Training: 2022-04-10 23:13:31,380-rank_id: 0 Training: 2022-04-10 23:13:45,393-: margin_list [1.0, 0.5, 0.0] Training: 2022-04-10 23:13:45,393-: network r100 Training: 2022-04-10 23:13:45,393-: resume False Training: 2022-04-10 23:13:45,393-: output work_dirs/ms1mv3_r100 Training: 2022-04-10 23:13:45,393-: embedding_size 512 Training: 2022-04-10 23:13:45,393-: sample_rate 1.0 Training: 2022-04-10 23:13:45,393-: interclass_filtering_threshold0 Training: 2022-04-10 23:13:45,393-: fp16 True Training: 2022-04-10 23:13:45,393-: batch_size 128 Training: 2022-04-10 23:13:45,393-: optimizer sgd Training: 2022-04-10 23:13:45,394-: lr 0.1 Training: 2022-04-10 23:13:45,394-: momentum 0.9 Training: 2022-04-10 23:13:45,394-: weight_decay 0.0005 Training: 2022-04-10 23:13:45,394-: verbose 2000 Training: 2022-04-10 23:13:45,394-: frequent 10 Training: 2022-04-10 23:13:45,394-: dali False Training: 2022-04-10 23:13:45,394-: rec /train_tmp/ms1m-retinaface-t1 Training: 2022-04-10 23:13:45,394-: num_classes 93431 Training: 2022-04-10 23:13:45,394-: num_image 5179510 Training: 2022-04-10 23:13:45,394-: num_epoch 20 Training: 2022-04-10 23:13:45,394-: warmup_epoch 0 Training: 2022-04-10 23:13:45,394-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-10 23:13:45,394-: total_batch_size 1024 Training: 2022-04-10 23:13:45,394-: warmup_step 0 Training: 2022-04-10 23:13:45,394-: total_step 101160 Training: 2022-04-10 23:14:52,660-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-10 23:14:58,211-Speed 3389.74 samples/sec Loss 47.3273 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-10 23:15:01,211-Speed 3415.40 samples/sec Loss 48.1647 LearningRate 0.0999 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-10 23:15:04,203-Speed 3423.93 samples/sec Loss 48.8566 LearningRate 0.0999 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-10 23:15:07,154-Speed 3470.90 samples/sec Loss 48.2520 LearningRate 0.0999 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-10 23:15:10,093-Speed 3486.58 samples/sec Loss 48.0507 LearningRate 0.0999 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-10 23:15:13,110-Speed 3394.80 samples/sec Loss 47.8045 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-10 23:15:16,120-Speed 3402.99 samples/sec Loss 47.9883 LearningRate 0.0998 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-10 23:15:19,057-Speed 3488.52 samples/sec Loss 47.7421 LearningRate 0.0998 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-10 23:15:21,999-Speed 3481.71 samples/sec Loss 47.3690 LearningRate 0.0998 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-10 23:15:24,975-Speed 3443.00 samples/sec Loss 47.0640 LearningRate 0.0998 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-10 23:15:27,908-Speed 3492.52 samples/sec Loss 46.9104 LearningRate 0.0998 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-10 23:15:30,855-Speed 3475.43 samples/sec Loss 46.8356 LearningRate 0.0997 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:33,784-Speed 3498.00 samples/sec Loss 46.7744 LearningRate 0.0997 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:36,734-Speed 3471.41 samples/sec Loss 46.4647 LearningRate 0.0997 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:39,687-Speed 3469.98 samples/sec Loss 46.3438 LearningRate 0.0997 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:42,641-Speed 3466.93 samples/sec Loss 46.1223 LearningRate 0.0997 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:45,583-Speed 3482.05 samples/sec Loss 45.9521 LearningRate 0.0996 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:48,529-Speed 3476.79 samples/sec Loss 45.9236 LearningRate 0.0996 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:51,466-Speed 3487.14 samples/sec Loss 45.6441 LearningRate 0.0996 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:15:54,416-Speed 3472.69 samples/sec Loss 45.5037 LearningRate 0.0996 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:15:57,364-Speed 3475.02 samples/sec Loss 45.3420 LearningRate 0.0996 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:00,320-Speed 3464.86 samples/sec Loss 45.1344 LearningRate 0.0995 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:03,276-Speed 3465.52 samples/sec Loss 45.1216 LearningRate 0.0995 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:06,221-Speed 3477.70 samples/sec Loss 44.9215 LearningRate 0.0995 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:09,162-Speed 3482.34 samples/sec Loss 44.7898 LearningRate 0.0995 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:12,112-Speed 3471.78 samples/sec Loss 44.5184 LearningRate 0.0995 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:15,059-Speed 3476.27 samples/sec Loss 44.4373 LearningRate 0.0994 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:18,002-Speed 3480.02 samples/sec Loss 44.2242 LearningRate 0.0994 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:20,956-Speed 3467.10 samples/sec Loss 44.0992 LearningRate 0.0994 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:16:23,896-Speed 3485.04 samples/sec Loss 43.9775 LearningRate 0.0994 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:26,857-Speed 3458.23 samples/sec Loss 43.7149 LearningRate 0.0994 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:29,809-Speed 3470.43 samples/sec Loss 43.6324 LearningRate 0.0993 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:32,770-Speed 3458.30 samples/sec Loss 43.4586 LearningRate 0.0993 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:35,750-Speed 3437.56 samples/sec Loss 43.3129 LearningRate 0.0993 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:38,691-Speed 3482.52 samples/sec Loss 43.2031 LearningRate 0.0993 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:41,637-Speed 3476.53 samples/sec Loss 42.9775 LearningRate 0.0993 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:44,578-Speed 3483.58 samples/sec Loss 42.7885 LearningRate 0.0993 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:47,520-Speed 3481.99 samples/sec Loss 42.5459 LearningRate 0.0992 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:50,468-Speed 3474.45 samples/sec Loss 42.5405 LearningRate 0.0992 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:16:53,413-Speed 3478.08 samples/sec Loss 42.3622 LearningRate 0.0992 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:16:56,346-Speed 3491.10 samples/sec Loss 42.1799 LearningRate 0.0992 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:16:59,306-Speed 3460.78 samples/sec Loss 41.9984 LearningRate 0.0992 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:17:02,284-Speed 3439.09 samples/sec Loss 41.9358 LearningRate 0.0991 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:17:05,227-Speed 3480.40 samples/sec Loss 41.7629 LearningRate 0.0991 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:17:08,172-Speed 3478.18 samples/sec Loss 41.5721 LearningRate 0.0991 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:17:11,122-Speed 3471.46 samples/sec Loss 41.4248 LearningRate 0.0991 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:14,068-Speed 3477.57 samples/sec Loss 41.2698 LearningRate 0.0991 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:17,003-Speed 3489.43 samples/sec Loss 41.0891 LearningRate 0.0990 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:19,946-Speed 3481.57 samples/sec Loss 40.9541 LearningRate 0.0990 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:22,891-Speed 3478.61 samples/sec Loss 40.8092 LearningRate 0.0990 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:25,836-Speed 3478.02 samples/sec Loss 40.6291 LearningRate 0.0990 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:28,780-Speed 3478.47 samples/sec Loss 40.4760 LearningRate 0.0990 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:31,716-Speed 3488.51 samples/sec Loss 40.4010 LearningRate 0.0989 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:34,651-Speed 3489.78 samples/sec Loss 40.2737 LearningRate 0.0989 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:37,591-Speed 3484.00 samples/sec Loss 40.1611 LearningRate 0.0989 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:40,512-Speed 3507.13 samples/sec Loss 39.9895 LearningRate 0.0989 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:43,449-Speed 3489.13 samples/sec Loss 39.7490 LearningRate 0.0989 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:46,459-Speed 3402.57 samples/sec Loss 39.4536 LearningRate 0.0988 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:49,393-Speed 3490.80 samples/sec Loss 39.3885 LearningRate 0.0988 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:52,343-Speed 3471.68 samples/sec Loss 39.2799 LearningRate 0.0988 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:55,280-Speed 3487.30 samples/sec Loss 39.1567 LearningRate 0.0988 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:17:58,220-Speed 3483.58 samples/sec Loss 39.0062 LearningRate 0.0988 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:01,167-Speed 3475.76 samples/sec Loss 38.8358 LearningRate 0.0987 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:04,119-Speed 3470.01 samples/sec Loss 38.6928 LearningRate 0.0987 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:07,065-Speed 3477.25 samples/sec Loss 38.5280 LearningRate 0.0987 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:09,991-Speed 3500.30 samples/sec Loss 38.4055 LearningRate 0.0987 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:12,928-Speed 3487.83 samples/sec Loss 38.1617 LearningRate 0.0987 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:15,861-Speed 3492.78 samples/sec Loss 38.0510 LearningRate 0.0986 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:18,798-Speed 3486.53 samples/sec Loss 37.8527 LearningRate 0.0986 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:21,737-Speed 3485.37 samples/sec Loss 37.6648 LearningRate 0.0986 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:24,681-Speed 3479.34 samples/sec Loss 37.5056 LearningRate 0.0986 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:27,616-Speed 3490.16 samples/sec Loss 37.4346 LearningRate 0.0986 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:30,577-Speed 3459.36 samples/sec Loss 37.2564 LearningRate 0.0985 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:33,522-Speed 3477.18 samples/sec Loss 37.1208 LearningRate 0.0985 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:36,469-Speed 3476.14 samples/sec Loss 36.9085 LearningRate 0.0985 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:39,407-Speed 3485.78 samples/sec Loss 36.7451 LearningRate 0.0985 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:18:42,345-Speed 3486.93 samples/sec Loss 36.7186 LearningRate 0.0985 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:45,282-Speed 3487.04 samples/sec Loss 36.4872 LearningRate 0.0984 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:48,230-Speed 3474.73 samples/sec Loss 36.2792 LearningRate 0.0984 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:51,177-Speed 3475.37 samples/sec Loss 36.1131 LearningRate 0.0984 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:54,120-Speed 3480.27 samples/sec Loss 35.9162 LearningRate 0.0984 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:18:57,069-Speed 3473.48 samples/sec Loss 35.8651 LearningRate 0.0984 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:00,003-Speed 3491.15 samples/sec Loss 35.5578 LearningRate 0.0983 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:02,948-Speed 3478.07 samples/sec Loss 35.5299 LearningRate 0.0983 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:05,888-Speed 3483.80 samples/sec Loss 35.4439 LearningRate 0.0983 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:08,825-Speed 3488.10 samples/sec Loss 35.0125 LearningRate 0.0983 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:11,752-Speed 3498.44 samples/sec Loss 34.9572 LearningRate 0.0983 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:14,691-Speed 3485.20 samples/sec Loss 34.7387 LearningRate 0.0982 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:17,635-Speed 3479.18 samples/sec Loss 34.5511 LearningRate 0.0982 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:19:20,574-Speed 3485.09 samples/sec Loss 34.4487 LearningRate 0.0982 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:23,519-Speed 3478.15 samples/sec Loss 34.4024 LearningRate 0.0982 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:26,474-Speed 3466.25 samples/sec Loss 34.1118 LearningRate 0.0982 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:29,413-Speed 3485.20 samples/sec Loss 34.0529 LearningRate 0.0982 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:32,377-Speed 3455.86 samples/sec Loss 33.8308 LearningRate 0.0981 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:35,319-Speed 3482.07 samples/sec Loss 33.5956 LearningRate 0.0981 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:38,260-Speed 3482.51 samples/sec Loss 33.4252 LearningRate 0.0981 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:41,204-Speed 3478.44 samples/sec Loss 33.5193 LearningRate 0.0981 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:19:44,136-Speed 3493.20 samples/sec Loss 33.2280 LearningRate 0.0981 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:47,113-Speed 3441.34 samples/sec Loss 33.0301 LearningRate 0.0980 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:50,058-Speed 3477.38 samples/sec Loss 32.8899 LearningRate 0.0980 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:53,008-Speed 3472.16 samples/sec Loss 32.6804 LearningRate 0.0980 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:55,960-Speed 3469.50 samples/sec Loss 32.5569 LearningRate 0.0980 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:58,928-Speed 3451.85 samples/sec Loss 32.3279 LearningRate 0.0980 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:01,879-Speed 3470.70 samples/sec Loss 32.0823 LearningRate 0.0979 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:04,826-Speed 3475.49 samples/sec Loss 32.1485 LearningRate 0.0979 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:07,768-Speed 3481.71 samples/sec Loss 31.7788 LearningRate 0.0979 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:10,711-Speed 3480.24 samples/sec Loss 31.5781 LearningRate 0.0979 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:13,651-Speed 3484.24 samples/sec Loss 31.3841 LearningRate 0.0979 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:16,605-Speed 3467.35 samples/sec Loss 31.3892 LearningRate 0.0978 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:19,552-Speed 3475.16 samples/sec Loss 31.1897 LearningRate 0.0978 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:22,498-Speed 3476.58 samples/sec Loss 31.0769 LearningRate 0.0978 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:25,442-Speed 3479.85 samples/sec Loss 30.5718 LearningRate 0.0978 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:28,392-Speed 3471.65 samples/sec Loss 30.8429 LearningRate 0.0978 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:31,337-Speed 3478.60 samples/sec Loss 30.5111 LearningRate 0.0977 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:34,289-Speed 3469.38 samples/sec Loss 30.3706 LearningRate 0.0977 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:37,238-Speed 3473.65 samples/sec Loss 30.1590 LearningRate 0.0977 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:40,191-Speed 3469.12 samples/sec Loss 29.9044 LearningRate 0.0977 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:43,125-Speed 3490.05 samples/sec Loss 29.8432 LearningRate 0.0977 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:46,068-Speed 3480.99 samples/sec Loss 29.5799 LearningRate 0.0976 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:49,016-Speed 3474.23 samples/sec Loss 29.5285 LearningRate 0.0976 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:51,964-Speed 3474.44 samples/sec Loss 29.4015 LearningRate 0.0976 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:54,924-Speed 3460.49 samples/sec Loss 29.3319 LearningRate 0.0976 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:57,868-Speed 3479.75 samples/sec Loss 29.0804 LearningRate 0.0976 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:00,816-Speed 3473.84 samples/sec Loss 29.0961 LearningRate 0.0975 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:03,803-Speed 3428.84 samples/sec Loss 28.7543 LearningRate 0.0975 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:06,752-Speed 3472.93 samples/sec Loss 28.5811 LearningRate 0.0975 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:09,699-Speed 3476.19 samples/sec Loss 28.5935 LearningRate 0.0975 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:12,639-Speed 3483.28 samples/sec Loss 28.3610 LearningRate 0.0975 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:15,605-Speed 3454.25 samples/sec Loss 28.2017 LearningRate 0.0974 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:18,555-Speed 3471.79 samples/sec Loss 28.3622 LearningRate 0.0974 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:21,507-Speed 3468.97 samples/sec Loss 27.8651 LearningRate 0.0974 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:24,475-Speed 3451.60 samples/sec Loss 27.7559 LearningRate 0.0974 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:27,429-Speed 3468.17 samples/sec Loss 27.8152 LearningRate 0.0974 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:30,378-Speed 3473.05 samples/sec Loss 27.3315 LearningRate 0.0973 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:33,327-Speed 3473.35 samples/sec Loss 27.1738 LearningRate 0.0973 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:36,275-Speed 3473.92 samples/sec Loss 27.1084 LearningRate 0.0973 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:39,221-Speed 3477.03 samples/sec Loss 27.0984 LearningRate 0.0973 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:42,160-Speed 3484.94 samples/sec Loss 26.8077 LearningRate 0.0973 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:45,115-Speed 3466.37 samples/sec Loss 26.8609 LearningRate 0.0973 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:48,064-Speed 3473.08 samples/sec Loss 26.9001 LearningRate 0.0972 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:51,021-Speed 3463.80 samples/sec Loss 26.5722 LearningRate 0.0972 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:53,973-Speed 3470.16 samples/sec Loss 26.4863 LearningRate 0.0972 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:56,931-Speed 3462.80 samples/sec Loss 26.5693 LearningRate 0.0972 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:59,878-Speed 3475.87 samples/sec Loss 26.2359 LearningRate 0.0972 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:02,831-Speed 3468.21 samples/sec Loss 26.2516 LearningRate 0.0971 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:05,779-Speed 3474.89 samples/sec Loss 25.5802 LearningRate 0.0971 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:08,730-Speed 3470.39 samples/sec Loss 25.5674 LearningRate 0.0971 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:11,673-Speed 3480.25 samples/sec Loss 25.4488 LearningRate 0.0971 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:14,625-Speed 3469.62 samples/sec Loss 25.5923 LearningRate 0.0971 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:17,616-Speed 3424.36 samples/sec Loss 25.2241 LearningRate 0.0970 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:20,578-Speed 3459.12 samples/sec Loss 25.0249 LearningRate 0.0970 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:23,541-Speed 3457.13 samples/sec Loss 25.1807 LearningRate 0.0970 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:26,494-Speed 3467.95 samples/sec Loss 25.2119 LearningRate 0.0970 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:29,453-Speed 3461.34 samples/sec Loss 24.9304 LearningRate 0.0970 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:32,439-Speed 3430.89 samples/sec Loss 24.8900 LearningRate 0.0969 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:35,394-Speed 3466.59 samples/sec Loss 24.7611 LearningRate 0.0969 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:38,349-Speed 3466.44 samples/sec Loss 24.4857 LearningRate 0.0969 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:41,302-Speed 3468.95 samples/sec Loss 24.2593 LearningRate 0.0969 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:44,256-Speed 3467.30 samples/sec Loss 24.2353 LearningRate 0.0969 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:47,214-Speed 3463.05 samples/sec Loss 24.2247 LearningRate 0.0968 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:50,182-Speed 3450.58 samples/sec Loss 24.2177 LearningRate 0.0968 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:53,137-Speed 3466.93 samples/sec Loss 24.1287 LearningRate 0.0968 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:56,091-Speed 3467.05 samples/sec Loss 23.9479 LearningRate 0.0968 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:22:59,098-Speed 3406.65 samples/sec Loss 23.8429 LearningRate 0.0968 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:02,088-Speed 3425.12 samples/sec Loss 23.6208 LearningRate 0.0967 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:05,046-Speed 3462.85 samples/sec Loss 23.5947 LearningRate 0.0967 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:08,000-Speed 3466.79 samples/sec Loss 23.4163 LearningRate 0.0967 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:10,955-Speed 3466.74 samples/sec Loss 23.2704 LearningRate 0.0967 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:23:13,898-Speed 3479.75 samples/sec Loss 23.5878 LearningRate 0.0967 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:16,856-Speed 3463.99 samples/sec Loss 23.2828 LearningRate 0.0966 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:19,809-Speed 3467.98 samples/sec Loss 23.1734 LearningRate 0.0966 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:22,762-Speed 3469.22 samples/sec Loss 22.7724 LearningRate 0.0966 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:25,720-Speed 3462.49 samples/sec Loss 22.8237 LearningRate 0.0966 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:28,681-Speed 3459.42 samples/sec Loss 22.6503 LearningRate 0.0966 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:31,634-Speed 3467.84 samples/sec Loss 22.8688 LearningRate 0.0966 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:34,592-Speed 3462.81 samples/sec Loss 22.3705 LearningRate 0.0965 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:37,553-Speed 3459.40 samples/sec Loss 22.1893 LearningRate 0.0965 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:40,520-Speed 3451.67 samples/sec Loss 22.2668 LearningRate 0.0965 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:43,468-Speed 3474.16 samples/sec Loss 22.4202 LearningRate 0.0965 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:46,424-Speed 3466.07 samples/sec Loss 22.1842 LearningRate 0.0965 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:49,379-Speed 3466.07 samples/sec Loss 21.9713 LearningRate 0.0964 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:52,334-Speed 3466.75 samples/sec Loss 22.0354 LearningRate 0.0964 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:55,286-Speed 3469.84 samples/sec Loss 21.8810 LearningRate 0.0964 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:23:58,235-Speed 3472.30 samples/sec Loss 21.7592 LearningRate 0.0964 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:01,243-Speed 3405.59 samples/sec Loss 21.7708 LearningRate 0.0964 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:04,188-Speed 3478.23 samples/sec Loss 21.5837 LearningRate 0.0963 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:07,160-Speed 3445.49 samples/sec Loss 21.6370 LearningRate 0.0963 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:10,109-Speed 3473.67 samples/sec Loss 21.4710 LearningRate 0.0963 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:13,063-Speed 3467.61 samples/sec Loss 21.3677 LearningRate 0.0963 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:16,021-Speed 3463.87 samples/sec Loss 21.2666 LearningRate 0.0963 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:18,969-Speed 3473.72 samples/sec Loss 21.0998 LearningRate 0.0962 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:21,917-Speed 3474.56 samples/sec Loss 21.1563 LearningRate 0.0962 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:24,872-Speed 3465.90 samples/sec Loss 21.0580 LearningRate 0.0962 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:27,827-Speed 3466.21 samples/sec Loss 21.0078 LearningRate 0.0962 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:30,797-Speed 3448.84 samples/sec Loss 20.8539 LearningRate 0.0962 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:24:33,748-Speed 3470.46 samples/sec Loss 20.7606 LearningRate 0.0961 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:36,713-Speed 3454.45 samples/sec Loss 20.7871 LearningRate 0.0961 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:39,674-Speed 3459.98 samples/sec Loss 20.5702 LearningRate 0.0961 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:42,626-Speed 3469.07 samples/sec Loss 20.4358 LearningRate 0.0961 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:26,979-[lfw][2000]XNorm: 21.043712 Training: 2022-04-10 23:25:26,980-[lfw][2000]Accuracy-Flip: 0.98433+-0.00389 Training: 2022-04-10 23:25:26,980-[lfw][2000]Accuracy-Highest: 0.98433 Training: 2022-04-10 23:26:18,162-[cfp_fp][2000]XNorm: 18.296019 Training: 2022-04-10 23:26:18,163-[cfp_fp][2000]Accuracy-Flip: 0.81600+-0.02091 Training: 2022-04-10 23:26:18,164-[cfp_fp][2000]Accuracy-Highest: 0.81600 Training: 2022-04-10 23:27:02,046-[agedb_30][2000]XNorm: 20.396503 Training: 2022-04-10 23:27:02,047-[agedb_30][2000]Accuracy-Flip: 0.88017+-0.02066 Training: 2022-04-10 23:27:02,048-[agedb_30][2000]Accuracy-Highest: 0.88017 Training: 2022-04-10 23:27:04,986-Speed 71.93 samples/sec Loss 20.4094 LearningRate 0.0961 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:07,914-Speed 3498.12 samples/sec Loss 20.3666 LearningRate 0.0960 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:10,844-Speed 3495.79 samples/sec Loss 20.3427 LearningRate 0.0960 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:13,775-Speed 3494.18 samples/sec Loss 20.4137 LearningRate 0.0960 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:16,728-Speed 3468.54 samples/sec Loss 20.1456 LearningRate 0.0960 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:19,663-Speed 3491.03 samples/sec Loss 20.0656 LearningRate 0.0960 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:22,587-Speed 3502.86 samples/sec Loss 20.2408 LearningRate 0.0959 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:25,536-Speed 3472.92 samples/sec Loss 19.9981 LearningRate 0.0959 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:28,470-Speed 3490.93 samples/sec Loss 19.8215 LearningRate 0.0959 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:31,448-Speed 3439.43 samples/sec Loss 19.8146 LearningRate 0.0959 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:34,388-Speed 3484.56 samples/sec Loss 19.6353 LearningRate 0.0959 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:37,339-Speed 3470.56 samples/sec Loss 19.8534 LearningRate 0.0959 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:40,285-Speed 3477.10 samples/sec Loss 19.5803 LearningRate 0.0958 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:43,227-Speed 3481.63 samples/sec Loss 19.6852 LearningRate 0.0958 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:46,169-Speed 3481.01 samples/sec Loss 19.5680 LearningRate 0.0958 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:49,113-Speed 3479.80 samples/sec Loss 19.4155 LearningRate 0.0958 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:52,042-Speed 3496.17 samples/sec Loss 19.2233 LearningRate 0.0958 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:54,990-Speed 3474.57 samples/sec Loss 19.4111 LearningRate 0.0957 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:27:57,939-Speed 3474.17 samples/sec Loss 19.0449 LearningRate 0.0957 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:00,893-Speed 3466.70 samples/sec Loss 19.2510 LearningRate 0.0957 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:03,840-Speed 3476.37 samples/sec Loss 19.3937 LearningRate 0.0957 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:06,782-Speed 3480.85 samples/sec Loss 19.0701 LearningRate 0.0957 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:09,725-Speed 3481.32 samples/sec Loss 19.0803 LearningRate 0.0956 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:12,669-Speed 3478.46 samples/sec Loss 18.9225 LearningRate 0.0956 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:15,624-Speed 3466.03 samples/sec Loss 19.0080 LearningRate 0.0956 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:18,562-Speed 3486.16 samples/sec Loss 18.9760 LearningRate 0.0956 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:21,501-Speed 3485.45 samples/sec Loss 18.9391 LearningRate 0.0956 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:24,437-Speed 3488.25 samples/sec Loss 18.8347 LearningRate 0.0955 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:27,399-Speed 3458.55 samples/sec Loss 18.8076 LearningRate 0.0955 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:30,339-Speed 3483.79 samples/sec Loss 18.6863 LearningRate 0.0955 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:33,283-Speed 3479.12 samples/sec Loss 18.6960 LearningRate 0.0955 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:36,232-Speed 3473.09 samples/sec Loss 18.6664 LearningRate 0.0955 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:39,172-Speed 3484.59 samples/sec Loss 18.6709 LearningRate 0.0954 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:42,109-Speed 3487.53 samples/sec Loss 18.5786 LearningRate 0.0954 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:45,053-Speed 3479.01 samples/sec Loss 18.4264 LearningRate 0.0954 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:48,018-Speed 3454.40 samples/sec Loss 18.4042 LearningRate 0.0954 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:50,987-Speed 3450.31 samples/sec Loss 18.3494 LearningRate 0.0954 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-10 23:28:53,935-Speed 3474.47 samples/sec Loss 18.4600 LearningRate 0.0953 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:56,881-Speed 3476.03 samples/sec Loss 18.2423 LearningRate 0.0953 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:28:59,828-Speed 3476.09 samples/sec Loss 18.2164 LearningRate 0.0953 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:02,764-Speed 3488.61 samples/sec Loss 18.2572 LearningRate 0.0953 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:05,715-Speed 3470.65 samples/sec Loss 18.2174 LearningRate 0.0953 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:08,653-Speed 3486.65 samples/sec Loss 18.2329 LearningRate 0.0953 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:11,615-Speed 3458.76 samples/sec Loss 18.1085 LearningRate 0.0952 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:14,578-Speed 3456.43 samples/sec Loss 17.9340 LearningRate 0.0952 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:17,522-Speed 3478.52 samples/sec Loss 17.8165 LearningRate 0.0952 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:20,464-Speed 3482.64 samples/sec Loss 17.8689 LearningRate 0.0952 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:23,389-Speed 3500.99 samples/sec Loss 17.8758 LearningRate 0.0952 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:26,332-Speed 3480.74 samples/sec Loss 17.6671 LearningRate 0.0951 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:29,272-Speed 3484.66 samples/sec Loss 17.8096 LearningRate 0.0951 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:32,212-Speed 3483.60 samples/sec Loss 17.5732 LearningRate 0.0951 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:35,154-Speed 3481.80 samples/sec Loss 17.6953 LearningRate 0.0951 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:38,113-Speed 3461.07 samples/sec Loss 17.5401 LearningRate 0.0951 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:41,054-Speed 3482.29 samples/sec Loss 17.5734 LearningRate 0.0950 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:44,011-Speed 3463.57 samples/sec Loss 17.6846 LearningRate 0.0950 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:46,953-Speed 3482.04 samples/sec Loss 17.5826 LearningRate 0.0950 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:49,897-Speed 3479.69 samples/sec Loss 17.3718 LearningRate 0.0950 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:52,830-Speed 3491.44 samples/sec Loss 17.4996 LearningRate 0.0950 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:55,779-Speed 3473.79 samples/sec Loss 17.4728 LearningRate 0.0949 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:29:58,732-Speed 3468.61 samples/sec Loss 17.3714 LearningRate 0.0949 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:01,676-Speed 3479.26 samples/sec Loss 17.3197 LearningRate 0.0949 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:04,631-Speed 3465.83 samples/sec Loss 17.2131 LearningRate 0.0949 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:07,578-Speed 3475.29 samples/sec Loss 17.2678 LearningRate 0.0949 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:10,525-Speed 3475.31 samples/sec Loss 17.3222 LearningRate 0.0948 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:13,472-Speed 3476.89 samples/sec Loss 17.2733 LearningRate 0.0948 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:16,429-Speed 3463.74 samples/sec Loss 17.2502 LearningRate 0.0948 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:19,374-Speed 3478.50 samples/sec Loss 17.2241 LearningRate 0.0948 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:22,313-Speed 3484.57 samples/sec Loss 17.4127 LearningRate 0.0948 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:25,258-Speed 3478.35 samples/sec Loss 17.0828 LearningRate 0.0948 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:28,201-Speed 3479.47 samples/sec Loss 16.9950 LearningRate 0.0947 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:31,151-Speed 3472.03 samples/sec Loss 17.0025 LearningRate 0.0947 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:34,105-Speed 3467.44 samples/sec Loss 16.8680 LearningRate 0.0947 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:37,065-Speed 3460.35 samples/sec Loss 16.8243 LearningRate 0.0947 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:40,009-Speed 3479.54 samples/sec Loss 16.7777 LearningRate 0.0947 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:42,954-Speed 3478.19 samples/sec Loss 16.9445 LearningRate 0.0946 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:45,900-Speed 3476.84 samples/sec Loss 16.7681 LearningRate 0.0946 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:48,844-Speed 3479.49 samples/sec Loss 16.5929 LearningRate 0.0946 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:51,780-Speed 3488.31 samples/sec Loss 16.8676 LearningRate 0.0946 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:54,742-Speed 3457.88 samples/sec Loss 17.0562 LearningRate 0.0946 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:30:57,689-Speed 3475.65 samples/sec Loss 16.6161 LearningRate 0.0945 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:31:00,634-Speed 3477.56 samples/sec Loss 16.5615 LearningRate 0.0945 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:31:03,581-Speed 3476.36 samples/sec Loss 16.7329 LearningRate 0.0945 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:06,525-Speed 3478.89 samples/sec Loss 16.6488 LearningRate 0.0945 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:09,466-Speed 3482.91 samples/sec Loss 16.3466 LearningRate 0.0945 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:12,411-Speed 3477.75 samples/sec Loss 16.6189 LearningRate 0.0944 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:15,354-Speed 3480.23 samples/sec Loss 16.5782 LearningRate 0.0944 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:18,296-Speed 3481.89 samples/sec Loss 16.2836 LearningRate 0.0944 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:21,226-Speed 3495.12 samples/sec Loss 16.4012 LearningRate 0.0944 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:24,173-Speed 3476.39 samples/sec Loss 16.3922 LearningRate 0.0944 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:27,116-Speed 3480.24 samples/sec Loss 16.5935 LearningRate 0.0943 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:30,065-Speed 3473.79 samples/sec Loss 16.5198 LearningRate 0.0943 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:33,021-Speed 3464.50 samples/sec Loss 16.2683 LearningRate 0.0943 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:35,967-Speed 3477.19 samples/sec Loss 16.2620 LearningRate 0.0943 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:31:38,911-Speed 3478.68 samples/sec Loss 16.2917 LearningRate 0.0943 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:41,858-Speed 3475.30 samples/sec Loss 16.3678 LearningRate 0.0943 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:44,805-Speed 3476.50 samples/sec Loss 16.0961 LearningRate 0.0942 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:47,761-Speed 3464.85 samples/sec Loss 16.2049 LearningRate 0.0942 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:50,728-Speed 3452.01 samples/sec Loss 16.1717 LearningRate 0.0942 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:53,668-Speed 3483.21 samples/sec Loss 15.9842 LearningRate 0.0942 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:56,610-Speed 3482.16 samples/sec Loss 16.3690 LearningRate 0.0942 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:31:59,552-Speed 3481.41 samples/sec Loss 16.0127 LearningRate 0.0941 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:02,495-Speed 3480.49 samples/sec Loss 15.9795 LearningRate 0.0941 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:05,442-Speed 3475.89 samples/sec Loss 15.8814 LearningRate 0.0941 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:08,386-Speed 3479.67 samples/sec Loss 16.0682 LearningRate 0.0941 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:11,441-Speed 3352.08 samples/sec Loss 16.0123 LearningRate 0.0941 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:14,394-Speed 3468.51 samples/sec Loss 16.2272 LearningRate 0.0940 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:17,345-Speed 3471.21 samples/sec Loss 16.0537 LearningRate 0.0940 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:20,287-Speed 3481.88 samples/sec Loss 16.0204 LearningRate 0.0940 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:23,284-Speed 3416.51 samples/sec Loss 15.7814 LearningRate 0.0940 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:26,234-Speed 3472.04 samples/sec Loss 15.8375 LearningRate 0.0940 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:32:29,178-Speed 3479.70 samples/sec Loss 15.8500 LearningRate 0.0939 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:32,132-Speed 3467.08 samples/sec Loss 15.7463 LearningRate 0.0939 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:35,074-Speed 3481.77 samples/sec Loss 15.9577 LearningRate 0.0939 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:38,017-Speed 3480.07 samples/sec Loss 15.6909 LearningRate 0.0939 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:40,972-Speed 3466.24 samples/sec Loss 15.8044 LearningRate 0.0939 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:43,914-Speed 3482.49 samples/sec Loss 15.7948 LearningRate 0.0939 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:46,858-Speed 3478.70 samples/sec Loss 15.6278 LearningRate 0.0938 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:49,804-Speed 3476.36 samples/sec Loss 15.6456 LearningRate 0.0938 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:52,791-Speed 3429.24 samples/sec Loss 15.6605 LearningRate 0.0938 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:55,739-Speed 3473.82 samples/sec Loss 15.5953 LearningRate 0.0938 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:32:58,697-Speed 3463.75 samples/sec Loss 15.3188 LearningRate 0.0938 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:01,642-Speed 3477.99 samples/sec Loss 15.5015 LearningRate 0.0937 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:04,586-Speed 3478.92 samples/sec Loss 15.4277 LearningRate 0.0937 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:07,531-Speed 3478.40 samples/sec Loss 15.2575 LearningRate 0.0937 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:10,473-Speed 3481.55 samples/sec Loss 15.3591 LearningRate 0.0937 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:13,430-Speed 3463.82 samples/sec Loss 15.4233 LearningRate 0.0937 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:16,375-Speed 3477.33 samples/sec Loss 15.5429 LearningRate 0.0936 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:19,319-Speed 3479.16 samples/sec Loss 15.2401 LearningRate 0.0936 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:22,266-Speed 3476.00 samples/sec Loss 15.4057 LearningRate 0.0936 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:25,215-Speed 3473.34 samples/sec Loss 15.5889 LearningRate 0.0936 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:28,147-Speed 3493.72 samples/sec Loss 15.4451 LearningRate 0.0936 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:31,093-Speed 3476.86 samples/sec Loss 15.2470 LearningRate 0.0935 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:34,036-Speed 3480.23 samples/sec Loss 15.2365 LearningRate 0.0935 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:36,982-Speed 3476.83 samples/sec Loss 15.1041 LearningRate 0.0935 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:39,925-Speed 3480.64 samples/sec Loss 15.2607 LearningRate 0.0935 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:42,872-Speed 3475.43 samples/sec Loss 15.0939 LearningRate 0.0935 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:45,818-Speed 3476.47 samples/sec Loss 15.2522 LearningRate 0.0934 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:48,767-Speed 3472.78 samples/sec Loss 15.2954 LearningRate 0.0934 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:51,713-Speed 3477.38 samples/sec Loss 15.2037 LearningRate 0.0934 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:54,657-Speed 3478.81 samples/sec Loss 15.2174 LearningRate 0.0934 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:33:57,591-Speed 3490.82 samples/sec Loss 15.2697 LearningRate 0.0934 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:00,537-Speed 3476.97 samples/sec Loss 15.0435 LearningRate 0.0934 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:03,483-Speed 3476.84 samples/sec Loss 14.8872 LearningRate 0.0933 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:06,427-Speed 3479.16 samples/sec Loss 15.1141 LearningRate 0.0933 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:09,371-Speed 3479.39 samples/sec Loss 15.1307 LearningRate 0.0933 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:12,319-Speed 3474.52 samples/sec Loss 15.0686 LearningRate 0.0933 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:15,263-Speed 3478.52 samples/sec Loss 15.1910 LearningRate 0.0933 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:18,206-Speed 3480.48 samples/sec Loss 14.9751 LearningRate 0.0932 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:21,154-Speed 3474.62 samples/sec Loss 15.2985 LearningRate 0.0932 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:24,102-Speed 3473.92 samples/sec Loss 14.8868 LearningRate 0.0932 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:34:27,048-Speed 3477.10 samples/sec Loss 15.0179 LearningRate 0.0932 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:29,995-Speed 3476.56 samples/sec Loss 14.8241 LearningRate 0.0932 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:32,946-Speed 3470.52 samples/sec Loss 14.8029 LearningRate 0.0931 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:35,894-Speed 3474.74 samples/sec Loss 14.7983 LearningRate 0.0931 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:38,849-Speed 3465.48 samples/sec Loss 14.6971 LearningRate 0.0931 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:41,794-Speed 3478.54 samples/sec Loss 14.7619 LearningRate 0.0931 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:44,741-Speed 3476.51 samples/sec Loss 14.7581 LearningRate 0.0931 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:47,685-Speed 3478.57 samples/sec Loss 14.8906 LearningRate 0.0930 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:50,640-Speed 3466.95 samples/sec Loss 14.6776 LearningRate 0.0930 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:53,589-Speed 3473.33 samples/sec Loss 14.7566 LearningRate 0.0930 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:56,528-Speed 3484.81 samples/sec Loss 14.8743 LearningRate 0.0930 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:34:59,486-Speed 3462.80 samples/sec Loss 14.7309 LearningRate 0.0930 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:02,437-Speed 3470.41 samples/sec Loss 14.7657 LearningRate 0.0930 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:05,389-Speed 3469.77 samples/sec Loss 14.8496 LearningRate 0.0929 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:08,360-Speed 3448.48 samples/sec Loss 14.8394 LearningRate 0.0929 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:11,326-Speed 3452.61 samples/sec Loss 14.7216 LearningRate 0.0929 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:14,360-Speed 3375.97 samples/sec Loss 14.4789 LearningRate 0.0929 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:17,344-Speed 3432.86 samples/sec Loss 14.8183 LearningRate 0.0929 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:20,290-Speed 3476.99 samples/sec Loss 14.5628 LearningRate 0.0928 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:23,239-Speed 3473.84 samples/sec Loss 14.4129 LearningRate 0.0928 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:26,172-Speed 3491.42 samples/sec Loss 14.6359 LearningRate 0.0928 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:29,117-Speed 3478.92 samples/sec Loss 14.4456 LearningRate 0.0928 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:32,064-Speed 3475.01 samples/sec Loss 14.6669 LearningRate 0.0928 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:35,011-Speed 3475.23 samples/sec Loss 14.6601 LearningRate 0.0927 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:37,968-Speed 3464.79 samples/sec Loss 14.5969 LearningRate 0.0927 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:40,923-Speed 3465.07 samples/sec Loss 14.6361 LearningRate 0.0927 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:43,879-Speed 3465.47 samples/sec Loss 14.4914 LearningRate 0.0927 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:46,842-Speed 3456.58 samples/sec Loss 14.5048 LearningRate 0.0927 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:49,788-Speed 3478.10 samples/sec Loss 14.3368 LearningRate 0.0926 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:52,736-Speed 3474.11 samples/sec Loss 14.1301 LearningRate 0.0926 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:35:55,680-Speed 3478.91 samples/sec Loss 14.5078 LearningRate 0.0926 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:35:58,614-Speed 3491.32 samples/sec Loss 14.3325 LearningRate 0.0926 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:01,564-Speed 3471.11 samples/sec Loss 14.4112 LearningRate 0.0926 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:04,517-Speed 3468.88 samples/sec Loss 14.3881 LearningRate 0.0926 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:07,460-Speed 3480.20 samples/sec Loss 14.4072 LearningRate 0.0925 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:10,404-Speed 3480.38 samples/sec Loss 14.5736 LearningRate 0.0925 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:13,350-Speed 3476.13 samples/sec Loss 14.3342 LearningRate 0.0925 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:16,303-Speed 3468.44 samples/sec Loss 14.4362 LearningRate 0.0925 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:19,250-Speed 3475.85 samples/sec Loss 14.1982 LearningRate 0.0925 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:22,197-Speed 3475.31 samples/sec Loss 14.4430 LearningRate 0.0924 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:25,145-Speed 3474.92 samples/sec Loss 14.3865 LearningRate 0.0924 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:28,078-Speed 3491.46 samples/sec Loss 14.3897 LearningRate 0.0924 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:36:30,998-Speed 3507.90 samples/sec Loss 14.2903 LearningRate 0.0924 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:33,941-Speed 3480.16 samples/sec Loss 14.4002 LearningRate 0.0924 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:36,896-Speed 3465.93 samples/sec Loss 14.3475 LearningRate 0.0923 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:39,853-Speed 3464.66 samples/sec Loss 14.1693 LearningRate 0.0923 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:42,800-Speed 3474.89 samples/sec Loss 14.3017 LearningRate 0.0923 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:45,753-Speed 3469.60 samples/sec Loss 14.2195 LearningRate 0.0923 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:48,705-Speed 3469.64 samples/sec Loss 14.4081 LearningRate 0.0923 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:36:51,654-Speed 3473.34 samples/sec Loss 14.0259 LearningRate 0.0922 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-10 23:37:35,780-[lfw][4000]XNorm: 22.088006 Training: 2022-04-10 23:37:35,781-[lfw][4000]Accuracy-Flip: 0.99150+-0.00456 Training: 2022-04-10 23:37:35,781-[lfw][4000]Accuracy-Highest: 0.99150 Training: 2022-04-10 23:38:27,089-[cfp_fp][4000]XNorm: 19.753306 Training: 2022-04-10 23:38:27,090-[cfp_fp][4000]Accuracy-Flip: 0.90800+-0.01447 Training: 2022-04-10 23:38:27,090-[cfp_fp][4000]Accuracy-Highest: 0.90800 Training: 2022-04-10 23:39:11,310-[agedb_30][4000]XNorm: 21.772712 Training: 2022-04-10 23:39:11,311-[agedb_30][4000]Accuracy-Flip: 0.94500+-0.01254 Training: 2022-04-10 23:39:11,311-[agedb_30][4000]Accuracy-Highest: 0.94500 Training: 2022-04-10 23:39:14,250-Speed 71.81 samples/sec Loss 14.1161 LearningRate 0.0922 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:39:17,193-Speed 3480.36 samples/sec Loss 14.0775 LearningRate 0.0922 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-10 23:39:20,122-Speed 3496.31 samples/sec Loss 14.3225 LearningRate 0.0922 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:23,050-Speed 3498.92 samples/sec Loss 13.9899 LearningRate 0.0922 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:25,980-Speed 3495.71 samples/sec Loss 14.0594 LearningRate 0.0922 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:28,911-Speed 3495.47 samples/sec Loss 14.2243 LearningRate 0.0921 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:31,841-Speed 3495.20 samples/sec Loss 14.0425 LearningRate 0.0921 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:34,775-Speed 3491.11 samples/sec Loss 14.0778 LearningRate 0.0921 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:37,711-Speed 3488.62 samples/sec Loss 14.2550 LearningRate 0.0921 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:40,645-Speed 3490.25 samples/sec Loss 13.8633 LearningRate 0.0921 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:43,585-Speed 3485.05 samples/sec Loss 13.9405 LearningRate 0.0920 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:46,521-Speed 3487.97 samples/sec Loss 14.0109 LearningRate 0.0920 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-10 23:39:49,458-Speed 3487.54 samples/sec Loss 14.0522 LearningRate 0.0920 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:39:52,392-Speed 3491.02 samples/sec Loss 14.0803 LearningRate 0.0920 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:39:55,326-Speed 3491.83 samples/sec Loss 14.0520 LearningRate 0.0920 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:39:58,260-Speed 3491.13 samples/sec Loss 13.9538 LearningRate 0.0919 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:01,226-Speed 3453.12 samples/sec Loss 14.0756 LearningRate 0.0919 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:04,161-Speed 3489.84 samples/sec Loss 13.9413 LearningRate 0.0919 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:07,102-Speed 3482.09 samples/sec Loss 13.8732 LearningRate 0.0919 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:10,041-Speed 3485.58 samples/sec Loss 13.9583 LearningRate 0.0919 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:12,994-Speed 3468.36 samples/sec Loss 14.0690 LearningRate 0.0918 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:15,946-Speed 3470.04 samples/sec Loss 14.0037 LearningRate 0.0918 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:18,889-Speed 3480.77 samples/sec Loss 13.9181 LearningRate 0.0918 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-10 23:40:21,815-Speed 3499.99 samples/sec Loss 13.6873 LearningRate 0.0918 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:24,753-Speed 3486.55 samples/sec Loss 13.8838 LearningRate 0.0918 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:27,691-Speed 3486.02 samples/sec Loss 13.8686 LearningRate 0.0918 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:30,670-Speed 3438.41 samples/sec Loss 13.8681 LearningRate 0.0917 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:33,608-Speed 3486.61 samples/sec Loss 13.8516 LearningRate 0.0917 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:36,549-Speed 3482.46 samples/sec Loss 13.9355 LearningRate 0.0917 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:39,502-Speed 3468.47 samples/sec Loss 13.8635 LearningRate 0.0917 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:42,448-Speed 3476.64 samples/sec Loss 13.7553 LearningRate 0.0917 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:45,387-Speed 3485.64 samples/sec Loss 13.9129 LearningRate 0.0916 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:48,323-Speed 3488.19 samples/sec Loss 13.8145 LearningRate 0.0916 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:51,249-Speed 3500.79 samples/sec Loss 13.6574 LearningRate 0.0916 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:54,209-Speed 3460.82 samples/sec Loss 13.6588 LearningRate 0.0916 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:40:57,149-Speed 3483.72 samples/sec Loss 13.5343 LearningRate 0.0916 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:00,095-Speed 3475.83 samples/sec Loss 13.7914 LearningRate 0.0915 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:03,055-Speed 3461.51 samples/sec Loss 13.8595 LearningRate 0.0915 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:06,001-Speed 3476.05 samples/sec Loss 13.8218 LearningRate 0.0915 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:08,943-Speed 3481.65 samples/sec Loss 13.4874 LearningRate 0.0915 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:11,887-Speed 3480.07 samples/sec Loss 13.9341 LearningRate 0.0915 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:14,825-Speed 3485.15 samples/sec Loss 13.7402 LearningRate 0.0915 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:17,777-Speed 3469.83 samples/sec Loss 13.6212 LearningRate 0.0914 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:20,706-Speed 3496.79 samples/sec Loss 13.7647 LearningRate 0.0914 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:23,648-Speed 3481.75 samples/sec Loss 13.7384 LearningRate 0.0914 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:26,608-Speed 3459.93 samples/sec Loss 13.6635 LearningRate 0.0914 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:29,554-Speed 3477.71 samples/sec Loss 13.8048 LearningRate 0.0914 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:32,494-Speed 3483.17 samples/sec Loss 13.5424 LearningRate 0.0913 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:35,433-Speed 3485.97 samples/sec Loss 13.7155 LearningRate 0.0913 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:38,379-Speed 3477.18 samples/sec Loss 13.5128 LearningRate 0.0913 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:41,322-Speed 3480.12 samples/sec Loss 13.5312 LearningRate 0.0913 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:44,274-Speed 3468.99 samples/sec Loss 13.4966 LearningRate 0.0913 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:47,213-Speed 3485.80 samples/sec Loss 13.6289 LearningRate 0.0912 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:50,140-Speed 3498.48 samples/sec Loss 13.5116 LearningRate 0.0912 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:53,085-Speed 3478.03 samples/sec Loss 13.5298 LearningRate 0.0912 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:56,028-Speed 3480.70 samples/sec Loss 13.5292 LearningRate 0.0912 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:41:58,976-Speed 3474.85 samples/sec Loss 13.6103 LearningRate 0.0912 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:01,916-Speed 3484.33 samples/sec Loss 13.4669 LearningRate 0.0912 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:04,883-Speed 3451.67 samples/sec Loss 13.4326 LearningRate 0.0911 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:07,826-Speed 3480.21 samples/sec Loss 13.4220 LearningRate 0.0911 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:10,769-Speed 3480.12 samples/sec Loss 13.4041 LearningRate 0.0911 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:13,706-Speed 3487.74 samples/sec Loss 13.4651 LearningRate 0.0911 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:16,652-Speed 3476.78 samples/sec Loss 13.4470 LearningRate 0.0911 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:19,588-Speed 3488.80 samples/sec Loss 13.4588 LearningRate 0.0910 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:22,534-Speed 3476.65 samples/sec Loss 13.4937 LearningRate 0.0910 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:25,475-Speed 3482.77 samples/sec Loss 13.3665 LearningRate 0.0910 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:28,425-Speed 3472.21 samples/sec Loss 13.5306 LearningRate 0.0910 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:31,365-Speed 3482.88 samples/sec Loss 13.4390 LearningRate 0.0910 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:34,308-Speed 3480.93 samples/sec Loss 13.5699 LearningRate 0.0909 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:37,246-Speed 3486.22 samples/sec Loss 13.3970 LearningRate 0.0909 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:40,191-Speed 3478.26 samples/sec Loss 13.4241 LearningRate 0.0909 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:43,137-Speed 3476.73 samples/sec Loss 13.5188 LearningRate 0.0909 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:46,080-Speed 3480.07 samples/sec Loss 13.3743 LearningRate 0.0909 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:49,011-Speed 3494.59 samples/sec Loss 13.4729 LearningRate 0.0908 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:51,955-Speed 3479.33 samples/sec Loss 13.3007 LearningRate 0.0908 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:54,908-Speed 3469.13 samples/sec Loss 13.3960 LearningRate 0.0908 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:42:57,850-Speed 3481.60 samples/sec Loss 13.3533 LearningRate 0.0908 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:43:00,796-Speed 3475.76 samples/sec Loss 13.2949 LearningRate 0.0908 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:43:03,738-Speed 3481.88 samples/sec Loss 13.5648 LearningRate 0.0908 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:43:06,681-Speed 3480.92 samples/sec Loss 13.4591 LearningRate 0.0907 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:43:09,623-Speed 3481.68 samples/sec Loss 13.3121 LearningRate 0.0907 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:12,573-Speed 3471.21 samples/sec Loss 13.2440 LearningRate 0.0907 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:15,511-Speed 3485.79 samples/sec Loss 13.3317 LearningRate 0.0907 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:18,465-Speed 3468.75 samples/sec Loss 13.2900 LearningRate 0.0907 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:21,422-Speed 3462.93 samples/sec Loss 13.2052 LearningRate 0.0906 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:24,394-Speed 3447.18 samples/sec Loss 13.1424 LearningRate 0.0906 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:27,370-Speed 3441.56 samples/sec Loss 13.0181 LearningRate 0.0906 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:30,316-Speed 3475.89 samples/sec Loss 13.1294 LearningRate 0.0906 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:33,255-Speed 3485.99 samples/sec Loss 13.3682 LearningRate 0.0906 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:36,196-Speed 3482.19 samples/sec Loss 13.1552 LearningRate 0.0905 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:39,137-Speed 3483.38 samples/sec Loss 13.3048 LearningRate 0.0905 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:42,082-Speed 3477.34 samples/sec Loss 12.9094 LearningRate 0.0905 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:45,032-Speed 3472.36 samples/sec Loss 13.1425 LearningRate 0.0905 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:47,975-Speed 3480.39 samples/sec Loss 12.9680 LearningRate 0.0905 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:43:50,908-Speed 3492.31 samples/sec Loss 13.0194 LearningRate 0.0905 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:53,858-Speed 3473.48 samples/sec Loss 13.1550 LearningRate 0.0904 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:56,802-Speed 3479.01 samples/sec Loss 13.2820 LearningRate 0.0904 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:43:59,745-Speed 3480.39 samples/sec Loss 13.0207 LearningRate 0.0904 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:02,679-Speed 3490.24 samples/sec Loss 13.2569 LearningRate 0.0904 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:05,633-Speed 3466.94 samples/sec Loss 13.0524 LearningRate 0.0904 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:08,577-Speed 3479.39 samples/sec Loss 13.2455 LearningRate 0.0903 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:11,518-Speed 3483.71 samples/sec Loss 13.1097 LearningRate 0.0903 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:14,458-Speed 3484.62 samples/sec Loss 13.3204 LearningRate 0.0903 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:17,400-Speed 3481.01 samples/sec Loss 13.0376 LearningRate 0.0903 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:20,416-Speed 3396.82 samples/sec Loss 13.0390 LearningRate 0.0903 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:33,052-Speed 810.41 samples/sec Loss 12.8230 LearningRate 0.0902 Epoch: 1 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:36,006-Speed 3468.78 samples/sec Loss 12.1781 LearningRate 0.0902 Epoch: 1 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:38,966-Speed 3459.94 samples/sec Loss 12.0808 LearningRate 0.0902 Epoch: 1 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:44:41,913-Speed 3475.03 samples/sec Loss 12.1553 LearningRate 0.0902 Epoch: 1 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:44,860-Speed 3476.36 samples/sec Loss 12.2992 LearningRate 0.0902 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:47,808-Speed 3473.83 samples/sec Loss 12.0251 LearningRate 0.0902 Epoch: 1 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:50,748-Speed 3483.78 samples/sec Loss 12.1304 LearningRate 0.0901 Epoch: 1 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:53,715-Speed 3452.31 samples/sec Loss 12.2892 LearningRate 0.0901 Epoch: 1 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:56,693-Speed 3438.92 samples/sec Loss 12.3380 LearningRate 0.0901 Epoch: 1 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:44:59,643-Speed 3473.26 samples/sec Loss 12.0149 LearningRate 0.0901 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:02,588-Speed 3477.41 samples/sec Loss 12.4538 LearningRate 0.0901 Epoch: 1 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:05,532-Speed 3479.43 samples/sec Loss 12.2624 LearningRate 0.0900 Epoch: 1 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:08,482-Speed 3472.06 samples/sec Loss 12.3121 LearningRate 0.0900 Epoch: 1 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:11,432-Speed 3470.95 samples/sec Loss 12.1283 LearningRate 0.0900 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:45:14,370-Speed 3486.87 samples/sec Loss 12.3784 LearningRate 0.0900 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:17,341-Speed 3447.72 samples/sec Loss 12.3968 LearningRate 0.0900 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:20,303-Speed 3458.54 samples/sec Loss 12.3967 LearningRate 0.0899 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:23,250-Speed 3475.54 samples/sec Loss 12.3252 LearningRate 0.0899 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:26,199-Speed 3472.66 samples/sec Loss 12.2831 LearningRate 0.0899 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:29,141-Speed 3482.05 samples/sec Loss 12.3086 LearningRate 0.0899 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:32,106-Speed 3454.74 samples/sec Loss 12.5128 LearningRate 0.0899 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:35,055-Speed 3473.40 samples/sec Loss 12.3510 LearningRate 0.0899 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:38,004-Speed 3472.34 samples/sec Loss 12.2736 LearningRate 0.0898 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:40,949-Speed 3477.95 samples/sec Loss 12.4296 LearningRate 0.0898 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:44,238-Speed 3114.66 samples/sec Loss 12.3285 LearningRate 0.0898 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:47,185-Speed 3474.92 samples/sec Loss 12.3063 LearningRate 0.0898 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:50,178-Speed 3422.06 samples/sec Loss 12.5340 LearningRate 0.0898 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:53,124-Speed 3477.98 samples/sec Loss 12.5000 LearningRate 0.0897 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:56,066-Speed 3480.56 samples/sec Loss 12.4060 LearningRate 0.0897 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:45:59,012-Speed 3477.76 samples/sec Loss 12.3636 LearningRate 0.0897 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:01,972-Speed 3459.21 samples/sec Loss 12.4789 LearningRate 0.0897 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:04,922-Speed 3472.28 samples/sec Loss 12.2905 LearningRate 0.0897 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:07,877-Speed 3466.97 samples/sec Loss 12.4001 LearningRate 0.0896 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:10,827-Speed 3471.74 samples/sec Loss 12.5907 LearningRate 0.0896 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:13,762-Speed 3490.21 samples/sec Loss 12.3940 LearningRate 0.0896 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:16,726-Speed 3455.08 samples/sec Loss 12.5151 LearningRate 0.0896 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:19,672-Speed 3478.16 samples/sec Loss 12.6263 LearningRate 0.0896 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:22,618-Speed 3475.59 samples/sec Loss 12.5725 LearningRate 0.0896 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:25,582-Speed 3456.14 samples/sec Loss 12.3621 LearningRate 0.0895 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:28,573-Speed 3424.12 samples/sec Loss 12.4196 LearningRate 0.0895 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:31,517-Speed 3479.13 samples/sec Loss 12.7327 LearningRate 0.0895 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:34,464-Speed 3475.39 samples/sec Loss 12.2393 LearningRate 0.0895 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:37,411-Speed 3475.45 samples/sec Loss 12.5070 LearningRate 0.0895 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:40,357-Speed 3477.29 samples/sec Loss 12.3363 LearningRate 0.0894 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:43,306-Speed 3472.93 samples/sec Loss 12.4028 LearningRate 0.0894 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:46:46,241-Speed 3490.58 samples/sec Loss 12.5042 LearningRate 0.0894 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:49,193-Speed 3469.76 samples/sec Loss 12.5440 LearningRate 0.0894 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:52,144-Speed 3470.28 samples/sec Loss 12.1591 LearningRate 0.0894 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:55,105-Speed 3459.71 samples/sec Loss 12.4319 LearningRate 0.0893 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:46:58,051-Speed 3477.07 samples/sec Loss 12.2398 LearningRate 0.0893 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:00,995-Speed 3478.27 samples/sec Loss 12.4642 LearningRate 0.0893 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:03,944-Speed 3474.06 samples/sec Loss 12.4296 LearningRate 0.0893 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:06,889-Speed 3477.17 samples/sec Loss 12.4890 LearningRate 0.0893 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:09,837-Speed 3475.01 samples/sec Loss 12.5329 LearningRate 0.0893 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:12,784-Speed 3475.10 samples/sec Loss 12.2382 LearningRate 0.0892 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:15,723-Speed 3485.97 samples/sec Loss 12.6216 LearningRate 0.0892 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:18,679-Speed 3464.43 samples/sec Loss 12.4460 LearningRate 0.0892 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:21,623-Speed 3479.02 samples/sec Loss 12.3326 LearningRate 0.0892 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:24,570-Speed 3475.81 samples/sec Loss 12.5092 LearningRate 0.0892 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:27,514-Speed 3479.43 samples/sec Loss 12.4432 LearningRate 0.0891 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:30,462-Speed 3473.99 samples/sec Loss 12.3638 LearningRate 0.0891 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:33,406-Speed 3479.20 samples/sec Loss 12.5555 LearningRate 0.0891 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:36,357-Speed 3470.63 samples/sec Loss 12.4436 LearningRate 0.0891 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:39,300-Speed 3480.85 samples/sec Loss 12.4471 LearningRate 0.0891 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:42,244-Speed 3479.47 samples/sec Loss 12.4754 LearningRate 0.0890 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:45,175-Speed 3494.65 samples/sec Loss 12.4369 LearningRate 0.0890 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:48,122-Speed 3475.31 samples/sec Loss 12.5726 LearningRate 0.0890 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:51,064-Speed 3481.58 samples/sec Loss 12.4748 LearningRate 0.0890 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:54,013-Speed 3472.70 samples/sec Loss 12.5552 LearningRate 0.0890 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:56,962-Speed 3473.83 samples/sec Loss 12.2501 LearningRate 0.0890 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:47:59,943-Speed 3436.05 samples/sec Loss 12.2534 LearningRate 0.0889 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:02,888-Speed 3477.96 samples/sec Loss 12.4558 LearningRate 0.0889 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:05,880-Speed 3423.72 samples/sec Loss 12.2403 LearningRate 0.0889 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:08,823-Speed 3479.53 samples/sec Loss 12.3632 LearningRate 0.0889 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:11,768-Speed 3478.26 samples/sec Loss 12.3575 LearningRate 0.0889 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:14,715-Speed 3475.62 samples/sec Loss 12.2082 LearningRate 0.0888 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:48:17,650-Speed 3489.83 samples/sec Loss 12.2937 LearningRate 0.0888 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:20,611-Speed 3459.58 samples/sec Loss 12.3293 LearningRate 0.0888 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:23,556-Speed 3478.08 samples/sec Loss 12.2936 LearningRate 0.0888 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:26,507-Speed 3470.72 samples/sec Loss 12.2519 LearningRate 0.0888 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:29,460-Speed 3468.11 samples/sec Loss 12.4462 LearningRate 0.0887 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:32,404-Speed 3479.99 samples/sec Loss 12.4253 LearningRate 0.0887 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:35,347-Speed 3479.91 samples/sec Loss 12.3812 LearningRate 0.0887 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:38,294-Speed 3476.00 samples/sec Loss 12.4489 LearningRate 0.0887 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:41,242-Speed 3474.35 samples/sec Loss 12.4769 LearningRate 0.0887 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:44,187-Speed 3477.57 samples/sec Loss 12.3100 LearningRate 0.0887 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:47,119-Speed 3492.83 samples/sec Loss 12.3306 LearningRate 0.0886 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:50,067-Speed 3474.73 samples/sec Loss 12.4002 LearningRate 0.0886 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:53,023-Speed 3464.96 samples/sec Loss 12.4834 LearningRate 0.0886 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:55,976-Speed 3470.69 samples/sec Loss 12.2588 LearningRate 0.0886 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:48:58,920-Speed 3480.08 samples/sec Loss 12.3713 LearningRate 0.0886 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:49:01,865-Speed 3477.09 samples/sec Loss 12.2901 LearningRate 0.0885 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:49:04,812-Speed 3475.80 samples/sec Loss 12.3571 LearningRate 0.0885 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:49:07,758-Speed 3476.41 samples/sec Loss 12.2130 LearningRate 0.0885 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:49:10,713-Speed 3466.61 samples/sec Loss 12.3169 LearningRate 0.0885 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:49:54,741-[lfw][6000]XNorm: 22.300878 Training: 2022-04-10 23:49:54,741-[lfw][6000]Accuracy-Flip: 0.99417+-0.00327 Training: 2022-04-10 23:49:54,742-[lfw][6000]Accuracy-Highest: 0.99417 Training: 2022-04-10 23:50:45,888-[cfp_fp][6000]XNorm: 19.629825 Training: 2022-04-10 23:50:45,889-[cfp_fp][6000]Accuracy-Flip: 0.92186+-0.01157 Training: 2022-04-10 23:50:45,889-[cfp_fp][6000]Accuracy-Highest: 0.92186 Training: 2022-04-10 23:51:30,104-[agedb_30][6000]XNorm: 22.114248 Training: 2022-04-10 23:51:30,105-[agedb_30][6000]Accuracy-Flip: 0.95683+-0.00603 Training: 2022-04-10 23:51:30,106-[agedb_30][6000]Accuracy-Highest: 0.95683 Training: 2022-04-10 23:51:33,048-Speed 71.94 samples/sec Loss 12.4022 LearningRate 0.0885 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:35,978-Speed 3495.99 samples/sec Loss 12.4552 LearningRate 0.0885 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:38,934-Speed 3464.31 samples/sec Loss 12.3357 LearningRate 0.0884 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:41,869-Speed 3490.47 samples/sec Loss 12.1732 LearningRate 0.0884 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:45,887-Speed 2549.02 samples/sec Loss 12.4505 LearningRate 0.0884 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:48,816-Speed 3497.24 samples/sec Loss 12.0741 LearningRate 0.0884 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:51,763-Speed 3475.40 samples/sec Loss 12.4680 LearningRate 0.0884 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:54,728-Speed 3454.19 samples/sec Loss 12.1639 LearningRate 0.0883 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:51:57,682-Speed 3467.61 samples/sec Loss 12.2994 LearningRate 0.0883 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:00,668-Speed 3430.11 samples/sec Loss 12.2925 LearningRate 0.0883 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:03,663-Speed 3419.71 samples/sec Loss 12.1398 LearningRate 0.0883 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:06,623-Speed 3461.51 samples/sec Loss 12.1477 LearningRate 0.0883 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:09,577-Speed 3467.17 samples/sec Loss 12.3056 LearningRate 0.0882 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:12,515-Speed 3486.35 samples/sec Loss 12.3175 LearningRate 0.0882 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:15,466-Speed 3470.22 samples/sec Loss 12.1865 LearningRate 0.0882 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:18,408-Speed 3481.42 samples/sec Loss 12.4013 LearningRate 0.0882 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:21,353-Speed 3478.05 samples/sec Loss 12.2280 LearningRate 0.0882 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:24,344-Speed 3424.30 samples/sec Loss 12.3790 LearningRate 0.0882 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:27,297-Speed 3469.03 samples/sec Loss 12.1374 LearningRate 0.0881 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:30,282-Speed 3430.88 samples/sec Loss 12.3414 LearningRate 0.0881 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:33,239-Speed 3464.52 samples/sec Loss 12.2641 LearningRate 0.0881 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:36,171-Speed 3492.90 samples/sec Loss 12.1719 LearningRate 0.0881 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:39,125-Speed 3468.17 samples/sec Loss 12.1504 LearningRate 0.0881 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:42,072-Speed 3474.89 samples/sec Loss 12.1188 LearningRate 0.0880 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:45,031-Speed 3462.40 samples/sec Loss 12.2398 LearningRate 0.0880 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:47,985-Speed 3466.93 samples/sec Loss 12.2128 LearningRate 0.0880 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:50,946-Speed 3458.58 samples/sec Loss 12.2987 LearningRate 0.0880 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:53,913-Speed 3452.47 samples/sec Loss 12.1412 LearningRate 0.0880 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:56,859-Speed 3476.85 samples/sec Loss 12.1731 LearningRate 0.0880 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:52:59,814-Speed 3466.65 samples/sec Loss 12.3244 LearningRate 0.0879 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:02,756-Speed 3481.88 samples/sec Loss 11.9956 LearningRate 0.0879 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:05,694-Speed 3485.49 samples/sec Loss 12.2231 LearningRate 0.0879 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:08,638-Speed 3479.18 samples/sec Loss 12.1260 LearningRate 0.0879 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:11,588-Speed 3471.94 samples/sec Loss 12.1495 LearningRate 0.0879 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:14,625-Speed 3372.96 samples/sec Loss 12.0013 LearningRate 0.0878 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:17,613-Speed 3428.55 samples/sec Loss 12.0774 LearningRate 0.0878 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:20,557-Speed 3478.23 samples/sec Loss 12.1664 LearningRate 0.0878 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:23,501-Speed 3479.29 samples/sec Loss 12.1729 LearningRate 0.0878 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:26,455-Speed 3468.03 samples/sec Loss 12.2487 LearningRate 0.0878 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:29,402-Speed 3475.54 samples/sec Loss 12.1561 LearningRate 0.0877 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:32,345-Speed 3480.50 samples/sec Loss 12.2183 LearningRate 0.0877 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:35,275-Speed 3495.90 samples/sec Loss 12.1373 LearningRate 0.0877 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:38,242-Speed 3451.82 samples/sec Loss 11.9975 LearningRate 0.0877 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:41,182-Speed 3483.88 samples/sec Loss 12.2602 LearningRate 0.0877 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:44,123-Speed 3483.04 samples/sec Loss 12.0863 LearningRate 0.0877 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:47,062-Speed 3484.96 samples/sec Loss 11.8688 LearningRate 0.0876 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:50,005-Speed 3480.47 samples/sec Loss 12.1537 LearningRate 0.0876 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:52,947-Speed 3481.66 samples/sec Loss 12.1329 LearningRate 0.0876 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:55,889-Speed 3481.12 samples/sec Loss 11.9842 LearningRate 0.0876 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:53:58,840-Speed 3471.28 samples/sec Loss 12.1020 LearningRate 0.0876 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:01,796-Speed 3465.61 samples/sec Loss 12.0479 LearningRate 0.0875 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:04,755-Speed 3460.57 samples/sec Loss 11.9309 LearningRate 0.0875 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-10 23:54:07,687-Speed 3493.72 samples/sec Loss 11.8735 LearningRate 0.0875 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:10,628-Speed 3483.46 samples/sec Loss 12.0220 LearningRate 0.0875 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:13,573-Speed 3477.36 samples/sec Loss 11.8366 LearningRate 0.0875 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:16,523-Speed 3471.83 samples/sec Loss 11.8319 LearningRate 0.0875 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:19,484-Speed 3459.01 samples/sec Loss 12.0006 LearningRate 0.0874 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:22,439-Speed 3466.84 samples/sec Loss 11.8724 LearningRate 0.0874 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-10 23:54:25,414-Speed 3443.33 samples/sec Loss 11.9883 LearningRate 0.0874 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:28,389-Speed 3441.88 samples/sec Loss 12.0070 LearningRate 0.0874 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:31,335-Speed 3477.30 samples/sec Loss 12.2978 LearningRate 0.0874 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:34,274-Speed 3485.31 samples/sec Loss 11.9962 LearningRate 0.0873 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:37,210-Speed 3488.84 samples/sec Loss 12.0382 LearningRate 0.0873 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:40,151-Speed 3481.59 samples/sec Loss 12.0740 LearningRate 0.0873 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:43,094-Speed 3480.70 samples/sec Loss 12.0850 LearningRate 0.0873 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:46,039-Speed 3478.41 samples/sec Loss 11.8458 LearningRate 0.0873 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:49,049-Speed 3402.16 samples/sec Loss 11.9904 LearningRate 0.0872 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:52,007-Speed 3463.78 samples/sec Loss 11.9327 LearningRate 0.0872 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:54,960-Speed 3467.87 samples/sec Loss 12.0475 LearningRate 0.0872 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:54:57,929-Speed 3450.29 samples/sec Loss 12.0875 LearningRate 0.0872 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:00,875-Speed 3476.84 samples/sec Loss 11.9386 LearningRate 0.0872 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:03,831-Speed 3464.67 samples/sec Loss 11.8308 LearningRate 0.0872 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:06,767-Speed 3488.28 samples/sec Loss 12.0707 LearningRate 0.0871 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:09,714-Speed 3476.22 samples/sec Loss 11.8525 LearningRate 0.0871 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:12,657-Speed 3480.21 samples/sec Loss 11.7587 LearningRate 0.0871 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:15,609-Speed 3468.80 samples/sec Loss 11.9857 LearningRate 0.0871 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:18,568-Speed 3462.30 samples/sec Loss 12.0411 LearningRate 0.0871 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:21,513-Speed 3478.04 samples/sec Loss 11.9991 LearningRate 0.0870 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:24,456-Speed 3480.34 samples/sec Loss 11.7230 LearningRate 0.0870 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:27,408-Speed 3469.98 samples/sec Loss 11.8984 LearningRate 0.0870 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:30,353-Speed 3477.22 samples/sec Loss 11.7486 LearningRate 0.0870 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:33,299-Speed 3477.97 samples/sec Loss 11.9252 LearningRate 0.0870 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:36,245-Speed 3475.95 samples/sec Loss 12.0043 LearningRate 0.0870 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:55:39,177-Speed 3492.85 samples/sec Loss 11.7754 LearningRate 0.0869 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:42,167-Speed 3426.54 samples/sec Loss 11.9587 LearningRate 0.0869 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:45,117-Speed 3471.59 samples/sec Loss 11.8065 LearningRate 0.0869 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:48,082-Speed 3455.46 samples/sec Loss 11.7068 LearningRate 0.0869 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:51,030-Speed 3474.59 samples/sec Loss 11.6219 LearningRate 0.0869 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:53,987-Speed 3462.85 samples/sec Loss 11.8841 LearningRate 0.0868 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:56,934-Speed 3476.21 samples/sec Loss 11.8361 LearningRate 0.0868 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:55:59,904-Speed 3448.61 samples/sec Loss 12.0788 LearningRate 0.0868 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:02,931-Speed 3383.28 samples/sec Loss 11.8765 LearningRate 0.0868 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:05,943-Speed 3400.84 samples/sec Loss 11.8234 LearningRate 0.0868 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:08,876-Speed 3491.61 samples/sec Loss 11.6733 LearningRate 0.0867 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:11,832-Speed 3465.86 samples/sec Loss 11.7718 LearningRate 0.0867 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:14,782-Speed 3471.17 samples/sec Loss 11.8572 LearningRate 0.0867 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:17,746-Speed 3456.28 samples/sec Loss 11.8900 LearningRate 0.0867 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:20,690-Speed 3479.33 samples/sec Loss 11.7922 LearningRate 0.0867 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:23,640-Speed 3472.29 samples/sec Loss 11.5616 LearningRate 0.0867 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:26,589-Speed 3473.38 samples/sec Loss 11.9076 LearningRate 0.0866 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:29,537-Speed 3474.19 samples/sec Loss 11.6790 LearningRate 0.0866 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:32,487-Speed 3471.68 samples/sec Loss 11.8545 LearningRate 0.0866 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:35,442-Speed 3466.67 samples/sec Loss 11.6806 LearningRate 0.0866 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:38,380-Speed 3485.73 samples/sec Loss 11.7693 LearningRate 0.0866 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:41,346-Speed 3453.15 samples/sec Loss 11.5106 LearningRate 0.0865 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:44,297-Speed 3470.39 samples/sec Loss 11.7681 LearningRate 0.0865 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:47,319-Speed 3390.80 samples/sec Loss 11.7098 LearningRate 0.0865 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:50,336-Speed 3394.22 samples/sec Loss 11.6533 LearningRate 0.0865 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:53,298-Speed 3458.11 samples/sec Loss 11.7890 LearningRate 0.0865 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:56,246-Speed 3474.33 samples/sec Loss 11.9361 LearningRate 0.0865 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:56:59,196-Speed 3472.19 samples/sec Loss 11.7460 LearningRate 0.0864 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:02,148-Speed 3470.18 samples/sec Loss 11.8761 LearningRate 0.0864 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:05,104-Speed 3464.00 samples/sec Loss 11.7792 LearningRate 0.0864 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:08,044-Speed 3484.52 samples/sec Loss 11.6496 LearningRate 0.0864 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:11,006-Speed 3457.43 samples/sec Loss 11.9215 LearningRate 0.0864 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:13,976-Speed 3449.08 samples/sec Loss 11.7510 LearningRate 0.0863 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:17,028-Speed 3356.48 samples/sec Loss 11.7474 LearningRate 0.0863 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:19,980-Speed 3469.04 samples/sec Loss 11.6004 LearningRate 0.0863 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:22,938-Speed 3462.82 samples/sec Loss 11.6602 LearningRate 0.0863 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:25,887-Speed 3473.66 samples/sec Loss 11.7124 LearningRate 0.0863 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:28,844-Speed 3463.92 samples/sec Loss 11.7582 LearningRate 0.0863 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:31,793-Speed 3473.24 samples/sec Loss 11.8634 LearningRate 0.0862 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:34,756-Speed 3456.32 samples/sec Loss 11.6674 LearningRate 0.0862 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:37,711-Speed 3466.67 samples/sec Loss 11.7760 LearningRate 0.0862 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-10 23:57:40,665-Speed 3466.97 samples/sec Loss 11.6824 LearningRate 0.0862 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:43,614-Speed 3473.52 samples/sec Loss 11.7328 LearningRate 0.0862 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:46,563-Speed 3473.95 samples/sec Loss 11.6366 LearningRate 0.0861 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:49,508-Speed 3476.72 samples/sec Loss 11.5545 LearningRate 0.0861 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:52,471-Speed 3457.66 samples/sec Loss 11.7007 LearningRate 0.0861 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:55,419-Speed 3474.31 samples/sec Loss 11.7966 LearningRate 0.0861 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:57:58,377-Speed 3462.27 samples/sec Loss 11.6010 LearningRate 0.0861 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:01,345-Speed 3450.79 samples/sec Loss 11.6120 LearningRate 0.0861 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:04,300-Speed 3466.25 samples/sec Loss 11.7669 LearningRate 0.0860 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:07,245-Speed 3478.37 samples/sec Loss 11.6287 LearningRate 0.0860 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:10,188-Speed 3480.93 samples/sec Loss 11.6830 LearningRate 0.0860 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:13,135-Speed 3475.55 samples/sec Loss 11.4124 LearningRate 0.0860 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:16,082-Speed 3475.07 samples/sec Loss 11.7005 LearningRate 0.0860 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:19,032-Speed 3472.02 samples/sec Loss 11.6631 LearningRate 0.0859 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:21,979-Speed 3475.31 samples/sec Loss 11.6418 LearningRate 0.0859 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:24,978-Speed 3415.28 samples/sec Loss 11.6721 LearningRate 0.0859 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:58:27,963-Speed 3431.80 samples/sec Loss 11.5858 LearningRate 0.0859 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:30,920-Speed 3463.28 samples/sec Loss 11.7076 LearningRate 0.0859 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:33,867-Speed 3476.18 samples/sec Loss 11.5258 LearningRate 0.0858 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:36,817-Speed 3472.61 samples/sec Loss 11.5597 LearningRate 0.0858 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:39,778-Speed 3458.78 samples/sec Loss 11.5078 LearningRate 0.0858 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:42,727-Speed 3473.72 samples/sec Loss 11.6566 LearningRate 0.0858 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:45,672-Speed 3477.02 samples/sec Loss 11.6676 LearningRate 0.0858 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:48,630-Speed 3462.79 samples/sec Loss 11.3536 LearningRate 0.0858 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:51,605-Speed 3443.06 samples/sec Loss 11.7683 LearningRate 0.0857 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:54,570-Speed 3454.11 samples/sec Loss 11.5962 LearningRate 0.0857 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:58:57,517-Speed 3476.90 samples/sec Loss 11.5463 LearningRate 0.0857 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:00,478-Speed 3458.07 samples/sec Loss 11.4329 LearningRate 0.0857 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:03,431-Speed 3469.71 samples/sec Loss 11.4502 LearningRate 0.0857 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:06,378-Speed 3474.60 samples/sec Loss 11.4844 LearningRate 0.0856 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:09,348-Speed 3448.94 samples/sec Loss 11.4424 LearningRate 0.0856 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:12,319-Speed 3447.75 samples/sec Loss 11.7001 LearningRate 0.0856 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:15,288-Speed 3450.48 samples/sec Loss 11.6527 LearningRate 0.0856 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:18,247-Speed 3461.80 samples/sec Loss 11.6383 LearningRate 0.0856 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:21,195-Speed 3473.43 samples/sec Loss 11.3164 LearningRate 0.0856 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:24,177-Speed 3435.00 samples/sec Loss 11.5831 LearningRate 0.0855 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:27,139-Speed 3458.45 samples/sec Loss 11.5515 LearningRate 0.0855 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-10 23:59:30,090-Speed 3470.92 samples/sec Loss 11.4223 LearningRate 0.0855 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:33,060-Speed 3448.78 samples/sec Loss 11.4480 LearningRate 0.0855 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:36,053-Speed 3421.28 samples/sec Loss 11.4409 LearningRate 0.0855 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:39,035-Speed 3435.34 samples/sec Loss 11.4609 LearningRate 0.0854 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:41,984-Speed 3474.12 samples/sec Loss 11.4083 LearningRate 0.0854 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:44,933-Speed 3472.70 samples/sec Loss 11.4867 LearningRate 0.0854 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:47,878-Speed 3478.39 samples/sec Loss 11.4616 LearningRate 0.0854 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:50,831-Speed 3467.59 samples/sec Loss 11.4773 LearningRate 0.0854 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:53,782-Speed 3471.91 samples/sec Loss 11.5718 LearningRate 0.0854 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:56,733-Speed 3469.77 samples/sec Loss 11.3854 LearningRate 0.0853 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-10 23:59:59,708-Speed 3443.20 samples/sec Loss 11.3365 LearningRate 0.0853 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:02,667-Speed 3462.18 samples/sec Loss 11.4711 LearningRate 0.0853 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:05,650-Speed 3433.14 samples/sec Loss 11.4543 LearningRate 0.0853 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:08,631-Speed 3436.80 samples/sec Loss 11.3512 LearningRate 0.0853 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:11,580-Speed 3473.31 samples/sec Loss 11.4632 LearningRate 0.0852 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:14,531-Speed 3470.49 samples/sec Loss 11.3771 LearningRate 0.0852 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:17,484-Speed 3468.76 samples/sec Loss 11.5698 LearningRate 0.0852 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:20,434-Speed 3472.08 samples/sec Loss 11.5089 LearningRate 0.0852 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:23,405-Speed 3447.55 samples/sec Loss 11.4040 LearningRate 0.0852 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:26,356-Speed 3470.32 samples/sec Loss 11.4682 LearningRate 0.0852 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:29,301-Speed 3478.02 samples/sec Loss 11.2888 LearningRate 0.0851 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:32,247-Speed 3476.45 samples/sec Loss 11.3165 LearningRate 0.0851 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:00:35,188-Speed 3483.23 samples/sec Loss 11.5606 LearningRate 0.0851 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:38,144-Speed 3465.67 samples/sec Loss 11.2641 LearningRate 0.0851 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:41,098-Speed 3467.59 samples/sec Loss 11.4801 LearningRate 0.0851 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:44,044-Speed 3476.15 samples/sec Loss 11.3758 LearningRate 0.0850 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:46,995-Speed 3470.66 samples/sec Loss 11.3019 LearningRate 0.0850 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:49,946-Speed 3471.21 samples/sec Loss 11.2423 LearningRate 0.0850 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:52,898-Speed 3469.89 samples/sec Loss 11.3940 LearningRate 0.0850 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:55,889-Speed 3424.19 samples/sec Loss 11.3758 LearningRate 0.0850 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:00:58,839-Speed 3472.34 samples/sec Loss 11.5091 LearningRate 0.0850 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:01:01,797-Speed 3462.83 samples/sec Loss 11.5032 LearningRate 0.0849 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:01:04,753-Speed 3465.10 samples/sec Loss 11.2371 LearningRate 0.0849 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:07,701-Speed 3473.97 samples/sec Loss 11.2223 LearningRate 0.0849 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:10,705-Speed 3410.46 samples/sec Loss 11.2812 LearningRate 0.0849 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:13,757-Speed 3356.00 samples/sec Loss 11.2735 LearningRate 0.0849 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:16,737-Speed 3437.13 samples/sec Loss 11.3711 LearningRate 0.0848 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:19,686-Speed 3473.06 samples/sec Loss 11.1554 LearningRate 0.0848 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:01:22,634-Speed 3474.39 samples/sec Loss 11.3769 LearningRate 0.0848 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:02:07,094-[lfw][8000]XNorm: 21.365277 Training: 2022-04-11 00:02:07,095-[lfw][8000]Accuracy-Flip: 0.99467+-0.00340 Training: 2022-04-11 00:02:07,095-[lfw][8000]Accuracy-Highest: 0.99467 Training: 2022-04-11 00:02:58,495-[cfp_fp][8000]XNorm: 18.598027 Training: 2022-04-11 00:02:58,496-[cfp_fp][8000]Accuracy-Flip: 0.91271+-0.01554 Training: 2022-04-11 00:02:58,496-[cfp_fp][8000]Accuracy-Highest: 0.92186 Training: 2022-04-11 00:03:42,819-[agedb_30][8000]XNorm: 20.583255 Training: 2022-04-11 00:03:42,820-[agedb_30][8000]Accuracy-Flip: 0.95633+-0.00636 Training: 2022-04-11 00:03:42,821-[agedb_30][8000]Accuracy-Highest: 0.95683 Training: 2022-04-11 00:03:45,755-Speed 71.55 samples/sec Loss 11.3956 LearningRate 0.0848 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 00:03:48,693-Speed 3485.69 samples/sec Loss 11.3459 LearningRate 0.0848 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 00:03:51,622-Speed 3497.56 samples/sec Loss 11.4367 LearningRate 0.0848 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:03:54,551-Speed 3496.95 samples/sec Loss 11.4454 LearningRate 0.0847 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:03:57,487-Speed 3488.88 samples/sec Loss 11.2438 LearningRate 0.0847 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:00,421-Speed 3491.40 samples/sec Loss 11.4209 LearningRate 0.0847 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:03,359-Speed 3485.61 samples/sec Loss 11.2991 LearningRate 0.0847 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:06,360-Speed 3413.33 samples/sec Loss 11.1986 LearningRate 0.0847 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:09,346-Speed 3429.94 samples/sec Loss 11.2959 LearningRate 0.0846 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:12,282-Speed 3488.64 samples/sec Loss 11.4003 LearningRate 0.0846 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:15,217-Speed 3489.56 samples/sec Loss 11.1969 LearningRate 0.0846 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:18,159-Speed 3481.72 samples/sec Loss 11.2675 LearningRate 0.0846 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:21,096-Speed 3488.27 samples/sec Loss 11.2961 LearningRate 0.0846 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:24,021-Speed 3502.00 samples/sec Loss 11.2967 LearningRate 0.0846 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:26,977-Speed 3464.04 samples/sec Loss 11.2988 LearningRate 0.0845 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:29,917-Speed 3484.71 samples/sec Loss 11.1302 LearningRate 0.0845 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:32,855-Speed 3486.02 samples/sec Loss 11.1367 LearningRate 0.0845 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:35,794-Speed 3485.08 samples/sec Loss 11.2728 LearningRate 0.0845 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:04:38,719-Speed 3501.77 samples/sec Loss 11.0672 LearningRate 0.0845 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:41,666-Speed 3475.53 samples/sec Loss 11.3449 LearningRate 0.0844 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:44,607-Speed 3482.80 samples/sec Loss 11.4069 LearningRate 0.0844 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:47,551-Speed 3479.87 samples/sec Loss 11.3550 LearningRate 0.0844 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:50,511-Speed 3460.39 samples/sec Loss 11.2673 LearningRate 0.0844 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:53,451-Speed 3483.79 samples/sec Loss 11.4142 LearningRate 0.0844 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:56,390-Speed 3484.55 samples/sec Loss 11.2936 LearningRate 0.0844 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:04:59,322-Speed 3494.09 samples/sec Loss 11.3169 LearningRate 0.0843 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:02,270-Speed 3473.48 samples/sec Loss 11.3213 LearningRate 0.0843 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:05,209-Speed 3485.69 samples/sec Loss 11.2319 LearningRate 0.0843 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:08,146-Speed 3487.00 samples/sec Loss 11.0289 LearningRate 0.0843 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:11,091-Speed 3477.72 samples/sec Loss 11.2461 LearningRate 0.0843 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:14,028-Speed 3488.47 samples/sec Loss 11.3124 LearningRate 0.0842 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:17,006-Speed 3439.53 samples/sec Loss 11.2455 LearningRate 0.0842 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:19,943-Speed 3487.20 samples/sec Loss 11.2093 LearningRate 0.0842 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:22,880-Speed 3487.37 samples/sec Loss 11.3841 LearningRate 0.0842 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:25,817-Speed 3488.42 samples/sec Loss 11.0479 LearningRate 0.0842 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 00:05:28,754-Speed 3486.38 samples/sec Loss 11.1351 LearningRate 0.0842 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:31,711-Speed 3463.53 samples/sec Loss 11.4995 LearningRate 0.0841 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:34,659-Speed 3475.87 samples/sec Loss 11.1889 LearningRate 0.0841 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:37,605-Speed 3476.56 samples/sec Loss 11.2275 LearningRate 0.0841 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:40,593-Speed 3427.72 samples/sec Loss 11.0607 LearningRate 0.0841 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:43,531-Speed 3485.66 samples/sec Loss 11.2486 LearningRate 0.0841 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:46,477-Speed 3476.89 samples/sec Loss 11.2538 LearningRate 0.0840 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:49,446-Speed 3450.35 samples/sec Loss 11.1443 LearningRate 0.0840 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:52,389-Speed 3481.06 samples/sec Loss 11.1824 LearningRate 0.0840 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:55,325-Speed 3487.98 samples/sec Loss 11.0909 LearningRate 0.0840 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:05:58,264-Speed 3484.93 samples/sec Loss 11.1965 LearningRate 0.0840 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:06:01,217-Speed 3468.53 samples/sec Loss 11.2537 LearningRate 0.0840 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:06:04,162-Speed 3479.02 samples/sec Loss 11.1734 LearningRate 0.0839 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:06:07,128-Speed 3453.48 samples/sec Loss 11.1457 LearningRate 0.0839 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:06:10,070-Speed 3481.99 samples/sec Loss 11.0985 LearningRate 0.0839 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:06:13,097-Speed 3383.17 samples/sec Loss 11.1213 LearningRate 0.0839 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:16,051-Speed 3467.72 samples/sec Loss 11.1169 LearningRate 0.0839 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:18,991-Speed 3483.56 samples/sec Loss 11.0478 LearningRate 0.0838 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:21,939-Speed 3474.42 samples/sec Loss 11.0467 LearningRate 0.0838 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:24,889-Speed 3472.04 samples/sec Loss 11.0596 LearningRate 0.0838 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:27,838-Speed 3472.99 samples/sec Loss 11.3323 LearningRate 0.0838 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:30,801-Speed 3457.54 samples/sec Loss 11.1410 LearningRate 0.0838 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:33,744-Speed 3479.83 samples/sec Loss 11.1214 LearningRate 0.0838 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:36,683-Speed 3485.60 samples/sec Loss 11.2429 LearningRate 0.0837 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:39,623-Speed 3483.37 samples/sec Loss 11.0917 LearningRate 0.0837 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:42,555-Speed 3493.16 samples/sec Loss 11.3084 LearningRate 0.0837 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:45,494-Speed 3485.46 samples/sec Loss 11.0126 LearningRate 0.0837 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:48,437-Speed 3480.33 samples/sec Loss 11.0178 LearningRate 0.0837 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:51,410-Speed 3444.94 samples/sec Loss 11.2206 LearningRate 0.0836 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:54,371-Speed 3458.98 samples/sec Loss 11.1990 LearningRate 0.0836 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:06:57,320-Speed 3474.28 samples/sec Loss 11.1074 LearningRate 0.0836 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:00,261-Speed 3482.83 samples/sec Loss 11.1375 LearningRate 0.0836 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:03,202-Speed 3482.16 samples/sec Loss 11.1194 LearningRate 0.0836 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:06,145-Speed 3479.93 samples/sec Loss 11.0757 LearningRate 0.0836 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:09,086-Speed 3482.99 samples/sec Loss 11.1700 LearningRate 0.0835 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:12,031-Speed 3478.35 samples/sec Loss 11.2349 LearningRate 0.0835 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:14,977-Speed 3476.87 samples/sec Loss 11.0493 LearningRate 0.0835 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:18,010-Speed 3377.06 samples/sec Loss 11.0805 LearningRate 0.0835 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:20,951-Speed 3482.89 samples/sec Loss 11.2576 LearningRate 0.0835 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:23,897-Speed 3476.72 samples/sec Loss 11.1401 LearningRate 0.0834 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:26,837-Speed 3484.07 samples/sec Loss 10.9691 LearningRate 0.0834 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:29,779-Speed 3482.22 samples/sec Loss 10.9638 LearningRate 0.0834 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:32,718-Speed 3484.27 samples/sec Loss 11.0016 LearningRate 0.0834 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:35,660-Speed 3481.65 samples/sec Loss 10.9286 LearningRate 0.0834 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:07:38,591-Speed 3494.82 samples/sec Loss 11.2384 LearningRate 0.0834 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:41,530-Speed 3484.73 samples/sec Loss 11.0215 LearningRate 0.0833 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:44,469-Speed 3484.61 samples/sec Loss 10.9443 LearningRate 0.0833 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:47,420-Speed 3470.64 samples/sec Loss 10.8001 LearningRate 0.0833 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:50,369-Speed 3474.04 samples/sec Loss 11.0068 LearningRate 0.0833 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:53,314-Speed 3478.08 samples/sec Loss 11.0704 LearningRate 0.0833 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:56,256-Speed 3482.37 samples/sec Loss 10.8681 LearningRate 0.0833 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:07:59,200-Speed 3478.82 samples/sec Loss 11.0944 LearningRate 0.0832 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:08:02,145-Speed 3477.33 samples/sec Loss 10.9767 LearningRate 0.0832 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:08:05,090-Speed 3479.01 samples/sec Loss 10.9145 LearningRate 0.0832 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:08:08,032-Speed 3480.90 samples/sec Loss 10.8843 LearningRate 0.0832 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:10,971-Speed 3485.13 samples/sec Loss 11.0924 LearningRate 0.0832 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:13,920-Speed 3472.69 samples/sec Loss 11.0196 LearningRate 0.0831 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:16,864-Speed 3479.84 samples/sec Loss 10.9484 LearningRate 0.0831 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:19,811-Speed 3476.32 samples/sec Loss 11.0023 LearningRate 0.0831 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:22,748-Speed 3487.24 samples/sec Loss 11.0219 LearningRate 0.0831 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:25,693-Speed 3477.52 samples/sec Loss 10.9338 LearningRate 0.0831 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:28,636-Speed 3479.96 samples/sec Loss 11.0331 LearningRate 0.0831 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:31,581-Speed 3478.00 samples/sec Loss 11.0181 LearningRate 0.0830 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:34,525-Speed 3479.31 samples/sec Loss 10.9242 LearningRate 0.0830 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:37,465-Speed 3484.48 samples/sec Loss 10.9845 LearningRate 0.0830 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:40,414-Speed 3473.35 samples/sec Loss 11.1768 LearningRate 0.0830 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:43,356-Speed 3482.11 samples/sec Loss 10.8713 LearningRate 0.0830 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:46,299-Speed 3480.17 samples/sec Loss 10.9125 LearningRate 0.0829 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:49,280-Speed 3435.28 samples/sec Loss 11.1757 LearningRate 0.0829 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:52,228-Speed 3474.13 samples/sec Loss 10.8795 LearningRate 0.0829 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:55,174-Speed 3477.41 samples/sec Loss 10.9134 LearningRate 0.0829 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:08:58,115-Speed 3482.09 samples/sec Loss 10.9638 LearningRate 0.0829 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:01,059-Speed 3480.00 samples/sec Loss 10.9109 LearningRate 0.0829 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:04,003-Speed 3478.46 samples/sec Loss 11.1490 LearningRate 0.0828 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:06,937-Speed 3491.64 samples/sec Loss 10.8609 LearningRate 0.0828 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:09,881-Speed 3479.06 samples/sec Loss 10.8922 LearningRate 0.0828 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:12,830-Speed 3472.88 samples/sec Loss 10.8471 LearningRate 0.0828 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:15,781-Speed 3471.70 samples/sec Loss 10.9641 LearningRate 0.0828 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:18,727-Speed 3475.81 samples/sec Loss 10.9927 LearningRate 0.0827 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:21,674-Speed 3475.29 samples/sec Loss 10.8262 LearningRate 0.0827 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:24,618-Speed 3479.92 samples/sec Loss 10.9571 LearningRate 0.0827 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:27,569-Speed 3470.16 samples/sec Loss 10.8033 LearningRate 0.0827 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:30,512-Speed 3481.44 samples/sec Loss 10.7804 LearningRate 0.0827 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:33,459-Speed 3475.11 samples/sec Loss 10.8388 LearningRate 0.0827 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:36,392-Speed 3492.68 samples/sec Loss 10.8360 LearningRate 0.0826 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:39,340-Speed 3474.47 samples/sec Loss 11.0047 LearningRate 0.0826 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:42,290-Speed 3471.44 samples/sec Loss 10.8708 LearningRate 0.0826 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:45,240-Speed 3473.27 samples/sec Loss 10.8828 LearningRate 0.0826 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:48,186-Speed 3475.96 samples/sec Loss 10.7763 LearningRate 0.0826 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:51,141-Speed 3466.18 samples/sec Loss 10.9823 LearningRate 0.0825 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:54,097-Speed 3465.36 samples/sec Loss 10.8639 LearningRate 0.0825 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:09:57,055-Speed 3461.97 samples/sec Loss 10.7900 LearningRate 0.0825 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:00,092-Speed 3373.28 samples/sec Loss 10.6864 LearningRate 0.0825 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:03,042-Speed 3471.88 samples/sec Loss 10.8869 LearningRate 0.0825 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:05,989-Speed 3476.89 samples/sec Loss 10.7884 LearningRate 0.0825 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:10:08,928-Speed 3484.46 samples/sec Loss 10.9562 LearningRate 0.0824 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:11,871-Speed 3479.93 samples/sec Loss 10.8761 LearningRate 0.0824 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:14,820-Speed 3473.06 samples/sec Loss 10.7988 LearningRate 0.0824 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:17,768-Speed 3475.16 samples/sec Loss 10.8431 LearningRate 0.0824 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:20,718-Speed 3471.97 samples/sec Loss 10.7508 LearningRate 0.0824 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:23,664-Speed 3476.04 samples/sec Loss 11.0304 LearningRate 0.0824 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:26,614-Speed 3473.20 samples/sec Loss 10.7336 LearningRate 0.0823 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:29,564-Speed 3472.44 samples/sec Loss 10.8956 LearningRate 0.0823 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:32,508-Speed 3478.24 samples/sec Loss 10.9184 LearningRate 0.0823 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:35,472-Speed 3455.78 samples/sec Loss 10.7233 LearningRate 0.0823 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:38,429-Speed 3463.99 samples/sec Loss 10.8200 LearningRate 0.0823 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:10:41,366-Speed 3487.92 samples/sec Loss 10.7712 LearningRate 0.0822 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:44,316-Speed 3471.25 samples/sec Loss 10.8268 LearningRate 0.0822 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:10:47,253-Speed 3487.84 samples/sec Loss 11.1065 LearningRate 0.0822 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:10:50,210-Speed 3463.54 samples/sec Loss 10.6985 LearningRate 0.0822 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:10:53,182-Speed 3447.30 samples/sec Loss 10.9344 LearningRate 0.0822 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:10:56,128-Speed 3476.89 samples/sec Loss 10.7886 LearningRate 0.0822 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:10:59,075-Speed 3475.59 samples/sec Loss 10.6894 LearningRate 0.0821 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:02,023-Speed 3473.91 samples/sec Loss 10.8038 LearningRate 0.0821 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:04,964-Speed 3482.58 samples/sec Loss 10.8463 LearningRate 0.0821 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:07,912-Speed 3474.17 samples/sec Loss 10.7818 LearningRate 0.0821 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:10,861-Speed 3473.91 samples/sec Loss 10.7658 LearningRate 0.0821 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:13,808-Speed 3475.93 samples/sec Loss 10.7793 LearningRate 0.0820 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:11:16,759-Speed 3469.65 samples/sec Loss 10.6763 LearningRate 0.0820 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:19,708-Speed 3474.25 samples/sec Loss 10.6945 LearningRate 0.0820 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:22,654-Speed 3476.25 samples/sec Loss 10.8963 LearningRate 0.0820 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:25,607-Speed 3469.84 samples/sec Loss 10.5752 LearningRate 0.0820 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:28,551-Speed 3478.42 samples/sec Loss 10.9165 LearningRate 0.0820 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:31,514-Speed 3456.72 samples/sec Loss 11.0723 LearningRate 0.0819 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:34,468-Speed 3467.13 samples/sec Loss 10.8353 LearningRate 0.0819 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:37,412-Speed 3478.91 samples/sec Loss 10.6384 LearningRate 0.0819 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:40,391-Speed 3439.00 samples/sec Loss 10.6024 LearningRate 0.0819 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:43,355-Speed 3455.71 samples/sec Loss 10.5457 LearningRate 0.0819 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:46,294-Speed 3484.86 samples/sec Loss 10.8110 LearningRate 0.0818 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:49,242-Speed 3475.11 samples/sec Loss 10.7032 LearningRate 0.0818 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:52,189-Speed 3475.43 samples/sec Loss 10.8520 LearningRate 0.0818 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:55,132-Speed 3480.20 samples/sec Loss 10.6608 LearningRate 0.0818 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:11:58,080-Speed 3474.48 samples/sec Loss 10.6234 LearningRate 0.0818 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:01,025-Speed 3478.06 samples/sec Loss 10.7683 LearningRate 0.0818 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:03,971-Speed 3476.31 samples/sec Loss 10.8788 LearningRate 0.0817 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:06,916-Speed 3478.15 samples/sec Loss 10.6470 LearningRate 0.0817 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:09,863-Speed 3475.67 samples/sec Loss 10.7109 LearningRate 0.0817 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:12,812-Speed 3472.98 samples/sec Loss 10.5228 LearningRate 0.0817 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:15,776-Speed 3456.48 samples/sec Loss 10.6189 LearningRate 0.0817 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:18,724-Speed 3474.30 samples/sec Loss 10.6864 LearningRate 0.0817 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:21,675-Speed 3470.38 samples/sec Loss 10.5862 LearningRate 0.0816 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:24,617-Speed 3481.72 samples/sec Loss 10.5983 LearningRate 0.0816 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:27,565-Speed 3474.63 samples/sec Loss 10.6979 LearningRate 0.0816 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:30,516-Speed 3470.41 samples/sec Loss 10.6763 LearningRate 0.0816 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:33,464-Speed 3474.98 samples/sec Loss 10.8362 LearningRate 0.0816 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:36,405-Speed 3482.36 samples/sec Loss 10.6316 LearningRate 0.0815 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:39,352-Speed 3476.14 samples/sec Loss 10.6581 LearningRate 0.0815 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:42,298-Speed 3477.20 samples/sec Loss 10.5686 LearningRate 0.0815 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:45,243-Speed 3478.40 samples/sec Loss 10.6004 LearningRate 0.0815 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:12:48,187-Speed 3479.11 samples/sec Loss 10.7134 LearningRate 0.0815 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:51,131-Speed 3478.25 samples/sec Loss 10.8971 LearningRate 0.0815 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:54,088-Speed 3464.40 samples/sec Loss 10.6830 LearningRate 0.0814 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:12:57,053-Speed 3454.71 samples/sec Loss 10.7138 LearningRate 0.0814 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:13:00,001-Speed 3473.65 samples/sec Loss 10.7532 LearningRate 0.0814 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:13:02,969-Speed 3452.01 samples/sec Loss 10.6971 LearningRate 0.0814 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:13:05,905-Speed 3488.35 samples/sec Loss 10.7690 LearningRate 0.0814 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:08,849-Speed 3479.37 samples/sec Loss 10.5667 LearningRate 0.0813 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:11,794-Speed 3478.11 samples/sec Loss 10.5397 LearningRate 0.0813 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:14,752-Speed 3461.88 samples/sec Loss 10.8904 LearningRate 0.0813 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:17,704-Speed 3470.37 samples/sec Loss 10.6242 LearningRate 0.0813 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:20,646-Speed 3481.79 samples/sec Loss 10.5891 LearningRate 0.0813 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:23,607-Speed 3458.19 samples/sec Loss 10.4777 LearningRate 0.0813 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:26,554-Speed 3476.26 samples/sec Loss 10.6451 LearningRate 0.0812 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:29,500-Speed 3477.27 samples/sec Loss 10.7612 LearningRate 0.0812 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:13:32,443-Speed 3480.05 samples/sec Loss 10.6047 LearningRate 0.0812 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:14:16,575-[lfw][10000]XNorm: 21.927683 Training: 2022-04-11 00:14:16,576-[lfw][10000]Accuracy-Flip: 0.99517+-0.00320 Training: 2022-04-11 00:14:16,576-[lfw][10000]Accuracy-Highest: 0.99517 Training: 2022-04-11 00:15:07,799-[cfp_fp][10000]XNorm: 18.705165 Training: 2022-04-11 00:15:07,800-[cfp_fp][10000]Accuracy-Flip: 0.91743+-0.01279 Training: 2022-04-11 00:15:07,800-[cfp_fp][10000]Accuracy-Highest: 0.92186 Training: 2022-04-11 00:15:51,796-[agedb_30][10000]XNorm: 21.474425 Training: 2022-04-11 00:15:51,796-[agedb_30][10000]Accuracy-Flip: 0.96000+-0.00913 Training: 2022-04-11 00:15:51,797-[agedb_30][10000]Accuracy-Highest: 0.96000 Training: 2022-04-11 00:15:54,743-Speed 71.96 samples/sec Loss 10.5565 LearningRate 0.0812 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:15:57,670-Speed 3499.08 samples/sec Loss 10.5759 LearningRate 0.0812 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:00,594-Speed 3503.00 samples/sec Loss 10.6135 LearningRate 0.0812 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:03,532-Speed 3487.30 samples/sec Loss 10.7502 LearningRate 0.0811 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:06,459-Speed 3499.10 samples/sec Loss 10.6382 LearningRate 0.0811 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:09,390-Speed 3494.53 samples/sec Loss 10.4997 LearningRate 0.0811 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:12,318-Speed 3498.81 samples/sec Loss 10.5600 LearningRate 0.0811 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:15,265-Speed 3474.59 samples/sec Loss 10.5732 LearningRate 0.0811 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:18,198-Speed 3493.10 samples/sec Loss 10.5349 LearningRate 0.0810 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:21,128-Speed 3495.71 samples/sec Loss 10.6567 LearningRate 0.0810 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:16:24,102-Speed 3443.50 samples/sec Loss 10.4294 LearningRate 0.0810 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:37,672-Speed 754.70 samples/sec Loss 10.1366 LearningRate 0.0810 Epoch: 2 Global Step: 10120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:40,634-Speed 3458.63 samples/sec Loss 9.6732 LearningRate 0.0810 Epoch: 2 Global Step: 10130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:43,584-Speed 3472.39 samples/sec Loss 9.9733 LearningRate 0.0810 Epoch: 2 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:46,542-Speed 3462.91 samples/sec Loss 10.0234 LearningRate 0.0809 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:49,500-Speed 3463.31 samples/sec Loss 9.7080 LearningRate 0.0809 Epoch: 2 Global Step: 10160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:52,456-Speed 3465.04 samples/sec Loss 9.6377 LearningRate 0.0809 Epoch: 2 Global Step: 10170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:55,405-Speed 3473.58 samples/sec Loss 9.8527 LearningRate 0.0809 Epoch: 2 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:16:58,382-Speed 3440.48 samples/sec Loss 9.8739 LearningRate 0.0809 Epoch: 2 Global Step: 10190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:01,331-Speed 3473.65 samples/sec Loss 9.7343 LearningRate 0.0809 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:04,270-Speed 3484.89 samples/sec Loss 9.8940 LearningRate 0.0808 Epoch: 2 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:07,215-Speed 3477.82 samples/sec Loss 9.9619 LearningRate 0.0808 Epoch: 2 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:10,161-Speed 3476.84 samples/sec Loss 9.8723 LearningRate 0.0808 Epoch: 2 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:13,098-Speed 3487.64 samples/sec Loss 9.9105 LearningRate 0.0808 Epoch: 2 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:16,037-Speed 3485.28 samples/sec Loss 9.6189 LearningRate 0.0808 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:19,000-Speed 3457.46 samples/sec Loss 9.7835 LearningRate 0.0807 Epoch: 2 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:21,942-Speed 3480.83 samples/sec Loss 9.8646 LearningRate 0.0807 Epoch: 2 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:24,892-Speed 3472.55 samples/sec Loss 9.9167 LearningRate 0.0807 Epoch: 2 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:27,848-Speed 3464.80 samples/sec Loss 9.9956 LearningRate 0.0807 Epoch: 2 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:30,784-Speed 3488.60 samples/sec Loss 9.9784 LearningRate 0.0807 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:33,710-Speed 3501.40 samples/sec Loss 9.9448 LearningRate 0.0807 Epoch: 2 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:36,653-Speed 3480.04 samples/sec Loss 9.9315 LearningRate 0.0806 Epoch: 2 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:17:39,609-Speed 3465.18 samples/sec Loss 9.9768 LearningRate 0.0806 Epoch: 2 Global Step: 10330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:42,548-Speed 3484.26 samples/sec Loss 10.0251 LearningRate 0.0806 Epoch: 2 Global Step: 10340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:45,490-Speed 3481.03 samples/sec Loss 10.0981 LearningRate 0.0806 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:48,431-Speed 3484.04 samples/sec Loss 10.0712 LearningRate 0.0806 Epoch: 2 Global Step: 10360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:51,378-Speed 3475.70 samples/sec Loss 10.0176 LearningRate 0.0805 Epoch: 2 Global Step: 10370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:54,323-Speed 3478.12 samples/sec Loss 10.1186 LearningRate 0.0805 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:17:57,265-Speed 3480.49 samples/sec Loss 9.9449 LearningRate 0.0805 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:00,209-Speed 3479.57 samples/sec Loss 10.0306 LearningRate 0.0805 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:03,161-Speed 3470.16 samples/sec Loss 10.1588 LearningRate 0.0805 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:06,105-Speed 3478.75 samples/sec Loss 9.9971 LearningRate 0.0805 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:09,049-Speed 3479.59 samples/sec Loss 10.1750 LearningRate 0.0804 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:11,992-Speed 3480.39 samples/sec Loss 10.1048 LearningRate 0.0804 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:14,938-Speed 3476.92 samples/sec Loss 10.0031 LearningRate 0.0804 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:17,881-Speed 3480.67 samples/sec Loss 9.9977 LearningRate 0.0804 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:20,824-Speed 3479.88 samples/sec Loss 9.8892 LearningRate 0.0804 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:23,803-Speed 3438.06 samples/sec Loss 9.9977 LearningRate 0.0804 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:26,747-Speed 3479.46 samples/sec Loss 9.9510 LearningRate 0.0803 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:29,708-Speed 3459.83 samples/sec Loss 10.1474 LearningRate 0.0803 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:32,656-Speed 3473.50 samples/sec Loss 10.0370 LearningRate 0.0803 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:35,602-Speed 3476.76 samples/sec Loss 10.0398 LearningRate 0.0803 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:38,550-Speed 3475.32 samples/sec Loss 10.0722 LearningRate 0.0803 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:41,513-Speed 3456.89 samples/sec Loss 10.3676 LearningRate 0.0802 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:44,455-Speed 3481.86 samples/sec Loss 10.2310 LearningRate 0.0802 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:47,399-Speed 3478.69 samples/sec Loss 10.1496 LearningRate 0.0802 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:18:50,326-Speed 3499.07 samples/sec Loss 10.2190 LearningRate 0.0802 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:53,270-Speed 3479.04 samples/sec Loss 10.2912 LearningRate 0.0802 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:56,211-Speed 3483.19 samples/sec Loss 10.2302 LearningRate 0.0802 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:18:59,155-Speed 3479.06 samples/sec Loss 10.1436 LearningRate 0.0801 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:02,106-Speed 3470.66 samples/sec Loss 10.2062 LearningRate 0.0801 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:05,061-Speed 3465.94 samples/sec Loss 10.2153 LearningRate 0.0801 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:08,016-Speed 3466.20 samples/sec Loss 10.1390 LearningRate 0.0801 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:10,957-Speed 3483.51 samples/sec Loss 10.1548 LearningRate 0.0801 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:13,905-Speed 3474.36 samples/sec Loss 10.2886 LearningRate 0.0801 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:16,847-Speed 3481.35 samples/sec Loss 10.0659 LearningRate 0.0800 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:19:19,787-Speed 3484.28 samples/sec Loss 10.1937 LearningRate 0.0800 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:22,725-Speed 3486.03 samples/sec Loss 10.3913 LearningRate 0.0800 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:25,668-Speed 3479.99 samples/sec Loss 10.2654 LearningRate 0.0800 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:28,615-Speed 3476.53 samples/sec Loss 10.3601 LearningRate 0.0800 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:31,565-Speed 3472.10 samples/sec Loss 10.2640 LearningRate 0.0799 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:34,508-Speed 3479.93 samples/sec Loss 10.2876 LearningRate 0.0799 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:37,454-Speed 3477.52 samples/sec Loss 10.2673 LearningRate 0.0799 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:40,400-Speed 3476.66 samples/sec Loss 10.2056 LearningRate 0.0799 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:43,356-Speed 3464.40 samples/sec Loss 10.2850 LearningRate 0.0799 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:46,299-Speed 3480.27 samples/sec Loss 10.0498 LearningRate 0.0799 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:49,232-Speed 3492.51 samples/sec Loss 10.2209 LearningRate 0.0798 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:52,176-Speed 3479.13 samples/sec Loss 10.3262 LearningRate 0.0798 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:55,129-Speed 3469.26 samples/sec Loss 10.2149 LearningRate 0.0798 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:19:58,069-Speed 3483.31 samples/sec Loss 10.1068 LearningRate 0.0798 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:01,019-Speed 3471.90 samples/sec Loss 10.1653 LearningRate 0.0798 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:03,973-Speed 3468.25 samples/sec Loss 10.3409 LearningRate 0.0798 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:06,913-Speed 3483.57 samples/sec Loss 10.2709 LearningRate 0.0797 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:09,855-Speed 3481.43 samples/sec Loss 10.0901 LearningRate 0.0797 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:12,803-Speed 3474.29 samples/sec Loss 10.1778 LearningRate 0.0797 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:15,748-Speed 3477.40 samples/sec Loss 10.3112 LearningRate 0.0797 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:18,697-Speed 3473.84 samples/sec Loss 10.1159 LearningRate 0.0797 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:20:21,630-Speed 3492.55 samples/sec Loss 10.1064 LearningRate 0.0796 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:24,582-Speed 3469.73 samples/sec Loss 10.1144 LearningRate 0.0796 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:27,527-Speed 3477.60 samples/sec Loss 10.1725 LearningRate 0.0796 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:30,469-Speed 3481.89 samples/sec Loss 10.2907 LearningRate 0.0796 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:33,416-Speed 3476.18 samples/sec Loss 10.0941 LearningRate 0.0796 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:36,356-Speed 3482.81 samples/sec Loss 10.1132 LearningRate 0.0796 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:39,300-Speed 3479.79 samples/sec Loss 10.3284 LearningRate 0.0795 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:42,244-Speed 3479.01 samples/sec Loss 10.0877 LearningRate 0.0795 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:45,195-Speed 3471.26 samples/sec Loss 10.1720 LearningRate 0.0795 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:48,139-Speed 3478.70 samples/sec Loss 10.1693 LearningRate 0.0795 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:51,095-Speed 3464.39 samples/sec Loss 10.2850 LearningRate 0.0795 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:54,049-Speed 3468.09 samples/sec Loss 10.3233 LearningRate 0.0795 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:56,994-Speed 3478.58 samples/sec Loss 10.2199 LearningRate 0.0794 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:20:59,937-Speed 3479.80 samples/sec Loss 10.2208 LearningRate 0.0794 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:02,886-Speed 3473.94 samples/sec Loss 10.1829 LearningRate 0.0794 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:05,885-Speed 3414.32 samples/sec Loss 10.2947 LearningRate 0.0794 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:08,830-Speed 3478.93 samples/sec Loss 10.1567 LearningRate 0.0794 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:11,781-Speed 3470.41 samples/sec Loss 10.3202 LearningRate 0.0793 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:14,742-Speed 3459.23 samples/sec Loss 10.1414 LearningRate 0.0793 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:21:17,708-Speed 3453.71 samples/sec Loss 10.0655 LearningRate 0.0793 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:20,660-Speed 3468.86 samples/sec Loss 10.3676 LearningRate 0.0793 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:23,603-Speed 3480.74 samples/sec Loss 10.1590 LearningRate 0.0793 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:26,551-Speed 3474.33 samples/sec Loss 10.1222 LearningRate 0.0793 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:29,496-Speed 3478.65 samples/sec Loss 10.1634 LearningRate 0.0792 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:32,437-Speed 3482.98 samples/sec Loss 10.2929 LearningRate 0.0792 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:35,386-Speed 3472.97 samples/sec Loss 10.2778 LearningRate 0.0792 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:38,331-Speed 3477.80 samples/sec Loss 10.1859 LearningRate 0.0792 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:41,284-Speed 3467.82 samples/sec Loss 10.2221 LearningRate 0.0792 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:44,227-Speed 3480.86 samples/sec Loss 10.1977 LearningRate 0.0792 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:47,165-Speed 3486.22 samples/sec Loss 10.1989 LearningRate 0.0791 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:50,112-Speed 3476.36 samples/sec Loss 10.1648 LearningRate 0.0791 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:53,057-Speed 3477.14 samples/sec Loss 10.1744 LearningRate 0.0791 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:56,002-Speed 3478.74 samples/sec Loss 10.1891 LearningRate 0.0791 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:21:58,958-Speed 3464.96 samples/sec Loss 10.3090 LearningRate 0.0791 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:01,900-Speed 3480.56 samples/sec Loss 10.1413 LearningRate 0.0790 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:04,846-Speed 3477.36 samples/sec Loss 10.1574 LearningRate 0.0790 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:07,788-Speed 3481.47 samples/sec Loss 10.1244 LearningRate 0.0790 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:10,745-Speed 3464.43 samples/sec Loss 10.2755 LearningRate 0.0790 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:13,712-Speed 3452.08 samples/sec Loss 10.2731 LearningRate 0.0790 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:22:16,661-Speed 3472.83 samples/sec Loss 10.3630 LearningRate 0.0790 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:19,612-Speed 3472.17 samples/sec Loss 10.1987 LearningRate 0.0789 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:22,564-Speed 3469.23 samples/sec Loss 10.2661 LearningRate 0.0789 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:25,515-Speed 3470.30 samples/sec Loss 10.1114 LearningRate 0.0789 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:28,459-Speed 3480.12 samples/sec Loss 10.0523 LearningRate 0.0789 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:31,411-Speed 3469.21 samples/sec Loss 10.1899 LearningRate 0.0789 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:34,360-Speed 3473.50 samples/sec Loss 10.1066 LearningRate 0.0789 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:37,309-Speed 3472.83 samples/sec Loss 10.2930 LearningRate 0.0788 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:40,262-Speed 3468.25 samples/sec Loss 10.2239 LearningRate 0.0788 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:43,212-Speed 3471.67 samples/sec Loss 10.1424 LearningRate 0.0788 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:46,149-Speed 3488.50 samples/sec Loss 10.1483 LearningRate 0.0788 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:49,099-Speed 3472.65 samples/sec Loss 10.1128 LearningRate 0.0788 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:52,203-Speed 3299.81 samples/sec Loss 10.2886 LearningRate 0.0787 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:55,192-Speed 3426.65 samples/sec Loss 10.3126 LearningRate 0.0787 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:22:58,156-Speed 3455.27 samples/sec Loss 10.2704 LearningRate 0.0787 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:01,109-Speed 3468.29 samples/sec Loss 10.3630 LearningRate 0.0787 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:04,066-Speed 3464.36 samples/sec Loss 10.1821 LearningRate 0.0787 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:07,018-Speed 3469.47 samples/sec Loss 10.0344 LearningRate 0.0787 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:09,984-Speed 3453.23 samples/sec Loss 10.3817 LearningRate 0.0786 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:12,938-Speed 3467.59 samples/sec Loss 10.2356 LearningRate 0.0786 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:15,891-Speed 3468.53 samples/sec Loss 10.1494 LearningRate 0.0786 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:18,843-Speed 3469.77 samples/sec Loss 10.3678 LearningRate 0.0786 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:21,793-Speed 3472.58 samples/sec Loss 10.2936 LearningRate 0.0786 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:24,734-Speed 3482.21 samples/sec Loss 10.1895 LearningRate 0.0786 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:27,690-Speed 3464.72 samples/sec Loss 10.3514 LearningRate 0.0785 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:30,639-Speed 3472.75 samples/sec Loss 10.3094 LearningRate 0.0785 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:33,586-Speed 3476.14 samples/sec Loss 10.0682 LearningRate 0.0785 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:36,531-Speed 3477.69 samples/sec Loss 10.0601 LearningRate 0.0785 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:39,484-Speed 3469.17 samples/sec Loss 10.0765 LearningRate 0.0785 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:42,451-Speed 3451.92 samples/sec Loss 10.0818 LearningRate 0.0785 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:45,390-Speed 3485.26 samples/sec Loss 10.0914 LearningRate 0.0784 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:48,338-Speed 3474.37 samples/sec Loss 10.0568 LearningRate 0.0784 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:51,288-Speed 3472.45 samples/sec Loss 10.0414 LearningRate 0.0784 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:54,248-Speed 3460.72 samples/sec Loss 10.2063 LearningRate 0.0784 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:23:57,192-Speed 3477.92 samples/sec Loss 10.2756 LearningRate 0.0784 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:00,152-Speed 3460.69 samples/sec Loss 10.0915 LearningRate 0.0783 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:03,119-Speed 3452.66 samples/sec Loss 10.1502 LearningRate 0.0783 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:06,067-Speed 3473.79 samples/sec Loss 10.0497 LearningRate 0.0783 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:09,025-Speed 3463.36 samples/sec Loss 10.1324 LearningRate 0.0783 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:11,985-Speed 3460.72 samples/sec Loss 10.0955 LearningRate 0.0783 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:14,927-Speed 3481.25 samples/sec Loss 10.0003 LearningRate 0.0783 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:24:17,865-Speed 3485.80 samples/sec Loss 10.0109 LearningRate 0.0782 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:20,810-Speed 3478.64 samples/sec Loss 10.1873 LearningRate 0.0782 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:23,778-Speed 3450.51 samples/sec Loss 10.1648 LearningRate 0.0782 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:26,743-Speed 3455.09 samples/sec Loss 10.1423 LearningRate 0.0782 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:29,689-Speed 3477.18 samples/sec Loss 10.1896 LearningRate 0.0782 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:32,640-Speed 3469.98 samples/sec Loss 10.1062 LearningRate 0.0782 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:35,587-Speed 3476.33 samples/sec Loss 10.0507 LearningRate 0.0781 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:38,537-Speed 3471.65 samples/sec Loss 10.2359 LearningRate 0.0781 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:41,486-Speed 3473.41 samples/sec Loss 10.0763 LearningRate 0.0781 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:44,434-Speed 3474.82 samples/sec Loss 9.9989 LearningRate 0.0781 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:47,369-Speed 3489.35 samples/sec Loss 9.9400 LearningRate 0.0781 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:50,317-Speed 3474.82 samples/sec Loss 10.1238 LearningRate 0.0780 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:53,261-Speed 3478.88 samples/sec Loss 9.9622 LearningRate 0.0780 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:56,207-Speed 3476.30 samples/sec Loss 10.1916 LearningRate 0.0780 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:24:59,158-Speed 3470.72 samples/sec Loss 10.1340 LearningRate 0.0780 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:02,126-Speed 3450.41 samples/sec Loss 10.1384 LearningRate 0.0780 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:05,071-Speed 3478.61 samples/sec Loss 10.0256 LearningRate 0.0780 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:08,025-Speed 3467.47 samples/sec Loss 10.0877 LearningRate 0.0779 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:10,969-Speed 3478.70 samples/sec Loss 10.0810 LearningRate 0.0779 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:13,924-Speed 3467.27 samples/sec Loss 10.2245 LearningRate 0.0779 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:16,869-Speed 3477.52 samples/sec Loss 9.9986 LearningRate 0.0779 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:19,813-Speed 3478.74 samples/sec Loss 10.0716 LearningRate 0.0779 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:22,767-Speed 3468.16 samples/sec Loss 10.2958 LearningRate 0.0779 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:25,713-Speed 3476.35 samples/sec Loss 10.0581 LearningRate 0.0778 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:28,656-Speed 3480.54 samples/sec Loss 10.1387 LearningRate 0.0778 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:31,603-Speed 3475.99 samples/sec Loss 10.1102 LearningRate 0.0778 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:34,553-Speed 3471.90 samples/sec Loss 9.9909 LearningRate 0.0778 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:37,514-Speed 3459.04 samples/sec Loss 9.9988 LearningRate 0.0778 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:40,459-Speed 3478.50 samples/sec Loss 9.9152 LearningRate 0.0778 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:43,417-Speed 3462.62 samples/sec Loss 10.2338 LearningRate 0.0777 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:46,360-Speed 3480.00 samples/sec Loss 10.0492 LearningRate 0.0777 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:25:49,294-Speed 3490.33 samples/sec Loss 10.0646 LearningRate 0.0777 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:25:52,241-Speed 3475.78 samples/sec Loss 10.1239 LearningRate 0.0777 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:26:36,602-[lfw][12000]XNorm: 21.938753 Training: 2022-04-11 00:26:36,603-[lfw][12000]Accuracy-Flip: 0.99667+-0.00422 Training: 2022-04-11 00:26:36,603-[lfw][12000]Accuracy-Highest: 0.99667 Training: 2022-04-11 00:27:28,170-[cfp_fp][12000]XNorm: 19.444285 Training: 2022-04-11 00:27:28,171-[cfp_fp][12000]Accuracy-Flip: 0.93443+-0.00824 Training: 2022-04-11 00:27:28,171-[cfp_fp][12000]Accuracy-Highest: 0.93443 Training: 2022-04-11 00:28:12,271-[agedb_30][12000]XNorm: 21.466312 Training: 2022-04-11 00:28:12,271-[agedb_30][12000]Accuracy-Flip: 0.96250+-0.00720 Training: 2022-04-11 00:28:12,272-[agedb_30][12000]Accuracy-Highest: 0.96250 Training: 2022-04-11 00:28:15,202-Speed 71.63 samples/sec Loss 9.8680 LearningRate 0.0777 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:18,127-Speed 3501.23 samples/sec Loss 10.0220 LearningRate 0.0776 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:21,052-Speed 3500.88 samples/sec Loss 10.0308 LearningRate 0.0776 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:23,982-Speed 3496.50 samples/sec Loss 10.0333 LearningRate 0.0776 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:26,913-Speed 3494.99 samples/sec Loss 9.9918 LearningRate 0.0776 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:29,842-Speed 3496.86 samples/sec Loss 10.0190 LearningRate 0.0776 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:32,779-Speed 3487.07 samples/sec Loss 10.1706 LearningRate 0.0776 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:35,722-Speed 3480.38 samples/sec Loss 10.0897 LearningRate 0.0775 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:38,648-Speed 3500.91 samples/sec Loss 10.1149 LearningRate 0.0775 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:41,590-Speed 3481.63 samples/sec Loss 10.0674 LearningRate 0.0775 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:44,521-Speed 3494.07 samples/sec Loss 10.0346 LearningRate 0.0775 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:47,522-Speed 3413.54 samples/sec Loss 9.9361 LearningRate 0.0775 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:50,588-Speed 3340.54 samples/sec Loss 10.0987 LearningRate 0.0775 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:53,536-Speed 3474.29 samples/sec Loss 10.0346 LearningRate 0.0774 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:56,474-Speed 3486.14 samples/sec Loss 10.0401 LearningRate 0.0774 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:28:59,422-Speed 3475.27 samples/sec Loss 10.0083 LearningRate 0.0774 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:02,364-Speed 3480.37 samples/sec Loss 10.0419 LearningRate 0.0774 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:05,346-Speed 3435.68 samples/sec Loss 10.0734 LearningRate 0.0774 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:08,276-Speed 3495.71 samples/sec Loss 10.1108 LearningRate 0.0774 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:11,219-Speed 3480.05 samples/sec Loss 10.1133 LearningRate 0.0773 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:14,162-Speed 3480.58 samples/sec Loss 9.9720 LearningRate 0.0773 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:17,103-Speed 3481.94 samples/sec Loss 9.8578 LearningRate 0.0773 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:20,048-Speed 3478.14 samples/sec Loss 10.0405 LearningRate 0.0773 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:22,987-Speed 3485.28 samples/sec Loss 10.1310 LearningRate 0.0773 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:25,936-Speed 3473.51 samples/sec Loss 10.1484 LearningRate 0.0772 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:28,884-Speed 3473.67 samples/sec Loss 10.0598 LearningRate 0.0772 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:31,828-Speed 3479.22 samples/sec Loss 10.0615 LearningRate 0.0772 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:34,791-Speed 3457.78 samples/sec Loss 9.8628 LearningRate 0.0772 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:29:37,709-Speed 3509.42 samples/sec Loss 10.0013 LearningRate 0.0772 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:40,662-Speed 3468.99 samples/sec Loss 9.9669 LearningRate 0.0772 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:43,607-Speed 3477.87 samples/sec Loss 10.0582 LearningRate 0.0771 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:46,563-Speed 3464.85 samples/sec Loss 10.1298 LearningRate 0.0771 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:49,523-Speed 3460.93 samples/sec Loss 10.0152 LearningRate 0.0771 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:52,464-Speed 3481.98 samples/sec Loss 9.8918 LearningRate 0.0771 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:55,417-Speed 3468.42 samples/sec Loss 9.8080 LearningRate 0.0771 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:29:58,355-Speed 3486.27 samples/sec Loss 9.8347 LearningRate 0.0771 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:01,311-Speed 3465.67 samples/sec Loss 9.9167 LearningRate 0.0770 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:04,267-Speed 3464.83 samples/sec Loss 10.0324 LearningRate 0.0770 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:07,209-Speed 3481.91 samples/sec Loss 9.8912 LearningRate 0.0770 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:10,144-Speed 3490.94 samples/sec Loss 9.9471 LearningRate 0.0770 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:13,096-Speed 3469.73 samples/sec Loss 9.9826 LearningRate 0.0770 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:16,045-Speed 3472.10 samples/sec Loss 9.9954 LearningRate 0.0770 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:19,002-Speed 3477.00 samples/sec Loss 9.9182 LearningRate 0.0769 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:21,938-Speed 3489.08 samples/sec Loss 9.9411 LearningRate 0.0769 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:24,908-Speed 3448.89 samples/sec Loss 9.9870 LearningRate 0.0769 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:27,846-Speed 3485.64 samples/sec Loss 9.7935 LearningRate 0.0769 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:30,792-Speed 3476.60 samples/sec Loss 9.8425 LearningRate 0.0769 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:33,736-Speed 3479.97 samples/sec Loss 9.9496 LearningRate 0.0768 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:30:36,665-Speed 3496.13 samples/sec Loss 9.8958 LearningRate 0.0768 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:39,609-Speed 3479.39 samples/sec Loss 9.9050 LearningRate 0.0768 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:42,553-Speed 3479.46 samples/sec Loss 9.9236 LearningRate 0.0768 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:45,496-Speed 3480.36 samples/sec Loss 9.8512 LearningRate 0.0768 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:48,436-Speed 3483.40 samples/sec Loss 10.1636 LearningRate 0.0768 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:51,399-Speed 3456.88 samples/sec Loss 10.0044 LearningRate 0.0767 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:54,347-Speed 3474.44 samples/sec Loss 9.8958 LearningRate 0.0767 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:30:57,296-Speed 3472.83 samples/sec Loss 10.1365 LearningRate 0.0767 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:31:00,237-Speed 3483.38 samples/sec Loss 9.8862 LearningRate 0.0767 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:31:03,182-Speed 3477.52 samples/sec Loss 9.8115 LearningRate 0.0767 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:31:06,129-Speed 3475.93 samples/sec Loss 9.8728 LearningRate 0.0767 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:09,071-Speed 3481.92 samples/sec Loss 9.9812 LearningRate 0.0766 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:12,014-Speed 3479.72 samples/sec Loss 9.7707 LearningRate 0.0766 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:14,957-Speed 3480.51 samples/sec Loss 9.7266 LearningRate 0.0766 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:17,905-Speed 3474.67 samples/sec Loss 9.9295 LearningRate 0.0766 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:20,841-Speed 3488.65 samples/sec Loss 9.8625 LearningRate 0.0766 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:23,780-Speed 3484.58 samples/sec Loss 10.0104 LearningRate 0.0766 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:26,726-Speed 3476.79 samples/sec Loss 9.8294 LearningRate 0.0765 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:29,668-Speed 3481.37 samples/sec Loss 9.8044 LearningRate 0.0765 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:32,610-Speed 3481.76 samples/sec Loss 9.9243 LearningRate 0.0765 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:35,554-Speed 3479.35 samples/sec Loss 9.8680 LearningRate 0.0765 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:38,516-Speed 3457.74 samples/sec Loss 9.9869 LearningRate 0.0765 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:41,458-Speed 3481.26 samples/sec Loss 9.9466 LearningRate 0.0765 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:44,408-Speed 3472.19 samples/sec Loss 9.8115 LearningRate 0.0764 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:47,369-Speed 3459.41 samples/sec Loss 9.8337 LearningRate 0.0764 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:50,314-Speed 3478.73 samples/sec Loss 9.7123 LearningRate 0.0764 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:53,251-Speed 3486.88 samples/sec Loss 9.8252 LearningRate 0.0764 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:56,193-Speed 3481.33 samples/sec Loss 9.9863 LearningRate 0.0764 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:31:59,137-Speed 3479.14 samples/sec Loss 9.9289 LearningRate 0.0763 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:32:02,083-Speed 3477.53 samples/sec Loss 9.8744 LearningRate 0.0763 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:32:05,012-Speed 3496.62 samples/sec Loss 9.7733 LearningRate 0.0763 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:32:07,965-Speed 3468.46 samples/sec Loss 9.9080 LearningRate 0.0763 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:32:10,904-Speed 3484.54 samples/sec Loss 9.9513 LearningRate 0.0763 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:13,873-Speed 3450.28 samples/sec Loss 9.7982 LearningRate 0.0763 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:16,814-Speed 3482.17 samples/sec Loss 9.8683 LearningRate 0.0762 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:19,772-Speed 3463.81 samples/sec Loss 9.8924 LearningRate 0.0762 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:22,706-Speed 3490.25 samples/sec Loss 9.9606 LearningRate 0.0762 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:25,646-Speed 3483.28 samples/sec Loss 9.9361 LearningRate 0.0762 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:28,590-Speed 3479.74 samples/sec Loss 9.7970 LearningRate 0.0762 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:31,552-Speed 3457.70 samples/sec Loss 9.8542 LearningRate 0.0762 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:34,496-Speed 3479.12 samples/sec Loss 9.8963 LearningRate 0.0761 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:37,445-Speed 3473.43 samples/sec Loss 9.6312 LearningRate 0.0761 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:40,390-Speed 3478.31 samples/sec Loss 9.9222 LearningRate 0.0761 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:43,334-Speed 3479.14 samples/sec Loss 9.9380 LearningRate 0.0761 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:46,292-Speed 3461.74 samples/sec Loss 9.8566 LearningRate 0.0761 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:49,255-Speed 3457.85 samples/sec Loss 9.7941 LearningRate 0.0761 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:52,214-Speed 3461.22 samples/sec Loss 9.7632 LearningRate 0.0760 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:55,168-Speed 3467.53 samples/sec Loss 9.6781 LearningRate 0.0760 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:32:58,144-Speed 3441.42 samples/sec Loss 9.9019 LearningRate 0.0760 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:33:01,096-Speed 3470.16 samples/sec Loss 9.7168 LearningRate 0.0760 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:33:04,037-Speed 3483.18 samples/sec Loss 9.9061 LearningRate 0.0760 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:33:06,977-Speed 3482.91 samples/sec Loss 9.6492 LearningRate 0.0759 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:33:09,917-Speed 3484.62 samples/sec Loss 9.7487 LearningRate 0.0759 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:12,997-Speed 3325.44 samples/sec Loss 9.9081 LearningRate 0.0759 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:15,943-Speed 3476.23 samples/sec Loss 9.9340 LearningRate 0.0759 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:18,897-Speed 3467.78 samples/sec Loss 9.7068 LearningRate 0.0759 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:21,864-Speed 3451.27 samples/sec Loss 9.7997 LearningRate 0.0759 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:24,827-Speed 3457.51 samples/sec Loss 9.9323 LearningRate 0.0758 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:27,783-Speed 3465.31 samples/sec Loss 9.7419 LearningRate 0.0758 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:30,730-Speed 3474.96 samples/sec Loss 9.8883 LearningRate 0.0758 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:33,676-Speed 3477.70 samples/sec Loss 9.8366 LearningRate 0.0758 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:36,644-Speed 3449.85 samples/sec Loss 9.8144 LearningRate 0.0758 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:39,627-Speed 3434.31 samples/sec Loss 9.8951 LearningRate 0.0758 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:33:42,559-Speed 3493.65 samples/sec Loss 9.9003 LearningRate 0.0757 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:45,501-Speed 3481.75 samples/sec Loss 9.6523 LearningRate 0.0757 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:48,480-Speed 3437.21 samples/sec Loss 9.6702 LearningRate 0.0757 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:51,439-Speed 3461.17 samples/sec Loss 9.7681 LearningRate 0.0757 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:54,387-Speed 3474.94 samples/sec Loss 9.7408 LearningRate 0.0757 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:33:57,335-Speed 3475.06 samples/sec Loss 9.7661 LearningRate 0.0757 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:00,283-Speed 3474.61 samples/sec Loss 9.7333 LearningRate 0.0756 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:03,224-Speed 3482.05 samples/sec Loss 9.7453 LearningRate 0.0756 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:06,174-Speed 3472.40 samples/sec Loss 9.8725 LearningRate 0.0756 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:09,121-Speed 3475.04 samples/sec Loss 9.9518 LearningRate 0.0756 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:12,076-Speed 3466.72 samples/sec Loss 9.7194 LearningRate 0.0756 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:15,018-Speed 3481.56 samples/sec Loss 9.7270 LearningRate 0.0756 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:17,959-Speed 3482.64 samples/sec Loss 9.9181 LearningRate 0.0755 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:20,912-Speed 3468.70 samples/sec Loss 9.9028 LearningRate 0.0755 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:23,856-Speed 3479.69 samples/sec Loss 9.7772 LearningRate 0.0755 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:26,806-Speed 3471.50 samples/sec Loss 9.7652 LearningRate 0.0755 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:34:29,757-Speed 3470.92 samples/sec Loss 9.7983 LearningRate 0.0755 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:34:32,719-Speed 3457.56 samples/sec Loss 9.6997 LearningRate 0.0755 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:34:35,655-Speed 3488.93 samples/sec Loss 9.8253 LearningRate 0.0754 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:38,616-Speed 3459.26 samples/sec Loss 9.8321 LearningRate 0.0754 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:41,559-Speed 3479.88 samples/sec Loss 9.6488 LearningRate 0.0754 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:44,517-Speed 3463.35 samples/sec Loss 9.8277 LearningRate 0.0754 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:47,512-Speed 3419.39 samples/sec Loss 9.7660 LearningRate 0.0754 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:50,469-Speed 3463.70 samples/sec Loss 9.8617 LearningRate 0.0753 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:53,438-Speed 3450.62 samples/sec Loss 9.8579 LearningRate 0.0753 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:56,380-Speed 3481.48 samples/sec Loss 9.6904 LearningRate 0.0753 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:34:59,347-Speed 3457.12 samples/sec Loss 9.7853 LearningRate 0.0753 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:02,316-Speed 3449.77 samples/sec Loss 9.7171 LearningRate 0.0753 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:05,260-Speed 3480.07 samples/sec Loss 9.8099 LearningRate 0.0753 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:08,212-Speed 3468.80 samples/sec Loss 9.6157 LearningRate 0.0752 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:11,172-Speed 3461.21 samples/sec Loss 9.8527 LearningRate 0.0752 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:14,134-Speed 3458.17 samples/sec Loss 9.8391 LearningRate 0.0752 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:17,077-Speed 3479.85 samples/sec Loss 9.7945 LearningRate 0.0752 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:20,046-Speed 3449.65 samples/sec Loss 9.8227 LearningRate 0.0752 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:23,005-Speed 3462.07 samples/sec Loss 9.5863 LearningRate 0.0752 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:25,967-Speed 3457.84 samples/sec Loss 9.8524 LearningRate 0.0751 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:28,914-Speed 3475.56 samples/sec Loss 9.6595 LearningRate 0.0751 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:31,875-Speed 3458.88 samples/sec Loss 9.7884 LearningRate 0.0751 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:34,818-Speed 3480.91 samples/sec Loss 9.7783 LearningRate 0.0751 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:37,774-Speed 3465.21 samples/sec Loss 9.7517 LearningRate 0.0751 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:40,726-Speed 3469.11 samples/sec Loss 9.8072 LearningRate 0.0751 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:43,680-Speed 3467.81 samples/sec Loss 9.7926 LearningRate 0.0750 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:35:46,643-Speed 3456.00 samples/sec Loss 9.5404 LearningRate 0.0750 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:49,601-Speed 3463.47 samples/sec Loss 9.6945 LearningRate 0.0750 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:52,548-Speed 3476.22 samples/sec Loss 9.7041 LearningRate 0.0750 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:55,488-Speed 3483.46 samples/sec Loss 9.6416 LearningRate 0.0750 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:35:58,434-Speed 3476.69 samples/sec Loss 9.6934 LearningRate 0.0750 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:01,402-Speed 3450.37 samples/sec Loss 9.6871 LearningRate 0.0749 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:04,346-Speed 3479.96 samples/sec Loss 9.6387 LearningRate 0.0749 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:07,308-Speed 3458.14 samples/sec Loss 9.4735 LearningRate 0.0749 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:10,272-Speed 3454.72 samples/sec Loss 9.5100 LearningRate 0.0749 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:13,250-Speed 3439.94 samples/sec Loss 9.6190 LearningRate 0.0749 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:16,204-Speed 3467.35 samples/sec Loss 9.8004 LearningRate 0.0749 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:19,151-Speed 3475.52 samples/sec Loss 9.7631 LearningRate 0.0748 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:22,101-Speed 3472.64 samples/sec Loss 9.7602 LearningRate 0.0748 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:25,072-Speed 3446.94 samples/sec Loss 9.7543 LearningRate 0.0748 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:28,097-Speed 3386.41 samples/sec Loss 9.7730 LearningRate 0.0748 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:31,041-Speed 3479.08 samples/sec Loss 9.7210 LearningRate 0.0748 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:36:34,017-Speed 3440.74 samples/sec Loss 9.6029 LearningRate 0.0747 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:36,967-Speed 3471.94 samples/sec Loss 9.7087 LearningRate 0.0747 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:39,931-Speed 3456.88 samples/sec Loss 9.8465 LearningRate 0.0747 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:42,884-Speed 3467.98 samples/sec Loss 9.5611 LearningRate 0.0747 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:45,839-Speed 3466.26 samples/sec Loss 9.7864 LearningRate 0.0747 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:48,821-Speed 3435.06 samples/sec Loss 9.6425 LearningRate 0.0747 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:51,788-Speed 3452.25 samples/sec Loss 9.7526 LearningRate 0.0746 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:54,752-Speed 3455.92 samples/sec Loss 9.4969 LearningRate 0.0746 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:36:57,693-Speed 3481.67 samples/sec Loss 9.6464 LearningRate 0.0746 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:00,662-Speed 3450.18 samples/sec Loss 9.5830 LearningRate 0.0746 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:03,613-Speed 3471.20 samples/sec Loss 9.7357 LearningRate 0.0746 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:06,576-Speed 3455.84 samples/sec Loss 9.6069 LearningRate 0.0746 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:09,537-Speed 3459.83 samples/sec Loss 9.6552 LearningRate 0.0745 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:12,484-Speed 3476.09 samples/sec Loss 9.6316 LearningRate 0.0745 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:15,436-Speed 3470.21 samples/sec Loss 9.4862 LearningRate 0.0745 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:18,377-Speed 3481.60 samples/sec Loss 9.5470 LearningRate 0.0745 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:21,323-Speed 3477.29 samples/sec Loss 9.5061 LearningRate 0.0745 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:24,271-Speed 3474.68 samples/sec Loss 9.7181 LearningRate 0.0745 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:27,228-Speed 3463.43 samples/sec Loss 9.5638 LearningRate 0.0744 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:30,196-Speed 3451.27 samples/sec Loss 9.6382 LearningRate 0.0744 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:33,151-Speed 3466.27 samples/sec Loss 9.6084 LearningRate 0.0744 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:36,128-Speed 3440.81 samples/sec Loss 9.6391 LearningRate 0.0744 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:39,076-Speed 3474.61 samples/sec Loss 9.6264 LearningRate 0.0744 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:42,019-Speed 3479.52 samples/sec Loss 9.4870 LearningRate 0.0744 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:44,987-Speed 3451.74 samples/sec Loss 9.6476 LearningRate 0.0743 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:47,942-Speed 3466.28 samples/sec Loss 9.6736 LearningRate 0.0743 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:50,890-Speed 3473.60 samples/sec Loss 9.7638 LearningRate 0.0743 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:53,831-Speed 3482.98 samples/sec Loss 9.6652 LearningRate 0.0743 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:56,790-Speed 3460.90 samples/sec Loss 9.7038 LearningRate 0.0743 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:37:59,753-Speed 3456.96 samples/sec Loss 9.6262 LearningRate 0.0743 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:38:02,707-Speed 3467.97 samples/sec Loss 9.5170 LearningRate 0.0742 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:38:46,998-[lfw][14000]XNorm: 21.923426 Training: 2022-04-11 00:38:46,998-[lfw][14000]Accuracy-Flip: 0.99617+-0.00366 Training: 2022-04-11 00:38:46,999-[lfw][14000]Accuracy-Highest: 0.99667 Training: 2022-04-11 00:39:38,479-[cfp_fp][14000]XNorm: 19.558783 Training: 2022-04-11 00:39:38,480-[cfp_fp][14000]Accuracy-Flip: 0.94229+-0.00967 Training: 2022-04-11 00:39:38,480-[cfp_fp][14000]Accuracy-Highest: 0.94229 Training: 2022-04-11 00:40:22,629-[agedb_30][14000]XNorm: 21.703387 Training: 2022-04-11 00:40:22,629-[agedb_30][14000]Accuracy-Flip: 0.96667+-0.00969 Training: 2022-04-11 00:40:22,630-[agedb_30][14000]Accuracy-Highest: 0.96667 Training: 2022-04-11 00:40:25,567-Speed 71.68 samples/sec Loss 9.6770 LearningRate 0.0742 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:28,517-Speed 3471.69 samples/sec Loss 9.6177 LearningRate 0.0742 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:31,443-Speed 3499.89 samples/sec Loss 9.4818 LearningRate 0.0742 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:34,372-Speed 3497.96 samples/sec Loss 9.4125 LearningRate 0.0742 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:37,303-Speed 3493.71 samples/sec Loss 9.5620 LearningRate 0.0742 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:40,255-Speed 3469.81 samples/sec Loss 9.6779 LearningRate 0.0741 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:43,216-Speed 3459.12 samples/sec Loss 9.6255 LearningRate 0.0741 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:46,181-Speed 3454.41 samples/sec Loss 9.4516 LearningRate 0.0741 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:49,149-Speed 3451.90 samples/sec Loss 9.7405 LearningRate 0.0741 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:52,127-Speed 3439.71 samples/sec Loss 9.4861 LearningRate 0.0741 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:40:55,113-Speed 3429.80 samples/sec Loss 9.5171 LearningRate 0.0740 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:40:58,076-Speed 3457.39 samples/sec Loss 9.5814 LearningRate 0.0740 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:01,024-Speed 3474.08 samples/sec Loss 9.5803 LearningRate 0.0740 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:03,973-Speed 3472.65 samples/sec Loss 9.6806 LearningRate 0.0740 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:06,948-Speed 3443.03 samples/sec Loss 9.4502 LearningRate 0.0740 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:09,894-Speed 3477.79 samples/sec Loss 9.7680 LearningRate 0.0740 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:12,856-Speed 3457.62 samples/sec Loss 9.5583 LearningRate 0.0739 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:15,814-Speed 3463.32 samples/sec Loss 9.5171 LearningRate 0.0739 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:18,766-Speed 3469.61 samples/sec Loss 9.6826 LearningRate 0.0739 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:21,714-Speed 3473.61 samples/sec Loss 9.5349 LearningRate 0.0739 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:24,674-Speed 3461.29 samples/sec Loss 9.5859 LearningRate 0.0739 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:41:27,616-Speed 3481.49 samples/sec Loss 9.4997 LearningRate 0.0739 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:41:30,549-Speed 3491.70 samples/sec Loss 9.6657 LearningRate 0.0738 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:33,492-Speed 3480.14 samples/sec Loss 9.6374 LearningRate 0.0738 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:36,437-Speed 3478.71 samples/sec Loss 9.5466 LearningRate 0.0738 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:39,387-Speed 3472.40 samples/sec Loss 9.4953 LearningRate 0.0738 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:42,328-Speed 3482.65 samples/sec Loss 9.5309 LearningRate 0.0738 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:45,279-Speed 3470.37 samples/sec Loss 9.4478 LearningRate 0.0738 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:48,238-Speed 3461.73 samples/sec Loss 9.4340 LearningRate 0.0737 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:51,205-Speed 3452.34 samples/sec Loss 9.6822 LearningRate 0.0737 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:54,164-Speed 3461.61 samples/sec Loss 9.5854 LearningRate 0.0737 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:41:57,113-Speed 3473.76 samples/sec Loss 9.8083 LearningRate 0.0737 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:00,068-Speed 3466.57 samples/sec Loss 9.6451 LearningRate 0.0737 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:03,002-Speed 3490.74 samples/sec Loss 9.7147 LearningRate 0.0737 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:05,968-Speed 3453.09 samples/sec Loss 9.6601 LearningRate 0.0736 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:08,948-Speed 3437.99 samples/sec Loss 9.5535 LearningRate 0.0736 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:11,913-Speed 3454.03 samples/sec Loss 9.8244 LearningRate 0.0736 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:14,883-Speed 3448.71 samples/sec Loss 9.5734 LearningRate 0.0736 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:17,860-Speed 3440.56 samples/sec Loss 9.5068 LearningRate 0.0736 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:20,835-Speed 3443.00 samples/sec Loss 9.6177 LearningRate 0.0736 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:23,785-Speed 3472.47 samples/sec Loss 9.3836 LearningRate 0.0735 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:26,751-Speed 3453.20 samples/sec Loss 9.5097 LearningRate 0.0735 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:29,704-Speed 3469.56 samples/sec Loss 9.7503 LearningRate 0.0735 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:42:32,647-Speed 3479.28 samples/sec Loss 9.5778 LearningRate 0.0735 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:35,597-Speed 3472.63 samples/sec Loss 9.5491 LearningRate 0.0735 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:38,533-Speed 3487.80 samples/sec Loss 9.4678 LearningRate 0.0735 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:41,499-Speed 3453.79 samples/sec Loss 9.5195 LearningRate 0.0734 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:44,447-Speed 3474.76 samples/sec Loss 9.4482 LearningRate 0.0734 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:47,440-Speed 3421.59 samples/sec Loss 9.7919 LearningRate 0.0734 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:50,408-Speed 3451.24 samples/sec Loss 9.5517 LearningRate 0.0734 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:53,403-Speed 3420.27 samples/sec Loss 9.6874 LearningRate 0.0734 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:56,366-Speed 3456.82 samples/sec Loss 9.3809 LearningRate 0.0734 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:42:59,336-Speed 3449.21 samples/sec Loss 9.5937 LearningRate 0.0733 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:43:02,298-Speed 3457.47 samples/sec Loss 9.5450 LearningRate 0.0733 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:05,276-Speed 3440.01 samples/sec Loss 9.5679 LearningRate 0.0733 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:08,225-Speed 3473.09 samples/sec Loss 9.5972 LearningRate 0.0733 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:11,179-Speed 3466.48 samples/sec Loss 9.6184 LearningRate 0.0733 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:14,132-Speed 3469.30 samples/sec Loss 9.5035 LearningRate 0.0733 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:17,215-Speed 3322.08 samples/sec Loss 9.4878 LearningRate 0.0732 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:20,155-Speed 3484.23 samples/sec Loss 9.4687 LearningRate 0.0732 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:23,101-Speed 3477.37 samples/sec Loss 9.4931 LearningRate 0.0732 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:26,058-Speed 3463.44 samples/sec Loss 9.5728 LearningRate 0.0732 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:29,010-Speed 3470.33 samples/sec Loss 9.6299 LearningRate 0.0732 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:31,963-Speed 3467.58 samples/sec Loss 9.6385 LearningRate 0.0732 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:43:34,905-Speed 3482.03 samples/sec Loss 9.5287 LearningRate 0.0731 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:43:37,874-Speed 3449.70 samples/sec Loss 9.4722 LearningRate 0.0731 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:43:40,844-Speed 3448.84 samples/sec Loss 9.6254 LearningRate 0.0731 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:43:43,796-Speed 3469.26 samples/sec Loss 9.4434 LearningRate 0.0731 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:46,748-Speed 3470.36 samples/sec Loss 9.5526 LearningRate 0.0731 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:49,716-Speed 3451.53 samples/sec Loss 9.4908 LearningRate 0.0730 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:52,692-Speed 3440.76 samples/sec Loss 9.4986 LearningRate 0.0730 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:55,645-Speed 3469.61 samples/sec Loss 9.4429 LearningRate 0.0730 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:43:58,592-Speed 3475.51 samples/sec Loss 9.6485 LearningRate 0.0730 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:01,542-Speed 3471.46 samples/sec Loss 9.5113 LearningRate 0.0730 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:04,533-Speed 3425.39 samples/sec Loss 9.4456 LearningRate 0.0730 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:07,482-Speed 3472.16 samples/sec Loss 9.5816 LearningRate 0.0729 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:10,449-Speed 3453.03 samples/sec Loss 9.5263 LearningRate 0.0729 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:13,402-Speed 3467.96 samples/sec Loss 9.5076 LearningRate 0.0729 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:44:16,373-Speed 3448.20 samples/sec Loss 9.4207 LearningRate 0.0729 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:19,325-Speed 3469.48 samples/sec Loss 9.4481 LearningRate 0.0729 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:22,298-Speed 3445.65 samples/sec Loss 9.6597 LearningRate 0.0729 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:25,255-Speed 3463.42 samples/sec Loss 9.3225 LearningRate 0.0728 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:28,196-Speed 3482.26 samples/sec Loss 9.3751 LearningRate 0.0728 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:31,145-Speed 3473.18 samples/sec Loss 9.3780 LearningRate 0.0728 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:34,097-Speed 3470.55 samples/sec Loss 9.3688 LearningRate 0.0728 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:37,075-Speed 3438.44 samples/sec Loss 9.4319 LearningRate 0.0728 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:40,038-Speed 3458.25 samples/sec Loss 9.3590 LearningRate 0.0728 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:43,004-Speed 3453.08 samples/sec Loss 9.5596 LearningRate 0.0727 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:44:45,985-Speed 3436.70 samples/sec Loss 9.5427 LearningRate 0.0727 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:44:48,948-Speed 3456.46 samples/sec Loss 9.4663 LearningRate 0.0727 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:44:51,910-Speed 3458.13 samples/sec Loss 9.4194 LearningRate 0.0727 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:44:54,879-Speed 3449.77 samples/sec Loss 9.2537 LearningRate 0.0727 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:44:57,841-Speed 3458.29 samples/sec Loss 9.4605 LearningRate 0.0727 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:00,825-Speed 3432.85 samples/sec Loss 9.5765 LearningRate 0.0726 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:03,825-Speed 3413.90 samples/sec Loss 9.5575 LearningRate 0.0726 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:06,775-Speed 3470.95 samples/sec Loss 9.3635 LearningRate 0.0726 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:09,745-Speed 3448.99 samples/sec Loss 9.3955 LearningRate 0.0726 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:12,686-Speed 3483.44 samples/sec Loss 9.3479 LearningRate 0.0726 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:15,697-Speed 3401.66 samples/sec Loss 9.4412 LearningRate 0.0726 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:18,684-Speed 3428.65 samples/sec Loss 9.7677 LearningRate 0.0725 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:21,640-Speed 3464.99 samples/sec Loss 9.4409 LearningRate 0.0725 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:24,594-Speed 3467.36 samples/sec Loss 9.5401 LearningRate 0.0725 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:27,593-Speed 3415.70 samples/sec Loss 9.4827 LearningRate 0.0725 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:30,587-Speed 3421.07 samples/sec Loss 9.3116 LearningRate 0.0725 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:33,546-Speed 3461.40 samples/sec Loss 9.2753 LearningRate 0.0725 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:36,514-Speed 3450.55 samples/sec Loss 9.5322 LearningRate 0.0724 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:39,483-Speed 3451.06 samples/sec Loss 9.2781 LearningRate 0.0724 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:45:42,458-Speed 3442.51 samples/sec Loss 9.5776 LearningRate 0.0724 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:45,449-Speed 3424.65 samples/sec Loss 9.5741 LearningRate 0.0724 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:48,418-Speed 3450.08 samples/sec Loss 9.3886 LearningRate 0.0724 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:51,388-Speed 3448.38 samples/sec Loss 9.4053 LearningRate 0.0724 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:54,362-Speed 3444.05 samples/sec Loss 9.4148 LearningRate 0.0723 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:45:57,334-Speed 3446.64 samples/sec Loss 9.2706 LearningRate 0.0723 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:00,309-Speed 3443.26 samples/sec Loss 9.3397 LearningRate 0.0723 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:03,322-Speed 3398.78 samples/sec Loss 9.4930 LearningRate 0.0723 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:06,294-Speed 3446.81 samples/sec Loss 9.5672 LearningRate 0.0723 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:09,325-Speed 3378.96 samples/sec Loss 9.2819 LearningRate 0.0723 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:22,161-Speed 797.84 samples/sec Loss 9.0798 LearningRate 0.0722 Epoch: 3 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:25,157-Speed 3420.51 samples/sec Loss 8.5934 LearningRate 0.0722 Epoch: 3 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:28,246-Speed 3315.53 samples/sec Loss 8.5682 LearningRate 0.0722 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:31,231-Speed 3432.77 samples/sec Loss 8.4610 LearningRate 0.0722 Epoch: 3 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:34,232-Speed 3414.63 samples/sec Loss 8.5846 LearningRate 0.0722 Epoch: 3 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:37,191-Speed 3461.51 samples/sec Loss 8.5228 LearningRate 0.0722 Epoch: 3 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:40,156-Speed 3454.49 samples/sec Loss 8.6295 LearningRate 0.0721 Epoch: 3 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:46:43,126-Speed 3449.11 samples/sec Loss 8.5577 LearningRate 0.0721 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:46,092-Speed 3454.21 samples/sec Loss 8.8430 LearningRate 0.0721 Epoch: 3 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:49,116-Speed 3387.25 samples/sec Loss 8.6771 LearningRate 0.0721 Epoch: 3 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:52,178-Speed 3344.86 samples/sec Loss 8.8033 LearningRate 0.0721 Epoch: 3 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:55,140-Speed 3458.11 samples/sec Loss 8.6733 LearningRate 0.0721 Epoch: 3 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:46:58,116-Speed 3442.45 samples/sec Loss 8.8458 LearningRate 0.0720 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:01,110-Speed 3421.29 samples/sec Loss 8.8210 LearningRate 0.0720 Epoch: 3 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:04,112-Speed 3411.88 samples/sec Loss 8.8028 LearningRate 0.0720 Epoch: 3 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:07,133-Speed 3390.26 samples/sec Loss 8.7743 LearningRate 0.0720 Epoch: 3 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:10,094-Speed 3460.24 samples/sec Loss 8.7730 LearningRate 0.0720 Epoch: 3 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:13,060-Speed 3453.41 samples/sec Loss 8.8098 LearningRate 0.0720 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:16,064-Speed 3410.71 samples/sec Loss 8.7565 LearningRate 0.0719 Epoch: 3 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:19,043-Speed 3437.96 samples/sec Loss 8.7937 LearningRate 0.0719 Epoch: 3 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:22,002-Speed 3461.02 samples/sec Loss 8.8227 LearningRate 0.0719 Epoch: 3 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:24,965-Speed 3457.66 samples/sec Loss 8.9030 LearningRate 0.0719 Epoch: 3 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:28,089-Speed 3278.18 samples/sec Loss 8.9754 LearningRate 0.0719 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:31,203-Speed 3289.63 samples/sec Loss 8.6840 LearningRate 0.0719 Epoch: 3 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:34,159-Speed 3464.80 samples/sec Loss 8.7727 LearningRate 0.0718 Epoch: 3 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:37,121-Speed 3459.00 samples/sec Loss 8.6750 LearningRate 0.0718 Epoch: 3 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:47:40,074-Speed 3468.58 samples/sec Loss 8.7623 LearningRate 0.0718 Epoch: 3 Global Step: 15440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:43,053-Speed 3437.51 samples/sec Loss 8.8885 LearningRate 0.0718 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:46,046-Speed 3423.00 samples/sec Loss 8.8977 LearningRate 0.0718 Epoch: 3 Global Step: 15460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:49,027-Speed 3436.16 samples/sec Loss 8.8639 LearningRate 0.0718 Epoch: 3 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:51,985-Speed 3463.61 samples/sec Loss 8.9185 LearningRate 0.0717 Epoch: 3 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:54,963-Speed 3439.08 samples/sec Loss 8.9685 LearningRate 0.0717 Epoch: 3 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:47:57,952-Speed 3427.13 samples/sec Loss 9.0139 LearningRate 0.0717 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:00,936-Speed 3433.11 samples/sec Loss 8.9678 LearningRate 0.0717 Epoch: 3 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:03,932-Speed 3418.61 samples/sec Loss 8.9816 LearningRate 0.0717 Epoch: 3 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:06,908-Speed 3441.10 samples/sec Loss 8.8973 LearningRate 0.0717 Epoch: 3 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:09,879-Speed 3448.45 samples/sec Loss 8.9878 LearningRate 0.0716 Epoch: 3 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:12,852-Speed 3445.30 samples/sec Loss 9.0036 LearningRate 0.0716 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:15,843-Speed 3425.04 samples/sec Loss 8.9717 LearningRate 0.0716 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:18,842-Speed 3415.17 samples/sec Loss 8.8409 LearningRate 0.0716 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:21,813-Speed 3447.86 samples/sec Loss 8.7602 LearningRate 0.0716 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:24,805-Speed 3423.19 samples/sec Loss 8.9194 LearningRate 0.0716 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:27,793-Speed 3427.97 samples/sec Loss 9.0528 LearningRate 0.0715 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:30,789-Speed 3418.62 samples/sec Loss 9.0788 LearningRate 0.0715 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:33,737-Speed 3474.14 samples/sec Loss 8.9407 LearningRate 0.0715 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:48:36,681-Speed 3479.33 samples/sec Loss 8.8296 LearningRate 0.0715 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:39,664-Speed 3433.98 samples/sec Loss 9.0283 LearningRate 0.0715 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:42,658-Speed 3421.46 samples/sec Loss 9.0809 LearningRate 0.0715 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:45,647-Speed 3427.47 samples/sec Loss 8.9635 LearningRate 0.0714 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:48,602-Speed 3466.05 samples/sec Loss 9.1361 LearningRate 0.0714 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:51,591-Speed 3426.43 samples/sec Loss 8.9753 LearningRate 0.0714 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:54,613-Speed 3389.83 samples/sec Loss 8.9423 LearningRate 0.0714 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:48:57,607-Speed 3421.21 samples/sec Loss 8.7794 LearningRate 0.0714 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:49:00,663-Speed 3351.44 samples/sec Loss 9.0464 LearningRate 0.0714 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:49:03,707-Speed 3365.00 samples/sec Loss 9.1324 LearningRate 0.0713 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:49:06,710-Speed 3410.47 samples/sec Loss 9.0828 LearningRate 0.0713 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:09,725-Speed 3397.74 samples/sec Loss 8.9976 LearningRate 0.0713 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:12,694-Speed 3449.97 samples/sec Loss 8.9621 LearningRate 0.0713 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:15,660-Speed 3453.60 samples/sec Loss 9.0713 LearningRate 0.0713 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:18,628-Speed 3450.21 samples/sec Loss 8.7920 LearningRate 0.0713 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:21,604-Speed 3442.61 samples/sec Loss 9.0916 LearningRate 0.0712 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:24,568-Speed 3456.19 samples/sec Loss 9.1287 LearningRate 0.0712 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:27,534-Speed 3453.74 samples/sec Loss 9.1006 LearningRate 0.0712 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:30,509-Speed 3442.54 samples/sec Loss 9.1528 LearningRate 0.0712 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:33,462-Speed 3468.79 samples/sec Loss 9.0359 LearningRate 0.0712 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:36,421-Speed 3461.71 samples/sec Loss 9.0522 LearningRate 0.0712 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:39,416-Speed 3420.16 samples/sec Loss 8.9895 LearningRate 0.0711 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:42,389-Speed 3445.87 samples/sec Loss 8.8675 LearningRate 0.0711 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:45,405-Speed 3396.65 samples/sec Loss 9.2030 LearningRate 0.0711 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:48,385-Speed 3436.84 samples/sec Loss 9.0171 LearningRate 0.0711 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:51,364-Speed 3438.27 samples/sec Loss 9.1811 LearningRate 0.0711 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:54,336-Speed 3446.15 samples/sec Loss 9.0621 LearningRate 0.0711 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:49:57,294-Speed 3462.98 samples/sec Loss 8.9758 LearningRate 0.0710 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:00,278-Speed 3433.23 samples/sec Loss 9.2585 LearningRate 0.0710 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:03,259-Speed 3436.08 samples/sec Loss 9.0530 LearningRate 0.0710 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:06,206-Speed 3476.09 samples/sec Loss 9.0952 LearningRate 0.0710 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:09,187-Speed 3436.26 samples/sec Loss 9.2475 LearningRate 0.0710 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:12,174-Speed 3428.41 samples/sec Loss 9.1878 LearningRate 0.0710 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:50:15,151-Speed 3451.21 samples/sec Loss 9.0750 LearningRate 0.0709 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:50:18,147-Speed 3418.95 samples/sec Loss 9.2180 LearningRate 0.0709 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:50:21,113-Speed 3453.91 samples/sec Loss 9.2224 LearningRate 0.0709 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:50:24,105-Speed 3423.10 samples/sec Loss 9.1421 LearningRate 0.0709 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:50:27,069-Speed 3455.64 samples/sec Loss 8.9650 LearningRate 0.0709 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:51:11,213-[lfw][16000]XNorm: 21.934007 Training: 2022-04-11 00:51:11,214-[lfw][16000]Accuracy-Flip: 0.99533+-0.00340 Training: 2022-04-11 00:51:11,214-[lfw][16000]Accuracy-Highest: 0.99667 Training: 2022-04-11 00:52:02,617-[cfp_fp][16000]XNorm: 19.244206 Training: 2022-04-11 00:52:02,618-[cfp_fp][16000]Accuracy-Flip: 0.95514+-0.01221 Training: 2022-04-11 00:52:02,618-[cfp_fp][16000]Accuracy-Highest: 0.95514 Training: 2022-04-11 00:52:46,694-[agedb_30][16000]XNorm: 21.524140 Training: 2022-04-11 00:52:46,694-[agedb_30][16000]Accuracy-Flip: 0.97133+-0.00816 Training: 2022-04-11 00:52:46,695-[agedb_30][16000]Accuracy-Highest: 0.97133 Training: 2022-04-11 00:52:49,664-Speed 71.81 samples/sec Loss 9.0826 LearningRate 0.0709 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:52:52,627-Speed 3456.11 samples/sec Loss 9.0774 LearningRate 0.0708 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:52:55,605-Speed 3439.50 samples/sec Loss 9.2613 LearningRate 0.0708 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:52:58,582-Speed 3440.12 samples/sec Loss 8.9511 LearningRate 0.0708 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:53:01,544-Speed 3458.66 samples/sec Loss 9.0930 LearningRate 0.0708 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:53:04,512-Speed 3451.87 samples/sec Loss 9.1432 LearningRate 0.0708 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:07,472-Speed 3460.03 samples/sec Loss 9.1107 LearningRate 0.0708 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:10,461-Speed 3426.35 samples/sec Loss 8.9173 LearningRate 0.0707 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:13,434-Speed 3445.66 samples/sec Loss 8.9715 LearningRate 0.0707 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:16,493-Speed 3348.38 samples/sec Loss 9.1316 LearningRate 0.0707 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:19,472-Speed 3439.08 samples/sec Loss 9.1335 LearningRate 0.0707 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:22,432-Speed 3460.13 samples/sec Loss 9.1764 LearningRate 0.0707 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:25,399-Speed 3453.29 samples/sec Loss 9.2110 LearningRate 0.0707 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:28,354-Speed 3466.30 samples/sec Loss 9.0278 LearningRate 0.0706 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:31,334-Speed 3436.48 samples/sec Loss 9.1643 LearningRate 0.0706 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:34,272-Speed 3486.22 samples/sec Loss 9.1909 LearningRate 0.0706 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:37,253-Speed 3436.75 samples/sec Loss 9.0281 LearningRate 0.0706 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:40,214-Speed 3459.43 samples/sec Loss 8.9820 LearningRate 0.0706 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:43,198-Speed 3432.90 samples/sec Loss 9.1402 LearningRate 0.0706 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:46,159-Speed 3457.98 samples/sec Loss 9.0885 LearningRate 0.0705 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:49,137-Speed 3440.94 samples/sec Loss 9.1159 LearningRate 0.0705 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:52,088-Speed 3470.88 samples/sec Loss 9.1206 LearningRate 0.0705 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:55,050-Speed 3457.66 samples/sec Loss 9.0779 LearningRate 0.0705 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:53:58,069-Speed 3393.29 samples/sec Loss 9.1311 LearningRate 0.0705 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:54:01,045-Speed 3441.91 samples/sec Loss 9.1850 LearningRate 0.0705 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:04,021-Speed 3441.66 samples/sec Loss 9.0682 LearningRate 0.0704 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:06,991-Speed 3449.14 samples/sec Loss 9.1243 LearningRate 0.0704 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:09,964-Speed 3445.64 samples/sec Loss 9.0400 LearningRate 0.0704 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:12,933-Speed 3449.39 samples/sec Loss 9.0981 LearningRate 0.0704 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:15,910-Speed 3440.91 samples/sec Loss 9.0383 LearningRate 0.0704 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:18,893-Speed 3433.42 samples/sec Loss 9.1362 LearningRate 0.0704 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:21,869-Speed 3441.28 samples/sec Loss 9.0091 LearningRate 0.0703 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:24,848-Speed 3438.54 samples/sec Loss 9.2026 LearningRate 0.0703 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:27,838-Speed 3426.00 samples/sec Loss 8.9861 LearningRate 0.0703 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:30,812-Speed 3444.68 samples/sec Loss 9.2124 LearningRate 0.0703 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:54:33,771-Speed 3460.76 samples/sec Loss 9.1494 LearningRate 0.0703 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:36,772-Speed 3413.62 samples/sec Loss 9.2285 LearningRate 0.0703 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:39,751-Speed 3438.52 samples/sec Loss 8.9151 LearningRate 0.0702 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:42,732-Speed 3435.90 samples/sec Loss 9.1652 LearningRate 0.0702 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:45,712-Speed 3436.97 samples/sec Loss 8.9324 LearningRate 0.0702 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:48,689-Speed 3441.38 samples/sec Loss 9.0167 LearningRate 0.0702 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:51,692-Speed 3409.88 samples/sec Loss 9.1214 LearningRate 0.0702 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:54,721-Speed 3382.19 samples/sec Loss 9.1767 LearningRate 0.0702 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:54:57,667-Speed 3476.83 samples/sec Loss 9.0319 LearningRate 0.0701 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:00,647-Speed 3437.31 samples/sec Loss 8.9848 LearningRate 0.0701 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:03,635-Speed 3427.29 samples/sec Loss 9.0639 LearningRate 0.0701 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:06,618-Speed 3434.31 samples/sec Loss 9.0549 LearningRate 0.0701 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:09,589-Speed 3447.43 samples/sec Loss 9.1362 LearningRate 0.0701 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:12,536-Speed 3475.91 samples/sec Loss 8.9514 LearningRate 0.0701 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:15,499-Speed 3457.21 samples/sec Loss 9.0937 LearningRate 0.0700 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:18,487-Speed 3427.59 samples/sec Loss 9.2047 LearningRate 0.0700 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:21,452-Speed 3455.43 samples/sec Loss 9.1047 LearningRate 0.0700 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:24,434-Speed 3435.25 samples/sec Loss 9.1832 LearningRate 0.0700 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:27,438-Speed 3409.36 samples/sec Loss 8.9555 LearningRate 0.0700 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:30,424-Speed 3429.21 samples/sec Loss 9.3129 LearningRate 0.0700 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:33,416-Speed 3423.95 samples/sec Loss 9.2515 LearningRate 0.0699 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:55:36,410-Speed 3420.99 samples/sec Loss 9.1217 LearningRate 0.0699 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:39,374-Speed 3455.61 samples/sec Loss 8.9313 LearningRate 0.0699 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:42,353-Speed 3438.63 samples/sec Loss 9.0086 LearningRate 0.0699 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:45,325-Speed 3445.94 samples/sec Loss 9.0601 LearningRate 0.0699 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:48,308-Speed 3433.64 samples/sec Loss 9.2597 LearningRate 0.0699 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:51,305-Speed 3418.18 samples/sec Loss 9.0229 LearningRate 0.0698 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:54,266-Speed 3459.35 samples/sec Loss 9.0036 LearningRate 0.0698 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:55:57,254-Speed 3427.89 samples/sec Loss 9.1638 LearningRate 0.0698 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:00,199-Speed 3477.65 samples/sec Loss 8.9586 LearningRate 0.0698 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:03,170-Speed 3447.35 samples/sec Loss 9.1081 LearningRate 0.0698 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:06,136-Speed 3453.41 samples/sec Loss 9.0237 LearningRate 0.0698 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 00:56:09,082-Speed 3477.38 samples/sec Loss 9.1242 LearningRate 0.0697 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:12,045-Speed 3456.52 samples/sec Loss 9.2884 LearningRate 0.0697 Epoch: 3 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:14,993-Speed 3475.09 samples/sec Loss 9.0096 LearningRate 0.0697 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:17,975-Speed 3434.70 samples/sec Loss 9.0843 LearningRate 0.0697 Epoch: 3 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:20,961-Speed 3429.97 samples/sec Loss 9.1160 LearningRate 0.0697 Epoch: 3 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:23,964-Speed 3410.10 samples/sec Loss 9.0858 LearningRate 0.0697 Epoch: 3 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:27,066-Speed 3302.21 samples/sec Loss 9.2949 LearningRate 0.0696 Epoch: 3 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:30,122-Speed 3351.41 samples/sec Loss 9.0445 LearningRate 0.0696 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:33,086-Speed 3456.04 samples/sec Loss 8.9160 LearningRate 0.0696 Epoch: 3 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:36,080-Speed 3421.01 samples/sec Loss 9.0163 LearningRate 0.0696 Epoch: 3 Global Step: 16770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:39,071-Speed 3425.36 samples/sec Loss 9.1208 LearningRate 0.0696 Epoch: 3 Global Step: 16780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:42,082-Speed 3401.30 samples/sec Loss 9.0468 LearningRate 0.0696 Epoch: 3 Global Step: 16790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:56:45,082-Speed 3414.77 samples/sec Loss 9.1335 LearningRate 0.0695 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:48,046-Speed 3455.81 samples/sec Loss 8.9640 LearningRate 0.0695 Epoch: 3 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:51,031-Speed 3430.57 samples/sec Loss 9.0520 LearningRate 0.0695 Epoch: 3 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:54,015-Speed 3432.81 samples/sec Loss 9.1075 LearningRate 0.0695 Epoch: 3 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:56,995-Speed 3437.90 samples/sec Loss 9.0948 LearningRate 0.0695 Epoch: 3 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:56:59,983-Speed 3428.23 samples/sec Loss 9.1700 LearningRate 0.0695 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:02,977-Speed 3420.67 samples/sec Loss 9.1150 LearningRate 0.0694 Epoch: 3 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:05,951-Speed 3444.52 samples/sec Loss 8.9959 LearningRate 0.0694 Epoch: 3 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:08,940-Speed 3426.51 samples/sec Loss 9.1411 LearningRate 0.0694 Epoch: 3 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:11,936-Speed 3419.14 samples/sec Loss 9.0144 LearningRate 0.0694 Epoch: 3 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:14,908-Speed 3445.74 samples/sec Loss 9.0130 LearningRate 0.0694 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:17,908-Speed 3414.06 samples/sec Loss 8.9562 LearningRate 0.0694 Epoch: 3 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 00:57:20,892-Speed 3433.26 samples/sec Loss 9.2799 LearningRate 0.0693 Epoch: 3 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:23,880-Speed 3428.06 samples/sec Loss 9.1079 LearningRate 0.0693 Epoch: 3 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:27,009-Speed 3273.35 samples/sec Loss 9.1873 LearningRate 0.0693 Epoch: 3 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:29,961-Speed 3469.69 samples/sec Loss 9.0273 LearningRate 0.0693 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:32,947-Speed 3430.17 samples/sec Loss 9.1868 LearningRate 0.0693 Epoch: 3 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:35,942-Speed 3419.27 samples/sec Loss 8.9775 LearningRate 0.0693 Epoch: 3 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:38,929-Speed 3429.68 samples/sec Loss 9.0405 LearningRate 0.0692 Epoch: 3 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:41,926-Speed 3417.76 samples/sec Loss 8.9929 LearningRate 0.0692 Epoch: 3 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:44,911-Speed 3431.86 samples/sec Loss 8.9240 LearningRate 0.0692 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 00:57:47,896-Speed 3430.58 samples/sec Loss 9.0052 LearningRate 0.0692 Epoch: 3 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:57:50,902-Speed 3407.03 samples/sec Loss 9.0505 LearningRate 0.0692 Epoch: 3 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:57:53,914-Speed 3401.89 samples/sec Loss 9.1428 LearningRate 0.0692 Epoch: 3 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:57:56,885-Speed 3447.35 samples/sec Loss 8.9986 LearningRate 0.0691 Epoch: 3 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:57:59,891-Speed 3406.96 samples/sec Loss 8.9888 LearningRate 0.0691 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:02,871-Speed 3437.00 samples/sec Loss 9.0572 LearningRate 0.0691 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:05,863-Speed 3423.50 samples/sec Loss 9.1276 LearningRate 0.0691 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:08,862-Speed 3415.84 samples/sec Loss 9.3065 LearningRate 0.0691 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:11,861-Speed 3415.40 samples/sec Loss 9.0047 LearningRate 0.0691 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:14,866-Speed 3409.02 samples/sec Loss 9.0966 LearningRate 0.0690 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:17,847-Speed 3435.89 samples/sec Loss 8.8553 LearningRate 0.0690 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:20,834-Speed 3429.16 samples/sec Loss 9.2012 LearningRate 0.0690 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:23,835-Speed 3413.10 samples/sec Loss 9.1500 LearningRate 0.0690 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:26,826-Speed 3424.83 samples/sec Loss 9.0452 LearningRate 0.0690 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:29,827-Speed 3413.21 samples/sec Loss 9.1370 LearningRate 0.0690 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:32,803-Speed 3441.09 samples/sec Loss 9.0833 LearningRate 0.0690 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:35,787-Speed 3432.66 samples/sec Loss 9.0475 LearningRate 0.0689 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:38,750-Speed 3457.00 samples/sec Loss 9.0310 LearningRate 0.0689 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:41,732-Speed 3435.37 samples/sec Loss 8.9994 LearningRate 0.0689 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:44,720-Speed 3428.57 samples/sec Loss 8.9021 LearningRate 0.0689 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:47,699-Speed 3437.42 samples/sec Loss 9.0955 LearningRate 0.0689 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:50,719-Speed 3391.09 samples/sec Loss 9.0208 LearningRate 0.0689 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:58:53,707-Speed 3428.65 samples/sec Loss 8.9463 LearningRate 0.0688 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:56,733-Speed 3384.96 samples/sec Loss 8.9916 LearningRate 0.0688 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:58:59,740-Speed 3406.36 samples/sec Loss 9.1441 LearningRate 0.0688 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:02,764-Speed 3387.05 samples/sec Loss 8.9943 LearningRate 0.0688 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:05,752-Speed 3428.31 samples/sec Loss 9.0257 LearningRate 0.0688 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:08,744-Speed 3423.48 samples/sec Loss 9.0480 LearningRate 0.0688 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:11,716-Speed 3446.37 samples/sec Loss 9.0147 LearningRate 0.0687 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:14,711-Speed 3420.42 samples/sec Loss 8.8374 LearningRate 0.0687 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:17,694-Speed 3433.64 samples/sec Loss 8.9573 LearningRate 0.0687 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:20,688-Speed 3420.94 samples/sec Loss 9.0215 LearningRate 0.0687 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 00:59:23,674-Speed 3430.32 samples/sec Loss 8.9377 LearningRate 0.0687 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:26,660-Speed 3430.34 samples/sec Loss 8.8517 LearningRate 0.0687 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:29,659-Speed 3415.95 samples/sec Loss 9.0739 LearningRate 0.0686 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:32,647-Speed 3427.86 samples/sec Loss 8.9152 LearningRate 0.0686 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:35,648-Speed 3413.01 samples/sec Loss 8.9737 LearningRate 0.0686 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:38,678-Speed 3381.19 samples/sec Loss 9.1094 LearningRate 0.0686 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:41,734-Speed 3351.50 samples/sec Loss 8.9966 LearningRate 0.0686 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:44,724-Speed 3425.46 samples/sec Loss 9.0993 LearningRate 0.0686 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:47,732-Speed 3405.51 samples/sec Loss 9.0864 LearningRate 0.0685 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:50,732-Speed 3413.38 samples/sec Loss 9.0527 LearningRate 0.0685 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:53,719-Speed 3430.71 samples/sec Loss 9.0029 LearningRate 0.0685 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:56,708-Speed 3426.00 samples/sec Loss 8.9973 LearningRate 0.0685 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 00:59:59,688-Speed 3438.46 samples/sec Loss 9.1688 LearningRate 0.0685 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:02,658-Speed 3448.06 samples/sec Loss 8.9225 LearningRate 0.0685 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:05,640-Speed 3434.58 samples/sec Loss 8.9811 LearningRate 0.0684 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:08,638-Speed 3416.94 samples/sec Loss 9.0601 LearningRate 0.0684 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:11,643-Speed 3408.56 samples/sec Loss 9.0804 LearningRate 0.0684 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:14,652-Speed 3404.21 samples/sec Loss 9.0487 LearningRate 0.0684 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:17,653-Speed 3413.01 samples/sec Loss 9.0912 LearningRate 0.0684 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:20,623-Speed 3449.16 samples/sec Loss 8.7135 LearningRate 0.0684 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:23,585-Speed 3458.11 samples/sec Loss 9.0592 LearningRate 0.0683 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:26,586-Speed 3412.99 samples/sec Loss 8.9324 LearningRate 0.0683 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:29,574-Speed 3428.69 samples/sec Loss 9.1435 LearningRate 0.0683 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:32,567-Speed 3421.36 samples/sec Loss 9.0951 LearningRate 0.0683 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:35,579-Speed 3400.72 samples/sec Loss 8.9493 LearningRate 0.0683 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:38,568-Speed 3427.89 samples/sec Loss 8.8952 LearningRate 0.0683 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:41,554-Speed 3429.40 samples/sec Loss 9.0218 LearningRate 0.0682 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:44,555-Speed 3413.51 samples/sec Loss 8.8041 LearningRate 0.0682 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:47,564-Speed 3404.14 samples/sec Loss 8.8750 LearningRate 0.0682 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:50,557-Speed 3422.29 samples/sec Loss 8.8677 LearningRate 0.0682 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:53,548-Speed 3424.73 samples/sec Loss 8.7401 LearningRate 0.0682 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:00:56,520-Speed 3446.25 samples/sec Loss 8.9814 LearningRate 0.0682 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:00:59,533-Speed 3399.00 samples/sec Loss 8.9586 LearningRate 0.0681 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:02,576-Speed 3366.13 samples/sec Loss 8.9060 LearningRate 0.0681 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:05,593-Speed 3395.54 samples/sec Loss 9.0184 LearningRate 0.0681 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:08,588-Speed 3420.23 samples/sec Loss 8.9150 LearningRate 0.0681 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:11,582-Speed 3420.70 samples/sec Loss 8.8865 LearningRate 0.0681 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:14,585-Speed 3411.62 samples/sec Loss 8.8681 LearningRate 0.0681 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:17,594-Speed 3403.82 samples/sec Loss 8.9130 LearningRate 0.0681 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:20,595-Speed 3412.23 samples/sec Loss 8.8698 LearningRate 0.0680 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:23,581-Speed 3430.57 samples/sec Loss 8.8157 LearningRate 0.0680 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:26,586-Speed 3408.40 samples/sec Loss 8.9498 LearningRate 0.0680 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:29,576-Speed 3425.57 samples/sec Loss 8.8608 LearningRate 0.0680 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:01:32,565-Speed 3427.83 samples/sec Loss 8.8063 LearningRate 0.0680 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:35,575-Speed 3402.39 samples/sec Loss 8.9013 LearningRate 0.0680 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:38,574-Speed 3415.60 samples/sec Loss 8.7488 LearningRate 0.0679 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:41,560-Speed 3430.07 samples/sec Loss 8.8586 LearningRate 0.0679 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:44,563-Speed 3410.72 samples/sec Loss 8.9031 LearningRate 0.0679 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:47,570-Speed 3405.83 samples/sec Loss 8.9856 LearningRate 0.0679 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:50,578-Speed 3405.56 samples/sec Loss 9.0899 LearningRate 0.0679 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:53,578-Speed 3415.04 samples/sec Loss 8.9345 LearningRate 0.0679 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:56,578-Speed 3414.06 samples/sec Loss 8.9206 LearningRate 0.0678 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:01:59,583-Speed 3408.07 samples/sec Loss 8.9677 LearningRate 0.0678 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:02,573-Speed 3426.74 samples/sec Loss 8.9122 LearningRate 0.0678 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:05,567-Speed 3421.44 samples/sec Loss 8.8924 LearningRate 0.0678 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:08,577-Speed 3402.19 samples/sec Loss 9.0935 LearningRate 0.0678 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:11,577-Speed 3414.10 samples/sec Loss 8.9469 LearningRate 0.0678 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:14,555-Speed 3439.11 samples/sec Loss 8.9367 LearningRate 0.0677 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:02:17,576-Speed 3390.52 samples/sec Loss 8.9326 LearningRate 0.0677 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:20,570-Speed 3422.34 samples/sec Loss 8.8337 LearningRate 0.0677 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:23,572-Speed 3411.77 samples/sec Loss 8.9930 LearningRate 0.0677 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:26,585-Speed 3398.72 samples/sec Loss 8.9308 LearningRate 0.0677 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:29,674-Speed 3315.89 samples/sec Loss 8.9228 LearningRate 0.0677 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:32,724-Speed 3359.13 samples/sec Loss 9.0076 LearningRate 0.0676 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:35,714-Speed 3425.05 samples/sec Loss 8.9107 LearningRate 0.0676 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:38,714-Speed 3413.97 samples/sec Loss 8.9100 LearningRate 0.0676 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:41,722-Speed 3405.44 samples/sec Loss 9.0249 LearningRate 0.0676 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:02:44,700-Speed 3440.09 samples/sec Loss 9.0672 LearningRate 0.0676 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:03:29,188-[lfw][18000]XNorm: 22.461839 Training: 2022-04-11 01:03:29,189-[lfw][18000]Accuracy-Flip: 0.99717+-0.00299 Training: 2022-04-11 01:03:29,189-[lfw][18000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:04:20,696-[cfp_fp][18000]XNorm: 19.853714 Training: 2022-04-11 01:04:20,697-[cfp_fp][18000]Accuracy-Flip: 0.95000+-0.01088 Training: 2022-04-11 01:04:20,697-[cfp_fp][18000]Accuracy-Highest: 0.95514 Training: 2022-04-11 01:05:04,838-[agedb_30][18000]XNorm: 21.988876 Training: 2022-04-11 01:05:04,838-[agedb_30][18000]Accuracy-Flip: 0.97050+-0.00946 Training: 2022-04-11 01:05:04,839-[agedb_30][18000]Accuracy-Highest: 0.97133 Training: 2022-04-11 01:05:07,827-Speed 71.55 samples/sec Loss 8.8999 LearningRate 0.0676 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:10,795-Speed 3450.22 samples/sec Loss 8.8670 LearningRate 0.0675 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:13,757-Speed 3458.86 samples/sec Loss 8.8806 LearningRate 0.0675 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:16,726-Speed 3450.12 samples/sec Loss 8.7271 LearningRate 0.0675 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:19,716-Speed 3425.55 samples/sec Loss 9.1107 LearningRate 0.0675 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:22,704-Speed 3428.81 samples/sec Loss 8.9628 LearningRate 0.0675 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:25,702-Speed 3416.19 samples/sec Loss 8.8780 LearningRate 0.0675 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:28,681-Speed 3438.31 samples/sec Loss 8.8771 LearningRate 0.0674 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:31,667-Speed 3430.83 samples/sec Loss 9.0826 LearningRate 0.0674 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:34,646-Speed 3438.46 samples/sec Loss 8.8487 LearningRate 0.0674 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:37,607-Speed 3459.10 samples/sec Loss 8.7722 LearningRate 0.0674 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:05:40,563-Speed 3465.51 samples/sec Loss 8.9420 LearningRate 0.0674 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:43,561-Speed 3415.50 samples/sec Loss 8.9321 LearningRate 0.0674 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:46,549-Speed 3428.72 samples/sec Loss 8.9457 LearningRate 0.0674 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:49,551-Speed 3411.87 samples/sec Loss 8.8875 LearningRate 0.0673 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:52,555-Speed 3410.08 samples/sec Loss 9.0006 LearningRate 0.0673 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:55,545-Speed 3425.76 samples/sec Loss 8.8514 LearningRate 0.0673 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:05:58,526-Speed 3434.96 samples/sec Loss 8.9087 LearningRate 0.0673 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:01,616-Speed 3314.94 samples/sec Loss 8.9002 LearningRate 0.0673 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:04,597-Speed 3436.47 samples/sec Loss 9.0205 LearningRate 0.0673 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:07,576-Speed 3438.18 samples/sec Loss 8.8662 LearningRate 0.0672 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:10,562-Speed 3430.66 samples/sec Loss 9.0302 LearningRate 0.0672 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:06:13,564-Speed 3412.40 samples/sec Loss 9.0116 LearningRate 0.0672 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:06:16,533-Speed 3449.51 samples/sec Loss 8.8629 LearningRate 0.0672 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:19,530-Speed 3417.59 samples/sec Loss 8.6938 LearningRate 0.0672 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:22,525-Speed 3420.36 samples/sec Loss 8.8230 LearningRate 0.0672 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:25,524-Speed 3415.43 samples/sec Loss 8.6712 LearningRate 0.0671 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:28,503-Speed 3438.38 samples/sec Loss 8.8571 LearningRate 0.0671 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:31,493-Speed 3426.53 samples/sec Loss 8.7130 LearningRate 0.0671 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:34,484-Speed 3423.85 samples/sec Loss 8.8119 LearningRate 0.0671 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:37,470-Speed 3431.23 samples/sec Loss 8.8824 LearningRate 0.0671 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:40,469-Speed 3414.69 samples/sec Loss 8.6971 LearningRate 0.0671 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:43,461-Speed 3423.49 samples/sec Loss 8.9363 LearningRate 0.0670 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:46,457-Speed 3418.52 samples/sec Loss 8.8736 LearningRate 0.0670 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:06:49,469-Speed 3401.44 samples/sec Loss 8.6873 LearningRate 0.0670 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:06:52,463-Speed 3421.05 samples/sec Loss 8.7357 LearningRate 0.0670 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:06:55,447-Speed 3431.50 samples/sec Loss 8.6361 LearningRate 0.0670 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:06:58,452-Speed 3409.38 samples/sec Loss 8.8752 LearningRate 0.0670 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:01,443-Speed 3424.80 samples/sec Loss 8.8478 LearningRate 0.0669 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:04,429-Speed 3429.90 samples/sec Loss 8.7619 LearningRate 0.0669 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:07,446-Speed 3396.23 samples/sec Loss 8.7969 LearningRate 0.0669 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:10,437-Speed 3424.33 samples/sec Loss 8.7560 LearningRate 0.0669 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:13,433-Speed 3418.20 samples/sec Loss 8.9430 LearningRate 0.0669 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:16,414-Speed 3436.43 samples/sec Loss 8.7389 LearningRate 0.0669 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:19,397-Speed 3434.58 samples/sec Loss 8.7172 LearningRate 0.0668 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:22,377-Speed 3436.21 samples/sec Loss 8.7412 LearningRate 0.0668 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:07:25,377-Speed 3414.61 samples/sec Loss 8.6675 LearningRate 0.0668 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:28,358-Speed 3436.13 samples/sec Loss 8.7077 LearningRate 0.0668 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:31,343-Speed 3431.52 samples/sec Loss 8.8050 LearningRate 0.0668 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:34,334-Speed 3424.24 samples/sec Loss 8.6779 LearningRate 0.0668 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:37,343-Speed 3404.89 samples/sec Loss 8.7474 LearningRate 0.0668 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:40,350-Speed 3405.73 samples/sec Loss 8.9763 LearningRate 0.0667 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:43,344-Speed 3421.11 samples/sec Loss 8.9897 LearningRate 0.0667 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:46,334-Speed 3427.66 samples/sec Loss 8.7172 LearningRate 0.0667 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:49,330-Speed 3418.24 samples/sec Loss 8.7177 LearningRate 0.0667 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:52,324-Speed 3421.09 samples/sec Loss 8.7839 LearningRate 0.0667 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:55,295-Speed 3447.83 samples/sec Loss 8.8772 LearningRate 0.0667 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:07:58,280-Speed 3431.39 samples/sec Loss 8.8002 LearningRate 0.0666 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:01,308-Speed 3382.61 samples/sec Loss 8.9007 LearningRate 0.0666 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:04,336-Speed 3383.01 samples/sec Loss 8.9260 LearningRate 0.0666 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:07,330-Speed 3421.31 samples/sec Loss 9.0223 LearningRate 0.0666 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:10,323-Speed 3422.26 samples/sec Loss 8.8349 LearningRate 0.0666 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:13,401-Speed 3326.86 samples/sec Loss 8.8933 LearningRate 0.0666 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:16,401-Speed 3415.03 samples/sec Loss 8.7761 LearningRate 0.0665 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:19,392-Speed 3423.56 samples/sec Loss 8.7750 LearningRate 0.0665 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:22,377-Speed 3431.87 samples/sec Loss 8.8522 LearningRate 0.0665 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:25,376-Speed 3415.67 samples/sec Loss 8.7622 LearningRate 0.0665 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:28,376-Speed 3414.03 samples/sec Loss 8.9232 LearningRate 0.0665 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:31,365-Speed 3427.12 samples/sec Loss 8.8527 LearningRate 0.0665 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:34,357-Speed 3422.70 samples/sec Loss 8.7982 LearningRate 0.0664 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:37,352-Speed 3421.14 samples/sec Loss 8.8430 LearningRate 0.0664 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:40,355-Speed 3410.28 samples/sec Loss 8.9512 LearningRate 0.0664 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:43,363-Speed 3404.51 samples/sec Loss 8.7736 LearningRate 0.0664 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:46,367-Speed 3409.50 samples/sec Loss 8.7112 LearningRate 0.0664 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:49,370-Speed 3411.31 samples/sec Loss 8.9191 LearningRate 0.0664 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:08:52,411-Speed 3368.14 samples/sec Loss 8.6506 LearningRate 0.0663 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:55,411-Speed 3414.80 samples/sec Loss 8.7510 LearningRate 0.0663 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:08:58,400-Speed 3426.87 samples/sec Loss 8.7333 LearningRate 0.0663 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:01,449-Speed 3358.82 samples/sec Loss 8.7952 LearningRate 0.0663 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:04,500-Speed 3356.95 samples/sec Loss 8.7669 LearningRate 0.0663 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:07,491-Speed 3425.13 samples/sec Loss 8.6716 LearningRate 0.0663 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:10,469-Speed 3439.01 samples/sec Loss 8.9788 LearningRate 0.0663 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:13,455-Speed 3430.38 samples/sec Loss 8.6354 LearningRate 0.0662 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:16,448-Speed 3422.05 samples/sec Loss 8.6929 LearningRate 0.0662 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:19,460-Speed 3400.26 samples/sec Loss 8.6685 LearningRate 0.0662 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:22,465-Speed 3408.91 samples/sec Loss 8.7209 LearningRate 0.0662 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:25,471-Speed 3407.83 samples/sec Loss 8.8900 LearningRate 0.0662 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:28,475-Speed 3409.63 samples/sec Loss 8.8479 LearningRate 0.0662 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:31,481-Speed 3407.02 samples/sec Loss 8.7274 LearningRate 0.0661 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:34,473-Speed 3423.52 samples/sec Loss 8.8031 LearningRate 0.0661 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:37,474-Speed 3413.24 samples/sec Loss 8.7649 LearningRate 0.0661 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:09:40,472-Speed 3415.45 samples/sec Loss 8.9432 LearningRate 0.0661 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:43,476-Speed 3409.89 samples/sec Loss 8.7628 LearningRate 0.0661 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:46,478-Speed 3412.92 samples/sec Loss 8.8025 LearningRate 0.0661 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:49,478-Speed 3414.09 samples/sec Loss 8.5616 LearningRate 0.0660 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:52,485-Speed 3406.30 samples/sec Loss 8.8637 LearningRate 0.0660 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:55,488-Speed 3410.52 samples/sec Loss 8.8843 LearningRate 0.0660 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:09:58,495-Speed 3406.66 samples/sec Loss 8.7057 LearningRate 0.0660 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:01,516-Speed 3390.81 samples/sec Loss 8.5612 LearningRate 0.0660 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:04,517-Speed 3413.59 samples/sec Loss 8.6371 LearningRate 0.0660 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:07,520-Speed 3410.41 samples/sec Loss 8.6144 LearningRate 0.0659 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:10,538-Speed 3394.34 samples/sec Loss 8.7578 LearningRate 0.0659 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:10:13,527-Speed 3426.93 samples/sec Loss 8.8130 LearningRate 0.0659 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:16,550-Speed 3388.20 samples/sec Loss 8.8404 LearningRate 0.0659 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:19,560-Speed 3402.82 samples/sec Loss 8.7137 LearningRate 0.0659 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:22,560-Speed 3414.58 samples/sec Loss 8.8068 LearningRate 0.0659 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:25,574-Speed 3398.45 samples/sec Loss 8.7159 LearningRate 0.0659 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:28,585-Speed 3400.91 samples/sec Loss 8.7294 LearningRate 0.0658 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:31,621-Speed 3375.02 samples/sec Loss 8.7640 LearningRate 0.0658 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:34,615-Speed 3420.12 samples/sec Loss 8.7434 LearningRate 0.0658 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:37,624-Speed 3404.44 samples/sec Loss 8.7383 LearningRate 0.0658 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:40,629-Speed 3408.36 samples/sec Loss 8.7196 LearningRate 0.0658 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:43,618-Speed 3426.33 samples/sec Loss 8.5937 LearningRate 0.0658 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:10:46,607-Speed 3427.40 samples/sec Loss 8.7891 LearningRate 0.0657 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:10:49,593-Speed 3429.82 samples/sec Loss 8.5750 LearningRate 0.0657 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:10:52,614-Speed 3391.42 samples/sec Loss 8.5894 LearningRate 0.0657 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:10:55,620-Speed 3406.73 samples/sec Loss 8.6399 LearningRate 0.0657 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:10:58,640-Speed 3391.14 samples/sec Loss 8.7530 LearningRate 0.0657 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:01,664-Speed 3387.89 samples/sec Loss 8.7735 LearningRate 0.0657 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:04,669-Speed 3407.76 samples/sec Loss 8.8180 LearningRate 0.0656 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:07,683-Speed 3398.34 samples/sec Loss 8.6980 LearningRate 0.0656 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:10,684-Speed 3413.86 samples/sec Loss 8.7162 LearningRate 0.0656 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:13,702-Speed 3393.79 samples/sec Loss 8.8165 LearningRate 0.0656 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:11:16,755-Speed 3354.61 samples/sec Loss 8.8343 LearningRate 0.0656 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:19,762-Speed 3406.54 samples/sec Loss 8.6947 LearningRate 0.0656 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:22,764-Speed 3411.93 samples/sec Loss 8.6956 LearningRate 0.0655 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:25,845-Speed 3324.44 samples/sec Loss 8.6983 LearningRate 0.0655 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:28,846-Speed 3412.60 samples/sec Loss 8.5763 LearningRate 0.0655 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:31,857-Speed 3401.72 samples/sec Loss 8.8091 LearningRate 0.0655 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:34,861-Speed 3410.09 samples/sec Loss 8.6342 LearningRate 0.0655 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:37,896-Speed 3374.10 samples/sec Loss 8.7413 LearningRate 0.0655 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:40,915-Speed 3393.17 samples/sec Loss 8.7462 LearningRate 0.0655 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:43,917-Speed 3412.47 samples/sec Loss 8.8068 LearningRate 0.0654 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:46,902-Speed 3430.55 samples/sec Loss 8.8250 LearningRate 0.0654 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:50,000-Speed 3306.48 samples/sec Loss 8.6592 LearningRate 0.0654 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:53,025-Speed 3385.92 samples/sec Loss 8.8568 LearningRate 0.0654 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:56,030-Speed 3408.45 samples/sec Loss 8.6718 LearningRate 0.0654 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:11:59,034-Speed 3409.61 samples/sec Loss 8.5398 LearningRate 0.0654 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:02,040-Speed 3407.00 samples/sec Loss 8.6428 LearningRate 0.0653 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:05,048-Speed 3404.90 samples/sec Loss 8.6480 LearningRate 0.0653 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:08,048-Speed 3414.49 samples/sec Loss 8.7641 LearningRate 0.0653 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:11,070-Speed 3389.79 samples/sec Loss 8.7091 LearningRate 0.0653 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:14,078-Speed 3405.34 samples/sec Loss 8.7386 LearningRate 0.0653 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:17,084-Speed 3407.57 samples/sec Loss 8.5979 LearningRate 0.0653 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:12:20,071-Speed 3428.73 samples/sec Loss 8.5106 LearningRate 0.0652 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:23,113-Speed 3367.31 samples/sec Loss 8.7614 LearningRate 0.0652 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:26,129-Speed 3395.56 samples/sec Loss 8.8436 LearningRate 0.0652 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:12:29,162-Speed 3377.35 samples/sec Loss 8.7638 LearningRate 0.0652 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:32,173-Speed 3401.26 samples/sec Loss 8.6956 LearningRate 0.0652 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:35,181-Speed 3405.79 samples/sec Loss 8.6757 LearningRate 0.0652 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:38,185-Speed 3410.12 samples/sec Loss 8.7454 LearningRate 0.0651 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:41,191-Speed 3406.96 samples/sec Loss 8.5709 LearningRate 0.0651 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:44,196-Speed 3408.18 samples/sec Loss 8.5046 LearningRate 0.0651 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:47,213-Speed 3395.28 samples/sec Loss 8.6968 LearningRate 0.0651 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:50,212-Speed 3415.52 samples/sec Loss 8.5625 LearningRate 0.0651 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:53,236-Speed 3387.48 samples/sec Loss 8.5954 LearningRate 0.0651 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:56,295-Speed 3347.61 samples/sec Loss 8.5746 LearningRate 0.0651 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:12:59,350-Speed 3352.46 samples/sec Loss 8.6435 LearningRate 0.0650 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:02,374-Speed 3387.84 samples/sec Loss 8.6076 LearningRate 0.0650 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:05,393-Speed 3392.99 samples/sec Loss 8.7069 LearningRate 0.0650 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:08,397-Speed 3409.53 samples/sec Loss 8.7575 LearningRate 0.0650 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:11,394-Speed 3417.70 samples/sec Loss 8.6430 LearningRate 0.0650 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:14,397-Speed 3410.61 samples/sec Loss 8.7186 LearningRate 0.0650 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:17,406-Speed 3404.48 samples/sec Loss 8.5570 LearningRate 0.0649 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:20,420-Speed 3397.62 samples/sec Loss 8.6423 LearningRate 0.0649 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:23,435-Speed 3398.61 samples/sec Loss 8.4995 LearningRate 0.0649 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:26,447-Speed 3400.20 samples/sec Loss 8.7057 LearningRate 0.0649 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:29,451-Speed 3409.89 samples/sec Loss 8.7664 LearningRate 0.0649 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:32,460-Speed 3403.66 samples/sec Loss 8.5523 LearningRate 0.0649 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:35,468-Speed 3405.03 samples/sec Loss 8.6150 LearningRate 0.0648 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:38,474-Speed 3408.01 samples/sec Loss 8.6944 LearningRate 0.0648 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:41,474-Speed 3414.36 samples/sec Loss 8.6300 LearningRate 0.0648 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:44,476-Speed 3411.92 samples/sec Loss 8.6859 LearningRate 0.0648 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:47,480-Speed 3409.01 samples/sec Loss 8.6237 LearningRate 0.0648 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:50,503-Speed 3389.09 samples/sec Loss 8.5800 LearningRate 0.0648 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:53,513-Speed 3404.17 samples/sec Loss 8.6411 LearningRate 0.0647 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:56,510-Speed 3416.59 samples/sec Loss 8.7080 LearningRate 0.0647 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:13:59,495-Speed 3431.93 samples/sec Loss 8.7329 LearningRate 0.0647 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:02,520-Speed 3386.15 samples/sec Loss 8.5889 LearningRate 0.0647 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:05,520-Speed 3414.19 samples/sec Loss 8.5219 LearningRate 0.0647 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:08,527-Speed 3407.23 samples/sec Loss 8.5968 LearningRate 0.0647 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:11,529-Speed 3411.57 samples/sec Loss 8.8441 LearningRate 0.0647 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:14,526-Speed 3417.74 samples/sec Loss 8.6431 LearningRate 0.0646 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:17,529-Speed 3411.36 samples/sec Loss 8.4534 LearningRate 0.0646 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:20,545-Speed 3395.55 samples/sec Loss 8.4954 LearningRate 0.0646 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:23,561-Speed 3396.68 samples/sec Loss 8.5594 LearningRate 0.0646 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:26,583-Speed 3389.48 samples/sec Loss 8.5640 LearningRate 0.0646 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:29,581-Speed 3416.28 samples/sec Loss 8.5773 LearningRate 0.0646 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:32,585-Speed 3409.11 samples/sec Loss 8.6844 LearningRate 0.0645 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:35,588-Speed 3410.83 samples/sec Loss 8.5049 LearningRate 0.0645 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:38,597-Speed 3404.50 samples/sec Loss 8.6241 LearningRate 0.0645 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:41,602-Speed 3408.27 samples/sec Loss 8.6316 LearningRate 0.0645 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:44,607-Speed 3408.53 samples/sec Loss 8.6823 LearningRate 0.0645 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:14:47,591-Speed 3433.40 samples/sec Loss 8.5173 LearningRate 0.0645 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:14:50,607-Speed 3396.15 samples/sec Loss 8.5082 LearningRate 0.0644 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:14:53,633-Speed 3384.85 samples/sec Loss 8.6103 LearningRate 0.0644 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:14:56,652-Speed 3393.01 samples/sec Loss 8.4069 LearningRate 0.0644 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:14:59,681-Speed 3380.96 samples/sec Loss 8.5895 LearningRate 0.0644 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:15:02,823-Speed 3260.20 samples/sec Loss 8.6539 LearningRate 0.0644 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:15:05,847-Speed 3386.93 samples/sec Loss 8.6190 LearningRate 0.0644 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:15:50,084-[lfw][20000]XNorm: 19.793800 Training: 2022-04-11 01:15:50,084-[lfw][20000]Accuracy-Flip: 0.99683+-0.00337 Training: 2022-04-11 01:15:50,085-[lfw][20000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:16:41,379-[cfp_fp][20000]XNorm: 17.406228 Training: 2022-04-11 01:16:41,380-[cfp_fp][20000]Accuracy-Flip: 0.95314+-0.01157 Training: 2022-04-11 01:16:41,380-[cfp_fp][20000]Accuracy-Highest: 0.95514 Training: 2022-04-11 01:17:25,749-[agedb_30][20000]XNorm: 19.689191 Training: 2022-04-11 01:17:25,750-[agedb_30][20000]Accuracy-Flip: 0.97383+-0.00727 Training: 2022-04-11 01:17:25,750-[agedb_30][20000]Accuracy-Highest: 0.97383 Training: 2022-04-11 01:17:28,754-Speed 71.66 samples/sec Loss 8.5656 LearningRate 0.0644 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:17:31,743-Speed 3426.74 samples/sec Loss 8.4944 LearningRate 0.0643 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:17:34,748-Speed 3407.90 samples/sec Loss 8.6717 LearningRate 0.0643 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:17:37,737-Speed 3427.76 samples/sec Loss 8.5959 LearningRate 0.0643 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:40,729-Speed 3423.02 samples/sec Loss 8.4088 LearningRate 0.0643 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:43,727-Speed 3416.92 samples/sec Loss 8.7080 LearningRate 0.0643 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:46,719-Speed 3422.87 samples/sec Loss 8.6402 LearningRate 0.0643 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:49,707-Speed 3428.27 samples/sec Loss 8.5998 LearningRate 0.0642 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:52,714-Speed 3406.50 samples/sec Loss 8.5893 LearningRate 0.0642 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:55,694-Speed 3437.39 samples/sec Loss 8.5082 LearningRate 0.0642 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:17:58,698-Speed 3409.63 samples/sec Loss 8.5903 LearningRate 0.0642 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:01,673-Speed 3442.29 samples/sec Loss 8.3820 LearningRate 0.0642 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:04,663-Speed 3426.03 samples/sec Loss 8.6337 LearningRate 0.0642 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:07,664-Speed 3413.63 samples/sec Loss 8.4736 LearningRate 0.0641 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:10,664-Speed 3414.05 samples/sec Loss 8.6034 LearningRate 0.0641 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:13,659-Speed 3419.90 samples/sec Loss 8.5430 LearningRate 0.0641 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:16,652-Speed 3422.10 samples/sec Loss 8.6362 LearningRate 0.0641 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:19,649-Speed 3417.19 samples/sec Loss 8.7580 LearningRate 0.0641 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:22,682-Speed 3377.93 samples/sec Loss 8.5636 LearningRate 0.0641 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:25,697-Speed 3396.14 samples/sec Loss 8.5825 LearningRate 0.0641 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:28,704-Speed 3407.31 samples/sec Loss 8.4958 LearningRate 0.0640 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:18:31,711-Speed 3406.05 samples/sec Loss 8.6632 LearningRate 0.0640 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:34,781-Speed 3336.51 samples/sec Loss 8.4433 LearningRate 0.0640 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:48,132-Speed 767.02 samples/sec Loss 7.9343 LearningRate 0.0640 Epoch: 4 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:51,350-Speed 3183.28 samples/sec Loss 7.7936 LearningRate 0.0640 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:54,385-Speed 3375.51 samples/sec Loss 7.7007 LearningRate 0.0640 Epoch: 4 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:18:57,519-Speed 3268.48 samples/sec Loss 7.9811 LearningRate 0.0639 Epoch: 4 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:00,546-Speed 3384.09 samples/sec Loss 7.8085 LearningRate 0.0639 Epoch: 4 Global Step: 20280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:03,613-Speed 3338.87 samples/sec Loss 7.8509 LearningRate 0.0639 Epoch: 4 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:06,637-Speed 3387.15 samples/sec Loss 7.7147 LearningRate 0.0639 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:09,637-Speed 3414.48 samples/sec Loss 7.7122 LearningRate 0.0639 Epoch: 4 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:12,604-Speed 3453.25 samples/sec Loss 7.8469 LearningRate 0.0639 Epoch: 4 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:15,610-Speed 3406.77 samples/sec Loss 7.9488 LearningRate 0.0638 Epoch: 4 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:18,630-Speed 3392.38 samples/sec Loss 7.8415 LearningRate 0.0638 Epoch: 4 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:21,642-Speed 3400.76 samples/sec Loss 7.9803 LearningRate 0.0638 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:24,640-Speed 3416.02 samples/sec Loss 8.0284 LearningRate 0.0638 Epoch: 4 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:27,658-Speed 3394.04 samples/sec Loss 7.9587 LearningRate 0.0638 Epoch: 4 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:30,655-Speed 3417.40 samples/sec Loss 7.9091 LearningRate 0.0638 Epoch: 4 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:33,652-Speed 3418.13 samples/sec Loss 7.9865 LearningRate 0.0638 Epoch: 4 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:36,652-Speed 3414.90 samples/sec Loss 7.8677 LearningRate 0.0637 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:39,662-Speed 3401.83 samples/sec Loss 7.8742 LearningRate 0.0637 Epoch: 4 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:42,663-Speed 3413.22 samples/sec Loss 8.0067 LearningRate 0.0637 Epoch: 4 Global Step: 20420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:19:45,645-Speed 3435.33 samples/sec Loss 8.0516 LearningRate 0.0637 Epoch: 4 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:48,644-Speed 3415.70 samples/sec Loss 8.0048 LearningRate 0.0637 Epoch: 4 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:51,635-Speed 3423.94 samples/sec Loss 8.0004 LearningRate 0.0637 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:19:54,640-Speed 3408.86 samples/sec Loss 7.8946 LearningRate 0.0636 Epoch: 4 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:19:57,643-Speed 3411.22 samples/sec Loss 7.9057 LearningRate 0.0636 Epoch: 4 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:00,642-Speed 3415.50 samples/sec Loss 8.0310 LearningRate 0.0636 Epoch: 4 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:03,651-Speed 3403.36 samples/sec Loss 8.0589 LearningRate 0.0636 Epoch: 4 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:06,651-Speed 3414.53 samples/sec Loss 7.8693 LearningRate 0.0636 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:09,650-Speed 3415.01 samples/sec Loss 8.0731 LearningRate 0.0636 Epoch: 4 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:12,644-Speed 3421.82 samples/sec Loss 8.0470 LearningRate 0.0635 Epoch: 4 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:15,643-Speed 3415.23 samples/sec Loss 8.1586 LearningRate 0.0635 Epoch: 4 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:18,641-Speed 3416.76 samples/sec Loss 8.0575 LearningRate 0.0635 Epoch: 4 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:21,641-Speed 3413.64 samples/sec Loss 8.0228 LearningRate 0.0635 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:20:24,676-Speed 3375.26 samples/sec Loss 8.0619 LearningRate 0.0635 Epoch: 4 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:27,680-Speed 3409.12 samples/sec Loss 8.0964 LearningRate 0.0635 Epoch: 4 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:30,678-Speed 3416.98 samples/sec Loss 8.3191 LearningRate 0.0635 Epoch: 4 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:33,686-Speed 3405.35 samples/sec Loss 8.1784 LearningRate 0.0634 Epoch: 4 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:36,698-Speed 3399.87 samples/sec Loss 8.1069 LearningRate 0.0634 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:39,709-Speed 3401.44 samples/sec Loss 8.1440 LearningRate 0.0634 Epoch: 4 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:42,713-Speed 3410.10 samples/sec Loss 8.2036 LearningRate 0.0634 Epoch: 4 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:45,713-Speed 3414.35 samples/sec Loss 8.0395 LearningRate 0.0634 Epoch: 4 Global Step: 20630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:48,714-Speed 3413.36 samples/sec Loss 8.0591 LearningRate 0.0634 Epoch: 4 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:51,717-Speed 3410.60 samples/sec Loss 8.1078 LearningRate 0.0633 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:54,695-Speed 3438.67 samples/sec Loss 7.8944 LearningRate 0.0633 Epoch: 4 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:20:57,702-Speed 3406.63 samples/sec Loss 8.2395 LearningRate 0.0633 Epoch: 4 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:00,715-Speed 3399.81 samples/sec Loss 8.2620 LearningRate 0.0633 Epoch: 4 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:03,720-Speed 3408.74 samples/sec Loss 8.2104 LearningRate 0.0633 Epoch: 4 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:06,722-Speed 3411.35 samples/sec Loss 8.2013 LearningRate 0.0633 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:09,720-Speed 3416.20 samples/sec Loss 8.1628 LearningRate 0.0632 Epoch: 4 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:12,719-Speed 3415.52 samples/sec Loss 8.1601 LearningRate 0.0632 Epoch: 4 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:15,729-Speed 3403.05 samples/sec Loss 7.9581 LearningRate 0.0632 Epoch: 4 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:18,736-Speed 3406.04 samples/sec Loss 8.2598 LearningRate 0.0632 Epoch: 4 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:21,734-Speed 3416.39 samples/sec Loss 8.1643 LearningRate 0.0632 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:24,728-Speed 3421.41 samples/sec Loss 8.1644 LearningRate 0.0632 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:21:27,834-Speed 3297.70 samples/sec Loss 8.1745 LearningRate 0.0632 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:31,120-Speed 3117.06 samples/sec Loss 8.2299 LearningRate 0.0631 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:34,123-Speed 3410.86 samples/sec Loss 8.1230 LearningRate 0.0631 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:37,128-Speed 3407.81 samples/sec Loss 8.1912 LearningRate 0.0631 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:40,145-Speed 3395.38 samples/sec Loss 8.2189 LearningRate 0.0631 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:43,163-Speed 3394.21 samples/sec Loss 8.0781 LearningRate 0.0631 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:46,161-Speed 3416.45 samples/sec Loss 8.1212 LearningRate 0.0631 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:49,161-Speed 3414.42 samples/sec Loss 8.1477 LearningRate 0.0630 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:52,164-Speed 3410.18 samples/sec Loss 8.1470 LearningRate 0.0630 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:55,171-Speed 3406.60 samples/sec Loss 8.1489 LearningRate 0.0630 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:21:58,150-Speed 3438.46 samples/sec Loss 8.3649 LearningRate 0.0630 Epoch: 4 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:01,174-Speed 3386.97 samples/sec Loss 8.2322 LearningRate 0.0630 Epoch: 4 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:04,174-Speed 3415.34 samples/sec Loss 8.1957 LearningRate 0.0630 Epoch: 4 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:07,175-Speed 3413.18 samples/sec Loss 8.2540 LearningRate 0.0629 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:10,176-Speed 3412.91 samples/sec Loss 8.1195 LearningRate 0.0629 Epoch: 4 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:13,179-Speed 3410.12 samples/sec Loss 8.0945 LearningRate 0.0629 Epoch: 4 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:16,184-Speed 3408.41 samples/sec Loss 8.2974 LearningRate 0.0629 Epoch: 4 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:19,168-Speed 3433.17 samples/sec Loss 8.2632 LearningRate 0.0629 Epoch: 4 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:22,170-Speed 3412.06 samples/sec Loss 8.0493 LearningRate 0.0629 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:25,174-Speed 3409.48 samples/sec Loss 8.2155 LearningRate 0.0629 Epoch: 4 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:28,195-Speed 3389.76 samples/sec Loss 8.4436 LearningRate 0.0628 Epoch: 4 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:31,201-Speed 3407.86 samples/sec Loss 8.1387 LearningRate 0.0628 Epoch: 4 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:34,203-Speed 3412.93 samples/sec Loss 8.2299 LearningRate 0.0628 Epoch: 4 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:37,222-Speed 3392.41 samples/sec Loss 8.1877 LearningRate 0.0628 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:40,224-Speed 3411.13 samples/sec Loss 8.2844 LearningRate 0.0628 Epoch: 4 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:43,224-Speed 3414.69 samples/sec Loss 8.2174 LearningRate 0.0628 Epoch: 4 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:46,227-Speed 3409.89 samples/sec Loss 8.2709 LearningRate 0.0627 Epoch: 4 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:22:49,242-Speed 3397.70 samples/sec Loss 8.1808 LearningRate 0.0627 Epoch: 4 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:52,267-Speed 3385.80 samples/sec Loss 8.3889 LearningRate 0.0627 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:55,276-Speed 3404.39 samples/sec Loss 8.2528 LearningRate 0.0627 Epoch: 4 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:22:58,292-Speed 3396.39 samples/sec Loss 8.1934 LearningRate 0.0627 Epoch: 4 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:01,306-Speed 3398.62 samples/sec Loss 8.3933 LearningRate 0.0627 Epoch: 4 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:04,360-Speed 3353.41 samples/sec Loss 8.1675 LearningRate 0.0627 Epoch: 4 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:07,372-Speed 3400.18 samples/sec Loss 8.4810 LearningRate 0.0626 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:10,386-Speed 3398.73 samples/sec Loss 8.2008 LearningRate 0.0626 Epoch: 4 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:13,390-Speed 3409.36 samples/sec Loss 8.3888 LearningRate 0.0626 Epoch: 4 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:16,400-Speed 3402.79 samples/sec Loss 8.2012 LearningRate 0.0626 Epoch: 4 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:19,406-Speed 3406.98 samples/sec Loss 8.2221 LearningRate 0.0626 Epoch: 4 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:22,423-Speed 3395.51 samples/sec Loss 8.3858 LearningRate 0.0626 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:25,436-Speed 3399.78 samples/sec Loss 8.3609 LearningRate 0.0625 Epoch: 4 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:28,436-Speed 3414.26 samples/sec Loss 8.3393 LearningRate 0.0625 Epoch: 4 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:31,439-Speed 3410.39 samples/sec Loss 8.2111 LearningRate 0.0625 Epoch: 4 Global Step: 21180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:34,444-Speed 3408.39 samples/sec Loss 8.4739 LearningRate 0.0625 Epoch: 4 Global Step: 21190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:37,446-Speed 3411.63 samples/sec Loss 8.2169 LearningRate 0.0625 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:40,450-Speed 3409.87 samples/sec Loss 8.2932 LearningRate 0.0625 Epoch: 4 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:43,450-Speed 3414.23 samples/sec Loss 8.2490 LearningRate 0.0624 Epoch: 4 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:23:46,436-Speed 3429.48 samples/sec Loss 8.2745 LearningRate 0.0624 Epoch: 4 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:49,459-Speed 3388.52 samples/sec Loss 8.4396 LearningRate 0.0624 Epoch: 4 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:52,483-Speed 3387.49 samples/sec Loss 8.3301 LearningRate 0.0624 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:55,498-Speed 3396.95 samples/sec Loss 8.3204 LearningRate 0.0624 Epoch: 4 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:23:58,512-Speed 3398.57 samples/sec Loss 8.2095 LearningRate 0.0624 Epoch: 4 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:01,530-Speed 3393.47 samples/sec Loss 8.2336 LearningRate 0.0624 Epoch: 4 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:04,548-Speed 3393.93 samples/sec Loss 8.0777 LearningRate 0.0623 Epoch: 4 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:07,566-Speed 3394.03 samples/sec Loss 8.2049 LearningRate 0.0623 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:10,572-Speed 3407.05 samples/sec Loss 8.2864 LearningRate 0.0623 Epoch: 4 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:13,590-Speed 3394.41 samples/sec Loss 8.4832 LearningRate 0.0623 Epoch: 4 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:16,596-Speed 3406.56 samples/sec Loss 8.3607 LearningRate 0.0623 Epoch: 4 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:19,603-Speed 3406.52 samples/sec Loss 8.3362 LearningRate 0.0623 Epoch: 4 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:22,604-Speed 3413.46 samples/sec Loss 8.3772 LearningRate 0.0622 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:25,621-Speed 3394.59 samples/sec Loss 8.2905 LearningRate 0.0622 Epoch: 4 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:28,626-Speed 3409.14 samples/sec Loss 8.0992 LearningRate 0.0622 Epoch: 4 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:31,638-Speed 3399.84 samples/sec Loss 8.3528 LearningRate 0.0622 Epoch: 4 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:34,644-Speed 3407.78 samples/sec Loss 8.3298 LearningRate 0.0622 Epoch: 4 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:37,648-Speed 3409.93 samples/sec Loss 8.2069 LearningRate 0.0622 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:24:40,641-Speed 3421.73 samples/sec Loss 8.1928 LearningRate 0.0622 Epoch: 4 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:43,650-Speed 3404.99 samples/sec Loss 8.3812 LearningRate 0.0621 Epoch: 4 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:46,654-Speed 3409.93 samples/sec Loss 8.1377 LearningRate 0.0621 Epoch: 4 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:49,656-Speed 3411.85 samples/sec Loss 8.1878 LearningRate 0.0621 Epoch: 4 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:52,676-Speed 3391.98 samples/sec Loss 8.3613 LearningRate 0.0621 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:55,683-Speed 3405.79 samples/sec Loss 8.3640 LearningRate 0.0621 Epoch: 4 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:24:58,698-Speed 3397.33 samples/sec Loss 8.4452 LearningRate 0.0621 Epoch: 4 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:01,702-Speed 3409.38 samples/sec Loss 8.1906 LearningRate 0.0620 Epoch: 4 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:04,718-Speed 3395.45 samples/sec Loss 8.1506 LearningRate 0.0620 Epoch: 4 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:07,725-Speed 3406.65 samples/sec Loss 8.3639 LearningRate 0.0620 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:10,844-Speed 3284.39 samples/sec Loss 8.3483 LearningRate 0.0620 Epoch: 4 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:13,861-Speed 3395.17 samples/sec Loss 8.2253 LearningRate 0.0620 Epoch: 4 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:16,866-Speed 3408.55 samples/sec Loss 8.2404 LearningRate 0.0620 Epoch: 4 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:19,876-Speed 3402.80 samples/sec Loss 8.0321 LearningRate 0.0619 Epoch: 4 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:22,879-Speed 3411.21 samples/sec Loss 8.3055 LearningRate 0.0619 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:25,891-Speed 3400.59 samples/sec Loss 8.3655 LearningRate 0.0619 Epoch: 4 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:28,896-Speed 3408.04 samples/sec Loss 8.4950 LearningRate 0.0619 Epoch: 4 Global Step: 21570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:31,906-Speed 3402.73 samples/sec Loss 8.2882 LearningRate 0.0619 Epoch: 4 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:34,913-Speed 3406.56 samples/sec Loss 8.1481 LearningRate 0.0619 Epoch: 4 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:37,938-Speed 3386.26 samples/sec Loss 8.3222 LearningRate 0.0619 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:40,949-Speed 3400.43 samples/sec Loss 8.1510 LearningRate 0.0618 Epoch: 4 Global Step: 21610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 01:25:43,959-Speed 3404.06 samples/sec Loss 8.1721 LearningRate 0.0618 Epoch: 4 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:25:46,946-Speed 3428.84 samples/sec Loss 8.2161 LearningRate 0.0618 Epoch: 4 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:49,957-Speed 3401.15 samples/sec Loss 8.1645 LearningRate 0.0618 Epoch: 4 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:52,961-Speed 3410.73 samples/sec Loss 8.2286 LearningRate 0.0618 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:55,968-Speed 3406.25 samples/sec Loss 8.2025 LearningRate 0.0618 Epoch: 4 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:25:58,985-Speed 3394.79 samples/sec Loss 8.3900 LearningRate 0.0617 Epoch: 4 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:01,999-Speed 3398.16 samples/sec Loss 8.3408 LearningRate 0.0617 Epoch: 4 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:05,009-Speed 3403.20 samples/sec Loss 8.3014 LearningRate 0.0617 Epoch: 4 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:08,018-Speed 3403.70 samples/sec Loss 8.2647 LearningRate 0.0617 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:11,029-Speed 3402.12 samples/sec Loss 8.2530 LearningRate 0.0617 Epoch: 4 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:14,037-Speed 3405.26 samples/sec Loss 8.2628 LearningRate 0.0617 Epoch: 4 Global Step: 21720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:17,047-Speed 3402.57 samples/sec Loss 8.3624 LearningRate 0.0617 Epoch: 4 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:20,058-Speed 3401.12 samples/sec Loss 8.2826 LearningRate 0.0616 Epoch: 4 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:23,063-Speed 3409.30 samples/sec Loss 8.1801 LearningRate 0.0616 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:26,069-Speed 3407.70 samples/sec Loss 8.3768 LearningRate 0.0616 Epoch: 4 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:29,082-Speed 3399.48 samples/sec Loss 8.4329 LearningRate 0.0616 Epoch: 4 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:32,092-Speed 3402.68 samples/sec Loss 8.2338 LearningRate 0.0616 Epoch: 4 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:35,102-Speed 3402.35 samples/sec Loss 8.2003 LearningRate 0.0616 Epoch: 4 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:38,115-Speed 3399.46 samples/sec Loss 8.1915 LearningRate 0.0615 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:41,127-Speed 3401.41 samples/sec Loss 8.3168 LearningRate 0.0615 Epoch: 4 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:44,140-Speed 3399.55 samples/sec Loss 8.1880 LearningRate 0.0615 Epoch: 4 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:47,136-Speed 3418.86 samples/sec Loss 8.3953 LearningRate 0.0615 Epoch: 4 Global Step: 21830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:50,146-Speed 3402.35 samples/sec Loss 8.2251 LearningRate 0.0615 Epoch: 4 Global Step: 21840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:53,157-Speed 3402.17 samples/sec Loss 8.1693 LearningRate 0.0615 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:26:56,182-Speed 3385.81 samples/sec Loss 8.2804 LearningRate 0.0615 Epoch: 4 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:26:59,195-Speed 3399.76 samples/sec Loss 8.3251 LearningRate 0.0614 Epoch: 4 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:02,218-Speed 3387.15 samples/sec Loss 8.3297 LearningRate 0.0614 Epoch: 4 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:05,231-Speed 3399.86 samples/sec Loss 8.1510 LearningRate 0.0614 Epoch: 4 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:08,251-Speed 3391.77 samples/sec Loss 8.1682 LearningRate 0.0614 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:11,264-Speed 3399.44 samples/sec Loss 8.2686 LearningRate 0.0614 Epoch: 4 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:14,270-Speed 3407.48 samples/sec Loss 8.2927 LearningRate 0.0614 Epoch: 4 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:17,290-Speed 3391.60 samples/sec Loss 8.3009 LearningRate 0.0613 Epoch: 4 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:20,300-Speed 3403.07 samples/sec Loss 8.3254 LearningRate 0.0613 Epoch: 4 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:23,305-Speed 3408.07 samples/sec Loss 8.1348 LearningRate 0.0613 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:27:26,315-Speed 3403.15 samples/sec Loss 8.3036 LearningRate 0.0613 Epoch: 4 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:27:29,325-Speed 3403.64 samples/sec Loss 8.1635 LearningRate 0.0613 Epoch: 4 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:27:32,334-Speed 3404.36 samples/sec Loss 8.1914 LearningRate 0.0613 Epoch: 4 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:27:35,339-Speed 3407.77 samples/sec Loss 8.2857 LearningRate 0.0612 Epoch: 4 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:27:38,348-Speed 3404.68 samples/sec Loss 8.1695 LearningRate 0.0612 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:28:22,637-[lfw][22000]XNorm: 23.486292 Training: 2022-04-11 01:28:22,637-[lfw][22000]Accuracy-Flip: 0.99667+-0.00289 Training: 2022-04-11 01:28:22,638-[lfw][22000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:29:14,206-[cfp_fp][22000]XNorm: 20.841552 Training: 2022-04-11 01:29:14,206-[cfp_fp][22000]Accuracy-Flip: 0.95629+-0.01297 Training: 2022-04-11 01:29:14,207-[cfp_fp][22000]Accuracy-Highest: 0.95629 Training: 2022-04-11 01:29:58,255-[agedb_30][22000]XNorm: 23.208466 Training: 2022-04-11 01:29:58,255-[agedb_30][22000]Accuracy-Flip: 0.97200+-0.00963 Training: 2022-04-11 01:29:58,256-[agedb_30][22000]Accuracy-Highest: 0.97383 Training: 2022-04-11 01:30:01,257-Speed 71.65 samples/sec Loss 8.3200 LearningRate 0.0612 Epoch: 4 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:04,259-Speed 3411.89 samples/sec Loss 8.2228 LearningRate 0.0612 Epoch: 4 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:07,254-Speed 3420.19 samples/sec Loss 8.2490 LearningRate 0.0612 Epoch: 4 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:10,872-Speed 2831.29 samples/sec Loss 8.3996 LearningRate 0.0612 Epoch: 4 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:14,661-Speed 2702.80 samples/sec Loss 8.2947 LearningRate 0.0612 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:17,646-Speed 3431.26 samples/sec Loss 8.2320 LearningRate 0.0611 Epoch: 4 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:20,642-Speed 3418.80 samples/sec Loss 8.2560 LearningRate 0.0611 Epoch: 4 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:23,656-Speed 3398.91 samples/sec Loss 8.3180 LearningRate 0.0611 Epoch: 4 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:26,649-Speed 3421.22 samples/sec Loss 8.4702 LearningRate 0.0611 Epoch: 4 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:30:29,635-Speed 3430.68 samples/sec Loss 8.1963 LearningRate 0.0611 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:32,636-Speed 3413.14 samples/sec Loss 8.2689 LearningRate 0.0611 Epoch: 4 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:35,645-Speed 3404.66 samples/sec Loss 8.3522 LearningRate 0.0610 Epoch: 4 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:38,651-Speed 3407.05 samples/sec Loss 8.2672 LearningRate 0.0610 Epoch: 4 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:41,648-Speed 3417.04 samples/sec Loss 8.2812 LearningRate 0.0610 Epoch: 4 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:44,649-Speed 3414.08 samples/sec Loss 8.1231 LearningRate 0.0610 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:47,644-Speed 3419.96 samples/sec Loss 8.3632 LearningRate 0.0610 Epoch: 4 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:50,639-Speed 3420.18 samples/sec Loss 8.2104 LearningRate 0.0610 Epoch: 4 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:53,694-Speed 3352.82 samples/sec Loss 8.3058 LearningRate 0.0610 Epoch: 4 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:56,693-Speed 3414.42 samples/sec Loss 8.1909 LearningRate 0.0609 Epoch: 4 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:30:59,696-Speed 3411.40 samples/sec Loss 8.3534 LearningRate 0.0609 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:31:02,708-Speed 3400.02 samples/sec Loss 8.3550 LearningRate 0.0609 Epoch: 4 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:31:05,702-Speed 3421.74 samples/sec Loss 8.1106 LearningRate 0.0609 Epoch: 4 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:08,704-Speed 3411.39 samples/sec Loss 8.3815 LearningRate 0.0609 Epoch: 4 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:11,719-Speed 3397.20 samples/sec Loss 8.1367 LearningRate 0.0609 Epoch: 4 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:14,744-Speed 3386.49 samples/sec Loss 8.0509 LearningRate 0.0608 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:17,751-Speed 3406.23 samples/sec Loss 8.0390 LearningRate 0.0608 Epoch: 4 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:20,751-Speed 3413.89 samples/sec Loss 8.2850 LearningRate 0.0608 Epoch: 4 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:23,768-Speed 3395.21 samples/sec Loss 8.1429 LearningRate 0.0608 Epoch: 4 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:26,768-Speed 3414.54 samples/sec Loss 8.1983 LearningRate 0.0608 Epoch: 4 Global Step: 22290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:29,786-Speed 3393.38 samples/sec Loss 8.3444 LearningRate 0.0608 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:32,796-Speed 3402.81 samples/sec Loss 8.1696 LearningRate 0.0608 Epoch: 4 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:35,806-Speed 3403.55 samples/sec Loss 8.1886 LearningRate 0.0607 Epoch: 4 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:31:38,807-Speed 3412.25 samples/sec Loss 8.2778 LearningRate 0.0607 Epoch: 4 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:31:41,805-Speed 3416.33 samples/sec Loss 8.2510 LearningRate 0.0607 Epoch: 4 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:31:44,783-Speed 3439.65 samples/sec Loss 8.2025 LearningRate 0.0607 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:47,784-Speed 3412.86 samples/sec Loss 8.2331 LearningRate 0.0607 Epoch: 4 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:50,783-Speed 3416.36 samples/sec Loss 8.1805 LearningRate 0.0607 Epoch: 4 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:53,781-Speed 3415.80 samples/sec Loss 8.1194 LearningRate 0.0606 Epoch: 4 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:56,787-Speed 3407.77 samples/sec Loss 8.4033 LearningRate 0.0606 Epoch: 4 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:31:59,784-Speed 3416.94 samples/sec Loss 8.1784 LearningRate 0.0606 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:02,789-Speed 3408.82 samples/sec Loss 8.2102 LearningRate 0.0606 Epoch: 4 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:05,785-Speed 3419.90 samples/sec Loss 8.1689 LearningRate 0.0606 Epoch: 4 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:08,782-Speed 3417.38 samples/sec Loss 8.1987 LearningRate 0.0606 Epoch: 4 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:11,781-Speed 3414.95 samples/sec Loss 8.3401 LearningRate 0.0606 Epoch: 4 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:14,782-Speed 3414.52 samples/sec Loss 7.9934 LearningRate 0.0605 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:32:17,781-Speed 3415.32 samples/sec Loss 8.3744 LearningRate 0.0605 Epoch: 4 Global Step: 22460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:32:20,778-Speed 3417.58 samples/sec Loss 8.3695 LearningRate 0.0605 Epoch: 4 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:32:23,777-Speed 3415.36 samples/sec Loss 8.3371 LearningRate 0.0605 Epoch: 4 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:32:26,786-Speed 3403.80 samples/sec Loss 8.3959 LearningRate 0.0605 Epoch: 4 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:32:29,778-Speed 3422.91 samples/sec Loss 8.2073 LearningRate 0.0605 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:32,790-Speed 3400.05 samples/sec Loss 8.1874 LearningRate 0.0604 Epoch: 4 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:35,789-Speed 3416.49 samples/sec Loss 8.2731 LearningRate 0.0604 Epoch: 4 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:38,785-Speed 3418.29 samples/sec Loss 8.2355 LearningRate 0.0604 Epoch: 4 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:41,789-Speed 3410.28 samples/sec Loss 8.1561 LearningRate 0.0604 Epoch: 4 Global Step: 22540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:44,784-Speed 3420.23 samples/sec Loss 8.1498 LearningRate 0.0604 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:47,782-Speed 3416.20 samples/sec Loss 8.2323 LearningRate 0.0604 Epoch: 4 Global Step: 22560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:50,778-Speed 3418.31 samples/sec Loss 8.2034 LearningRate 0.0604 Epoch: 4 Global Step: 22570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:53,783-Speed 3408.15 samples/sec Loss 8.1938 LearningRate 0.0603 Epoch: 4 Global Step: 22580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:56,782-Speed 3415.80 samples/sec Loss 8.1246 LearningRate 0.0603 Epoch: 4 Global Step: 22590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:32:59,780-Speed 3416.75 samples/sec Loss 8.1567 LearningRate 0.0603 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:02,778-Speed 3416.71 samples/sec Loss 8.2261 LearningRate 0.0603 Epoch: 4 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:05,815-Speed 3372.95 samples/sec Loss 8.0918 LearningRate 0.0603 Epoch: 4 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:08,855-Speed 3368.47 samples/sec Loss 8.2448 LearningRate 0.0603 Epoch: 4 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:11,851-Speed 3418.86 samples/sec Loss 8.2412 LearningRate 0.0602 Epoch: 4 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:14,859-Speed 3405.61 samples/sec Loss 8.2117 LearningRate 0.0602 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:17,864-Speed 3408.43 samples/sec Loss 8.1741 LearningRate 0.0602 Epoch: 4 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:20,868-Speed 3410.48 samples/sec Loss 8.2635 LearningRate 0.0602 Epoch: 4 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:23,869-Speed 3412.23 samples/sec Loss 8.2326 LearningRate 0.0602 Epoch: 4 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:26,888-Speed 3393.31 samples/sec Loss 8.1451 LearningRate 0.0602 Epoch: 4 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:29,878-Speed 3425.92 samples/sec Loss 8.0802 LearningRate 0.0602 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:32,890-Speed 3400.65 samples/sec Loss 8.2442 LearningRate 0.0601 Epoch: 4 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:35,888-Speed 3416.40 samples/sec Loss 8.2694 LearningRate 0.0601 Epoch: 4 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:38,917-Speed 3381.08 samples/sec Loss 8.3074 LearningRate 0.0601 Epoch: 4 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:41,915-Speed 3416.86 samples/sec Loss 8.2417 LearningRate 0.0601 Epoch: 4 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:44,918-Speed 3411.07 samples/sec Loss 8.2055 LearningRate 0.0601 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:47,915-Speed 3417.70 samples/sec Loss 8.2032 LearningRate 0.0601 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:50,914-Speed 3415.10 samples/sec Loss 8.1770 LearningRate 0.0600 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:53,912-Speed 3416.76 samples/sec Loss 8.2619 LearningRate 0.0600 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:56,916-Speed 3409.56 samples/sec Loss 8.2343 LearningRate 0.0600 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:33:59,897-Speed 3435.66 samples/sec Loss 8.2389 LearningRate 0.0600 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:02,899-Speed 3412.70 samples/sec Loss 8.1869 LearningRate 0.0600 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:05,915-Speed 3395.89 samples/sec Loss 8.1551 LearningRate 0.0600 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:08,917-Speed 3411.54 samples/sec Loss 8.1237 LearningRate 0.0600 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:11,916-Speed 3415.06 samples/sec Loss 8.0531 LearningRate 0.0599 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:14,915-Speed 3416.13 samples/sec Loss 8.1452 LearningRate 0.0599 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:17,919-Speed 3409.48 samples/sec Loss 8.1046 LearningRate 0.0599 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:20,916-Speed 3417.23 samples/sec Loss 8.2249 LearningRate 0.0599 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:23,922-Speed 3406.81 samples/sec Loss 8.3966 LearningRate 0.0599 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:26,945-Speed 3389.13 samples/sec Loss 8.0804 LearningRate 0.0599 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:29,929-Speed 3432.87 samples/sec Loss 8.2382 LearningRate 0.0598 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:32,935-Speed 3407.22 samples/sec Loss 8.0732 LearningRate 0.0598 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:35,942-Speed 3405.76 samples/sec Loss 8.1616 LearningRate 0.0598 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:38,941-Speed 3415.18 samples/sec Loss 8.0908 LearningRate 0.0598 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:34:41,923-Speed 3434.97 samples/sec Loss 8.1425 LearningRate 0.0598 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:44,930-Speed 3406.25 samples/sec Loss 8.2797 LearningRate 0.0598 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:47,938-Speed 3404.98 samples/sec Loss 8.3595 LearningRate 0.0598 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:50,952-Speed 3398.52 samples/sec Loss 8.1791 LearningRate 0.0597 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:53,954-Speed 3412.03 samples/sec Loss 8.1469 LearningRate 0.0597 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:56,969-Speed 3397.16 samples/sec Loss 8.0420 LearningRate 0.0597 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:34:59,994-Speed 3386.35 samples/sec Loss 8.2114 LearningRate 0.0597 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:03,020-Speed 3385.02 samples/sec Loss 8.0636 LearningRate 0.0597 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:06,025-Speed 3408.67 samples/sec Loss 8.1125 LearningRate 0.0597 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:09,044-Speed 3392.92 samples/sec Loss 8.1330 LearningRate 0.0597 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:12,046-Speed 3411.27 samples/sec Loss 8.1459 LearningRate 0.0596 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:15,061-Speed 3397.01 samples/sec Loss 8.2145 LearningRate 0.0596 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:18,068-Speed 3406.21 samples/sec Loss 8.1602 LearningRate 0.0596 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:21,074-Speed 3407.95 samples/sec Loss 8.0803 LearningRate 0.0596 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:24,073-Speed 3414.84 samples/sec Loss 8.1580 LearningRate 0.0596 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:27,076-Speed 3411.59 samples/sec Loss 8.0839 LearningRate 0.0596 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:30,077-Speed 3413.07 samples/sec Loss 8.0778 LearningRate 0.0595 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:33,080-Speed 3409.92 samples/sec Loss 8.1515 LearningRate 0.0595 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:36,093-Speed 3399.62 samples/sec Loss 8.0621 LearningRate 0.0595 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:35:39,084-Speed 3424.83 samples/sec Loss 8.2105 LearningRate 0.0595 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:42,088-Speed 3409.18 samples/sec Loss 8.1254 LearningRate 0.0595 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:45,099-Speed 3402.39 samples/sec Loss 8.0255 LearningRate 0.0595 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:48,107-Speed 3404.92 samples/sec Loss 8.1329 LearningRate 0.0595 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:51,109-Speed 3411.79 samples/sec Loss 8.2252 LearningRate 0.0594 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:54,113-Speed 3409.46 samples/sec Loss 8.2022 LearningRate 0.0594 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:35:57,117-Speed 3410.40 samples/sec Loss 8.0545 LearningRate 0.0594 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:36:00,122-Speed 3408.09 samples/sec Loss 8.2053 LearningRate 0.0594 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:36:03,136-Speed 3398.75 samples/sec Loss 8.1135 LearningRate 0.0594 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:36:06,147-Speed 3401.90 samples/sec Loss 8.0220 LearningRate 0.0594 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:36:09,150-Speed 3410.07 samples/sec Loss 8.0186 LearningRate 0.0593 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:12,160-Speed 3403.25 samples/sec Loss 8.0450 LearningRate 0.0593 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:15,171-Speed 3401.32 samples/sec Loss 8.0324 LearningRate 0.0593 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:18,186-Speed 3397.72 samples/sec Loss 7.9976 LearningRate 0.0593 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:21,192-Speed 3407.66 samples/sec Loss 8.2415 LearningRate 0.0593 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:24,220-Speed 3382.60 samples/sec Loss 8.0015 LearningRate 0.0593 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:27,228-Speed 3404.25 samples/sec Loss 8.0992 LearningRate 0.0593 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:30,241-Speed 3400.00 samples/sec Loss 8.1633 LearningRate 0.0592 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:33,245-Speed 3410.35 samples/sec Loss 8.1471 LearningRate 0.0592 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:36,245-Speed 3413.74 samples/sec Loss 8.1440 LearningRate 0.0592 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:39,250-Speed 3408.91 samples/sec Loss 8.1009 LearningRate 0.0592 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:42,254-Speed 3409.72 samples/sec Loss 8.1107 LearningRate 0.0592 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:45,255-Speed 3412.91 samples/sec Loss 8.1241 LearningRate 0.0592 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:48,263-Speed 3405.47 samples/sec Loss 8.2195 LearningRate 0.0591 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:51,268-Speed 3409.12 samples/sec Loss 8.1908 LearningRate 0.0591 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:54,275-Speed 3405.10 samples/sec Loss 8.0809 LearningRate 0.0591 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:36:57,278-Speed 3411.59 samples/sec Loss 8.1963 LearningRate 0.0591 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:00,284-Speed 3407.07 samples/sec Loss 8.0470 LearningRate 0.0591 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:03,301-Speed 3395.63 samples/sec Loss 8.0621 LearningRate 0.0591 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:06,313-Speed 3399.84 samples/sec Loss 8.2121 LearningRate 0.0591 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:09,317-Speed 3410.16 samples/sec Loss 8.0010 LearningRate 0.0590 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:12,321-Speed 3409.76 samples/sec Loss 8.1549 LearningRate 0.0590 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:15,337-Speed 3395.77 samples/sec Loss 8.1419 LearningRate 0.0590 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:18,345-Speed 3404.99 samples/sec Loss 8.1074 LearningRate 0.0590 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:21,351-Speed 3407.96 samples/sec Loss 8.2449 LearningRate 0.0590 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:24,360-Speed 3403.34 samples/sec Loss 8.0465 LearningRate 0.0590 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:27,383-Speed 3388.81 samples/sec Loss 8.2949 LearningRate 0.0590 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:30,387-Speed 3409.69 samples/sec Loss 8.0493 LearningRate 0.0589 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:33,402-Speed 3397.23 samples/sec Loss 8.0787 LearningRate 0.0589 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:36,402-Speed 3413.84 samples/sec Loss 8.0224 LearningRate 0.0589 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:39,409-Speed 3406.34 samples/sec Loss 8.0661 LearningRate 0.0589 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:42,415-Speed 3407.16 samples/sec Loss 8.1630 LearningRate 0.0589 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:45,419-Speed 3409.36 samples/sec Loss 8.2053 LearningRate 0.0589 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:37:48,423-Speed 3410.08 samples/sec Loss 8.1354 LearningRate 0.0588 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:51,436-Speed 3399.34 samples/sec Loss 7.8267 LearningRate 0.0588 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:54,444-Speed 3405.10 samples/sec Loss 7.9867 LearningRate 0.0588 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:37:57,454-Speed 3403.21 samples/sec Loss 8.0945 LearningRate 0.0588 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:00,466-Speed 3400.53 samples/sec Loss 8.1085 LearningRate 0.0588 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:03,469-Speed 3410.61 samples/sec Loss 8.2847 LearningRate 0.0588 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:06,470-Speed 3413.50 samples/sec Loss 8.1574 LearningRate 0.0588 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:09,476-Speed 3407.78 samples/sec Loss 8.2138 LearningRate 0.0587 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:12,484-Speed 3404.51 samples/sec Loss 8.0572 LearningRate 0.0587 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:15,494-Speed 3403.59 samples/sec Loss 7.9426 LearningRate 0.0587 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:18,509-Speed 3396.34 samples/sec Loss 8.0106 LearningRate 0.0587 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:21,512-Speed 3412.00 samples/sec Loss 7.9277 LearningRate 0.0587 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:24,517-Speed 3408.58 samples/sec Loss 7.9842 LearningRate 0.0587 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:27,529-Speed 3400.05 samples/sec Loss 8.2156 LearningRate 0.0586 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:30,531-Speed 3411.43 samples/sec Loss 8.1039 LearningRate 0.0586 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:33,534-Speed 3411.52 samples/sec Loss 8.1977 LearningRate 0.0586 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:36,549-Speed 3397.20 samples/sec Loss 7.9335 LearningRate 0.0586 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:39,550-Speed 3412.21 samples/sec Loss 8.1715 LearningRate 0.0586 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:42,557-Speed 3406.79 samples/sec Loss 8.0161 LearningRate 0.0586 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:45,618-Speed 3345.95 samples/sec Loss 7.9748 LearningRate 0.0586 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:48,608-Speed 3425.61 samples/sec Loss 7.8902 LearningRate 0.0585 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:51,611-Speed 3411.06 samples/sec Loss 8.1437 LearningRate 0.0585 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:38:54,608-Speed 3417.44 samples/sec Loss 8.0447 LearningRate 0.0585 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:38:57,612-Speed 3410.42 samples/sec Loss 7.9334 LearningRate 0.0585 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:00,617-Speed 3408.03 samples/sec Loss 7.8058 LearningRate 0.0585 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:03,632-Speed 3397.85 samples/sec Loss 8.1027 LearningRate 0.0585 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:06,644-Speed 3400.24 samples/sec Loss 8.1372 LearningRate 0.0585 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:09,657-Speed 3398.91 samples/sec Loss 8.2689 LearningRate 0.0584 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:12,671-Speed 3398.89 samples/sec Loss 8.1387 LearningRate 0.0584 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:15,679-Speed 3405.88 samples/sec Loss 7.9949 LearningRate 0.0584 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:18,687-Speed 3404.58 samples/sec Loss 8.2649 LearningRate 0.0584 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:21,697-Speed 3403.69 samples/sec Loss 7.9268 LearningRate 0.0584 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:24,700-Speed 3410.65 samples/sec Loss 7.9764 LearningRate 0.0584 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:39:27,709-Speed 3402.91 samples/sec Loss 8.1264 LearningRate 0.0583 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:39:30,692-Speed 3434.18 samples/sec Loss 8.0581 LearningRate 0.0583 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:33,701-Speed 3403.55 samples/sec Loss 8.0882 LearningRate 0.0583 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:36,719-Speed 3394.32 samples/sec Loss 8.0243 LearningRate 0.0583 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:39,732-Speed 3400.51 samples/sec Loss 8.1658 LearningRate 0.0583 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:42,739-Speed 3405.55 samples/sec Loss 8.0538 LearningRate 0.0583 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:45,740-Speed 3414.09 samples/sec Loss 7.9450 LearningRate 0.0583 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:48,743-Speed 3410.11 samples/sec Loss 8.0545 LearningRate 0.0582 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:51,765-Speed 3389.65 samples/sec Loss 8.0820 LearningRate 0.0582 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:54,768-Speed 3410.14 samples/sec Loss 8.0087 LearningRate 0.0582 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:39:57,770-Speed 3411.93 samples/sec Loss 8.0503 LearningRate 0.0582 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:40:00,772-Speed 3411.84 samples/sec Loss 7.9623 LearningRate 0.0582 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:40:45,085-[lfw][24000]XNorm: 22.871274 Training: 2022-04-11 01:40:45,086-[lfw][24000]Accuracy-Flip: 0.99683+-0.00337 Training: 2022-04-11 01:40:45,086-[lfw][24000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:41:36,739-[cfp_fp][24000]XNorm: 19.961849 Training: 2022-04-11 01:41:36,740-[cfp_fp][24000]Accuracy-Flip: 0.94914+-0.00917 Training: 2022-04-11 01:41:36,740-[cfp_fp][24000]Accuracy-Highest: 0.95629 Training: 2022-04-11 01:42:20,873-[agedb_30][24000]XNorm: 22.410717 Training: 2022-04-11 01:42:20,874-[agedb_30][24000]Accuracy-Flip: 0.97450+-0.00723 Training: 2022-04-11 01:42:20,874-[agedb_30][24000]Accuracy-Highest: 0.97450 Training: 2022-04-11 01:42:23,882-Speed 71.55 samples/sec Loss 7.9803 LearningRate 0.0582 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:42:26,862-Speed 3436.77 samples/sec Loss 8.0887 LearningRate 0.0581 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:42:29,825-Speed 3456.36 samples/sec Loss 7.9664 LearningRate 0.0581 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:32,821-Speed 3418.71 samples/sec Loss 7.9786 LearningRate 0.0581 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:35,805-Speed 3432.90 samples/sec Loss 8.1880 LearningRate 0.0581 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:38,793-Speed 3428.32 samples/sec Loss 7.8214 LearningRate 0.0581 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:41,786-Speed 3420.91 samples/sec Loss 7.9863 LearningRate 0.0581 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:44,773-Speed 3428.90 samples/sec Loss 8.0939 LearningRate 0.0581 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:47,769-Speed 3419.90 samples/sec Loss 7.9175 LearningRate 0.0580 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:50,761-Speed 3422.89 samples/sec Loss 7.9904 LearningRate 0.0580 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:53,773-Speed 3401.57 samples/sec Loss 8.0475 LearningRate 0.0580 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:56,771-Speed 3415.87 samples/sec Loss 8.1432 LearningRate 0.0580 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:42:59,769-Speed 3415.94 samples/sec Loss 7.9351 LearningRate 0.0580 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:02,766-Speed 3417.86 samples/sec Loss 7.9011 LearningRate 0.0580 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:05,772-Speed 3407.18 samples/sec Loss 8.0043 LearningRate 0.0580 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:08,768-Speed 3419.93 samples/sec Loss 8.0722 LearningRate 0.0579 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:11,760-Speed 3422.52 samples/sec Loss 8.1036 LearningRate 0.0579 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:14,755-Speed 3420.66 samples/sec Loss 7.9229 LearningRate 0.0579 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:17,750-Speed 3419.18 samples/sec Loss 7.9833 LearningRate 0.0579 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:20,747-Speed 3418.04 samples/sec Loss 7.8508 LearningRate 0.0579 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:23,721-Speed 3444.33 samples/sec Loss 8.0976 LearningRate 0.0579 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:26,718-Speed 3417.26 samples/sec Loss 8.1750 LearningRate 0.0578 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:29,724-Speed 3407.09 samples/sec Loss 7.9565 LearningRate 0.0578 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:32,718-Speed 3421.40 samples/sec Loss 8.0038 LearningRate 0.0578 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:35,721-Speed 3410.24 samples/sec Loss 8.1407 LearningRate 0.0578 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:38,786-Speed 3343.00 samples/sec Loss 8.1029 LearningRate 0.0578 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:41,793-Speed 3406.64 samples/sec Loss 8.0219 LearningRate 0.0578 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:44,785-Speed 3422.34 samples/sec Loss 7.8985 LearningRate 0.0578 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:47,781-Speed 3419.88 samples/sec Loss 8.0982 LearningRate 0.0577 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:50,778-Speed 3417.15 samples/sec Loss 8.0760 LearningRate 0.0577 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:53,771-Speed 3422.00 samples/sec Loss 8.0525 LearningRate 0.0577 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:43:56,745-Speed 3444.43 samples/sec Loss 7.8486 LearningRate 0.0577 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:43:59,738-Speed 3422.10 samples/sec Loss 8.2300 LearningRate 0.0577 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:02,753-Speed 3397.26 samples/sec Loss 8.0383 LearningRate 0.0577 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:05,749-Speed 3419.16 samples/sec Loss 8.0867 LearningRate 0.0577 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:08,742-Speed 3421.70 samples/sec Loss 8.1538 LearningRate 0.0576 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:11,742-Speed 3414.83 samples/sec Loss 7.9436 LearningRate 0.0576 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:14,737-Speed 3419.46 samples/sec Loss 8.0565 LearningRate 0.0576 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:17,731-Speed 3421.11 samples/sec Loss 7.9799 LearningRate 0.0576 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:20,727-Speed 3417.95 samples/sec Loss 8.0079 LearningRate 0.0576 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:23,722-Speed 3420.95 samples/sec Loss 8.0614 LearningRate 0.0576 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:26,741-Speed 3391.78 samples/sec Loss 7.9493 LearningRate 0.0575 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:44:29,718-Speed 3440.88 samples/sec Loss 7.9825 LearningRate 0.0575 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:32,716-Speed 3416.82 samples/sec Loss 8.2419 LearningRate 0.0575 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:35,730-Speed 3398.14 samples/sec Loss 8.0893 LearningRate 0.0575 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:38,726-Speed 3419.50 samples/sec Loss 7.9135 LearningRate 0.0575 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:41,722-Speed 3418.20 samples/sec Loss 8.1192 LearningRate 0.0575 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:44,720-Speed 3417.04 samples/sec Loss 8.1430 LearningRate 0.0575 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:47,711-Speed 3424.44 samples/sec Loss 7.8852 LearningRate 0.0574 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:50,710-Speed 3415.25 samples/sec Loss 8.0522 LearningRate 0.0574 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:53,722-Speed 3400.39 samples/sec Loss 8.0097 LearningRate 0.0574 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:56,729-Speed 3406.74 samples/sec Loss 7.8832 LearningRate 0.0574 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:44:59,767-Speed 3371.11 samples/sec Loss 7.9350 LearningRate 0.0574 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:02,772-Speed 3408.98 samples/sec Loss 8.0160 LearningRate 0.0574 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:05,773-Speed 3411.98 samples/sec Loss 7.9974 LearningRate 0.0574 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:08,778-Speed 3409.29 samples/sec Loss 7.8817 LearningRate 0.0573 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:11,786-Speed 3405.03 samples/sec Loss 7.9494 LearningRate 0.0573 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:14,784-Speed 3415.90 samples/sec Loss 8.0212 LearningRate 0.0573 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:17,783-Speed 3415.40 samples/sec Loss 8.1173 LearningRate 0.0573 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:20,791-Speed 3405.53 samples/sec Loss 7.9745 LearningRate 0.0573 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:23,770-Speed 3439.16 samples/sec Loss 8.0771 LearningRate 0.0573 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:26,764-Speed 3420.35 samples/sec Loss 7.8896 LearningRate 0.0572 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:29,762-Speed 3416.92 samples/sec Loss 8.0049 LearningRate 0.0572 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:32,756-Speed 3420.94 samples/sec Loss 8.1722 LearningRate 0.0572 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:35,760-Speed 3409.00 samples/sec Loss 7.9710 LearningRate 0.0572 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:38,770-Speed 3403.56 samples/sec Loss 8.0750 LearningRate 0.0572 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:41,774-Speed 3409.46 samples/sec Loss 8.0118 LearningRate 0.0572 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:44,778-Speed 3409.21 samples/sec Loss 7.8834 LearningRate 0.0572 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:47,774-Speed 3419.36 samples/sec Loss 8.1744 LearningRate 0.0571 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:50,772-Speed 3416.82 samples/sec Loss 7.8624 LearningRate 0.0571 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:45:53,782-Speed 3403.24 samples/sec Loss 7.9533 LearningRate 0.0571 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:56,778-Speed 3418.76 samples/sec Loss 7.8043 LearningRate 0.0571 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:45:59,775-Speed 3416.76 samples/sec Loss 8.0956 LearningRate 0.0571 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:46:02,771-Speed 3419.67 samples/sec Loss 7.9805 LearningRate 0.0571 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:46:05,764-Speed 3421.27 samples/sec Loss 7.9039 LearningRate 0.0571 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:46:08,769-Speed 3408.62 samples/sec Loss 8.0059 LearningRate 0.0570 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:46:11,766-Speed 3417.80 samples/sec Loss 7.9401 LearningRate 0.0570 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:46:14,745-Speed 3438.00 samples/sec Loss 7.9900 LearningRate 0.0570 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:17,741-Speed 3419.08 samples/sec Loss 8.0222 LearningRate 0.0570 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:20,738-Speed 3418.24 samples/sec Loss 7.9707 LearningRate 0.0570 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:23,735-Speed 3417.54 samples/sec Loss 7.8647 LearningRate 0.0570 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:26,733-Speed 3416.13 samples/sec Loss 7.9280 LearningRate 0.0569 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:29,728-Speed 3419.77 samples/sec Loss 7.9872 LearningRate 0.0569 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:32,739-Speed 3401.90 samples/sec Loss 8.0460 LearningRate 0.0569 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:35,743-Speed 3409.90 samples/sec Loss 8.0555 LearningRate 0.0569 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:38,760-Speed 3395.46 samples/sec Loss 7.8577 LearningRate 0.0569 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:41,760-Speed 3413.99 samples/sec Loss 7.9800 LearningRate 0.0569 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:44,746-Speed 3430.02 samples/sec Loss 7.9375 LearningRate 0.0569 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:47,742-Speed 3418.32 samples/sec Loss 8.0620 LearningRate 0.0568 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:50,743-Speed 3413.94 samples/sec Loss 7.9962 LearningRate 0.0568 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:53,780-Speed 3372.48 samples/sec Loss 7.8722 LearningRate 0.0568 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:56,890-Speed 3293.23 samples/sec Loss 7.8915 LearningRate 0.0568 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:46:59,889-Speed 3414.56 samples/sec Loss 7.9555 LearningRate 0.0568 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:02,891-Speed 3412.03 samples/sec Loss 7.8563 LearningRate 0.0568 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:05,885-Speed 3421.70 samples/sec Loss 7.8015 LearningRate 0.0568 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:08,901-Speed 3396.87 samples/sec Loss 8.0361 LearningRate 0.0567 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:11,898-Speed 3416.75 samples/sec Loss 7.8517 LearningRate 0.0567 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:14,908-Speed 3403.14 samples/sec Loss 7.8080 LearningRate 0.0567 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:47:17,942-Speed 3375.63 samples/sec Loss 7.8429 LearningRate 0.0567 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:47:20,942-Speed 3414.18 samples/sec Loss 7.9784 LearningRate 0.0567 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:47:23,920-Speed 3439.88 samples/sec Loss 8.1426 LearningRate 0.0567 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:26,916-Speed 3418.23 samples/sec Loss 7.8636 LearningRate 0.0567 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:29,921-Speed 3408.87 samples/sec Loss 7.7171 LearningRate 0.0566 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:32,956-Speed 3375.68 samples/sec Loss 7.8285 LearningRate 0.0566 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:35,953-Speed 3416.93 samples/sec Loss 7.9650 LearningRate 0.0566 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:38,953-Speed 3414.28 samples/sec Loss 7.9996 LearningRate 0.0566 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:41,949-Speed 3419.00 samples/sec Loss 8.0439 LearningRate 0.0566 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:44,947-Speed 3415.62 samples/sec Loss 8.0714 LearningRate 0.0566 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:47,950-Speed 3411.29 samples/sec Loss 7.8529 LearningRate 0.0565 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:50,954-Speed 3409.83 samples/sec Loss 8.0884 LearningRate 0.0565 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:47:53,956-Speed 3412.31 samples/sec Loss 7.9054 LearningRate 0.0565 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:47:56,955-Speed 3414.94 samples/sec Loss 7.8863 LearningRate 0.0565 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:47:59,974-Speed 3392.81 samples/sec Loss 7.8643 LearningRate 0.0565 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:02,959-Speed 3431.01 samples/sec Loss 7.9486 LearningRate 0.0565 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:05,961-Speed 3412.18 samples/sec Loss 7.7737 LearningRate 0.0565 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:08,961-Speed 3413.94 samples/sec Loss 8.0480 LearningRate 0.0564 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:11,962-Speed 3412.94 samples/sec Loss 8.1008 LearningRate 0.0564 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:14,972-Speed 3402.60 samples/sec Loss 7.9399 LearningRate 0.0564 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:17,971-Speed 3415.96 samples/sec Loss 7.8763 LearningRate 0.0564 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:20,967-Speed 3418.17 samples/sec Loss 7.8744 LearningRate 0.0564 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:23,967-Speed 3414.83 samples/sec Loss 7.7873 LearningRate 0.0564 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:26,980-Speed 3399.89 samples/sec Loss 7.9565 LearningRate 0.0564 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:29,982-Speed 3412.24 samples/sec Loss 8.0266 LearningRate 0.0563 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:48:32,980-Speed 3416.69 samples/sec Loss 7.8999 LearningRate 0.0563 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:35,982-Speed 3410.82 samples/sec Loss 7.8591 LearningRate 0.0563 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:38,984-Speed 3412.39 samples/sec Loss 7.9845 LearningRate 0.0563 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:41,993-Speed 3404.22 samples/sec Loss 7.7760 LearningRate 0.0563 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:45,150-Speed 3244.41 samples/sec Loss 8.0723 LearningRate 0.0563 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:48:48,147-Speed 3417.59 samples/sec Loss 8.0390 LearningRate 0.0563 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:49:00,566-Speed 824.62 samples/sec Loss 7.1180 LearningRate 0.0562 Epoch: 5 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:03,602-Speed 3373.41 samples/sec Loss 7.2235 LearningRate 0.0562 Epoch: 5 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:06,626-Speed 3387.55 samples/sec Loss 7.2190 LearningRate 0.0562 Epoch: 5 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:09,644-Speed 3393.81 samples/sec Loss 7.1779 LearningRate 0.0562 Epoch: 5 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:12,662-Speed 3393.89 samples/sec Loss 7.1422 LearningRate 0.0562 Epoch: 5 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:15,659-Speed 3417.75 samples/sec Loss 7.1941 LearningRate 0.0562 Epoch: 5 Global Step: 25350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:18,675-Speed 3395.99 samples/sec Loss 7.2570 LearningRate 0.0561 Epoch: 5 Global Step: 25360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:21,677-Speed 3411.14 samples/sec Loss 7.1732 LearningRate 0.0561 Epoch: 5 Global Step: 25370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:24,679-Speed 3412.26 samples/sec Loss 7.3868 LearningRate 0.0561 Epoch: 5 Global Step: 25380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:27,694-Speed 3396.80 samples/sec Loss 7.1631 LearningRate 0.0561 Epoch: 5 Global Step: 25390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:30,696-Speed 3412.38 samples/sec Loss 7.2904 LearningRate 0.0561 Epoch: 5 Global Step: 25400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:33,706-Speed 3403.39 samples/sec Loss 7.2792 LearningRate 0.0561 Epoch: 5 Global Step: 25410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:36,711-Speed 3408.20 samples/sec Loss 7.1899 LearningRate 0.0561 Epoch: 5 Global Step: 25420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:39,709-Speed 3416.62 samples/sec Loss 7.1550 LearningRate 0.0560 Epoch: 5 Global Step: 25430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:42,707-Speed 3416.16 samples/sec Loss 7.1979 LearningRate 0.0560 Epoch: 5 Global Step: 25440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:49:45,712-Speed 3409.15 samples/sec Loss 7.2293 LearningRate 0.0560 Epoch: 5 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:48,708-Speed 3418.40 samples/sec Loss 7.4133 LearningRate 0.0560 Epoch: 5 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:51,716-Speed 3404.87 samples/sec Loss 7.3540 LearningRate 0.0560 Epoch: 5 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:54,717-Speed 3413.41 samples/sec Loss 7.3273 LearningRate 0.0560 Epoch: 5 Global Step: 25480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:49:57,715-Speed 3416.25 samples/sec Loss 7.3367 LearningRate 0.0560 Epoch: 5 Global Step: 25490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:00,795-Speed 3325.39 samples/sec Loss 7.2419 LearningRate 0.0559 Epoch: 5 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:03,798-Speed 3410.72 samples/sec Loss 7.1822 LearningRate 0.0559 Epoch: 5 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:06,802-Speed 3410.36 samples/sec Loss 7.2984 LearningRate 0.0559 Epoch: 5 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:09,798-Speed 3418.82 samples/sec Loss 7.2403 LearningRate 0.0559 Epoch: 5 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:12,798-Speed 3413.35 samples/sec Loss 7.3233 LearningRate 0.0559 Epoch: 5 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:15,809-Speed 3402.39 samples/sec Loss 7.2374 LearningRate 0.0559 Epoch: 5 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:18,804-Speed 3419.03 samples/sec Loss 7.1350 LearningRate 0.0559 Epoch: 5 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:21,802-Speed 3416.62 samples/sec Loss 7.2129 LearningRate 0.0558 Epoch: 5 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:24,811-Speed 3404.48 samples/sec Loss 7.3841 LearningRate 0.0558 Epoch: 5 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:27,812-Speed 3413.18 samples/sec Loss 7.3386 LearningRate 0.0558 Epoch: 5 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:30,815-Speed 3411.62 samples/sec Loss 7.3506 LearningRate 0.0558 Epoch: 5 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:33,824-Speed 3402.76 samples/sec Loss 7.5393 LearningRate 0.0558 Epoch: 5 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:36,831-Speed 3407.06 samples/sec Loss 7.3969 LearningRate 0.0558 Epoch: 5 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:39,853-Speed 3388.83 samples/sec Loss 7.2992 LearningRate 0.0557 Epoch: 5 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:42,858-Speed 3408.09 samples/sec Loss 7.3999 LearningRate 0.0557 Epoch: 5 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:45,846-Speed 3428.33 samples/sec Loss 7.5072 LearningRate 0.0557 Epoch: 5 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:48,848-Speed 3412.75 samples/sec Loss 7.4292 LearningRate 0.0557 Epoch: 5 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:51,847-Speed 3415.41 samples/sec Loss 7.4997 LearningRate 0.0557 Epoch: 5 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:50:54,831-Speed 3431.65 samples/sec Loss 7.4021 LearningRate 0.0557 Epoch: 5 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:50:57,853-Speed 3389.40 samples/sec Loss 7.4545 LearningRate 0.0557 Epoch: 5 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:00,870-Speed 3395.17 samples/sec Loss 7.5583 LearningRate 0.0556 Epoch: 5 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:03,897-Speed 3383.84 samples/sec Loss 7.5954 LearningRate 0.0556 Epoch: 5 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:06,906-Speed 3403.37 samples/sec Loss 7.3078 LearningRate 0.0556 Epoch: 5 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:09,916-Speed 3403.39 samples/sec Loss 7.3772 LearningRate 0.0556 Epoch: 5 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:12,918-Speed 3412.12 samples/sec Loss 7.4906 LearningRate 0.0556 Epoch: 5 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:15,927-Speed 3404.58 samples/sec Loss 7.3786 LearningRate 0.0556 Epoch: 5 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:18,939-Speed 3400.34 samples/sec Loss 7.5862 LearningRate 0.0556 Epoch: 5 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:21,943-Speed 3409.52 samples/sec Loss 7.2858 LearningRate 0.0555 Epoch: 5 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:51:24,944-Speed 3413.36 samples/sec Loss 7.4396 LearningRate 0.0555 Epoch: 5 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:27,954-Speed 3402.82 samples/sec Loss 7.5536 LearningRate 0.0555 Epoch: 5 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:30,958-Speed 3408.97 samples/sec Loss 7.5191 LearningRate 0.0555 Epoch: 5 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:33,959-Speed 3412.92 samples/sec Loss 7.5207 LearningRate 0.0555 Epoch: 5 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:36,958-Speed 3416.16 samples/sec Loss 7.5696 LearningRate 0.0555 Epoch: 5 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:39,957-Speed 3415.32 samples/sec Loss 7.4559 LearningRate 0.0555 Epoch: 5 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:42,967-Speed 3402.70 samples/sec Loss 7.5467 LearningRate 0.0554 Epoch: 5 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:45,969-Speed 3412.23 samples/sec Loss 7.3428 LearningRate 0.0554 Epoch: 5 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:48,971-Speed 3411.85 samples/sec Loss 7.5932 LearningRate 0.0554 Epoch: 5 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:51,973-Speed 3411.25 samples/sec Loss 7.3833 LearningRate 0.0554 Epoch: 5 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:54,972-Speed 3416.18 samples/sec Loss 7.6029 LearningRate 0.0554 Epoch: 5 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:51:57,973-Speed 3412.18 samples/sec Loss 7.7389 LearningRate 0.0554 Epoch: 5 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:00,982-Speed 3405.18 samples/sec Loss 7.4219 LearningRate 0.0553 Epoch: 5 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:03,984-Speed 3411.14 samples/sec Loss 7.5718 LearningRate 0.0553 Epoch: 5 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:06,994-Speed 3403.46 samples/sec Loss 7.6516 LearningRate 0.0553 Epoch: 5 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:09,998-Speed 3409.95 samples/sec Loss 7.4581 LearningRate 0.0553 Epoch: 5 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:12,999-Speed 3412.79 samples/sec Loss 7.4958 LearningRate 0.0553 Epoch: 5 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:15,999-Speed 3413.96 samples/sec Loss 7.6096 LearningRate 0.0553 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:19,009-Speed 3403.19 samples/sec Loss 7.5800 LearningRate 0.0553 Epoch: 5 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:22,009-Speed 3412.93 samples/sec Loss 7.5428 LearningRate 0.0552 Epoch: 5 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:24,991-Speed 3435.88 samples/sec Loss 7.4913 LearningRate 0.0552 Epoch: 5 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:28,001-Speed 3402.42 samples/sec Loss 7.6043 LearningRate 0.0552 Epoch: 5 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:52:31,001-Speed 3414.17 samples/sec Loss 7.5785 LearningRate 0.0552 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:53:15,416-[lfw][26000]XNorm: 23.011558 Training: 2022-04-11 01:53:15,417-[lfw][26000]Accuracy-Flip: 0.99683+-0.00241 Training: 2022-04-11 01:53:15,418-[lfw][26000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:54:07,014-[cfp_fp][26000]XNorm: 20.137223 Training: 2022-04-11 01:54:07,014-[cfp_fp][26000]Accuracy-Flip: 0.95486+-0.01184 Training: 2022-04-11 01:54:07,015-[cfp_fp][26000]Accuracy-Highest: 0.95629 Training: 2022-04-11 01:54:53,442-[agedb_30][26000]XNorm: 22.549544 Training: 2022-04-11 01:54:53,442-[agedb_30][26000]Accuracy-Flip: 0.97317+-0.00864 Training: 2022-04-11 01:54:53,443-[agedb_30][26000]Accuracy-Highest: 0.97450 Training: 2022-04-11 01:54:56,465-Speed 70.40 samples/sec Loss 7.5950 LearningRate 0.0552 Epoch: 5 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:54:59,452-Speed 3429.81 samples/sec Loss 7.4485 LearningRate 0.0552 Epoch: 5 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:02,438-Speed 3430.34 samples/sec Loss 7.7481 LearningRate 0.0552 Epoch: 5 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:05,419-Speed 3436.26 samples/sec Loss 7.6064 LearningRate 0.0551 Epoch: 5 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:08,411-Speed 3422.91 samples/sec Loss 7.5792 LearningRate 0.0551 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:11,395-Speed 3432.45 samples/sec Loss 7.6063 LearningRate 0.0551 Epoch: 5 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:14,375-Speed 3436.74 samples/sec Loss 7.5860 LearningRate 0.0551 Epoch: 5 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:17,355-Speed 3436.52 samples/sec Loss 7.5786 LearningRate 0.0551 Epoch: 5 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:20,342-Speed 3430.45 samples/sec Loss 7.7716 LearningRate 0.0551 Epoch: 5 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:23,331-Speed 3426.34 samples/sec Loss 7.5577 LearningRate 0.0551 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:26,319-Speed 3428.01 samples/sec Loss 7.6029 LearningRate 0.0550 Epoch: 5 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:29,312-Speed 3422.81 samples/sec Loss 7.6474 LearningRate 0.0550 Epoch: 5 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:55:32,267-Speed 3465.98 samples/sec Loss 7.5822 LearningRate 0.0550 Epoch: 5 Global Step: 26130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:35,281-Speed 3397.81 samples/sec Loss 7.5879 LearningRate 0.0550 Epoch: 5 Global Step: 26140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:38,271-Speed 3426.15 samples/sec Loss 7.3217 LearningRate 0.0550 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:41,264-Speed 3422.18 samples/sec Loss 7.5876 LearningRate 0.0550 Epoch: 5 Global Step: 26160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:44,259-Speed 3419.52 samples/sec Loss 7.5292 LearningRate 0.0550 Epoch: 5 Global Step: 26170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:47,250-Speed 3423.84 samples/sec Loss 7.7101 LearningRate 0.0549 Epoch: 5 Global Step: 26180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:50,255-Speed 3408.57 samples/sec Loss 7.5154 LearningRate 0.0549 Epoch: 5 Global Step: 26190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:53,249-Speed 3421.76 samples/sec Loss 7.6260 LearningRate 0.0549 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:56,241-Speed 3423.62 samples/sec Loss 7.5149 LearningRate 0.0549 Epoch: 5 Global Step: 26210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:55:59,247-Speed 3407.34 samples/sec Loss 7.5823 LearningRate 0.0549 Epoch: 5 Global Step: 26220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 01:56:02,242-Speed 3419.14 samples/sec Loss 7.6630 LearningRate 0.0549 Epoch: 5 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:05,236-Speed 3422.09 samples/sec Loss 7.5494 LearningRate 0.0549 Epoch: 5 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:08,277-Speed 3367.38 samples/sec Loss 7.5700 LearningRate 0.0548 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:11,270-Speed 3422.99 samples/sec Loss 7.7256 LearningRate 0.0548 Epoch: 5 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:14,277-Speed 3405.00 samples/sec Loss 7.6175 LearningRate 0.0548 Epoch: 5 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:17,274-Speed 3417.66 samples/sec Loss 7.6450 LearningRate 0.0548 Epoch: 5 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:20,277-Speed 3411.18 samples/sec Loss 7.5899 LearningRate 0.0548 Epoch: 5 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:23,282-Speed 3408.86 samples/sec Loss 7.6892 LearningRate 0.0548 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:26,278-Speed 3418.80 samples/sec Loss 7.5807 LearningRate 0.0547 Epoch: 5 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:29,271-Speed 3422.99 samples/sec Loss 7.5893 LearningRate 0.0547 Epoch: 5 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:32,262-Speed 3423.75 samples/sec Loss 7.5949 LearningRate 0.0547 Epoch: 5 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:56:35,286-Speed 3386.86 samples/sec Loss 7.6107 LearningRate 0.0547 Epoch: 5 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:56:38,329-Speed 3365.54 samples/sec Loss 7.6067 LearningRate 0.0547 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:41,332-Speed 3411.87 samples/sec Loss 7.8271 LearningRate 0.0547 Epoch: 5 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:44,321-Speed 3426.63 samples/sec Loss 7.5850 LearningRate 0.0547 Epoch: 5 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:47,314-Speed 3421.36 samples/sec Loss 7.6538 LearningRate 0.0546 Epoch: 5 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:50,303-Speed 3427.13 samples/sec Loss 7.5386 LearningRate 0.0546 Epoch: 5 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:53,297-Speed 3420.89 samples/sec Loss 7.4753 LearningRate 0.0546 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:56,289-Speed 3424.23 samples/sec Loss 7.6863 LearningRate 0.0546 Epoch: 5 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:56:59,322-Speed 3375.95 samples/sec Loss 7.5095 LearningRate 0.0546 Epoch: 5 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:02,438-Speed 3287.84 samples/sec Loss 7.5814 LearningRate 0.0546 Epoch: 5 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:05,429-Speed 3423.93 samples/sec Loss 7.5560 LearningRate 0.0546 Epoch: 5 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:08,420-Speed 3425.01 samples/sec Loss 7.6255 LearningRate 0.0545 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:11,411-Speed 3424.07 samples/sec Loss 7.6326 LearningRate 0.0545 Epoch: 5 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:14,403-Speed 3422.88 samples/sec Loss 7.5170 LearningRate 0.0545 Epoch: 5 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:17,416-Speed 3400.24 samples/sec Loss 7.6299 LearningRate 0.0545 Epoch: 5 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:20,412-Speed 3419.14 samples/sec Loss 7.6883 LearningRate 0.0545 Epoch: 5 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:23,402-Speed 3424.87 samples/sec Loss 7.7712 LearningRate 0.0545 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:26,396-Speed 3421.52 samples/sec Loss 7.6825 LearningRate 0.0545 Epoch: 5 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:29,387-Speed 3424.10 samples/sec Loss 7.6384 LearningRate 0.0544 Epoch: 5 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:32,379-Speed 3423.96 samples/sec Loss 7.7346 LearningRate 0.0544 Epoch: 5 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:35,368-Speed 3427.00 samples/sec Loss 7.5946 LearningRate 0.0544 Epoch: 5 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:38,341-Speed 3444.17 samples/sec Loss 7.6649 LearningRate 0.0544 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:41,332-Speed 3425.26 samples/sec Loss 7.4531 LearningRate 0.0544 Epoch: 5 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:44,328-Speed 3418.96 samples/sec Loss 7.7380 LearningRate 0.0544 Epoch: 5 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:47,319-Speed 3424.30 samples/sec Loss 7.6452 LearningRate 0.0544 Epoch: 5 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:57:50,301-Speed 3434.56 samples/sec Loss 7.5763 LearningRate 0.0543 Epoch: 5 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:53,296-Speed 3420.11 samples/sec Loss 7.7034 LearningRate 0.0543 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:56,290-Speed 3421.27 samples/sec Loss 7.7500 LearningRate 0.0543 Epoch: 5 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:57:59,285-Speed 3419.66 samples/sec Loss 7.7469 LearningRate 0.0543 Epoch: 5 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:02,280-Speed 3419.72 samples/sec Loss 7.5980 LearningRate 0.0543 Epoch: 5 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:05,276-Speed 3418.69 samples/sec Loss 7.7072 LearningRate 0.0543 Epoch: 5 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:08,267-Speed 3424.02 samples/sec Loss 7.7464 LearningRate 0.0543 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:11,263-Speed 3419.61 samples/sec Loss 7.6570 LearningRate 0.0542 Epoch: 5 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:14,255-Speed 3423.72 samples/sec Loss 7.7021 LearningRate 0.0542 Epoch: 5 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:17,246-Speed 3424.43 samples/sec Loss 7.7862 LearningRate 0.0542 Epoch: 5 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:20,236-Speed 3425.51 samples/sec Loss 7.7220 LearningRate 0.0542 Epoch: 5 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:58:23,234-Speed 3416.72 samples/sec Loss 7.5915 LearningRate 0.0542 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:58:26,225-Speed 3423.58 samples/sec Loss 7.4586 LearningRate 0.0542 Epoch: 5 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:58:29,216-Speed 3424.36 samples/sec Loss 7.5288 LearningRate 0.0541 Epoch: 5 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:58:32,197-Speed 3435.85 samples/sec Loss 7.4810 LearningRate 0.0541 Epoch: 5 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:35,218-Speed 3391.21 samples/sec Loss 7.5506 LearningRate 0.0541 Epoch: 5 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:38,213-Speed 3419.36 samples/sec Loss 7.5843 LearningRate 0.0541 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:41,207-Speed 3421.91 samples/sec Loss 7.5036 LearningRate 0.0541 Epoch: 5 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:44,205-Speed 3416.22 samples/sec Loss 7.7655 LearningRate 0.0541 Epoch: 5 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:47,198-Speed 3422.02 samples/sec Loss 7.5448 LearningRate 0.0541 Epoch: 5 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:50,197-Speed 3415.90 samples/sec Loss 7.8315 LearningRate 0.0540 Epoch: 5 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:53,195-Speed 3415.41 samples/sec Loss 7.5259 LearningRate 0.0540 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:56,204-Speed 3405.10 samples/sec Loss 7.6723 LearningRate 0.0540 Epoch: 5 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:58:59,200-Speed 3417.50 samples/sec Loss 7.6938 LearningRate 0.0540 Epoch: 5 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:02,208-Speed 3405.69 samples/sec Loss 7.5911 LearningRate 0.0540 Epoch: 5 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:05,210-Speed 3411.44 samples/sec Loss 7.5920 LearningRate 0.0540 Epoch: 5 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:08,218-Speed 3405.70 samples/sec Loss 7.6419 LearningRate 0.0540 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:11,216-Speed 3416.89 samples/sec Loss 7.5977 LearningRate 0.0539 Epoch: 5 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:14,207-Speed 3423.79 samples/sec Loss 7.7385 LearningRate 0.0539 Epoch: 5 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:17,203-Speed 3418.97 samples/sec Loss 7.7293 LearningRate 0.0539 Epoch: 5 Global Step: 26880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:20,201-Speed 3417.04 samples/sec Loss 7.6516 LearningRate 0.0539 Epoch: 5 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:23,197-Speed 3418.74 samples/sec Loss 7.6314 LearningRate 0.0539 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:26,201-Speed 3409.47 samples/sec Loss 7.7037 LearningRate 0.0539 Epoch: 5 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:29,208-Speed 3406.47 samples/sec Loss 7.6462 LearningRate 0.0539 Epoch: 5 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:32,194-Speed 3429.46 samples/sec Loss 7.5068 LearningRate 0.0538 Epoch: 5 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:35,187-Speed 3422.72 samples/sec Loss 7.6279 LearningRate 0.0538 Epoch: 5 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:38,189-Speed 3410.85 samples/sec Loss 7.6775 LearningRate 0.0538 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 01:59:41,181-Speed 3424.13 samples/sec Loss 7.5213 LearningRate 0.0538 Epoch: 5 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:44,177-Speed 3418.96 samples/sec Loss 7.4821 LearningRate 0.0538 Epoch: 5 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:47,186-Speed 3403.53 samples/sec Loss 7.5119 LearningRate 0.0538 Epoch: 5 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:50,180-Speed 3421.83 samples/sec Loss 7.5132 LearningRate 0.0538 Epoch: 5 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:53,182-Speed 3411.59 samples/sec Loss 7.6166 LearningRate 0.0537 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:56,178-Speed 3418.82 samples/sec Loss 7.5429 LearningRate 0.0537 Epoch: 5 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 01:59:59,176-Speed 3416.48 samples/sec Loss 7.5429 LearningRate 0.0537 Epoch: 5 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:02,173-Speed 3416.93 samples/sec Loss 7.4819 LearningRate 0.0537 Epoch: 5 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:05,223-Speed 3358.80 samples/sec Loss 7.6019 LearningRate 0.0537 Epoch: 5 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:08,220-Speed 3417.61 samples/sec Loss 7.6951 LearningRate 0.0537 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:11,215-Speed 3419.65 samples/sec Loss 7.6940 LearningRate 0.0537 Epoch: 5 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:14,218-Speed 3410.76 samples/sec Loss 7.5173 LearningRate 0.0536 Epoch: 5 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:17,219-Speed 3413.05 samples/sec Loss 7.6966 LearningRate 0.0536 Epoch: 5 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:20,218-Speed 3416.14 samples/sec Loss 7.5447 LearningRate 0.0536 Epoch: 5 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:23,214-Speed 3417.90 samples/sec Loss 7.5284 LearningRate 0.0536 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:26,196-Speed 3434.81 samples/sec Loss 7.5718 LearningRate 0.0536 Epoch: 5 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:29,256-Speed 3347.47 samples/sec Loss 7.8165 LearningRate 0.0536 Epoch: 5 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:32,264-Speed 3405.28 samples/sec Loss 7.7398 LearningRate 0.0536 Epoch: 5 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:35,259-Speed 3420.34 samples/sec Loss 7.7466 LearningRate 0.0535 Epoch: 5 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:38,258-Speed 3414.59 samples/sec Loss 7.5838 LearningRate 0.0535 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:41,255-Speed 3418.32 samples/sec Loss 7.8178 LearningRate 0.0535 Epoch: 5 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:44,255-Speed 3414.08 samples/sec Loss 7.6197 LearningRate 0.0535 Epoch: 5 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:47,257-Speed 3411.25 samples/sec Loss 7.7134 LearningRate 0.0535 Epoch: 5 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:50,266-Speed 3404.63 samples/sec Loss 7.4917 LearningRate 0.0535 Epoch: 5 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:53,268-Speed 3411.08 samples/sec Loss 7.5598 LearningRate 0.0535 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:00:56,272-Speed 3410.77 samples/sec Loss 7.4395 LearningRate 0.0534 Epoch: 5 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:00:59,268-Speed 3418.01 samples/sec Loss 7.6845 LearningRate 0.0534 Epoch: 5 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:02,278-Speed 3402.74 samples/sec Loss 7.5102 LearningRate 0.0534 Epoch: 5 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:05,276-Speed 3417.25 samples/sec Loss 7.6886 LearningRate 0.0534 Epoch: 5 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:08,282-Speed 3406.99 samples/sec Loss 7.6868 LearningRate 0.0534 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:11,280-Speed 3417.37 samples/sec Loss 7.8530 LearningRate 0.0534 Epoch: 5 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:14,286-Speed 3406.91 samples/sec Loss 7.4379 LearningRate 0.0534 Epoch: 5 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:17,287-Speed 3412.29 samples/sec Loss 7.5006 LearningRate 0.0533 Epoch: 5 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:20,286-Speed 3415.52 samples/sec Loss 7.5749 LearningRate 0.0533 Epoch: 5 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:23,289-Speed 3410.26 samples/sec Loss 7.8077 LearningRate 0.0533 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:26,271-Speed 3435.36 samples/sec Loss 7.6745 LearningRate 0.0533 Epoch: 5 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:29,270-Speed 3415.16 samples/sec Loss 7.6732 LearningRate 0.0533 Epoch: 5 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:32,267-Speed 3417.37 samples/sec Loss 7.5390 LearningRate 0.0533 Epoch: 5 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:35,281-Speed 3399.43 samples/sec Loss 7.5220 LearningRate 0.0533 Epoch: 5 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:38,281-Speed 3414.07 samples/sec Loss 7.6005 LearningRate 0.0532 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:41,277-Speed 3418.85 samples/sec Loss 7.6781 LearningRate 0.0532 Epoch: 5 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:01:44,262-Speed 3431.22 samples/sec Loss 7.6209 LearningRate 0.0532 Epoch: 5 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:01:47,257-Speed 3419.57 samples/sec Loss 7.5389 LearningRate 0.0532 Epoch: 5 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:01:50,256-Speed 3416.00 samples/sec Loss 7.4676 LearningRate 0.0532 Epoch: 5 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:01:53,259-Speed 3410.38 samples/sec Loss 7.6812 LearningRate 0.0532 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 02:01:56,262-Speed 3410.43 samples/sec Loss 7.5013 LearningRate 0.0532 Epoch: 5 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:01:59,265-Speed 3410.59 samples/sec Loss 7.5147 LearningRate 0.0531 Epoch: 5 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:02,264-Speed 3415.36 samples/sec Loss 7.5921 LearningRate 0.0531 Epoch: 5 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:05,269-Speed 3409.88 samples/sec Loss 7.5974 LearningRate 0.0531 Epoch: 5 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:08,268-Speed 3415.24 samples/sec Loss 7.4281 LearningRate 0.0531 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:11,269-Speed 3413.19 samples/sec Loss 7.5297 LearningRate 0.0531 Epoch: 5 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:14,270-Speed 3412.38 samples/sec Loss 7.6231 LearningRate 0.0531 Epoch: 5 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:02:17,276-Speed 3407.55 samples/sec Loss 7.5609 LearningRate 0.0530 Epoch: 5 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:02:20,314-Speed 3371.63 samples/sec Loss 7.6311 LearningRate 0.0530 Epoch: 5 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:02:23,302-Speed 3427.57 samples/sec Loss 7.3885 LearningRate 0.0530 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:26,308-Speed 3408.60 samples/sec Loss 7.6001 LearningRate 0.0530 Epoch: 5 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:29,312-Speed 3409.81 samples/sec Loss 7.6572 LearningRate 0.0530 Epoch: 5 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:32,316-Speed 3408.68 samples/sec Loss 7.5806 LearningRate 0.0530 Epoch: 5 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:35,327-Speed 3402.86 samples/sec Loss 7.4960 LearningRate 0.0530 Epoch: 5 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:38,331-Speed 3408.88 samples/sec Loss 7.4848 LearningRate 0.0529 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:41,343-Speed 3400.92 samples/sec Loss 7.5800 LearningRate 0.0529 Epoch: 5 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:44,343-Speed 3414.07 samples/sec Loss 7.5389 LearningRate 0.0529 Epoch: 5 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:47,343-Speed 3413.95 samples/sec Loss 7.3330 LearningRate 0.0529 Epoch: 5 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:50,343-Speed 3414.65 samples/sec Loss 7.5281 LearningRate 0.0529 Epoch: 5 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:02:53,368-Speed 3384.99 samples/sec Loss 7.6358 LearningRate 0.0529 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:02:56,378-Speed 3403.15 samples/sec Loss 7.5665 LearningRate 0.0529 Epoch: 5 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:02:59,418-Speed 3369.40 samples/sec Loss 7.5649 LearningRate 0.0528 Epoch: 5 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:02,482-Speed 3343.06 samples/sec Loss 7.5445 LearningRate 0.0528 Epoch: 5 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:05,495-Speed 3400.05 samples/sec Loss 7.7302 LearningRate 0.0528 Epoch: 5 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:08,498-Speed 3410.86 samples/sec Loss 7.4120 LearningRate 0.0528 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:11,506-Speed 3405.36 samples/sec Loss 7.5382 LearningRate 0.0528 Epoch: 5 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:14,507-Speed 3412.85 samples/sec Loss 7.4046 LearningRate 0.0528 Epoch: 5 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:17,519-Speed 3399.36 samples/sec Loss 7.5903 LearningRate 0.0528 Epoch: 5 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:03:20,502-Speed 3434.57 samples/sec Loss 7.5748 LearningRate 0.0527 Epoch: 5 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:03:23,487-Speed 3431.19 samples/sec Loss 7.7678 LearningRate 0.0527 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:26,492-Speed 3408.69 samples/sec Loss 7.5558 LearningRate 0.0527 Epoch: 5 Global Step: 27710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:29,492-Speed 3414.34 samples/sec Loss 7.5726 LearningRate 0.0527 Epoch: 5 Global Step: 27720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:32,495-Speed 3410.30 samples/sec Loss 7.6581 LearningRate 0.0527 Epoch: 5 Global Step: 27730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:35,496-Speed 3413.63 samples/sec Loss 7.6419 LearningRate 0.0527 Epoch: 5 Global Step: 27740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:38,500-Speed 3409.54 samples/sec Loss 7.5809 LearningRate 0.0527 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:41,508-Speed 3404.85 samples/sec Loss 7.5171 LearningRate 0.0526 Epoch: 5 Global Step: 27760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:44,505-Speed 3418.08 samples/sec Loss 7.6495 LearningRate 0.0526 Epoch: 5 Global Step: 27770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:47,509-Speed 3409.53 samples/sec Loss 7.5962 LearningRate 0.0526 Epoch: 5 Global Step: 27780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:50,508-Speed 3414.66 samples/sec Loss 7.4522 LearningRate 0.0526 Epoch: 5 Global Step: 27790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:03:53,511-Speed 3411.90 samples/sec Loss 7.4960 LearningRate 0.0526 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:03:56,509-Speed 3416.37 samples/sec Loss 7.4926 LearningRate 0.0526 Epoch: 5 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:03:59,508-Speed 3415.57 samples/sec Loss 7.6032 LearningRate 0.0526 Epoch: 5 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:02,516-Speed 3404.38 samples/sec Loss 7.8116 LearningRate 0.0525 Epoch: 5 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:05,519-Speed 3411.35 samples/sec Loss 7.6072 LearningRate 0.0525 Epoch: 5 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:08,524-Speed 3407.69 samples/sec Loss 7.6506 LearningRate 0.0525 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:11,537-Speed 3399.70 samples/sec Loss 7.6094 LearningRate 0.0525 Epoch: 5 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:14,548-Speed 3401.65 samples/sec Loss 7.6403 LearningRate 0.0525 Epoch: 5 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:17,547-Speed 3415.11 samples/sec Loss 7.5437 LearningRate 0.0525 Epoch: 5 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:20,547-Speed 3415.35 samples/sec Loss 7.5309 LearningRate 0.0525 Epoch: 5 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:04:23,543-Speed 3417.73 samples/sec Loss 7.6794 LearningRate 0.0524 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:26,544-Speed 3413.70 samples/sec Loss 7.5285 LearningRate 0.0524 Epoch: 5 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:29,544-Speed 3413.45 samples/sec Loss 7.5608 LearningRate 0.0524 Epoch: 5 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:32,545-Speed 3413.06 samples/sec Loss 7.6424 LearningRate 0.0524 Epoch: 5 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:35,548-Speed 3411.64 samples/sec Loss 7.5332 LearningRate 0.0524 Epoch: 5 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:38,550-Speed 3411.62 samples/sec Loss 7.4442 LearningRate 0.0524 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:41,548-Speed 3416.10 samples/sec Loss 7.8281 LearningRate 0.0524 Epoch: 5 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:44,562-Speed 3398.57 samples/sec Loss 7.6979 LearningRate 0.0523 Epoch: 5 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:47,563-Speed 3413.27 samples/sec Loss 7.5591 LearningRate 0.0523 Epoch: 5 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:50,561-Speed 3416.32 samples/sec Loss 7.6775 LearningRate 0.0523 Epoch: 5 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:04:53,544-Speed 3433.31 samples/sec Loss 7.5931 LearningRate 0.0523 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:05:37,910-[lfw][28000]XNorm: 21.480269 Training: 2022-04-11 02:05:37,910-[lfw][28000]Accuracy-Flip: 0.99717+-0.00211 Training: 2022-04-11 02:05:37,911-[lfw][28000]Accuracy-Highest: 0.99717 Training: 2022-04-11 02:06:29,598-[cfp_fp][28000]XNorm: 18.856448 Training: 2022-04-11 02:06:29,599-[cfp_fp][28000]Accuracy-Flip: 0.95957+-0.01236 Training: 2022-04-11 02:06:29,599-[cfp_fp][28000]Accuracy-Highest: 0.95957 Training: 2022-04-11 02:07:13,656-[agedb_30][28000]XNorm: 21.347683 Training: 2022-04-11 02:07:13,656-[agedb_30][28000]Accuracy-Flip: 0.97567+-0.00655 Training: 2022-04-11 02:07:13,657-[agedb_30][28000]Accuracy-Highest: 0.97567 Training: 2022-04-11 02:07:16,657-Speed 71.55 samples/sec Loss 7.4783 LearningRate 0.0523 Epoch: 5 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:19,648-Speed 3424.19 samples/sec Loss 7.6376 LearningRate 0.0523 Epoch: 5 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:22,628-Speed 3436.60 samples/sec Loss 7.5862 LearningRate 0.0523 Epoch: 5 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:25,622-Speed 3421.57 samples/sec Loss 7.6655 LearningRate 0.0522 Epoch: 5 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:28,608-Speed 3430.18 samples/sec Loss 7.4957 LearningRate 0.0522 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:31,594-Speed 3430.90 samples/sec Loss 7.4441 LearningRate 0.0522 Epoch: 5 Global Step: 28060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:34,594-Speed 3413.55 samples/sec Loss 7.4630 LearningRate 0.0522 Epoch: 5 Global Step: 28070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:37,579-Speed 3431.47 samples/sec Loss 7.4904 LearningRate 0.0522 Epoch: 5 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 02:07:40,569-Speed 3425.62 samples/sec Loss 7.6452 LearningRate 0.0522 Epoch: 5 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:43,547-Speed 3439.49 samples/sec Loss 7.4881 LearningRate 0.0522 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:46,550-Speed 3410.68 samples/sec Loss 7.4578 LearningRate 0.0521 Epoch: 5 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:49,587-Speed 3372.52 samples/sec Loss 7.5565 LearningRate 0.0521 Epoch: 5 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:52,583-Speed 3418.80 samples/sec Loss 7.5536 LearningRate 0.0521 Epoch: 5 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:55,574-Speed 3424.16 samples/sec Loss 7.5144 LearningRate 0.0521 Epoch: 5 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:07:58,567-Speed 3422.87 samples/sec Loss 7.5238 LearningRate 0.0521 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:08:01,559-Speed 3423.16 samples/sec Loss 7.4874 LearningRate 0.0521 Epoch: 5 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:08:04,582-Speed 3388.77 samples/sec Loss 7.4048 LearningRate 0.0521 Epoch: 5 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:08:07,581-Speed 3414.13 samples/sec Loss 7.5066 LearningRate 0.0520 Epoch: 5 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:08:10,537-Speed 3465.63 samples/sec Loss 7.5309 LearningRate 0.0520 Epoch: 5 Global Step: 28190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:13,537-Speed 3413.75 samples/sec Loss 7.6161 LearningRate 0.0520 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:16,576-Speed 3371.24 samples/sec Loss 7.5343 LearningRate 0.0520 Epoch: 5 Global Step: 28210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:19,570-Speed 3421.07 samples/sec Loss 7.5227 LearningRate 0.0520 Epoch: 5 Global Step: 28220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:22,561-Speed 3423.41 samples/sec Loss 7.5855 LearningRate 0.0520 Epoch: 5 Global Step: 28230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:25,555-Speed 3422.41 samples/sec Loss 7.5310 LearningRate 0.0520 Epoch: 5 Global Step: 28240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:28,551-Speed 3417.91 samples/sec Loss 7.5416 LearningRate 0.0519 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:31,543-Speed 3423.94 samples/sec Loss 7.7062 LearningRate 0.0519 Epoch: 5 Global Step: 28260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:34,539-Speed 3418.05 samples/sec Loss 7.5359 LearningRate 0.0519 Epoch: 5 Global Step: 28270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:37,615-Speed 3330.07 samples/sec Loss 7.4515 LearningRate 0.0519 Epoch: 5 Global Step: 28280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:08:40,611-Speed 3419.16 samples/sec Loss 7.4714 LearningRate 0.0519 Epoch: 5 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:43,604-Speed 3422.31 samples/sec Loss 7.4031 LearningRate 0.0519 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:46,603-Speed 3415.78 samples/sec Loss 7.5017 LearningRate 0.0519 Epoch: 5 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:49,604-Speed 3411.93 samples/sec Loss 7.3855 LearningRate 0.0518 Epoch: 5 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:52,626-Speed 3389.36 samples/sec Loss 7.6031 LearningRate 0.0518 Epoch: 5 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:55,627-Speed 3413.52 samples/sec Loss 7.6553 LearningRate 0.0518 Epoch: 5 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:08:58,623-Speed 3418.80 samples/sec Loss 7.5509 LearningRate 0.0518 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:09:01,621-Speed 3416.74 samples/sec Loss 7.3643 LearningRate 0.0518 Epoch: 5 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:09:04,614-Speed 3421.75 samples/sec Loss 7.6499 LearningRate 0.0518 Epoch: 5 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:09:07,610-Speed 3418.55 samples/sec Loss 7.4525 LearningRate 0.0518 Epoch: 5 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:09:10,607-Speed 3417.99 samples/sec Loss 7.7112 LearningRate 0.0517 Epoch: 5 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:13,609-Speed 3411.63 samples/sec Loss 7.4382 LearningRate 0.0517 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:16,612-Speed 3411.22 samples/sec Loss 7.6049 LearningRate 0.0517 Epoch: 5 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:19,616-Speed 3409.87 samples/sec Loss 7.3621 LearningRate 0.0517 Epoch: 5 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:22,617-Speed 3413.77 samples/sec Loss 7.4725 LearningRate 0.0517 Epoch: 5 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:25,621-Speed 3410.00 samples/sec Loss 7.4326 LearningRate 0.0517 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:28,625-Speed 3408.82 samples/sec Loss 7.4709 LearningRate 0.0517 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:31,626-Speed 3413.78 samples/sec Loss 7.6272 LearningRate 0.0516 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:34,623-Speed 3417.59 samples/sec Loss 7.6563 LearningRate 0.0516 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:37,621-Speed 3415.73 samples/sec Loss 7.3805 LearningRate 0.0516 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:40,612-Speed 3425.16 samples/sec Loss 7.6108 LearningRate 0.0516 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:43,611-Speed 3415.32 samples/sec Loss 7.4282 LearningRate 0.0516 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:46,610-Speed 3414.96 samples/sec Loss 7.4202 LearningRate 0.0516 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:49,608-Speed 3417.00 samples/sec Loss 7.6234 LearningRate 0.0516 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:52,605-Speed 3417.09 samples/sec Loss 7.4859 LearningRate 0.0515 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:55,609-Speed 3410.33 samples/sec Loss 7.4984 LearningRate 0.0515 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:09:58,604-Speed 3419.67 samples/sec Loss 7.3700 LearningRate 0.0515 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:10:01,600-Speed 3418.31 samples/sec Loss 7.4708 LearningRate 0.0515 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:04,618-Speed 3394.37 samples/sec Loss 7.5726 LearningRate 0.0515 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:07,616-Speed 3415.72 samples/sec Loss 7.4566 LearningRate 0.0515 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:10,615-Speed 3416.01 samples/sec Loss 7.4888 LearningRate 0.0515 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:13,612-Speed 3417.81 samples/sec Loss 7.5763 LearningRate 0.0514 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:16,611-Speed 3415.41 samples/sec Loss 7.4584 LearningRate 0.0514 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:19,613-Speed 3412.02 samples/sec Loss 7.5881 LearningRate 0.0514 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:22,609-Speed 3418.66 samples/sec Loss 7.4796 LearningRate 0.0514 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:25,609-Speed 3413.65 samples/sec Loss 7.4459 LearningRate 0.0514 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:28,618-Speed 3403.65 samples/sec Loss 7.3396 LearningRate 0.0514 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:31,600-Speed 3434.99 samples/sec Loss 7.5305 LearningRate 0.0514 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:34,602-Speed 3412.26 samples/sec Loss 7.6061 LearningRate 0.0513 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:37,628-Speed 3384.80 samples/sec Loss 7.6412 LearningRate 0.0513 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:40,634-Speed 3408.79 samples/sec Loss 7.4627 LearningRate 0.0513 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:43,632-Speed 3416.57 samples/sec Loss 7.3533 LearningRate 0.0513 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:46,630-Speed 3417.14 samples/sec Loss 7.4333 LearningRate 0.0513 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:49,626-Speed 3418.02 samples/sec Loss 7.4029 LearningRate 0.0513 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:52,623-Speed 3417.30 samples/sec Loss 7.4877 LearningRate 0.0513 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:55,622-Speed 3416.19 samples/sec Loss 7.4587 LearningRate 0.0513 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:10:58,631-Speed 3403.32 samples/sec Loss 7.4498 LearningRate 0.0512 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:01,639-Speed 3405.53 samples/sec Loss 7.4705 LearningRate 0.0512 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:04,649-Speed 3402.92 samples/sec Loss 7.3652 LearningRate 0.0512 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:07,648-Speed 3415.82 samples/sec Loss 7.5006 LearningRate 0.0512 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:10,645-Speed 3417.51 samples/sec Loss 7.3653 LearningRate 0.0512 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:13,660-Speed 3397.57 samples/sec Loss 7.4809 LearningRate 0.0512 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:16,698-Speed 3370.86 samples/sec Loss 7.4762 LearningRate 0.0512 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:19,697-Speed 3415.98 samples/sec Loss 7.4793 LearningRate 0.0511 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:22,698-Speed 3411.97 samples/sec Loss 7.4180 LearningRate 0.0511 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:25,755-Speed 3351.64 samples/sec Loss 7.5853 LearningRate 0.0511 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:28,828-Speed 3333.23 samples/sec Loss 7.3584 LearningRate 0.0511 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:31,809-Speed 3435.32 samples/sec Loss 7.5675 LearningRate 0.0511 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:11:34,788-Speed 3438.51 samples/sec Loss 7.4006 LearningRate 0.0511 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:37,795-Speed 3406.32 samples/sec Loss 7.6090 LearningRate 0.0511 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:40,796-Speed 3413.76 samples/sec Loss 7.4839 LearningRate 0.0510 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:43,799-Speed 3410.43 samples/sec Loss 7.4718 LearningRate 0.0510 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:46,801-Speed 3412.12 samples/sec Loss 7.5304 LearningRate 0.0510 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:49,816-Speed 3397.11 samples/sec Loss 7.4334 LearningRate 0.0510 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:52,822-Speed 3406.85 samples/sec Loss 7.5007 LearningRate 0.0510 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:55,826-Speed 3409.92 samples/sec Loss 7.3768 LearningRate 0.0510 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:11:58,837-Speed 3401.79 samples/sec Loss 7.5722 LearningRate 0.0510 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:01,843-Speed 3407.63 samples/sec Loss 7.5209 LearningRate 0.0509 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:04,847-Speed 3409.24 samples/sec Loss 7.6181 LearningRate 0.0509 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:07,839-Speed 3423.23 samples/sec Loss 7.3641 LearningRate 0.0509 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:10,877-Speed 3372.43 samples/sec Loss 7.4174 LearningRate 0.0509 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:13,877-Speed 3413.89 samples/sec Loss 7.3918 LearningRate 0.0509 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:16,883-Speed 3407.06 samples/sec Loss 7.4871 LearningRate 0.0509 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:19,886-Speed 3410.14 samples/sec Loss 7.4654 LearningRate 0.0509 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:22,894-Speed 3405.02 samples/sec Loss 7.3066 LearningRate 0.0508 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:25,898-Speed 3410.87 samples/sec Loss 7.4233 LearningRate 0.0508 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:28,904-Speed 3407.09 samples/sec Loss 7.4627 LearningRate 0.0508 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:31,911-Speed 3406.33 samples/sec Loss 7.4591 LearningRate 0.0508 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:34,916-Speed 3408.90 samples/sec Loss 7.5005 LearningRate 0.0508 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:12:37,918-Speed 3411.32 samples/sec Loss 7.3919 LearningRate 0.0508 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:40,924-Speed 3407.36 samples/sec Loss 7.5376 LearningRate 0.0508 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:43,929-Speed 3408.15 samples/sec Loss 7.5011 LearningRate 0.0507 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:46,934-Speed 3408.90 samples/sec Loss 7.4278 LearningRate 0.0507 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:49,948-Speed 3398.25 samples/sec Loss 7.3353 LearningRate 0.0507 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:52,949-Speed 3413.15 samples/sec Loss 7.4788 LearningRate 0.0507 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:55,959-Speed 3403.69 samples/sec Loss 7.4083 LearningRate 0.0507 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:12:58,969-Speed 3403.01 samples/sec Loss 7.3395 LearningRate 0.0507 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:01,973-Speed 3409.08 samples/sec Loss 7.3947 LearningRate 0.0507 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:04,978-Speed 3408.04 samples/sec Loss 7.4151 LearningRate 0.0506 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:07,967-Speed 3427.52 samples/sec Loss 7.5552 LearningRate 0.0506 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:10,965-Speed 3415.77 samples/sec Loss 7.3256 LearningRate 0.0506 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:13,980-Speed 3397.05 samples/sec Loss 7.3088 LearningRate 0.0506 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:16,990-Speed 3403.69 samples/sec Loss 7.4179 LearningRate 0.0506 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:20,018-Speed 3382.80 samples/sec Loss 7.5850 LearningRate 0.0506 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:23,023-Speed 3408.43 samples/sec Loss 7.3046 LearningRate 0.0506 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:26,051-Speed 3382.40 samples/sec Loss 7.5540 LearningRate 0.0505 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:29,056-Speed 3408.89 samples/sec Loss 7.2499 LearningRate 0.0505 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:32,058-Speed 3411.97 samples/sec Loss 7.4100 LearningRate 0.0505 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:35,060-Speed 3411.82 samples/sec Loss 7.4107 LearningRate 0.0505 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:38,054-Speed 3420.75 samples/sec Loss 7.4590 LearningRate 0.0505 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:41,077-Speed 3388.49 samples/sec Loss 7.2975 LearningRate 0.0505 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:13:44,059-Speed 3434.60 samples/sec Loss 7.3621 LearningRate 0.0505 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:13:47,066-Speed 3406.58 samples/sec Loss 7.5010 LearningRate 0.0504 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:13:50,083-Speed 3394.91 samples/sec Loss 7.3637 LearningRate 0.0504 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:13:53,095-Speed 3400.56 samples/sec Loss 7.3657 LearningRate 0.0504 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:13:56,108-Speed 3400.18 samples/sec Loss 7.3835 LearningRate 0.0504 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:13:59,111-Speed 3409.58 samples/sec Loss 7.3412 LearningRate 0.0504 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:02,127-Speed 3397.24 samples/sec Loss 7.3125 LearningRate 0.0504 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:05,143-Speed 3395.10 samples/sec Loss 7.2720 LearningRate 0.0504 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:08,148-Speed 3409.65 samples/sec Loss 7.5507 LearningRate 0.0503 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:11,147-Speed 3415.61 samples/sec Loss 7.4692 LearningRate 0.0503 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:14,151-Speed 3409.74 samples/sec Loss 7.3134 LearningRate 0.0503 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:14:17,149-Speed 3415.62 samples/sec Loss 7.4980 LearningRate 0.0503 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:14:20,134-Speed 3431.82 samples/sec Loss 7.5331 LearningRate 0.0503 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:23,138-Speed 3409.12 samples/sec Loss 7.5266 LearningRate 0.0503 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:26,155-Speed 3395.11 samples/sec Loss 7.2949 LearningRate 0.0503 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:29,171-Speed 3396.08 samples/sec Loss 7.3715 LearningRate 0.0503 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:32,172-Speed 3413.00 samples/sec Loss 7.3345 LearningRate 0.0502 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:35,193-Speed 3391.32 samples/sec Loss 7.1863 LearningRate 0.0502 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:38,202-Speed 3403.41 samples/sec Loss 7.4917 LearningRate 0.0502 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:41,205-Speed 3411.52 samples/sec Loss 7.3533 LearningRate 0.0502 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:44,211-Speed 3406.78 samples/sec Loss 7.6056 LearningRate 0.0502 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:47,219-Speed 3405.67 samples/sec Loss 7.3902 LearningRate 0.0502 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:14:50,225-Speed 3407.20 samples/sec Loss 7.4126 LearningRate 0.0502 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:14:53,238-Speed 3398.84 samples/sec Loss 7.5247 LearningRate 0.0501 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:14:56,276-Speed 3371.71 samples/sec Loss 7.5561 LearningRate 0.0501 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:14:59,296-Speed 3392.37 samples/sec Loss 7.4625 LearningRate 0.0501 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:02,312-Speed 3395.80 samples/sec Loss 7.3687 LearningRate 0.0501 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:05,317-Speed 3409.22 samples/sec Loss 7.2313 LearningRate 0.0501 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:08,320-Speed 3409.70 samples/sec Loss 7.2421 LearningRate 0.0501 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:11,323-Speed 3411.79 samples/sec Loss 7.2869 LearningRate 0.0501 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:14,324-Speed 3411.93 samples/sec Loss 7.3727 LearningRate 0.0500 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:17,331-Speed 3406.58 samples/sec Loss 7.4531 LearningRate 0.0500 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:20,315-Speed 3432.98 samples/sec Loss 7.4479 LearningRate 0.0500 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:23,318-Speed 3410.10 samples/sec Loss 7.2815 LearningRate 0.0500 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:26,325-Speed 3407.14 samples/sec Loss 7.4237 LearningRate 0.0500 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:29,333-Speed 3405.38 samples/sec Loss 7.5176 LearningRate 0.0500 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:32,338-Speed 3408.79 samples/sec Loss 7.5310 LearningRate 0.0500 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:35,340-Speed 3410.86 samples/sec Loss 7.4518 LearningRate 0.0499 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:38,348-Speed 3405.81 samples/sec Loss 7.5434 LearningRate 0.0499 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:41,347-Speed 3414.63 samples/sec Loss 7.2663 LearningRate 0.0499 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:44,355-Speed 3406.00 samples/sec Loss 7.2490 LearningRate 0.0499 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:47,361-Speed 3407.19 samples/sec Loss 7.3063 LearningRate 0.0499 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:15:50,363-Speed 3411.73 samples/sec Loss 7.4514 LearningRate 0.0499 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:53,369-Speed 3407.55 samples/sec Loss 7.3148 LearningRate 0.0499 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:56,373-Speed 3410.50 samples/sec Loss 7.3515 LearningRate 0.0498 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:15:59,370-Speed 3416.67 samples/sec Loss 7.3692 LearningRate 0.0498 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:02,376-Speed 3407.53 samples/sec Loss 7.2559 LearningRate 0.0498 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:05,382-Speed 3407.39 samples/sec Loss 7.4442 LearningRate 0.0498 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:08,396-Speed 3398.11 samples/sec Loss 7.5326 LearningRate 0.0498 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:11,410-Speed 3399.13 samples/sec Loss 7.3113 LearningRate 0.0498 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:14,411-Speed 3412.00 samples/sec Loss 7.4626 LearningRate 0.0498 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:17,428-Speed 3395.21 samples/sec Loss 7.4158 LearningRate 0.0497 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:20,429-Speed 3413.87 samples/sec Loss 7.2769 LearningRate 0.0497 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:23,435-Speed 3407.38 samples/sec Loss 7.3121 LearningRate 0.0497 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:26,460-Speed 3386.53 samples/sec Loss 7.2031 LearningRate 0.0497 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:29,465-Speed 3407.39 samples/sec Loss 7.4176 LearningRate 0.0497 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:16:32,475-Speed 3403.84 samples/sec Loss 7.2821 LearningRate 0.0497 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:16:35,474-Speed 3414.85 samples/sec Loss 7.3697 LearningRate 0.0497 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:16:38,478-Speed 3409.83 samples/sec Loss 7.3738 LearningRate 0.0496 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:16:41,468-Speed 3425.01 samples/sec Loss 7.2573 LearningRate 0.0496 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:44,478-Speed 3403.43 samples/sec Loss 7.2584 LearningRate 0.0496 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:47,488-Speed 3402.95 samples/sec Loss 7.2797 LearningRate 0.0496 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:50,490-Speed 3411.98 samples/sec Loss 7.3060 LearningRate 0.0496 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:53,504-Speed 3397.71 samples/sec Loss 7.4019 LearningRate 0.0496 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:56,511-Speed 3406.97 samples/sec Loss 7.5194 LearningRate 0.0496 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:16:59,519-Speed 3404.66 samples/sec Loss 7.2247 LearningRate 0.0496 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:17:02,530-Speed 3401.83 samples/sec Loss 7.3586 LearningRate 0.0495 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:17:05,543-Speed 3399.90 samples/sec Loss 7.4073 LearningRate 0.0495 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:17:08,545-Speed 3411.34 samples/sec Loss 7.3690 LearningRate 0.0495 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:17:11,554-Speed 3404.24 samples/sec Loss 7.2202 LearningRate 0.0495 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:17:14,563-Speed 3404.35 samples/sec Loss 7.3548 LearningRate 0.0495 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:17:58,978-[lfw][30000]XNorm: 24.189952 Training: 2022-04-11 02:17:58,979-[lfw][30000]Accuracy-Flip: 0.99717+-0.00325 Training: 2022-04-11 02:17:58,979-[lfw][30000]Accuracy-Highest: 0.99717 Training: 2022-04-11 02:18:50,448-[cfp_fp][30000]XNorm: 21.420047 Training: 2022-04-11 02:18:50,449-[cfp_fp][30000]Accuracy-Flip: 0.96443+-0.00839 Training: 2022-04-11 02:18:50,449-[cfp_fp][30000]Accuracy-Highest: 0.96443 Training: 2022-04-11 02:19:34,832-[agedb_30][30000]XNorm: 24.095577 Training: 2022-04-11 02:19:34,833-[agedb_30][30000]Accuracy-Flip: 0.97750+-0.00655 Training: 2022-04-11 02:19:34,833-[agedb_30][30000]Accuracy-Highest: 0.97750 Training: 2022-04-11 02:19:37,823-Speed 71.48 samples/sec Loss 7.3324 LearningRate 0.0495 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:19:40,809-Speed 3430.38 samples/sec Loss 7.2771 LearningRate 0.0495 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:19:43,799-Speed 3424.73 samples/sec Loss 7.4157 LearningRate 0.0494 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:19:46,780-Speed 3437.24 samples/sec Loss 7.4154 LearningRate 0.0494 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:19:49,768-Speed 3428.00 samples/sec Loss 7.4594 LearningRate 0.0494 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:19:52,724-Speed 3465.02 samples/sec Loss 7.5247 LearningRate 0.0494 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:19:55,715-Speed 3424.03 samples/sec Loss 7.3614 LearningRate 0.0494 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:19:58,701-Speed 3429.43 samples/sec Loss 7.5177 LearningRate 0.0494 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:01,692-Speed 3425.63 samples/sec Loss 7.3512 LearningRate 0.0494 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:04,684-Speed 3422.82 samples/sec Loss 7.4419 LearningRate 0.0493 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:07,680-Speed 3418.73 samples/sec Loss 7.3610 LearningRate 0.0493 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:10,676-Speed 3419.04 samples/sec Loss 7.4625 LearningRate 0.0493 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:13,672-Speed 3419.05 samples/sec Loss 7.2953 LearningRate 0.0493 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:16,667-Speed 3421.10 samples/sec Loss 7.4333 LearningRate 0.0493 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:19,658-Speed 3424.11 samples/sec Loss 7.2979 LearningRate 0.0493 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:20:22,650-Speed 3423.64 samples/sec Loss 7.2686 LearningRate 0.0493 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:25,646-Speed 3418.74 samples/sec Loss 7.3315 LearningRate 0.0492 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:28,670-Speed 3386.41 samples/sec Loss 7.1286 LearningRate 0.0492 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:31,666-Speed 3419.40 samples/sec Loss 7.2953 LearningRate 0.0492 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:34,660-Speed 3421.10 samples/sec Loss 7.4072 LearningRate 0.0492 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:37,656-Speed 3419.10 samples/sec Loss 7.2211 LearningRate 0.0492 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:40,651-Speed 3419.19 samples/sec Loss 7.3729 LearningRate 0.0492 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:43,653-Speed 3412.48 samples/sec Loss 7.2379 LearningRate 0.0492 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:46,648-Speed 3419.89 samples/sec Loss 7.3036 LearningRate 0.0491 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:49,641-Speed 3421.75 samples/sec Loss 7.3215 LearningRate 0.0491 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:20:52,662-Speed 3390.76 samples/sec Loss 7.4003 LearningRate 0.0491 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:20:55,655-Speed 3422.24 samples/sec Loss 7.2752 LearningRate 0.0491 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:20:58,654-Speed 3415.08 samples/sec Loss 7.2771 LearningRate 0.0491 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:01,656-Speed 3411.93 samples/sec Loss 7.3047 LearningRate 0.0491 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:04,655-Speed 3415.55 samples/sec Loss 7.2887 LearningRate 0.0491 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:07,652-Speed 3418.43 samples/sec Loss 7.3836 LearningRate 0.0491 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:10,642-Speed 3424.93 samples/sec Loss 7.2104 LearningRate 0.0490 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:13,633-Speed 3424.45 samples/sec Loss 7.5638 LearningRate 0.0490 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:16,699-Speed 3341.06 samples/sec Loss 7.5515 LearningRate 0.0490 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:29,244-Speed 816.31 samples/sec Loss 7.2153 LearningRate 0.0490 Epoch: 6 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:32,260-Speed 3396.78 samples/sec Loss 6.5246 LearningRate 0.0490 Epoch: 6 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:35,295-Speed 3375.42 samples/sec Loss 6.4626 LearningRate 0.0490 Epoch: 6 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:38,308-Speed 3399.70 samples/sec Loss 6.4595 LearningRate 0.0490 Epoch: 6 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:21:41,330-Speed 3388.83 samples/sec Loss 6.4377 LearningRate 0.0489 Epoch: 6 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:44,348-Speed 3395.11 samples/sec Loss 6.4218 LearningRate 0.0489 Epoch: 6 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:47,367-Speed 3393.76 samples/sec Loss 6.5353 LearningRate 0.0489 Epoch: 6 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:50,385-Speed 3395.09 samples/sec Loss 6.4843 LearningRate 0.0489 Epoch: 6 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:53,412-Speed 3383.11 samples/sec Loss 6.5395 LearningRate 0.0489 Epoch: 6 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:56,415-Speed 3411.19 samples/sec Loss 6.6336 LearningRate 0.0489 Epoch: 6 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:21:59,417-Speed 3413.24 samples/sec Loss 6.6258 LearningRate 0.0489 Epoch: 6 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:02,440-Speed 3389.08 samples/sec Loss 6.6702 LearningRate 0.0488 Epoch: 6 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:05,450-Speed 3402.46 samples/sec Loss 6.6233 LearningRate 0.0488 Epoch: 6 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:08,511-Speed 3346.41 samples/sec Loss 6.6723 LearningRate 0.0488 Epoch: 6 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:11,536-Speed 3386.85 samples/sec Loss 6.6489 LearningRate 0.0488 Epoch: 6 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:14,551-Speed 3397.63 samples/sec Loss 6.7220 LearningRate 0.0488 Epoch: 6 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:17,579-Speed 3382.49 samples/sec Loss 6.7857 LearningRate 0.0488 Epoch: 6 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:20,590-Speed 3401.45 samples/sec Loss 6.6230 LearningRate 0.0488 Epoch: 6 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:23,586-Speed 3418.56 samples/sec Loss 6.7405 LearningRate 0.0487 Epoch: 6 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:26,594-Speed 3405.43 samples/sec Loss 6.7609 LearningRate 0.0487 Epoch: 6 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:22:29,583-Speed 3426.66 samples/sec Loss 6.5843 LearningRate 0.0487 Epoch: 6 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:32,605-Speed 3389.75 samples/sec Loss 6.8000 LearningRate 0.0487 Epoch: 6 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:35,620-Speed 3397.61 samples/sec Loss 6.7771 LearningRate 0.0487 Epoch: 6 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:38,627-Speed 3405.71 samples/sec Loss 6.6901 LearningRate 0.0487 Epoch: 6 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:41,633-Speed 3407.62 samples/sec Loss 6.7177 LearningRate 0.0487 Epoch: 6 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:44,642-Speed 3404.53 samples/sec Loss 6.7054 LearningRate 0.0487 Epoch: 6 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:47,656-Speed 3398.67 samples/sec Loss 6.6322 LearningRate 0.0486 Epoch: 6 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:50,763-Speed 3296.68 samples/sec Loss 6.6386 LearningRate 0.0486 Epoch: 6 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:53,855-Speed 3312.22 samples/sec Loss 6.7880 LearningRate 0.0486 Epoch: 6 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:56,869-Speed 3399.04 samples/sec Loss 6.8539 LearningRate 0.0486 Epoch: 6 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:22:59,897-Speed 3382.50 samples/sec Loss 6.7852 LearningRate 0.0486 Epoch: 6 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:02,907-Speed 3403.19 samples/sec Loss 6.7805 LearningRate 0.0486 Epoch: 6 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:05,946-Speed 3370.38 samples/sec Loss 6.7954 LearningRate 0.0486 Epoch: 6 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:08,952-Speed 3406.59 samples/sec Loss 6.6738 LearningRate 0.0485 Epoch: 6 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:11,959-Speed 3406.83 samples/sec Loss 6.8363 LearningRate 0.0485 Epoch: 6 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:14,960-Speed 3413.27 samples/sec Loss 6.8920 LearningRate 0.0485 Epoch: 6 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:18,059-Speed 3305.51 samples/sec Loss 6.6756 LearningRate 0.0485 Epoch: 6 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:21,073-Speed 3397.61 samples/sec Loss 6.8646 LearningRate 0.0485 Epoch: 6 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:24,082-Speed 3405.01 samples/sec Loss 6.7057 LearningRate 0.0485 Epoch: 6 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:27,083-Speed 3412.60 samples/sec Loss 6.7691 LearningRate 0.0485 Epoch: 6 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:30,072-Speed 3426.87 samples/sec Loss 6.7011 LearningRate 0.0484 Epoch: 6 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:33,072-Speed 3414.11 samples/sec Loss 6.8118 LearningRate 0.0484 Epoch: 6 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:36,086-Speed 3398.78 samples/sec Loss 6.9273 LearningRate 0.0484 Epoch: 6 Global Step: 30770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:39,111-Speed 3385.78 samples/sec Loss 6.9024 LearningRate 0.0484 Epoch: 6 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:42,158-Speed 3361.78 samples/sec Loss 6.8229 LearningRate 0.0484 Epoch: 6 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:45,162-Speed 3409.30 samples/sec Loss 6.8808 LearningRate 0.0484 Epoch: 6 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:48,169-Speed 3406.63 samples/sec Loss 6.9338 LearningRate 0.0484 Epoch: 6 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:51,224-Speed 3352.28 samples/sec Loss 6.9000 LearningRate 0.0483 Epoch: 6 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:54,245-Speed 3390.79 samples/sec Loss 6.8763 LearningRate 0.0483 Epoch: 6 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:23:57,250-Speed 3408.37 samples/sec Loss 6.9689 LearningRate 0.0483 Epoch: 6 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:00,243-Speed 3422.35 samples/sec Loss 6.9081 LearningRate 0.0483 Epoch: 6 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:03,245-Speed 3412.25 samples/sec Loss 6.8830 LearningRate 0.0483 Epoch: 6 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:06,274-Speed 3381.23 samples/sec Loss 6.8558 LearningRate 0.0483 Epoch: 6 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:09,277-Speed 3410.99 samples/sec Loss 7.0297 LearningRate 0.0483 Epoch: 6 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:12,285-Speed 3405.28 samples/sec Loss 7.0570 LearningRate 0.0483 Epoch: 6 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:24:15,271-Speed 3430.06 samples/sec Loss 6.8291 LearningRate 0.0482 Epoch: 6 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:24:18,259-Speed 3427.47 samples/sec Loss 6.8975 LearningRate 0.0482 Epoch: 6 Global Step: 30910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:21,272-Speed 3400.23 samples/sec Loss 7.0064 LearningRate 0.0482 Epoch: 6 Global Step: 30920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:24,277-Speed 3408.42 samples/sec Loss 6.7929 LearningRate 0.0482 Epoch: 6 Global Step: 30930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:27,294-Speed 3395.30 samples/sec Loss 6.9881 LearningRate 0.0482 Epoch: 6 Global Step: 30940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:30,368-Speed 3332.29 samples/sec Loss 6.9891 LearningRate 0.0482 Epoch: 6 Global Step: 30950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:33,394-Speed 3384.64 samples/sec Loss 6.8881 LearningRate 0.0482 Epoch: 6 Global Step: 30960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:36,399-Speed 3408.10 samples/sec Loss 6.8595 LearningRate 0.0481 Epoch: 6 Global Step: 30970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:39,402-Speed 3411.04 samples/sec Loss 6.9285 LearningRate 0.0481 Epoch: 6 Global Step: 30980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:42,411-Speed 3404.53 samples/sec Loss 6.8733 LearningRate 0.0481 Epoch: 6 Global Step: 30990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:45,419-Speed 3404.68 samples/sec Loss 6.9214 LearningRate 0.0481 Epoch: 6 Global Step: 31000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:48,427-Speed 3405.33 samples/sec Loss 6.9361 LearningRate 0.0481 Epoch: 6 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:24:51,440-Speed 3400.63 samples/sec Loss 7.0905 LearningRate 0.0481 Epoch: 6 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:24:54,460-Speed 3390.51 samples/sec Loss 6.9062 LearningRate 0.0481 Epoch: 6 Global Step: 31030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:24:57,496-Speed 3374.59 samples/sec Loss 6.9328 LearningRate 0.0480 Epoch: 6 Global Step: 31040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:00,508-Speed 3401.15 samples/sec Loss 6.8551 LearningRate 0.0480 Epoch: 6 Global Step: 31050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:03,527-Speed 3392.78 samples/sec Loss 6.9707 LearningRate 0.0480 Epoch: 6 Global Step: 31060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:06,568-Speed 3367.70 samples/sec Loss 6.9692 LearningRate 0.0480 Epoch: 6 Global Step: 31070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:09,578-Speed 3402.79 samples/sec Loss 6.9204 LearningRate 0.0480 Epoch: 6 Global Step: 31080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:12,593-Speed 3397.46 samples/sec Loss 6.8929 LearningRate 0.0480 Epoch: 6 Global Step: 31090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:15,690-Speed 3307.31 samples/sec Loss 6.9155 LearningRate 0.0480 Epoch: 6 Global Step: 31100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:18,737-Speed 3362.25 samples/sec Loss 6.9157 LearningRate 0.0480 Epoch: 6 Global Step: 31110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:21,789-Speed 3356.28 samples/sec Loss 6.8728 LearningRate 0.0479 Epoch: 6 Global Step: 31120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:25:24,810-Speed 3389.65 samples/sec Loss 7.0824 LearningRate 0.0479 Epoch: 6 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:27,822-Speed 3400.44 samples/sec Loss 6.9417 LearningRate 0.0479 Epoch: 6 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:30,823-Speed 3412.82 samples/sec Loss 6.8946 LearningRate 0.0479 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:33,833-Speed 3403.00 samples/sec Loss 7.0804 LearningRate 0.0479 Epoch: 6 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:36,852-Speed 3393.80 samples/sec Loss 7.1426 LearningRate 0.0479 Epoch: 6 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:39,865-Speed 3399.55 samples/sec Loss 7.0206 LearningRate 0.0479 Epoch: 6 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:42,915-Speed 3358.06 samples/sec Loss 7.1202 LearningRate 0.0478 Epoch: 6 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:45,923-Speed 3404.79 samples/sec Loss 6.9653 LearningRate 0.0478 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:48,928-Speed 3408.25 samples/sec Loss 6.9984 LearningRate 0.0478 Epoch: 6 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:51,940-Speed 3400.38 samples/sec Loss 7.0721 LearningRate 0.0478 Epoch: 6 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:25:54,945-Speed 3408.51 samples/sec Loss 7.0429 LearningRate 0.0478 Epoch: 6 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:25:57,953-Speed 3405.69 samples/sec Loss 7.0528 LearningRate 0.0478 Epoch: 6 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:00,956-Speed 3410.45 samples/sec Loss 7.0532 LearningRate 0.0478 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:03,967-Speed 3402.34 samples/sec Loss 7.1095 LearningRate 0.0477 Epoch: 6 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:07,036-Speed 3337.48 samples/sec Loss 7.0245 LearningRate 0.0477 Epoch: 6 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:10,044-Speed 3404.51 samples/sec Loss 6.8883 LearningRate 0.0477 Epoch: 6 Global Step: 31280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:13,097-Speed 3355.47 samples/sec Loss 7.0201 LearningRate 0.0477 Epoch: 6 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:16,103-Speed 3407.09 samples/sec Loss 7.0214 LearningRate 0.0477 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:19,113-Speed 3402.77 samples/sec Loss 6.9982 LearningRate 0.0477 Epoch: 6 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:22,120-Speed 3407.01 samples/sec Loss 6.9790 LearningRate 0.0477 Epoch: 6 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:25,124-Speed 3409.54 samples/sec Loss 7.1586 LearningRate 0.0477 Epoch: 6 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:28,164-Speed 3368.98 samples/sec Loss 7.0158 LearningRate 0.0476 Epoch: 6 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:31,200-Speed 3373.88 samples/sec Loss 7.1747 LearningRate 0.0476 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:34,210-Speed 3403.19 samples/sec Loss 7.0833 LearningRate 0.0476 Epoch: 6 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:37,224-Speed 3398.87 samples/sec Loss 6.9851 LearningRate 0.0476 Epoch: 6 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:40,237-Speed 3399.48 samples/sec Loss 6.9664 LearningRate 0.0476 Epoch: 6 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:43,247-Speed 3402.13 samples/sec Loss 7.0686 LearningRate 0.0476 Epoch: 6 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:46,252-Speed 3408.70 samples/sec Loss 6.9184 LearningRate 0.0476 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:49,264-Speed 3401.45 samples/sec Loss 6.8147 LearningRate 0.0475 Epoch: 6 Global Step: 31410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:52,284-Speed 3391.29 samples/sec Loss 7.0982 LearningRate 0.0475 Epoch: 6 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:26:55,282-Speed 3416.79 samples/sec Loss 6.9435 LearningRate 0.0475 Epoch: 6 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:26:58,295-Speed 3399.07 samples/sec Loss 6.9404 LearningRate 0.0475 Epoch: 6 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:01,309-Speed 3398.34 samples/sec Loss 6.9250 LearningRate 0.0475 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:04,318-Speed 3404.02 samples/sec Loss 7.0484 LearningRate 0.0475 Epoch: 6 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:07,333-Speed 3396.80 samples/sec Loss 7.0033 LearningRate 0.0475 Epoch: 6 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:10,346-Speed 3400.10 samples/sec Loss 6.9191 LearningRate 0.0474 Epoch: 6 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:13,351-Speed 3408.38 samples/sec Loss 7.0895 LearningRate 0.0474 Epoch: 6 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:16,381-Speed 3380.95 samples/sec Loss 6.7666 LearningRate 0.0474 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:19,400-Speed 3391.88 samples/sec Loss 7.2581 LearningRate 0.0474 Epoch: 6 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:22,419-Speed 3393.17 samples/sec Loss 7.0292 LearningRate 0.0474 Epoch: 6 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:25,432-Speed 3400.12 samples/sec Loss 7.0084 LearningRate 0.0474 Epoch: 6 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:27:28,447-Speed 3397.36 samples/sec Loss 6.9645 LearningRate 0.0474 Epoch: 6 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:27:31,450-Speed 3410.28 samples/sec Loss 7.0008 LearningRate 0.0474 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:27:34,461-Speed 3401.03 samples/sec Loss 7.0542 LearningRate 0.0473 Epoch: 6 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:27:37,461-Speed 3414.96 samples/sec Loss 7.1436 LearningRate 0.0473 Epoch: 6 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:40,472-Speed 3401.57 samples/sec Loss 7.1965 LearningRate 0.0473 Epoch: 6 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:43,524-Speed 3356.43 samples/sec Loss 7.0491 LearningRate 0.0473 Epoch: 6 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:46,544-Speed 3391.57 samples/sec Loss 6.8491 LearningRate 0.0473 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:49,552-Speed 3404.72 samples/sec Loss 6.9137 LearningRate 0.0473 Epoch: 6 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:52,558-Speed 3408.29 samples/sec Loss 6.9426 LearningRate 0.0473 Epoch: 6 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:55,561-Speed 3410.44 samples/sec Loss 7.0448 LearningRate 0.0472 Epoch: 6 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:27:58,571-Speed 3402.64 samples/sec Loss 6.9637 LearningRate 0.0472 Epoch: 6 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:01,572-Speed 3414.04 samples/sec Loss 6.9625 LearningRate 0.0472 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:04,600-Speed 3381.82 samples/sec Loss 6.9797 LearningRate 0.0472 Epoch: 6 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:07,605-Speed 3408.69 samples/sec Loss 7.0620 LearningRate 0.0472 Epoch: 6 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:10,606-Speed 3413.80 samples/sec Loss 6.9900 LearningRate 0.0472 Epoch: 6 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:13,621-Speed 3397.29 samples/sec Loss 6.9874 LearningRate 0.0472 Epoch: 6 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:16,629-Speed 3404.62 samples/sec Loss 6.9195 LearningRate 0.0471 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:19,637-Speed 3405.16 samples/sec Loss 6.9188 LearningRate 0.0471 Epoch: 6 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:22,624-Speed 3429.93 samples/sec Loss 7.0118 LearningRate 0.0471 Epoch: 6 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:25,626-Speed 3411.03 samples/sec Loss 6.9325 LearningRate 0.0471 Epoch: 6 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:28,631-Speed 3409.39 samples/sec Loss 7.0854 LearningRate 0.0471 Epoch: 6 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:31,650-Speed 3392.15 samples/sec Loss 6.9573 LearningRate 0.0471 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:34,673-Speed 3388.95 samples/sec Loss 7.1172 LearningRate 0.0471 Epoch: 6 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:37,742-Speed 3337.16 samples/sec Loss 7.0032 LearningRate 0.0471 Epoch: 6 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:40,782-Speed 3369.50 samples/sec Loss 7.0370 LearningRate 0.0470 Epoch: 6 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:43,791-Speed 3403.54 samples/sec Loss 7.1430 LearningRate 0.0470 Epoch: 6 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:46,795-Speed 3410.14 samples/sec Loss 6.9084 LearningRate 0.0470 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:49,838-Speed 3365.59 samples/sec Loss 7.1475 LearningRate 0.0470 Epoch: 6 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:28:52,866-Speed 3383.29 samples/sec Loss 7.0754 LearningRate 0.0470 Epoch: 6 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:55,885-Speed 3392.20 samples/sec Loss 6.9953 LearningRate 0.0470 Epoch: 6 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:28:58,887-Speed 3412.21 samples/sec Loss 7.0445 LearningRate 0.0470 Epoch: 6 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:01,889-Speed 3412.21 samples/sec Loss 6.9847 LearningRate 0.0469 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:04,919-Speed 3381.08 samples/sec Loss 7.0800 LearningRate 0.0469 Epoch: 6 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:07,921-Speed 3411.98 samples/sec Loss 7.1352 LearningRate 0.0469 Epoch: 6 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:10,927-Speed 3407.17 samples/sec Loss 7.1510 LearningRate 0.0469 Epoch: 6 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:13,929-Speed 3411.73 samples/sec Loss 7.0224 LearningRate 0.0469 Epoch: 6 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:16,944-Speed 3398.47 samples/sec Loss 6.9635 LearningRate 0.0469 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:19,954-Speed 3402.75 samples/sec Loss 7.0240 LearningRate 0.0469 Epoch: 6 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:22,974-Speed 3390.91 samples/sec Loss 7.1650 LearningRate 0.0468 Epoch: 6 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:25,983-Speed 3403.95 samples/sec Loss 7.0120 LearningRate 0.0468 Epoch: 6 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:29:28,991-Speed 3405.52 samples/sec Loss 6.9862 LearningRate 0.0468 Epoch: 6 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:32,036-Speed 3364.05 samples/sec Loss 7.1778 LearningRate 0.0468 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:35,054-Speed 3393.64 samples/sec Loss 6.9573 LearningRate 0.0468 Epoch: 6 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:38,058-Speed 3409.91 samples/sec Loss 7.0964 LearningRate 0.0468 Epoch: 6 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:41,061-Speed 3410.09 samples/sec Loss 6.9642 LearningRate 0.0468 Epoch: 6 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:44,075-Speed 3398.29 samples/sec Loss 6.9673 LearningRate 0.0468 Epoch: 6 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:29:47,077-Speed 3412.27 samples/sec Loss 6.9834 LearningRate 0.0467 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:30:31,753-[lfw][32000]XNorm: 24.251113 Training: 2022-04-11 02:30:31,754-[lfw][32000]Accuracy-Flip: 0.99717+-0.00342 Training: 2022-04-11 02:30:31,754-[lfw][32000]Accuracy-Highest: 0.99717 Training: 2022-04-11 02:31:23,023-[cfp_fp][32000]XNorm: 21.559307 Training: 2022-04-11 02:31:23,024-[cfp_fp][32000]Accuracy-Flip: 0.97057+-0.01023 Training: 2022-04-11 02:31:23,025-[cfp_fp][32000]Accuracy-Highest: 0.97057 Training: 2022-04-11 02:32:07,986-[agedb_30][32000]XNorm: 23.669438 Training: 2022-04-11 02:32:07,986-[agedb_30][32000]Accuracy-Flip: 0.97567+-0.00797 Training: 2022-04-11 02:32:07,987-[agedb_30][32000]Accuracy-Highest: 0.97750 Training: 2022-04-11 02:32:10,976-Speed 71.16 samples/sec Loss 7.0603 LearningRate 0.0467 Epoch: 6 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:13,958-Speed 3434.48 samples/sec Loss 7.0901 LearningRate 0.0467 Epoch: 6 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:16,952-Speed 3421.46 samples/sec Loss 6.9353 LearningRate 0.0467 Epoch: 6 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:19,941-Speed 3426.76 samples/sec Loss 6.9643 LearningRate 0.0467 Epoch: 6 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:22,928-Speed 3429.63 samples/sec Loss 7.0314 LearningRate 0.0467 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:25,926-Speed 3416.36 samples/sec Loss 7.0558 LearningRate 0.0467 Epoch: 6 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:28,915-Speed 3426.73 samples/sec Loss 7.2993 LearningRate 0.0466 Epoch: 6 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:31,902-Speed 3429.25 samples/sec Loss 6.9969 LearningRate 0.0466 Epoch: 6 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:34,887-Speed 3431.62 samples/sec Loss 6.9800 LearningRate 0.0466 Epoch: 6 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:37,882-Speed 3419.84 samples/sec Loss 6.9485 LearningRate 0.0466 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:40,871-Speed 3427.11 samples/sec Loss 7.0474 LearningRate 0.0466 Epoch: 6 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:32:43,876-Speed 3407.55 samples/sec Loss 6.9531 LearningRate 0.0466 Epoch: 6 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:32:46,876-Speed 3414.97 samples/sec Loss 6.9982 LearningRate 0.0466 Epoch: 6 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:32:49,869-Speed 3421.92 samples/sec Loss 7.1346 LearningRate 0.0466 Epoch: 6 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:32:52,889-Speed 3391.74 samples/sec Loss 7.0539 LearningRate 0.0465 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:32:55,893-Speed 3409.94 samples/sec Loss 6.9007 LearningRate 0.0465 Epoch: 6 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:32:58,891-Speed 3416.58 samples/sec Loss 7.0852 LearningRate 0.0465 Epoch: 6 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:01,888-Speed 3417.89 samples/sec Loss 6.9931 LearningRate 0.0465 Epoch: 6 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:04,899-Speed 3401.65 samples/sec Loss 7.1510 LearningRate 0.0465 Epoch: 6 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:07,901-Speed 3411.90 samples/sec Loss 7.1407 LearningRate 0.0465 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:10,899-Speed 3416.35 samples/sec Loss 7.0500 LearningRate 0.0465 Epoch: 6 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:13,901-Speed 3411.83 samples/sec Loss 6.9931 LearningRate 0.0464 Epoch: 6 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:16,909-Speed 3406.03 samples/sec Loss 7.0811 LearningRate 0.0464 Epoch: 6 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:19,917-Speed 3404.58 samples/sec Loss 7.1109 LearningRate 0.0464 Epoch: 6 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:22,929-Speed 3400.72 samples/sec Loss 7.1253 LearningRate 0.0464 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:25,980-Speed 3357.74 samples/sec Loss 7.0478 LearningRate 0.0464 Epoch: 6 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:28,989-Speed 3404.04 samples/sec Loss 6.8741 LearningRate 0.0464 Epoch: 6 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:31,992-Speed 3410.93 samples/sec Loss 7.0184 LearningRate 0.0464 Epoch: 6 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:34,996-Speed 3409.59 samples/sec Loss 7.0449 LearningRate 0.0463 Epoch: 6 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:37,997-Speed 3412.87 samples/sec Loss 6.9015 LearningRate 0.0463 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:41,014-Speed 3395.57 samples/sec Loss 7.1120 LearningRate 0.0463 Epoch: 6 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:44,014-Speed 3413.50 samples/sec Loss 7.0569 LearningRate 0.0463 Epoch: 6 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:47,028-Speed 3399.56 samples/sec Loss 7.0043 LearningRate 0.0463 Epoch: 6 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:50,050-Speed 3388.87 samples/sec Loss 6.9778 LearningRate 0.0463 Epoch: 6 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:53,050-Speed 3414.29 samples/sec Loss 7.0867 LearningRate 0.0463 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:33:56,054-Speed 3409.78 samples/sec Loss 6.9946 LearningRate 0.0463 Epoch: 6 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:33:59,033-Speed 3438.57 samples/sec Loss 7.0807 LearningRate 0.0462 Epoch: 6 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:02,049-Speed 3395.80 samples/sec Loss 7.0877 LearningRate 0.0462 Epoch: 6 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:05,176-Speed 3276.53 samples/sec Loss 7.0309 LearningRate 0.0462 Epoch: 6 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:08,198-Speed 3389.60 samples/sec Loss 7.1399 LearningRate 0.0462 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:11,195-Speed 3417.71 samples/sec Loss 6.9550 LearningRate 0.0462 Epoch: 6 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:14,193-Speed 3416.62 samples/sec Loss 6.9338 LearningRate 0.0462 Epoch: 6 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:17,191-Speed 3416.29 samples/sec Loss 7.2092 LearningRate 0.0462 Epoch: 6 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:20,190-Speed 3415.66 samples/sec Loss 6.9784 LearningRate 0.0461 Epoch: 6 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:23,192-Speed 3411.28 samples/sec Loss 7.0111 LearningRate 0.0461 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:26,189-Speed 3418.07 samples/sec Loss 7.0865 LearningRate 0.0461 Epoch: 6 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:29,207-Speed 3394.25 samples/sec Loss 7.0715 LearningRate 0.0461 Epoch: 6 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:34:32,206-Speed 3415.30 samples/sec Loss 7.0103 LearningRate 0.0461 Epoch: 6 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:35,205-Speed 3415.08 samples/sec Loss 7.1890 LearningRate 0.0461 Epoch: 6 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:38,210-Speed 3408.33 samples/sec Loss 7.0172 LearningRate 0.0461 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:41,225-Speed 3398.50 samples/sec Loss 7.0052 LearningRate 0.0461 Epoch: 6 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:44,221-Speed 3418.07 samples/sec Loss 7.0221 LearningRate 0.0460 Epoch: 6 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:47,217-Speed 3418.89 samples/sec Loss 7.1079 LearningRate 0.0460 Epoch: 6 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:50,213-Speed 3419.20 samples/sec Loss 7.0597 LearningRate 0.0460 Epoch: 6 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:53,252-Speed 3370.30 samples/sec Loss 6.9978 LearningRate 0.0460 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:56,338-Speed 3319.18 samples/sec Loss 6.9725 LearningRate 0.0460 Epoch: 6 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:34:59,319-Speed 3435.97 samples/sec Loss 6.9244 LearningRate 0.0460 Epoch: 6 Global Step: 32570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:02,317-Speed 3417.00 samples/sec Loss 7.1396 LearningRate 0.0460 Epoch: 6 Global Step: 32580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:05,317-Speed 3414.12 samples/sec Loss 6.8904 LearningRate 0.0459 Epoch: 6 Global Step: 32590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:08,317-Speed 3414.56 samples/sec Loss 6.9766 LearningRate 0.0459 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:11,314-Speed 3417.04 samples/sec Loss 7.0390 LearningRate 0.0459 Epoch: 6 Global Step: 32610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:14,312-Speed 3416.73 samples/sec Loss 6.9422 LearningRate 0.0459 Epoch: 6 Global Step: 32620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:17,321-Speed 3403.71 samples/sec Loss 6.8198 LearningRate 0.0459 Epoch: 6 Global Step: 32630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:20,319-Speed 3417.22 samples/sec Loss 6.9699 LearningRate 0.0459 Epoch: 6 Global Step: 32640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:23,356-Speed 3373.08 samples/sec Loss 7.0855 LearningRate 0.0459 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:26,376-Speed 3391.74 samples/sec Loss 7.1178 LearningRate 0.0459 Epoch: 6 Global Step: 32660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:35:29,379-Speed 3410.74 samples/sec Loss 7.0608 LearningRate 0.0458 Epoch: 6 Global Step: 32670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:32,391-Speed 3399.91 samples/sec Loss 6.8310 LearningRate 0.0458 Epoch: 6 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:35,393-Speed 3412.93 samples/sec Loss 7.1634 LearningRate 0.0458 Epoch: 6 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:38,395-Speed 3411.99 samples/sec Loss 7.0867 LearningRate 0.0458 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:41,396-Speed 3413.14 samples/sec Loss 7.0963 LearningRate 0.0458 Epoch: 6 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:44,396-Speed 3414.50 samples/sec Loss 7.0106 LearningRate 0.0458 Epoch: 6 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:47,396-Speed 3413.70 samples/sec Loss 7.0455 LearningRate 0.0458 Epoch: 6 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:50,409-Speed 3399.95 samples/sec Loss 6.9080 LearningRate 0.0457 Epoch: 6 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:53,412-Speed 3410.37 samples/sec Loss 7.0390 LearningRate 0.0457 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:56,413-Speed 3413.18 samples/sec Loss 6.9751 LearningRate 0.0457 Epoch: 6 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:35:59,401-Speed 3427.80 samples/sec Loss 7.0066 LearningRate 0.0457 Epoch: 6 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:02,404-Speed 3411.20 samples/sec Loss 7.2255 LearningRate 0.0457 Epoch: 6 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:05,407-Speed 3411.60 samples/sec Loss 6.9074 LearningRate 0.0457 Epoch: 6 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:08,405-Speed 3416.24 samples/sec Loss 7.0669 LearningRate 0.0457 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:11,419-Speed 3397.46 samples/sec Loss 6.9675 LearningRate 0.0457 Epoch: 6 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:14,424-Speed 3409.41 samples/sec Loss 7.0120 LearningRate 0.0456 Epoch: 6 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:17,426-Speed 3412.26 samples/sec Loss 7.1515 LearningRate 0.0456 Epoch: 6 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:20,423-Speed 3417.91 samples/sec Loss 6.9263 LearningRate 0.0456 Epoch: 6 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:23,421-Speed 3415.63 samples/sec Loss 6.9851 LearningRate 0.0456 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:26,429-Speed 3404.80 samples/sec Loss 7.0193 LearningRate 0.0456 Epoch: 6 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:29,453-Speed 3388.50 samples/sec Loss 7.1124 LearningRate 0.0456 Epoch: 6 Global Step: 32870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:36:32,461-Speed 3404.65 samples/sec Loss 7.2273 LearningRate 0.0456 Epoch: 6 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:36:35,444-Speed 3434.32 samples/sec Loss 7.0682 LearningRate 0.0455 Epoch: 6 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:38,445-Speed 3412.01 samples/sec Loss 7.1304 LearningRate 0.0455 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:41,448-Speed 3411.56 samples/sec Loss 6.8694 LearningRate 0.0455 Epoch: 6 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:44,449-Speed 3412.84 samples/sec Loss 7.0216 LearningRate 0.0455 Epoch: 6 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:47,447-Speed 3416.93 samples/sec Loss 6.9870 LearningRate 0.0455 Epoch: 6 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:50,448-Speed 3413.30 samples/sec Loss 6.8657 LearningRate 0.0455 Epoch: 6 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:53,464-Speed 3395.12 samples/sec Loss 6.9753 LearningRate 0.0455 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:56,469-Speed 3408.65 samples/sec Loss 7.2357 LearningRate 0.0455 Epoch: 6 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:36:59,471-Speed 3412.51 samples/sec Loss 6.9005 LearningRate 0.0454 Epoch: 6 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:02,501-Speed 3379.89 samples/sec Loss 6.9470 LearningRate 0.0454 Epoch: 6 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:05,502-Speed 3413.25 samples/sec Loss 6.9887 LearningRate 0.0454 Epoch: 6 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:08,511-Speed 3403.95 samples/sec Loss 7.0278 LearningRate 0.0454 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:11,510-Speed 3415.90 samples/sec Loss 6.8820 LearningRate 0.0454 Epoch: 6 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:14,509-Speed 3415.47 samples/sec Loss 6.9603 LearningRate 0.0454 Epoch: 6 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:17,509-Speed 3413.70 samples/sec Loss 6.7644 LearningRate 0.0454 Epoch: 6 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:20,515-Speed 3408.25 samples/sec Loss 7.0835 LearningRate 0.0453 Epoch: 6 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:23,536-Speed 3389.95 samples/sec Loss 6.9687 LearningRate 0.0453 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:26,648-Speed 3291.40 samples/sec Loss 7.0444 LearningRate 0.0453 Epoch: 6 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:29,651-Speed 3410.95 samples/sec Loss 7.0598 LearningRate 0.0453 Epoch: 6 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:32,650-Speed 3414.60 samples/sec Loss 6.8764 LearningRate 0.0453 Epoch: 6 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:35,634-Speed 3433.17 samples/sec Loss 7.0009 LearningRate 0.0453 Epoch: 6 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:38,638-Speed 3409.28 samples/sec Loss 7.0919 LearningRate 0.0453 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:37:41,622-Speed 3432.88 samples/sec Loss 7.0442 LearningRate 0.0453 Epoch: 6 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:44,618-Speed 3418.92 samples/sec Loss 6.9576 LearningRate 0.0452 Epoch: 6 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:47,623-Speed 3408.51 samples/sec Loss 7.0320 LearningRate 0.0452 Epoch: 6 Global Step: 33130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:50,623-Speed 3414.64 samples/sec Loss 7.0364 LearningRate 0.0452 Epoch: 6 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:53,624-Speed 3412.42 samples/sec Loss 7.0022 LearningRate 0.0452 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:56,630-Speed 3407.59 samples/sec Loss 7.0403 LearningRate 0.0452 Epoch: 6 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:37:59,638-Speed 3405.15 samples/sec Loss 7.1427 LearningRate 0.0452 Epoch: 6 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:38:02,647-Speed 3403.92 samples/sec Loss 7.0347 LearningRate 0.0452 Epoch: 6 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:38:05,665-Speed 3394.04 samples/sec Loss 7.0464 LearningRate 0.0451 Epoch: 6 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:38:08,670-Speed 3408.74 samples/sec Loss 6.9957 LearningRate 0.0451 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:38:11,666-Speed 3417.91 samples/sec Loss 7.1292 LearningRate 0.0451 Epoch: 6 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:14,667-Speed 3414.09 samples/sec Loss 7.0895 LearningRate 0.0451 Epoch: 6 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:17,700-Speed 3375.94 samples/sec Loss 7.0054 LearningRate 0.0451 Epoch: 6 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:20,701-Speed 3413.57 samples/sec Loss 6.8486 LearningRate 0.0451 Epoch: 6 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:23,710-Speed 3404.76 samples/sec Loss 6.9449 LearningRate 0.0451 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:26,716-Speed 3407.15 samples/sec Loss 6.9350 LearningRate 0.0451 Epoch: 6 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:29,716-Speed 3414.49 samples/sec Loss 7.1929 LearningRate 0.0450 Epoch: 6 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:32,730-Speed 3398.26 samples/sec Loss 6.8320 LearningRate 0.0450 Epoch: 6 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:35,731-Speed 3413.14 samples/sec Loss 7.0418 LearningRate 0.0450 Epoch: 6 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:38,731-Speed 3414.42 samples/sec Loss 7.2367 LearningRate 0.0450 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:41,712-Speed 3435.21 samples/sec Loss 7.0376 LearningRate 0.0450 Epoch: 6 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:44,718-Speed 3408.12 samples/sec Loss 6.8260 LearningRate 0.0450 Epoch: 6 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:47,721-Speed 3410.33 samples/sec Loss 6.9688 LearningRate 0.0450 Epoch: 6 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:50,718-Speed 3418.09 samples/sec Loss 7.0408 LearningRate 0.0449 Epoch: 6 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:53,726-Speed 3405.80 samples/sec Loss 7.1424 LearningRate 0.0449 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:56,723-Speed 3417.72 samples/sec Loss 6.9759 LearningRate 0.0449 Epoch: 6 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:38:59,729-Speed 3406.87 samples/sec Loss 7.1042 LearningRate 0.0449 Epoch: 6 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:39:02,766-Speed 3372.64 samples/sec Loss 6.9330 LearningRate 0.0449 Epoch: 6 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:39:05,761-Speed 3420.81 samples/sec Loss 6.7784 LearningRate 0.0449 Epoch: 6 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:08,763-Speed 3412.20 samples/sec Loss 7.0634 LearningRate 0.0449 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:11,769-Speed 3407.10 samples/sec Loss 7.0007 LearningRate 0.0449 Epoch: 6 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:14,766-Speed 3417.34 samples/sec Loss 6.7962 LearningRate 0.0448 Epoch: 6 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:17,785-Speed 3392.28 samples/sec Loss 6.9160 LearningRate 0.0448 Epoch: 6 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:20,787-Speed 3412.20 samples/sec Loss 7.0928 LearningRate 0.0448 Epoch: 6 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:23,801-Speed 3399.16 samples/sec Loss 7.0606 LearningRate 0.0448 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:26,805-Speed 3408.63 samples/sec Loss 6.8696 LearningRate 0.0448 Epoch: 6 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:29,809-Speed 3409.67 samples/sec Loss 7.0196 LearningRate 0.0448 Epoch: 6 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:32,816-Speed 3406.35 samples/sec Loss 6.9083 LearningRate 0.0448 Epoch: 6 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:35,815-Speed 3415.42 samples/sec Loss 6.9869 LearningRate 0.0447 Epoch: 6 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:39:38,821-Speed 3407.66 samples/sec Loss 6.9363 LearningRate 0.0447 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:39:41,818-Speed 3416.81 samples/sec Loss 6.6461 LearningRate 0.0447 Epoch: 6 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:44,829-Speed 3402.59 samples/sec Loss 6.9412 LearningRate 0.0447 Epoch: 6 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:47,857-Speed 3382.71 samples/sec Loss 6.8683 LearningRate 0.0447 Epoch: 6 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:50,888-Speed 3379.84 samples/sec Loss 6.9104 LearningRate 0.0447 Epoch: 6 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:53,921-Speed 3375.78 samples/sec Loss 6.8435 LearningRate 0.0447 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:56,926-Speed 3409.01 samples/sec Loss 6.9069 LearningRate 0.0447 Epoch: 6 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:39:59,923-Speed 3417.37 samples/sec Loss 6.9309 LearningRate 0.0446 Epoch: 6 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:02,924-Speed 3413.32 samples/sec Loss 7.0541 LearningRate 0.0446 Epoch: 6 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:05,924-Speed 3414.56 samples/sec Loss 7.0744 LearningRate 0.0446 Epoch: 6 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:08,936-Speed 3400.43 samples/sec Loss 7.0243 LearningRate 0.0446 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:11,946-Speed 3402.92 samples/sec Loss 6.9988 LearningRate 0.0446 Epoch: 6 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:14,945-Speed 3415.64 samples/sec Loss 6.8876 LearningRate 0.0446 Epoch: 6 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:17,946-Speed 3412.65 samples/sec Loss 6.9989 LearningRate 0.0446 Epoch: 6 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:20,982-Speed 3374.43 samples/sec Loss 6.8537 LearningRate 0.0445 Epoch: 6 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:23,980-Speed 3416.18 samples/sec Loss 6.9997 LearningRate 0.0445 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:26,985-Speed 3408.62 samples/sec Loss 7.1859 LearningRate 0.0445 Epoch: 6 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:30,151-Speed 3235.81 samples/sec Loss 6.9085 LearningRate 0.0445 Epoch: 6 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:33,192-Speed 3368.10 samples/sec Loss 6.9794 LearningRate 0.0445 Epoch: 6 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:36,193-Speed 3413.53 samples/sec Loss 6.8710 LearningRate 0.0445 Epoch: 6 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:39,195-Speed 3411.21 samples/sec Loss 6.9430 LearningRate 0.0445 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:42,202-Speed 3406.48 samples/sec Loss 6.9791 LearningRate 0.0445 Epoch: 6 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:40:45,183-Speed 3436.95 samples/sec Loss 6.7964 LearningRate 0.0444 Epoch: 6 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:48,183-Speed 3413.56 samples/sec Loss 7.0215 LearningRate 0.0444 Epoch: 6 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:51,197-Speed 3398.27 samples/sec Loss 6.9212 LearningRate 0.0444 Epoch: 6 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:54,201-Speed 3410.77 samples/sec Loss 6.9546 LearningRate 0.0444 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:40:57,201-Speed 3413.24 samples/sec Loss 6.8991 LearningRate 0.0444 Epoch: 6 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:00,218-Speed 3395.53 samples/sec Loss 6.9572 LearningRate 0.0444 Epoch: 6 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:03,241-Speed 3388.86 samples/sec Loss 7.0678 LearningRate 0.0444 Epoch: 6 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:06,244-Speed 3410.18 samples/sec Loss 6.9590 LearningRate 0.0444 Epoch: 6 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:09,250-Speed 3407.23 samples/sec Loss 7.0040 LearningRate 0.0443 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:12,261-Speed 3402.64 samples/sec Loss 6.9821 LearningRate 0.0443 Epoch: 6 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:15,270-Speed 3403.40 samples/sec Loss 6.9963 LearningRate 0.0443 Epoch: 6 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:41:18,340-Speed 3336.69 samples/sec Loss 6.7767 LearningRate 0.0443 Epoch: 6 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:21,369-Speed 3381.65 samples/sec Loss 6.8697 LearningRate 0.0443 Epoch: 6 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:24,368-Speed 3415.14 samples/sec Loss 6.7946 LearningRate 0.0443 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:27,373-Speed 3408.59 samples/sec Loss 6.8765 LearningRate 0.0443 Epoch: 6 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:30,379-Speed 3407.88 samples/sec Loss 6.9247 LearningRate 0.0442 Epoch: 6 Global Step: 33870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:33,418-Speed 3370.25 samples/sec Loss 6.9150 LearningRate 0.0442 Epoch: 6 Global Step: 33880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:36,423-Speed 3409.03 samples/sec Loss 6.9565 LearningRate 0.0442 Epoch: 6 Global Step: 33890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:39,430-Speed 3405.29 samples/sec Loss 6.9089 LearningRate 0.0442 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:42,443-Speed 3400.18 samples/sec Loss 6.9171 LearningRate 0.0442 Epoch: 6 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:45,443-Speed 3414.46 samples/sec Loss 6.8557 LearningRate 0.0442 Epoch: 6 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:41:48,445-Speed 3412.01 samples/sec Loss 6.9884 LearningRate 0.0442 Epoch: 6 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:41:51,444-Speed 3415.28 samples/sec Loss 6.8366 LearningRate 0.0442 Epoch: 6 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:41:54,514-Speed 3336.71 samples/sec Loss 6.9998 LearningRate 0.0441 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:41:57,513-Speed 3415.25 samples/sec Loss 6.7315 LearningRate 0.0441 Epoch: 6 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:42:00,515-Speed 3412.37 samples/sec Loss 6.8960 LearningRate 0.0441 Epoch: 6 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:42:03,540-Speed 3386.17 samples/sec Loss 6.8718 LearningRate 0.0441 Epoch: 6 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:42:06,542-Speed 3412.02 samples/sec Loss 7.0745 LearningRate 0.0441 Epoch: 6 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:42:09,544-Speed 3412.14 samples/sec Loss 6.9817 LearningRate 0.0441 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:42:54,245-[lfw][34000]XNorm: 21.288355 Training: 2022-04-11 02:42:54,245-[lfw][34000]Accuracy-Flip: 0.99733+-0.00186 Training: 2022-04-11 02:42:54,246-[lfw][34000]Accuracy-Highest: 0.99733 Training: 2022-04-11 02:43:45,950-[cfp_fp][34000]XNorm: 19.032494 Training: 2022-04-11 02:43:45,950-[cfp_fp][34000]Accuracy-Flip: 0.96771+-0.00865 Training: 2022-04-11 02:43:45,951-[cfp_fp][34000]Accuracy-Highest: 0.97057 Training: 2022-04-11 02:44:30,339-[agedb_30][34000]XNorm: 21.578682 Training: 2022-04-11 02:44:30,340-[agedb_30][34000]Accuracy-Flip: 0.97650+-0.00740 Training: 2022-04-11 02:44:30,340-[agedb_30][34000]Accuracy-Highest: 0.97750 Training: 2022-04-11 02:44:33,359-Speed 71.20 samples/sec Loss 6.8706 LearningRate 0.0441 Epoch: 6 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:36,346-Speed 3428.63 samples/sec Loss 6.8656 LearningRate 0.0440 Epoch: 6 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:39,322-Speed 3442.17 samples/sec Loss 6.9866 LearningRate 0.0440 Epoch: 6 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:42,315-Speed 3421.80 samples/sec Loss 7.0167 LearningRate 0.0440 Epoch: 6 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:45,305-Speed 3425.56 samples/sec Loss 6.9783 LearningRate 0.0440 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:48,299-Speed 3422.09 samples/sec Loss 6.8672 LearningRate 0.0440 Epoch: 6 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:51,319-Speed 3390.49 samples/sec Loss 7.0306 LearningRate 0.0440 Epoch: 6 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:54,437-Speed 3285.51 samples/sec Loss 6.8644 LearningRate 0.0440 Epoch: 6 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:44:57,442-Speed 3408.73 samples/sec Loss 7.0510 LearningRate 0.0440 Epoch: 6 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:00,451-Speed 3403.77 samples/sec Loss 6.8768 LearningRate 0.0439 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:03,449-Speed 3416.83 samples/sec Loss 6.8815 LearningRate 0.0439 Epoch: 6 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:06,443-Speed 3420.99 samples/sec Loss 6.9854 LearningRate 0.0439 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:09,417-Speed 3444.28 samples/sec Loss 6.9022 LearningRate 0.0439 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:12,414-Speed 3418.14 samples/sec Loss 6.9303 LearningRate 0.0439 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:15,429-Speed 3396.38 samples/sec Loss 6.8531 LearningRate 0.0439 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:18,432-Speed 3411.10 samples/sec Loss 7.0344 LearningRate 0.0439 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:45:21,408-Speed 3442.31 samples/sec Loss 6.9760 LearningRate 0.0439 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:45:24,384-Speed 3441.15 samples/sec Loss 6.9399 LearningRate 0.0438 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:27,399-Speed 3397.37 samples/sec Loss 6.9242 LearningRate 0.0438 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:30,394-Speed 3419.96 samples/sec Loss 6.7526 LearningRate 0.0438 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:33,388-Speed 3421.74 samples/sec Loss 6.8754 LearningRate 0.0438 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:36,381-Speed 3422.17 samples/sec Loss 6.8086 LearningRate 0.0438 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:39,385-Speed 3409.16 samples/sec Loss 6.8958 LearningRate 0.0438 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:42,383-Speed 3417.22 samples/sec Loss 6.7048 LearningRate 0.0438 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:45,383-Speed 3413.76 samples/sec Loss 6.8463 LearningRate 0.0437 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:48,471-Speed 3317.12 samples/sec Loss 6.7791 LearningRate 0.0437 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:51,466-Speed 3420.58 samples/sec Loss 6.9527 LearningRate 0.0437 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:45:54,467-Speed 3412.93 samples/sec Loss 6.8384 LearningRate 0.0437 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:45:57,475-Speed 3405.62 samples/sec Loss 6.9797 LearningRate 0.0437 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:00,471-Speed 3418.97 samples/sec Loss 6.7500 LearningRate 0.0437 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:03,485-Speed 3398.36 samples/sec Loss 6.8950 LearningRate 0.0437 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:06,501-Speed 3396.76 samples/sec Loss 6.8918 LearningRate 0.0437 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:09,502-Speed 3412.16 samples/sec Loss 7.0267 LearningRate 0.0436 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:12,503-Speed 3413.94 samples/sec Loss 7.0018 LearningRate 0.0436 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:15,495-Speed 3423.02 samples/sec Loss 6.9221 LearningRate 0.0436 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:18,508-Speed 3400.43 samples/sec Loss 6.9588 LearningRate 0.0436 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:21,506-Speed 3416.13 samples/sec Loss 6.8129 LearningRate 0.0436 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:24,506-Speed 3414.19 samples/sec Loss 7.0657 LearningRate 0.0436 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:27,553-Speed 3362.41 samples/sec Loss 6.9462 LearningRate 0.0436 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:30,562-Speed 3403.70 samples/sec Loss 6.9997 LearningRate 0.0436 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:33,556-Speed 3420.58 samples/sec Loss 7.0588 LearningRate 0.0435 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:36,553-Speed 3418.49 samples/sec Loss 6.8210 LearningRate 0.0435 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:39,551-Speed 3416.01 samples/sec Loss 6.9472 LearningRate 0.0435 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:42,548-Speed 3417.80 samples/sec Loss 6.8983 LearningRate 0.0435 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:45,548-Speed 3414.15 samples/sec Loss 6.8208 LearningRate 0.0435 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:48,555-Speed 3407.39 samples/sec Loss 6.9031 LearningRate 0.0435 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:51,561-Speed 3407.37 samples/sec Loss 6.8352 LearningRate 0.0435 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:46:54,571-Speed 3402.04 samples/sec Loss 6.9446 LearningRate 0.0434 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:46:57,548-Speed 3441.03 samples/sec Loss 6.7550 LearningRate 0.0434 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:00,544-Speed 3419.56 samples/sec Loss 6.7764 LearningRate 0.0434 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:03,543-Speed 3414.97 samples/sec Loss 6.7780 LearningRate 0.0434 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:06,550-Speed 3405.69 samples/sec Loss 6.8201 LearningRate 0.0434 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:09,548-Speed 3416.98 samples/sec Loss 6.8963 LearningRate 0.0434 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:12,553-Speed 3408.56 samples/sec Loss 6.8698 LearningRate 0.0434 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:15,554-Speed 3412.78 samples/sec Loss 6.8873 LearningRate 0.0434 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:18,609-Speed 3352.65 samples/sec Loss 6.8364 LearningRate 0.0433 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:21,611-Speed 3412.36 samples/sec Loss 6.7851 LearningRate 0.0433 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:24,613-Speed 3412.05 samples/sec Loss 6.8270 LearningRate 0.0433 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:27,617-Speed 3410.59 samples/sec Loss 6.8285 LearningRate 0.0433 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:47:30,599-Speed 3433.97 samples/sec Loss 6.8552 LearningRate 0.0433 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:33,600-Speed 3413.78 samples/sec Loss 6.8319 LearningRate 0.0433 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:36,602-Speed 3412.65 samples/sec Loss 6.8491 LearningRate 0.0433 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:39,598-Speed 3418.05 samples/sec Loss 6.8673 LearningRate 0.0433 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:42,599-Speed 3412.76 samples/sec Loss 6.7977 LearningRate 0.0432 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:45,597-Speed 3416.89 samples/sec Loss 6.8729 LearningRate 0.0432 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:48,600-Speed 3411.24 samples/sec Loss 6.9714 LearningRate 0.0432 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:51,616-Speed 3395.62 samples/sec Loss 6.9780 LearningRate 0.0432 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:54,615-Speed 3415.60 samples/sec Loss 6.8577 LearningRate 0.0432 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:47:57,616-Speed 3412.69 samples/sec Loss 6.9390 LearningRate 0.0432 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:00,616-Speed 3413.91 samples/sec Loss 6.8856 LearningRate 0.0432 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:03,639-Speed 3388.83 samples/sec Loss 6.7580 LearningRate 0.0431 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:06,642-Speed 3410.83 samples/sec Loss 6.8326 LearningRate 0.0431 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:09,647-Speed 3408.96 samples/sec Loss 6.9340 LearningRate 0.0431 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:12,659-Speed 3400.41 samples/sec Loss 6.8034 LearningRate 0.0431 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:15,683-Speed 3387.43 samples/sec Loss 6.9499 LearningRate 0.0431 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:18,704-Speed 3390.63 samples/sec Loss 6.7653 LearningRate 0.0431 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:21,706-Speed 3412.31 samples/sec Loss 6.7568 LearningRate 0.0431 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:24,705-Speed 3415.16 samples/sec Loss 6.7396 LearningRate 0.0431 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:27,708-Speed 3410.38 samples/sec Loss 6.9691 LearningRate 0.0430 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:30,716-Speed 3405.58 samples/sec Loss 6.8111 LearningRate 0.0430 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:33,715-Speed 3415.32 samples/sec Loss 6.9167 LearningRate 0.0430 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:36,734-Speed 3392.35 samples/sec Loss 6.8547 LearningRate 0.0430 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:39,737-Speed 3410.84 samples/sec Loss 6.8562 LearningRate 0.0430 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:42,809-Speed 3334.37 samples/sec Loss 6.7185 LearningRate 0.0430 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:48:45,825-Speed 3395.86 samples/sec Loss 6.8812 LearningRate 0.0430 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:48:48,793-Speed 3450.81 samples/sec Loss 6.8887 LearningRate 0.0430 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:48:51,837-Speed 3365.76 samples/sec Loss 6.8114 LearningRate 0.0429 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:48:55,011-Speed 3226.22 samples/sec Loss 6.8815 LearningRate 0.0429 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:48:58,012-Speed 3413.30 samples/sec Loss 6.7817 LearningRate 0.0429 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:01,027-Speed 3397.39 samples/sec Loss 6.7677 LearningRate 0.0429 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:04,158-Speed 3270.83 samples/sec Loss 6.8419 LearningRate 0.0429 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:07,178-Speed 3392.37 samples/sec Loss 6.8547 LearningRate 0.0429 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:10,179-Speed 3412.71 samples/sec Loss 6.8524 LearningRate 0.0429 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:13,179-Speed 3414.68 samples/sec Loss 6.9294 LearningRate 0.0429 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:16,228-Speed 3359.08 samples/sec Loss 6.9536 LearningRate 0.0428 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:49:19,234-Speed 3406.81 samples/sec Loss 6.7245 LearningRate 0.0428 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:22,236-Speed 3412.29 samples/sec Loss 6.9334 LearningRate 0.0428 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:25,245-Speed 3404.18 samples/sec Loss 6.9125 LearningRate 0.0428 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:28,250-Speed 3408.91 samples/sec Loss 6.7984 LearningRate 0.0428 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:31,253-Speed 3410.53 samples/sec Loss 6.8732 LearningRate 0.0428 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:34,256-Speed 3410.09 samples/sec Loss 6.9140 LearningRate 0.0428 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:37,270-Speed 3399.68 samples/sec Loss 6.7049 LearningRate 0.0427 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:40,272-Speed 3411.50 samples/sec Loss 6.6675 LearningRate 0.0427 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:43,279-Speed 3406.51 samples/sec Loss 6.6535 LearningRate 0.0427 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:46,282-Speed 3410.16 samples/sec Loss 6.9461 LearningRate 0.0427 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:49:49,316-Speed 3375.84 samples/sec Loss 6.8942 LearningRate 0.0427 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:49:52,359-Speed 3366.27 samples/sec Loss 6.8236 LearningRate 0.0427 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:49:55,364-Speed 3408.28 samples/sec Loss 6.8984 LearningRate 0.0427 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:49:58,367-Speed 3410.64 samples/sec Loss 6.7898 LearningRate 0.0427 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:50:01,393-Speed 3385.18 samples/sec Loss 6.7952 LearningRate 0.0426 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:50:04,388-Speed 3420.45 samples/sec Loss 6.8491 LearningRate 0.0426 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:07,393-Speed 3408.05 samples/sec Loss 6.7908 LearningRate 0.0426 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:10,399-Speed 3407.87 samples/sec Loss 6.7881 LearningRate 0.0426 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:13,402-Speed 3410.99 samples/sec Loss 6.8028 LearningRate 0.0426 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:16,411-Speed 3403.19 samples/sec Loss 6.7990 LearningRate 0.0426 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:19,415-Speed 3409.98 samples/sec Loss 6.8962 LearningRate 0.0426 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:22,421-Speed 3406.51 samples/sec Loss 6.7527 LearningRate 0.0426 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:25,430-Speed 3403.91 samples/sec Loss 6.8950 LearningRate 0.0425 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:28,453-Speed 3389.27 samples/sec Loss 6.9354 LearningRate 0.0425 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:31,457-Speed 3409.60 samples/sec Loss 6.8088 LearningRate 0.0425 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:34,465-Speed 3404.20 samples/sec Loss 6.8298 LearningRate 0.0425 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:50:37,459-Speed 3421.58 samples/sec Loss 6.7806 LearningRate 0.0425 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:40,477-Speed 3394.16 samples/sec Loss 6.8443 LearningRate 0.0425 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:43,479-Speed 3411.96 samples/sec Loss 6.9284 LearningRate 0.0425 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:46,484-Speed 3407.68 samples/sec Loss 6.7312 LearningRate 0.0425 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:49,493-Speed 3404.83 samples/sec Loss 6.8391 LearningRate 0.0424 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:52,530-Speed 3372.82 samples/sec Loss 6.9360 LearningRate 0.0424 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:55,542-Speed 3399.89 samples/sec Loss 6.8916 LearningRate 0.0424 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:50:58,568-Speed 3385.79 samples/sec Loss 6.9492 LearningRate 0.0424 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:01,587-Speed 3392.41 samples/sec Loss 6.8832 LearningRate 0.0424 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:04,621-Speed 3375.84 samples/sec Loss 6.6609 LearningRate 0.0424 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:07,633-Speed 3400.36 samples/sec Loss 6.7736 LearningRate 0.0424 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:10,636-Speed 3411.46 samples/sec Loss 6.6943 LearningRate 0.0423 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:13,639-Speed 3410.88 samples/sec Loss 6.7729 LearningRate 0.0423 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:16,645-Speed 3407.58 samples/sec Loss 6.8667 LearningRate 0.0423 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:19,654-Speed 3403.33 samples/sec Loss 6.6782 LearningRate 0.0423 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:22,661-Speed 3406.30 samples/sec Loss 6.6471 LearningRate 0.0423 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:25,668-Speed 3406.64 samples/sec Loss 6.7487 LearningRate 0.0423 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:28,673-Speed 3408.84 samples/sec Loss 6.7338 LearningRate 0.0423 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:31,765-Speed 3313.08 samples/sec Loss 6.7722 LearningRate 0.0423 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:44,061-Speed 832.82 samples/sec Loss 6.5069 LearningRate 0.0422 Epoch: 7 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:51:47,067-Speed 3408.22 samples/sec Loss 6.0252 LearningRate 0.0422 Epoch: 7 Global Step: 35420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:50,109-Speed 3366.33 samples/sec Loss 5.9228 LearningRate 0.0422 Epoch: 7 Global Step: 35430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:53,203-Speed 3311.15 samples/sec Loss 6.1060 LearningRate 0.0422 Epoch: 7 Global Step: 35440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:56,287-Speed 3321.05 samples/sec Loss 6.0186 LearningRate 0.0422 Epoch: 7 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:51:59,308-Speed 3390.36 samples/sec Loss 6.0335 LearningRate 0.0422 Epoch: 7 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:02,372-Speed 3343.26 samples/sec Loss 5.9502 LearningRate 0.0422 Epoch: 7 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:05,444-Speed 3334.82 samples/sec Loss 6.0494 LearningRate 0.0422 Epoch: 7 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:08,462-Speed 3392.98 samples/sec Loss 5.9199 LearningRate 0.0421 Epoch: 7 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:11,476-Speed 3398.69 samples/sec Loss 6.0468 LearningRate 0.0421 Epoch: 7 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:14,493-Speed 3394.86 samples/sec Loss 6.0031 LearningRate 0.0421 Epoch: 7 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:17,518-Speed 3387.06 samples/sec Loss 6.1066 LearningRate 0.0421 Epoch: 7 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:20,519-Speed 3412.46 samples/sec Loss 6.1669 LearningRate 0.0421 Epoch: 7 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:23,542-Speed 3388.28 samples/sec Loss 6.0885 LearningRate 0.0421 Epoch: 7 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:26,562-Speed 3392.20 samples/sec Loss 6.2100 LearningRate 0.0421 Epoch: 7 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:29,579-Speed 3395.67 samples/sec Loss 6.1601 LearningRate 0.0421 Epoch: 7 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:32,611-Speed 3377.88 samples/sec Loss 6.1312 LearningRate 0.0420 Epoch: 7 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:35,653-Speed 3367.28 samples/sec Loss 5.9959 LearningRate 0.0420 Epoch: 7 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:38,811-Speed 3242.81 samples/sec Loss 6.1512 LearningRate 0.0420 Epoch: 7 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:52:41,830-Speed 3393.47 samples/sec Loss 6.1117 LearningRate 0.0420 Epoch: 7 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:44,837-Speed 3406.15 samples/sec Loss 6.1031 LearningRate 0.0420 Epoch: 7 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:47,868-Speed 3379.15 samples/sec Loss 6.0558 LearningRate 0.0420 Epoch: 7 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:50,881-Speed 3398.94 samples/sec Loss 6.0205 LearningRate 0.0420 Epoch: 7 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:53,895-Speed 3399.00 samples/sec Loss 6.1123 LearningRate 0.0419 Epoch: 7 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:56,946-Speed 3357.10 samples/sec Loss 6.1856 LearningRate 0.0419 Epoch: 7 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:52:59,955-Speed 3404.44 samples/sec Loss 6.2337 LearningRate 0.0419 Epoch: 7 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:02,969-Speed 3398.55 samples/sec Loss 6.1321 LearningRate 0.0419 Epoch: 7 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:05,985-Speed 3395.57 samples/sec Loss 6.2103 LearningRate 0.0419 Epoch: 7 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:08,990-Speed 3409.56 samples/sec Loss 6.4112 LearningRate 0.0419 Epoch: 7 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:12,024-Speed 3375.94 samples/sec Loss 6.2907 LearningRate 0.0419 Epoch: 7 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:53:15,081-Speed 3349.96 samples/sec Loss 6.2468 LearningRate 0.0419 Epoch: 7 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:53:18,083-Speed 3412.20 samples/sec Loss 6.1430 LearningRate 0.0418 Epoch: 7 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:53:21,119-Speed 3373.35 samples/sec Loss 6.2654 LearningRate 0.0418 Epoch: 7 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:53:24,136-Speed 3396.28 samples/sec Loss 6.2786 LearningRate 0.0418 Epoch: 7 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:53:27,170-Speed 3375.20 samples/sec Loss 6.2086 LearningRate 0.0418 Epoch: 7 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:30,209-Speed 3370.71 samples/sec Loss 6.1987 LearningRate 0.0418 Epoch: 7 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:33,231-Speed 3390.24 samples/sec Loss 6.2323 LearningRate 0.0418 Epoch: 7 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:36,237-Speed 3407.26 samples/sec Loss 6.3171 LearningRate 0.0418 Epoch: 7 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:39,246-Speed 3403.80 samples/sec Loss 6.2057 LearningRate 0.0418 Epoch: 7 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:42,247-Speed 3413.66 samples/sec Loss 6.3306 LearningRate 0.0417 Epoch: 7 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:45,260-Speed 3399.32 samples/sec Loss 6.2759 LearningRate 0.0417 Epoch: 7 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:48,263-Speed 3411.26 samples/sec Loss 6.1390 LearningRate 0.0417 Epoch: 7 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:53:51,281-Speed 3394.53 samples/sec Loss 6.3033 LearningRate 0.0417 Epoch: 7 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:53:54,293-Speed 3400.31 samples/sec Loss 6.3229 LearningRate 0.0417 Epoch: 7 Global Step: 35840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:53:57,302-Speed 3403.74 samples/sec Loss 6.3489 LearningRate 0.0417 Epoch: 7 Global Step: 35850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:00,310-Speed 3405.44 samples/sec Loss 6.3835 LearningRate 0.0417 Epoch: 7 Global Step: 35860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:03,315-Speed 3408.87 samples/sec Loss 6.3590 LearningRate 0.0417 Epoch: 7 Global Step: 35870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:06,339-Speed 3386.54 samples/sec Loss 6.3654 LearningRate 0.0416 Epoch: 7 Global Step: 35880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:09,357-Speed 3394.25 samples/sec Loss 6.2944 LearningRate 0.0416 Epoch: 7 Global Step: 35890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:12,367-Speed 3402.58 samples/sec Loss 6.3397 LearningRate 0.0416 Epoch: 7 Global Step: 35900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:15,377-Speed 3403.00 samples/sec Loss 6.2565 LearningRate 0.0416 Epoch: 7 Global Step: 35910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:18,397-Speed 3391.48 samples/sec Loss 6.3273 LearningRate 0.0416 Epoch: 7 Global Step: 35920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 02:54:21,419-Speed 3389.34 samples/sec Loss 6.3876 LearningRate 0.0416 Epoch: 7 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:24,425-Speed 3406.84 samples/sec Loss 6.3249 LearningRate 0.0416 Epoch: 7 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:27,458-Speed 3377.64 samples/sec Loss 6.3180 LearningRate 0.0416 Epoch: 7 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:30,595-Speed 3265.08 samples/sec Loss 6.2536 LearningRate 0.0415 Epoch: 7 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:33,620-Speed 3386.80 samples/sec Loss 6.3569 LearningRate 0.0415 Epoch: 7 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:36,630-Speed 3401.86 samples/sec Loss 6.3139 LearningRate 0.0415 Epoch: 7 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:39,639-Speed 3404.37 samples/sec Loss 6.2115 LearningRate 0.0415 Epoch: 7 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:54:42,650-Speed 3402.25 samples/sec Loss 6.3805 LearningRate 0.0415 Epoch: 7 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:55:27,985-[lfw][36000]XNorm: 21.069878 Training: 2022-04-11 02:55:27,986-[lfw][36000]Accuracy-Flip: 0.99767+-0.00238 Training: 2022-04-11 02:55:27,986-[lfw][36000]Accuracy-Highest: 0.99767 Training: 2022-04-11 02:56:19,929-[cfp_fp][36000]XNorm: 18.998192 Training: 2022-04-11 02:56:19,929-[cfp_fp][36000]Accuracy-Flip: 0.96900+-0.00783 Training: 2022-04-11 02:56:19,930-[cfp_fp][36000]Accuracy-Highest: 0.97057 Training: 2022-04-11 02:57:04,873-[agedb_30][36000]XNorm: 21.024379 Training: 2022-04-11 02:57:04,874-[agedb_30][36000]Accuracy-Flip: 0.97583+-0.00834 Training: 2022-04-11 02:57:04,875-[agedb_30][36000]Accuracy-Highest: 0.97750 Training: 2022-04-11 02:57:07,905-Speed 70.50 samples/sec Loss 6.3683 LearningRate 0.0415 Epoch: 7 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:10,906-Speed 3413.09 samples/sec Loss 6.2565 LearningRate 0.0415 Epoch: 7 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:13,898-Speed 3422.49 samples/sec Loss 6.3893 LearningRate 0.0415 Epoch: 7 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:57:16,905-Speed 3406.57 samples/sec Loss 6.3538 LearningRate 0.0414 Epoch: 7 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:57:19,891-Speed 3430.27 samples/sec Loss 6.3588 LearningRate 0.0414 Epoch: 7 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:22,901-Speed 3402.66 samples/sec Loss 6.3161 LearningRate 0.0414 Epoch: 7 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:26,030-Speed 3273.62 samples/sec Loss 6.4180 LearningRate 0.0414 Epoch: 7 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:29,128-Speed 3306.84 samples/sec Loss 6.3650 LearningRate 0.0414 Epoch: 7 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:32,125-Speed 3417.58 samples/sec Loss 6.4551 LearningRate 0.0414 Epoch: 7 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:35,126-Speed 3412.44 samples/sec Loss 6.2726 LearningRate 0.0414 Epoch: 7 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:38,185-Speed 3348.43 samples/sec Loss 6.5275 LearningRate 0.0414 Epoch: 7 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:41,218-Speed 3377.16 samples/sec Loss 6.4226 LearningRate 0.0413 Epoch: 7 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:44,214-Speed 3419.49 samples/sec Loss 6.5557 LearningRate 0.0413 Epoch: 7 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:47,287-Speed 3332.55 samples/sec Loss 6.4045 LearningRate 0.0413 Epoch: 7 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:50,288-Speed 3413.86 samples/sec Loss 6.3963 LearningRate 0.0413 Epoch: 7 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:57:53,288-Speed 3413.81 samples/sec Loss 6.3469 LearningRate 0.0413 Epoch: 7 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:57:56,283-Speed 3420.51 samples/sec Loss 6.1915 LearningRate 0.0413 Epoch: 7 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:57:59,286-Speed 3410.96 samples/sec Loss 6.6212 LearningRate 0.0413 Epoch: 7 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:02,294-Speed 3405.76 samples/sec Loss 6.4738 LearningRate 0.0412 Epoch: 7 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:05,306-Speed 3400.01 samples/sec Loss 6.3622 LearningRate 0.0412 Epoch: 7 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:08,304-Speed 3416.61 samples/sec Loss 6.4286 LearningRate 0.0412 Epoch: 7 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:11,303-Speed 3415.69 samples/sec Loss 6.3539 LearningRate 0.0412 Epoch: 7 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:14,303-Speed 3414.50 samples/sec Loss 6.4488 LearningRate 0.0412 Epoch: 7 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:17,303-Speed 3414.42 samples/sec Loss 6.4771 LearningRate 0.0412 Epoch: 7 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:20,306-Speed 3410.03 samples/sec Loss 6.5489 LearningRate 0.0412 Epoch: 7 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:23,321-Speed 3397.61 samples/sec Loss 6.5759 LearningRate 0.0412 Epoch: 7 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:26,586-Speed 3137.04 samples/sec Loss 6.4558 LearningRate 0.0411 Epoch: 7 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:58:29,607-Speed 3390.27 samples/sec Loss 6.4535 LearningRate 0.0411 Epoch: 7 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:58:32,611-Speed 3410.19 samples/sec Loss 6.5383 LearningRate 0.0411 Epoch: 7 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:58:35,650-Speed 3370.74 samples/sec Loss 6.4197 LearningRate 0.0411 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:58:38,639-Speed 3426.86 samples/sec Loss 6.3989 LearningRate 0.0411 Epoch: 7 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:41,643-Speed 3409.26 samples/sec Loss 6.3722 LearningRate 0.0411 Epoch: 7 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:44,695-Speed 3356.05 samples/sec Loss 6.5769 LearningRate 0.0411 Epoch: 7 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:47,697-Speed 3412.52 samples/sec Loss 6.3926 LearningRate 0.0411 Epoch: 7 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:50,783-Speed 3318.49 samples/sec Loss 6.2735 LearningRate 0.0410 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:53,781-Speed 3417.16 samples/sec Loss 6.5964 LearningRate 0.0410 Epoch: 7 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:56,792-Speed 3402.18 samples/sec Loss 6.4750 LearningRate 0.0410 Epoch: 7 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:58:59,789-Speed 3418.00 samples/sec Loss 6.4998 LearningRate 0.0410 Epoch: 7 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:02,793-Speed 3409.58 samples/sec Loss 6.4848 LearningRate 0.0410 Epoch: 7 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:05,815-Speed 3389.24 samples/sec Loss 6.3565 LearningRate 0.0410 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:08,814-Speed 3415.33 samples/sec Loss 6.4543 LearningRate 0.0410 Epoch: 7 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:11,811-Speed 3417.49 samples/sec Loss 6.4164 LearningRate 0.0410 Epoch: 7 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:14,807-Speed 3418.72 samples/sec Loss 6.3286 LearningRate 0.0409 Epoch: 7 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:17,818-Speed 3402.01 samples/sec Loss 6.4738 LearningRate 0.0409 Epoch: 7 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:20,826-Speed 3404.86 samples/sec Loss 6.4288 LearningRate 0.0409 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:23,836-Speed 3403.38 samples/sec Loss 6.2867 LearningRate 0.0409 Epoch: 7 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:26,843-Speed 3405.90 samples/sec Loss 6.5425 LearningRate 0.0409 Epoch: 7 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:29,852-Speed 3404.48 samples/sec Loss 6.4027 LearningRate 0.0409 Epoch: 7 Global Step: 36480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:32,853-Speed 3413.09 samples/sec Loss 6.2872 LearningRate 0.0409 Epoch: 7 Global Step: 36490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:35,855-Speed 3412.10 samples/sec Loss 6.4034 LearningRate 0.0409 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:38,853-Speed 3415.97 samples/sec Loss 6.4762 LearningRate 0.0408 Epoch: 7 Global Step: 36510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:41,854-Speed 3413.69 samples/sec Loss 6.5259 LearningRate 0.0408 Epoch: 7 Global Step: 36520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:44,861-Speed 3406.27 samples/sec Loss 6.4211 LearningRate 0.0408 Epoch: 7 Global Step: 36530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 02:59:47,865-Speed 3409.81 samples/sec Loss 6.5307 LearningRate 0.0408 Epoch: 7 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:50,864-Speed 3414.83 samples/sec Loss 6.5906 LearningRate 0.0408 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:53,875-Speed 3401.64 samples/sec Loss 6.4399 LearningRate 0.0408 Epoch: 7 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:56,879-Speed 3410.69 samples/sec Loss 6.5541 LearningRate 0.0408 Epoch: 7 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 02:59:59,879-Speed 3414.02 samples/sec Loss 6.5201 LearningRate 0.0408 Epoch: 7 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:00:02,891-Speed 3400.85 samples/sec Loss 6.5007 LearningRate 0.0407 Epoch: 7 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:00:05,902-Speed 3401.59 samples/sec Loss 6.4682 LearningRate 0.0407 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:08,883-Speed 3435.97 samples/sec Loss 6.5755 LearningRate 0.0407 Epoch: 7 Global Step: 36610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:11,884-Speed 3413.10 samples/sec Loss 6.4778 LearningRate 0.0407 Epoch: 7 Global Step: 36620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:14,888-Speed 3408.80 samples/sec Loss 6.4565 LearningRate 0.0407 Epoch: 7 Global Step: 36630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:17,900-Speed 3400.85 samples/sec Loss 6.5088 LearningRate 0.0407 Epoch: 7 Global Step: 36640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:20,907-Speed 3406.85 samples/sec Loss 6.4400 LearningRate 0.0407 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:23,933-Speed 3384.75 samples/sec Loss 6.3396 LearningRate 0.0407 Epoch: 7 Global Step: 36660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:26,938-Speed 3408.77 samples/sec Loss 6.4586 LearningRate 0.0406 Epoch: 7 Global Step: 36670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:30,051-Speed 3290.23 samples/sec Loss 6.5807 LearningRate 0.0406 Epoch: 7 Global Step: 36680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:33,081-Speed 3380.69 samples/sec Loss 6.5113 LearningRate 0.0406 Epoch: 7 Global Step: 36690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:36,078-Speed 3417.20 samples/sec Loss 6.5223 LearningRate 0.0406 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:00:39,082-Speed 3409.10 samples/sec Loss 6.3503 LearningRate 0.0406 Epoch: 7 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:42,081-Speed 3416.34 samples/sec Loss 6.4242 LearningRate 0.0406 Epoch: 7 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:45,094-Speed 3399.02 samples/sec Loss 6.4939 LearningRate 0.0406 Epoch: 7 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:48,092-Speed 3417.49 samples/sec Loss 6.5036 LearningRate 0.0406 Epoch: 7 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:51,101-Speed 3403.18 samples/sec Loss 6.5186 LearningRate 0.0405 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:54,111-Speed 3402.95 samples/sec Loss 6.5226 LearningRate 0.0405 Epoch: 7 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:00:57,115-Speed 3409.29 samples/sec Loss 6.3608 LearningRate 0.0405 Epoch: 7 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:00,123-Speed 3404.90 samples/sec Loss 6.4623 LearningRate 0.0405 Epoch: 7 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:03,166-Speed 3366.88 samples/sec Loss 6.4239 LearningRate 0.0405 Epoch: 7 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:06,164-Speed 3415.99 samples/sec Loss 6.4659 LearningRate 0.0405 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:09,162-Speed 3416.30 samples/sec Loss 6.4925 LearningRate 0.0405 Epoch: 7 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:12,173-Speed 3402.65 samples/sec Loss 6.5571 LearningRate 0.0405 Epoch: 7 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:15,173-Speed 3413.91 samples/sec Loss 6.4387 LearningRate 0.0404 Epoch: 7 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:18,177-Speed 3410.18 samples/sec Loss 6.4451 LearningRate 0.0404 Epoch: 7 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:21,162-Speed 3430.61 samples/sec Loss 6.3836 LearningRate 0.0404 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:24,169-Speed 3406.04 samples/sec Loss 6.4954 LearningRate 0.0404 Epoch: 7 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:27,169-Speed 3414.61 samples/sec Loss 6.5284 LearningRate 0.0404 Epoch: 7 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:30,180-Speed 3402.20 samples/sec Loss 6.5288 LearningRate 0.0404 Epoch: 7 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:33,183-Speed 3411.69 samples/sec Loss 6.4443 LearningRate 0.0404 Epoch: 7 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:36,184-Speed 3412.16 samples/sec Loss 6.4529 LearningRate 0.0404 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:39,247-Speed 3344.77 samples/sec Loss 6.5640 LearningRate 0.0403 Epoch: 7 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:42,270-Speed 3387.69 samples/sec Loss 6.4869 LearningRate 0.0403 Epoch: 7 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:45,271-Speed 3412.63 samples/sec Loss 6.5658 LearningRate 0.0403 Epoch: 7 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:48,278-Speed 3407.34 samples/sec Loss 6.6148 LearningRate 0.0403 Epoch: 7 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:01:51,293-Speed 3397.09 samples/sec Loss 6.6142 LearningRate 0.0403 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:54,293-Speed 3414.16 samples/sec Loss 6.5038 LearningRate 0.0403 Epoch: 7 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:01:57,293-Speed 3413.92 samples/sec Loss 6.5614 LearningRate 0.0403 Epoch: 7 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:00,298-Speed 3409.00 samples/sec Loss 6.4687 LearningRate 0.0403 Epoch: 7 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:03,300-Speed 3412.65 samples/sec Loss 6.3481 LearningRate 0.0402 Epoch: 7 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:06,331-Speed 3378.95 samples/sec Loss 6.4482 LearningRate 0.0402 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:09,338-Speed 3406.31 samples/sec Loss 6.4743 LearningRate 0.0402 Epoch: 7 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:12,340-Speed 3412.42 samples/sec Loss 6.6451 LearningRate 0.0402 Epoch: 7 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:15,364-Speed 3387.32 samples/sec Loss 6.5407 LearningRate 0.0402 Epoch: 7 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:18,376-Speed 3400.58 samples/sec Loss 6.4192 LearningRate 0.0402 Epoch: 7 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:21,371-Speed 3420.41 samples/sec Loss 6.5411 LearningRate 0.0402 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:24,381-Speed 3402.82 samples/sec Loss 6.5641 LearningRate 0.0402 Epoch: 7 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:27,385-Speed 3409.58 samples/sec Loss 6.5832 LearningRate 0.0401 Epoch: 7 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:30,403-Speed 3394.04 samples/sec Loss 6.5720 LearningRate 0.0401 Epoch: 7 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:33,405-Speed 3412.17 samples/sec Loss 6.4838 LearningRate 0.0401 Epoch: 7 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:36,408-Speed 3410.58 samples/sec Loss 6.4450 LearningRate 0.0401 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:39,409-Speed 3413.61 samples/sec Loss 6.6765 LearningRate 0.0401 Epoch: 7 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:42,416-Speed 3406.09 samples/sec Loss 6.5802 LearningRate 0.0401 Epoch: 7 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:45,421-Speed 3409.17 samples/sec Loss 6.3836 LearningRate 0.0401 Epoch: 7 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:48,428-Speed 3406.37 samples/sec Loss 6.5182 LearningRate 0.0401 Epoch: 7 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:51,416-Speed 3428.47 samples/sec Loss 6.5014 LearningRate 0.0400 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:54,420-Speed 3409.20 samples/sec Loss 6.5014 LearningRate 0.0400 Epoch: 7 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:02:57,421-Speed 3412.41 samples/sec Loss 6.4974 LearningRate 0.0400 Epoch: 7 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:00,424-Speed 3411.75 samples/sec Loss 6.4942 LearningRate 0.0400 Epoch: 7 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:03,435-Speed 3402.08 samples/sec Loss 6.4140 LearningRate 0.0400 Epoch: 7 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:06,451-Speed 3395.60 samples/sec Loss 6.5402 LearningRate 0.0400 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:09,449-Speed 3415.85 samples/sec Loss 6.5556 LearningRate 0.0400 Epoch: 7 Global Step: 37210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:12,515-Speed 3340.86 samples/sec Loss 6.4753 LearningRate 0.0400 Epoch: 7 Global Step: 37220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:15,518-Speed 3411.53 samples/sec Loss 6.5770 LearningRate 0.0399 Epoch: 7 Global Step: 37230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:18,642-Speed 3277.77 samples/sec Loss 6.5370 LearningRate 0.0399 Epoch: 7 Global Step: 37240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:21,642-Speed 3414.76 samples/sec Loss 6.3518 LearningRate 0.0399 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:24,645-Speed 3411.21 samples/sec Loss 6.3537 LearningRate 0.0399 Epoch: 7 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:27,649-Speed 3409.51 samples/sec Loss 6.4583 LearningRate 0.0399 Epoch: 7 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:30,649-Speed 3414.17 samples/sec Loss 6.4006 LearningRate 0.0399 Epoch: 7 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:03:33,651-Speed 3412.19 samples/sec Loss 6.6359 LearningRate 0.0399 Epoch: 7 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:36,658-Speed 3407.20 samples/sec Loss 6.4033 LearningRate 0.0399 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:39,663-Speed 3408.62 samples/sec Loss 6.3766 LearningRate 0.0398 Epoch: 7 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:42,673-Speed 3402.41 samples/sec Loss 6.4946 LearningRate 0.0398 Epoch: 7 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:45,687-Speed 3398.72 samples/sec Loss 6.4498 LearningRate 0.0398 Epoch: 7 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:48,689-Speed 3411.71 samples/sec Loss 6.5468 LearningRate 0.0398 Epoch: 7 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:51,695-Speed 3407.77 samples/sec Loss 6.5356 LearningRate 0.0398 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:54,706-Speed 3401.72 samples/sec Loss 6.5236 LearningRate 0.0398 Epoch: 7 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:03:57,708-Speed 3411.67 samples/sec Loss 6.3313 LearningRate 0.0398 Epoch: 7 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:00,724-Speed 3396.82 samples/sec Loss 6.3961 LearningRate 0.0398 Epoch: 7 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:03,694-Speed 3448.08 samples/sec Loss 6.5427 LearningRate 0.0397 Epoch: 7 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:06,697-Speed 3410.86 samples/sec Loss 6.5644 LearningRate 0.0397 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:09,697-Speed 3413.98 samples/sec Loss 6.5133 LearningRate 0.0397 Epoch: 7 Global Step: 37410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:12,697-Speed 3414.59 samples/sec Loss 6.3688 LearningRate 0.0397 Epoch: 7 Global Step: 37420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:15,703-Speed 3407.32 samples/sec Loss 6.5047 LearningRate 0.0397 Epoch: 7 Global Step: 37430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:18,706-Speed 3411.12 samples/sec Loss 6.5654 LearningRate 0.0397 Epoch: 7 Global Step: 37440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:21,706-Speed 3414.40 samples/sec Loss 6.6054 LearningRate 0.0397 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:24,710-Speed 3409.63 samples/sec Loss 6.4385 LearningRate 0.0397 Epoch: 7 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:27,712-Speed 3411.97 samples/sec Loss 6.5077 LearningRate 0.0396 Epoch: 7 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:30,767-Speed 3353.07 samples/sec Loss 6.6186 LearningRate 0.0396 Epoch: 7 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 03:04:33,767-Speed 3414.29 samples/sec Loss 6.4183 LearningRate 0.0396 Epoch: 7 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:36,773-Speed 3407.95 samples/sec Loss 6.5796 LearningRate 0.0396 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:39,779-Speed 3406.77 samples/sec Loss 6.3822 LearningRate 0.0396 Epoch: 7 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:42,785-Speed 3408.88 samples/sec Loss 6.3991 LearningRate 0.0396 Epoch: 7 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:45,791-Speed 3406.76 samples/sec Loss 6.5092 LearningRate 0.0396 Epoch: 7 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 03:04:48,796-Speed 3408.21 samples/sec Loss 6.5468 LearningRate 0.0396 Epoch: 7 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:04:51,813-Speed 3395.67 samples/sec Loss 6.5230 LearningRate 0.0395 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:04:54,820-Speed 3405.87 samples/sec Loss 6.5657 LearningRate 0.0395 Epoch: 7 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:04:57,828-Speed 3405.70 samples/sec Loss 6.5707 LearningRate 0.0395 Epoch: 7 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:00,839-Speed 3401.74 samples/sec Loss 6.5871 LearningRate 0.0395 Epoch: 7 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:03,851-Speed 3399.64 samples/sec Loss 6.5502 LearningRate 0.0395 Epoch: 7 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:06,880-Speed 3383.10 samples/sec Loss 6.4579 LearningRate 0.0395 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:09,882-Speed 3411.76 samples/sec Loss 6.4837 LearningRate 0.0395 Epoch: 7 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:12,889-Speed 3406.78 samples/sec Loss 6.5259 LearningRate 0.0395 Epoch: 7 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:15,902-Speed 3399.02 samples/sec Loss 6.5243 LearningRate 0.0394 Epoch: 7 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:18,943-Speed 3368.98 samples/sec Loss 6.4698 LearningRate 0.0394 Epoch: 7 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:21,945-Speed 3411.33 samples/sec Loss 6.5778 LearningRate 0.0394 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:24,954-Speed 3404.27 samples/sec Loss 6.5368 LearningRate 0.0394 Epoch: 7 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:28,012-Speed 3349.78 samples/sec Loss 6.5242 LearningRate 0.0394 Epoch: 7 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:31,019-Speed 3406.52 samples/sec Loss 6.5080 LearningRate 0.0394 Epoch: 7 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:34,022-Speed 3410.27 samples/sec Loss 6.3997 LearningRate 0.0394 Epoch: 7 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:37,027-Speed 3409.24 samples/sec Loss 6.5537 LearningRate 0.0394 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:05:40,044-Speed 3395.09 samples/sec Loss 6.4523 LearningRate 0.0393 Epoch: 7 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:43,052-Speed 3405.73 samples/sec Loss 6.3957 LearningRate 0.0393 Epoch: 7 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:46,059-Speed 3405.59 samples/sec Loss 6.4617 LearningRate 0.0393 Epoch: 7 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:49,081-Speed 3389.01 samples/sec Loss 6.4103 LearningRate 0.0393 Epoch: 7 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:52,120-Speed 3370.73 samples/sec Loss 6.5583 LearningRate 0.0393 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:55,138-Speed 3394.05 samples/sec Loss 6.4349 LearningRate 0.0393 Epoch: 7 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:05:58,147-Speed 3404.68 samples/sec Loss 6.3687 LearningRate 0.0393 Epoch: 7 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:06:01,138-Speed 3423.95 samples/sec Loss 6.4390 LearningRate 0.0393 Epoch: 7 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:04,161-Speed 3387.91 samples/sec Loss 6.6255 LearningRate 0.0392 Epoch: 7 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:07,171-Speed 3403.78 samples/sec Loss 6.4330 LearningRate 0.0392 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:10,180-Speed 3403.71 samples/sec Loss 6.5508 LearningRate 0.0392 Epoch: 7 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:13,188-Speed 3404.80 samples/sec Loss 6.5339 LearningRate 0.0392 Epoch: 7 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:16,213-Speed 3386.73 samples/sec Loss 6.4561 LearningRate 0.0392 Epoch: 7 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:19,246-Speed 3377.03 samples/sec Loss 6.4801 LearningRate 0.0392 Epoch: 7 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:22,250-Speed 3409.78 samples/sec Loss 6.4362 LearningRate 0.0392 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:25,263-Speed 3399.38 samples/sec Loss 6.4446 LearningRate 0.0392 Epoch: 7 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:28,274-Speed 3401.04 samples/sec Loss 6.5212 LearningRate 0.0391 Epoch: 7 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:31,285-Speed 3401.91 samples/sec Loss 6.4834 LearningRate 0.0391 Epoch: 7 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:06:34,281-Speed 3419.06 samples/sec Loss 6.5670 LearningRate 0.0391 Epoch: 7 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:37,286-Speed 3408.91 samples/sec Loss 6.2445 LearningRate 0.0391 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:40,291-Speed 3408.23 samples/sec Loss 6.5860 LearningRate 0.0391 Epoch: 7 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:43,300-Speed 3404.16 samples/sec Loss 6.5799 LearningRate 0.0391 Epoch: 7 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:46,303-Speed 3410.86 samples/sec Loss 6.6128 LearningRate 0.0391 Epoch: 7 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:49,317-Speed 3397.85 samples/sec Loss 6.5262 LearningRate 0.0391 Epoch: 7 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:52,345-Speed 3382.65 samples/sec Loss 6.3664 LearningRate 0.0390 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:55,380-Speed 3374.27 samples/sec Loss 6.4730 LearningRate 0.0390 Epoch: 7 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:06:58,383-Speed 3411.90 samples/sec Loss 6.4578 LearningRate 0.0390 Epoch: 7 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:07:01,390-Speed 3406.41 samples/sec Loss 6.5000 LearningRate 0.0390 Epoch: 7 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:07:04,419-Speed 3381.18 samples/sec Loss 6.4209 LearningRate 0.0390 Epoch: 7 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:07:07,392-Speed 3444.89 samples/sec Loss 6.3110 LearningRate 0.0390 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:07:51,398-[lfw][38000]XNorm: 25.049999 Training: 2022-04-11 03:07:51,399-[lfw][38000]Accuracy-Flip: 0.99800+-0.00233 Training: 2022-04-11 03:07:51,399-[lfw][38000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:08:43,032-[cfp_fp][38000]XNorm: 22.735850 Training: 2022-04-11 03:08:43,033-[cfp_fp][38000]Accuracy-Flip: 0.96857+-0.01141 Training: 2022-04-11 03:08:43,033-[cfp_fp][38000]Accuracy-Highest: 0.97057 Training: 2022-04-11 03:09:27,717-[agedb_30][38000]XNorm: 25.016551 Training: 2022-04-11 03:09:27,718-[agedb_30][38000]Accuracy-Flip: 0.97700+-0.00710 Training: 2022-04-11 03:09:27,718-[agedb_30][38000]Accuracy-Highest: 0.97750 Training: 2022-04-11 03:09:30,728-Speed 71.44 samples/sec Loss 6.5550 LearningRate 0.0390 Epoch: 7 Global Step: 38010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:09:33,718-Speed 3425.54 samples/sec Loss 6.4811 LearningRate 0.0390 Epoch: 7 Global Step: 38020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 03:09:36,712-Speed 3420.78 samples/sec Loss 6.5204 LearningRate 0.0389 Epoch: 7 Global Step: 38030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:39,724-Speed 3401.88 samples/sec Loss 6.5348 LearningRate 0.0389 Epoch: 7 Global Step: 38040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:44,248-Speed 2264.12 samples/sec Loss 6.6622 LearningRate 0.0389 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:47,243-Speed 3419.30 samples/sec Loss 6.3676 LearningRate 0.0389 Epoch: 7 Global Step: 38060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:50,252-Speed 3404.34 samples/sec Loss 6.5471 LearningRate 0.0389 Epoch: 7 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:53,256-Speed 3409.90 samples/sec Loss 6.2943 LearningRate 0.0389 Epoch: 7 Global Step: 38080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:56,254-Speed 3416.99 samples/sec Loss 6.3620 LearningRate 0.0389 Epoch: 7 Global Step: 38090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:09:59,263-Speed 3403.48 samples/sec Loss 6.5326 LearningRate 0.0389 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:02,279-Speed 3396.47 samples/sec Loss 6.5535 LearningRate 0.0388 Epoch: 7 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:05,321-Speed 3367.96 samples/sec Loss 6.4976 LearningRate 0.0388 Epoch: 7 Global Step: 38120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:08,318-Speed 3417.25 samples/sec Loss 6.4575 LearningRate 0.0388 Epoch: 7 Global Step: 38130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:11,320-Speed 3411.67 samples/sec Loss 6.5237 LearningRate 0.0388 Epoch: 7 Global Step: 38140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:14,332-Speed 3400.99 samples/sec Loss 6.7567 LearningRate 0.0388 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:17,336-Speed 3409.80 samples/sec Loss 6.5053 LearningRate 0.0388 Epoch: 7 Global Step: 38160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:20,337-Speed 3412.90 samples/sec Loss 6.3025 LearningRate 0.0388 Epoch: 7 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:23,340-Speed 3410.47 samples/sec Loss 6.4280 LearningRate 0.0388 Epoch: 7 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:26,365-Speed 3386.25 samples/sec Loss 6.4061 LearningRate 0.0387 Epoch: 7 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:29,374-Speed 3404.29 samples/sec Loss 6.5185 LearningRate 0.0387 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:10:32,410-Speed 3373.16 samples/sec Loss 6.4507 LearningRate 0.0387 Epoch: 7 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:10:35,415-Speed 3408.70 samples/sec Loss 6.3921 LearningRate 0.0387 Epoch: 7 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:10:38,398-Speed 3434.20 samples/sec Loss 6.3872 LearningRate 0.0387 Epoch: 7 Global Step: 38230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:41,413-Speed 3396.39 samples/sec Loss 6.2970 LearningRate 0.0387 Epoch: 7 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:44,425-Speed 3402.07 samples/sec Loss 6.5112 LearningRate 0.0387 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:47,435-Speed 3401.85 samples/sec Loss 6.4610 LearningRate 0.0387 Epoch: 7 Global Step: 38260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:50,441-Speed 3408.15 samples/sec Loss 6.5011 LearningRate 0.0386 Epoch: 7 Global Step: 38270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:53,474-Speed 3377.63 samples/sec Loss 6.4795 LearningRate 0.0386 Epoch: 7 Global Step: 38280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:56,474-Speed 3414.20 samples/sec Loss 6.4318 LearningRate 0.0386 Epoch: 7 Global Step: 38290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:10:59,472-Speed 3416.28 samples/sec Loss 6.4084 LearningRate 0.0386 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:11:02,563-Speed 3313.71 samples/sec Loss 6.4499 LearningRate 0.0386 Epoch: 7 Global Step: 38310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:11:05,662-Speed 3305.40 samples/sec Loss 6.5292 LearningRate 0.0386 Epoch: 7 Global Step: 38320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:11:08,684-Speed 3389.85 samples/sec Loss 6.6118 LearningRate 0.0386 Epoch: 7 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:11,684-Speed 3414.04 samples/sec Loss 6.4733 LearningRate 0.0386 Epoch: 7 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:14,683-Speed 3414.76 samples/sec Loss 6.5770 LearningRate 0.0386 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:17,688-Speed 3408.59 samples/sec Loss 6.3968 LearningRate 0.0385 Epoch: 7 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:20,688-Speed 3414.85 samples/sec Loss 6.6122 LearningRate 0.0385 Epoch: 7 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:23,717-Speed 3381.59 samples/sec Loss 6.5323 LearningRate 0.0385 Epoch: 7 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:26,718-Speed 3412.36 samples/sec Loss 6.3626 LearningRate 0.0385 Epoch: 7 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:29,729-Speed 3402.21 samples/sec Loss 6.4318 LearningRate 0.0385 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:32,727-Speed 3416.07 samples/sec Loss 6.3649 LearningRate 0.0385 Epoch: 7 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:35,724-Speed 3418.44 samples/sec Loss 6.4162 LearningRate 0.0385 Epoch: 7 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:11:38,738-Speed 3397.82 samples/sec Loss 6.4196 LearningRate 0.0385 Epoch: 7 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:41,740-Speed 3411.83 samples/sec Loss 6.5673 LearningRate 0.0384 Epoch: 7 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:44,754-Speed 3398.79 samples/sec Loss 6.3651 LearningRate 0.0384 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:47,757-Speed 3410.51 samples/sec Loss 6.2653 LearningRate 0.0384 Epoch: 7 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:50,765-Speed 3405.23 samples/sec Loss 6.4185 LearningRate 0.0384 Epoch: 7 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:53,781-Speed 3396.58 samples/sec Loss 6.5409 LearningRate 0.0384 Epoch: 7 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:56,783-Speed 3411.15 samples/sec Loss 6.4308 LearningRate 0.0384 Epoch: 7 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:11:59,785-Speed 3411.76 samples/sec Loss 6.4411 LearningRate 0.0384 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:02,795-Speed 3402.75 samples/sec Loss 6.5246 LearningRate 0.0384 Epoch: 7 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:05,803-Speed 3405.86 samples/sec Loss 6.4865 LearningRate 0.0383 Epoch: 7 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:08,792-Speed 3426.06 samples/sec Loss 6.4261 LearningRate 0.0383 Epoch: 7 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:11,794-Speed 3411.88 samples/sec Loss 6.3740 LearningRate 0.0383 Epoch: 7 Global Step: 38540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:14,791-Speed 3418.62 samples/sec Loss 6.5059 LearningRate 0.0383 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:17,797-Speed 3406.47 samples/sec Loss 6.3639 LearningRate 0.0383 Epoch: 7 Global Step: 38560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:20,801-Speed 3409.99 samples/sec Loss 6.4756 LearningRate 0.0383 Epoch: 7 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:23,803-Speed 3411.80 samples/sec Loss 6.4023 LearningRate 0.0383 Epoch: 7 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:26,803-Speed 3414.37 samples/sec Loss 6.4705 LearningRate 0.0383 Epoch: 7 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:12:29,785-Speed 3434.78 samples/sec Loss 6.4438 LearningRate 0.0382 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:32,784-Speed 3415.08 samples/sec Loss 6.4446 LearningRate 0.0382 Epoch: 7 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:35,781-Speed 3418.33 samples/sec Loss 6.5748 LearningRate 0.0382 Epoch: 7 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:38,780-Speed 3415.33 samples/sec Loss 6.5557 LearningRate 0.0382 Epoch: 7 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:41,791-Speed 3401.45 samples/sec Loss 6.6042 LearningRate 0.0382 Epoch: 7 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:44,795-Speed 3409.86 samples/sec Loss 6.4653 LearningRate 0.0382 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:47,797-Speed 3412.39 samples/sec Loss 6.3979 LearningRate 0.0382 Epoch: 7 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:50,797-Speed 3414.24 samples/sec Loss 6.3773 LearningRate 0.0382 Epoch: 7 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:53,800-Speed 3410.00 samples/sec Loss 6.5854 LearningRate 0.0381 Epoch: 7 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:56,804-Speed 3410.07 samples/sec Loss 6.3091 LearningRate 0.0381 Epoch: 7 Global Step: 38690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:12:59,824-Speed 3391.05 samples/sec Loss 6.3535 LearningRate 0.0381 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:13:02,817-Speed 3422.68 samples/sec Loss 6.4601 LearningRate 0.0381 Epoch: 7 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:05,825-Speed 3404.51 samples/sec Loss 6.5470 LearningRate 0.0381 Epoch: 7 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:08,811-Speed 3430.55 samples/sec Loss 6.6095 LearningRate 0.0381 Epoch: 7 Global Step: 38730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:11,831-Speed 3391.95 samples/sec Loss 6.5379 LearningRate 0.0381 Epoch: 7 Global Step: 38740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:14,829-Speed 3416.42 samples/sec Loss 6.4614 LearningRate 0.0381 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:17,844-Speed 3396.46 samples/sec Loss 6.4757 LearningRate 0.0380 Epoch: 7 Global Step: 38760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:20,853-Speed 3404.18 samples/sec Loss 6.3834 LearningRate 0.0380 Epoch: 7 Global Step: 38770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:23,873-Speed 3391.13 samples/sec Loss 6.4174 LearningRate 0.0380 Epoch: 7 Global Step: 38780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:26,899-Speed 3385.43 samples/sec Loss 6.3748 LearningRate 0.0380 Epoch: 7 Global Step: 38790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:29,898-Speed 3415.35 samples/sec Loss 6.4477 LearningRate 0.0380 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:32,904-Speed 3407.59 samples/sec Loss 6.4690 LearningRate 0.0380 Epoch: 7 Global Step: 38810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:35,922-Speed 3394.64 samples/sec Loss 6.2371 LearningRate 0.0380 Epoch: 7 Global Step: 38820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:13:38,997-Speed 3331.20 samples/sec Loss 6.6379 LearningRate 0.0380 Epoch: 7 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:42,059-Speed 3344.91 samples/sec Loss 6.3609 LearningRate 0.0380 Epoch: 7 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:45,074-Speed 3397.55 samples/sec Loss 6.3998 LearningRate 0.0379 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:48,079-Speed 3408.17 samples/sec Loss 6.3240 LearningRate 0.0379 Epoch: 7 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:51,083-Speed 3411.33 samples/sec Loss 6.4077 LearningRate 0.0379 Epoch: 7 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:54,098-Speed 3396.51 samples/sec Loss 6.5662 LearningRate 0.0379 Epoch: 7 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:13:57,098-Speed 3413.84 samples/sec Loss 6.5148 LearningRate 0.0379 Epoch: 7 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:00,105-Speed 3406.65 samples/sec Loss 6.2733 LearningRate 0.0379 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:03,140-Speed 3374.12 samples/sec Loss 6.3302 LearningRate 0.0379 Epoch: 7 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:06,155-Speed 3398.45 samples/sec Loss 6.2852 LearningRate 0.0379 Epoch: 7 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:09,155-Speed 3414.21 samples/sec Loss 6.4796 LearningRate 0.0378 Epoch: 7 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:12,172-Speed 3394.42 samples/sec Loss 6.2825 LearningRate 0.0378 Epoch: 7 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:15,177-Speed 3408.73 samples/sec Loss 6.6915 LearningRate 0.0378 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:18,178-Speed 3412.44 samples/sec Loss 6.4956 LearningRate 0.0378 Epoch: 7 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:21,184-Speed 3408.52 samples/sec Loss 6.4135 LearningRate 0.0378 Epoch: 7 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:24,190-Speed 3407.91 samples/sec Loss 6.4998 LearningRate 0.0378 Epoch: 7 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:27,232-Speed 3366.24 samples/sec Loss 6.4231 LearningRate 0.0378 Epoch: 7 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:30,296-Speed 3343.08 samples/sec Loss 6.4934 LearningRate 0.0378 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:33,302-Speed 3407.46 samples/sec Loss 6.5588 LearningRate 0.0377 Epoch: 7 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:36,303-Speed 3413.61 samples/sec Loss 6.4760 LearningRate 0.0377 Epoch: 7 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:39,320-Speed 3394.11 samples/sec Loss 6.4131 LearningRate 0.0377 Epoch: 7 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:42,331-Speed 3401.80 samples/sec Loss 6.5480 LearningRate 0.0377 Epoch: 7 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:45,338-Speed 3407.22 samples/sec Loss 6.4396 LearningRate 0.0377 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:14:48,341-Speed 3410.40 samples/sec Loss 6.3685 LearningRate 0.0377 Epoch: 7 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:51,354-Speed 3399.24 samples/sec Loss 6.2895 LearningRate 0.0377 Epoch: 7 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:54,356-Speed 3411.79 samples/sec Loss 6.3361 LearningRate 0.0377 Epoch: 7 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:14:57,342-Speed 3430.73 samples/sec Loss 6.2570 LearningRate 0.0376 Epoch: 7 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:00,395-Speed 3354.41 samples/sec Loss 6.4587 LearningRate 0.0376 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:03,422-Speed 3383.88 samples/sec Loss 6.5282 LearningRate 0.0376 Epoch: 7 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:06,542-Speed 3282.82 samples/sec Loss 6.4072 LearningRate 0.0376 Epoch: 7 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:09,591-Speed 3359.20 samples/sec Loss 6.2459 LearningRate 0.0376 Epoch: 7 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:12,605-Speed 3398.67 samples/sec Loss 6.4116 LearningRate 0.0376 Epoch: 7 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:15,643-Speed 3371.29 samples/sec Loss 6.3729 LearningRate 0.0376 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:18,679-Speed 3373.59 samples/sec Loss 6.3837 LearningRate 0.0376 Epoch: 7 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:21,689-Speed 3402.87 samples/sec Loss 6.3344 LearningRate 0.0376 Epoch: 7 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:24,697-Speed 3405.57 samples/sec Loss 6.3772 LearningRate 0.0375 Epoch: 7 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:27,692-Speed 3420.33 samples/sec Loss 6.5403 LearningRate 0.0375 Epoch: 7 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:30,695-Speed 3411.12 samples/sec Loss 6.3051 LearningRate 0.0375 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:33,700-Speed 3408.28 samples/sec Loss 6.4125 LearningRate 0.0375 Epoch: 7 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:36,705-Speed 3408.33 samples/sec Loss 6.3041 LearningRate 0.0375 Epoch: 7 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:39,742-Speed 3372.16 samples/sec Loss 6.5720 LearningRate 0.0375 Epoch: 7 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:42,758-Speed 3396.52 samples/sec Loss 6.4116 LearningRate 0.0375 Epoch: 7 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:45,761-Speed 3411.26 samples/sec Loss 6.3945 LearningRate 0.0375 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:48,764-Speed 3410.79 samples/sec Loss 6.5164 LearningRate 0.0374 Epoch: 7 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:51,778-Speed 3397.86 samples/sec Loss 6.3041 LearningRate 0.0374 Epoch: 7 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:54,797-Speed 3393.63 samples/sec Loss 6.3487 LearningRate 0.0374 Epoch: 7 Global Step: 39280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:15:57,798-Speed 3412.01 samples/sec Loss 6.3880 LearningRate 0.0374 Epoch: 7 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:00,799-Speed 3413.39 samples/sec Loss 6.4131 LearningRate 0.0374 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:03,806-Speed 3405.94 samples/sec Loss 6.3308 LearningRate 0.0374 Epoch: 7 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:06,791-Speed 3432.10 samples/sec Loss 6.4340 LearningRate 0.0374 Epoch: 7 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:09,793-Speed 3411.48 samples/sec Loss 6.3805 LearningRate 0.0374 Epoch: 7 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:12,804-Speed 3402.27 samples/sec Loss 6.4359 LearningRate 0.0373 Epoch: 7 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:15,809-Speed 3408.91 samples/sec Loss 6.2848 LearningRate 0.0373 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:18,817-Speed 3404.88 samples/sec Loss 6.3558 LearningRate 0.0373 Epoch: 7 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:21,822-Speed 3408.72 samples/sec Loss 6.3922 LearningRate 0.0373 Epoch: 7 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:24,836-Speed 3399.23 samples/sec Loss 6.3619 LearningRate 0.0373 Epoch: 7 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:27,839-Speed 3410.69 samples/sec Loss 6.2652 LearningRate 0.0373 Epoch: 7 Global Step: 39390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:30,859-Speed 3391.86 samples/sec Loss 6.2957 LearningRate 0.0373 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:33,862-Speed 3409.79 samples/sec Loss 6.5135 LearningRate 0.0373 Epoch: 7 Global Step: 39410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:36,869-Speed 3406.65 samples/sec Loss 6.4195 LearningRate 0.0372 Epoch: 7 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:39,883-Speed 3398.59 samples/sec Loss 6.3131 LearningRate 0.0372 Epoch: 7 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:42,891-Speed 3405.58 samples/sec Loss 6.3829 LearningRate 0.0372 Epoch: 7 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:16:45,872-Speed 3435.10 samples/sec Loss 6.3969 LearningRate 0.0372 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:48,876-Speed 3410.00 samples/sec Loss 6.3498 LearningRate 0.0372 Epoch: 7 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:51,883-Speed 3406.83 samples/sec Loss 6.4230 LearningRate 0.0372 Epoch: 7 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:54,902-Speed 3392.54 samples/sec Loss 6.3264 LearningRate 0.0372 Epoch: 7 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:16:57,909-Speed 3405.72 samples/sec Loss 6.3296 LearningRate 0.0372 Epoch: 7 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:00,928-Speed 3393.80 samples/sec Loss 6.4189 LearningRate 0.0372 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:03,935-Speed 3406.38 samples/sec Loss 6.4264 LearningRate 0.0371 Epoch: 7 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:06,939-Speed 3409.16 samples/sec Loss 6.3130 LearningRate 0.0371 Epoch: 7 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:09,950-Speed 3401.92 samples/sec Loss 6.3387 LearningRate 0.0371 Epoch: 7 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:12,966-Speed 3395.38 samples/sec Loss 6.3009 LearningRate 0.0371 Epoch: 7 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:15,972-Speed 3408.24 samples/sec Loss 6.4534 LearningRate 0.0371 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:17:18,968-Speed 3418.17 samples/sec Loss 6.4791 LearningRate 0.0371 Epoch: 7 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:21,973-Speed 3408.85 samples/sec Loss 6.3999 LearningRate 0.0371 Epoch: 7 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:24,977-Speed 3409.53 samples/sec Loss 6.3527 LearningRate 0.0371 Epoch: 7 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:27,991-Speed 3399.00 samples/sec Loss 6.3305 LearningRate 0.0370 Epoch: 7 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:30,996-Speed 3408.13 samples/sec Loss 6.2955 LearningRate 0.0370 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:34,006-Speed 3403.56 samples/sec Loss 6.3154 LearningRate 0.0370 Epoch: 7 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:37,006-Speed 3414.05 samples/sec Loss 6.3971 LearningRate 0.0370 Epoch: 7 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:40,031-Speed 3385.81 samples/sec Loss 6.3080 LearningRate 0.0370 Epoch: 7 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:43,034-Speed 3410.90 samples/sec Loss 6.3046 LearningRate 0.0370 Epoch: 7 Global Step: 39640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:46,034-Speed 3413.91 samples/sec Loss 6.3126 LearningRate 0.0370 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:49,027-Speed 3422.07 samples/sec Loss 6.3527 LearningRate 0.0370 Epoch: 7 Global Step: 39660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:52,062-Speed 3374.98 samples/sec Loss 6.4425 LearningRate 0.0369 Epoch: 7 Global Step: 39670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:55,138-Speed 3330.46 samples/sec Loss 6.3940 LearningRate 0.0369 Epoch: 7 Global Step: 39680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:17:58,140-Speed 3411.24 samples/sec Loss 6.2312 LearningRate 0.0369 Epoch: 7 Global Step: 39690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:01,151-Speed 3402.62 samples/sec Loss 6.4654 LearningRate 0.0369 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:04,164-Speed 3398.62 samples/sec Loss 6.2496 LearningRate 0.0369 Epoch: 7 Global Step: 39710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:07,168-Speed 3409.46 samples/sec Loss 6.3641 LearningRate 0.0369 Epoch: 7 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:10,173-Speed 3409.14 samples/sec Loss 6.3104 LearningRate 0.0369 Epoch: 7 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:13,189-Speed 3395.68 samples/sec Loss 6.3042 LearningRate 0.0369 Epoch: 7 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:16,199-Speed 3402.50 samples/sec Loss 6.2964 LearningRate 0.0369 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:19,215-Speed 3395.85 samples/sec Loss 6.3897 LearningRate 0.0368 Epoch: 7 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:18:22,216-Speed 3414.30 samples/sec Loss 6.5010 LearningRate 0.0368 Epoch: 7 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:18:25,226-Speed 3402.81 samples/sec Loss 6.2659 LearningRate 0.0368 Epoch: 7 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:18:28,236-Speed 3402.03 samples/sec Loss 6.3305 LearningRate 0.0368 Epoch: 7 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:18:31,234-Speed 3417.43 samples/sec Loss 6.4657 LearningRate 0.0368 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:34,242-Speed 3404.50 samples/sec Loss 6.3856 LearningRate 0.0368 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:37,244-Speed 3412.13 samples/sec Loss 6.4531 LearningRate 0.0368 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:40,262-Speed 3393.57 samples/sec Loss 6.2549 LearningRate 0.0368 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:43,286-Speed 3388.02 samples/sec Loss 6.4340 LearningRate 0.0367 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:46,288-Speed 3411.43 samples/sec Loss 6.2078 LearningRate 0.0367 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:49,292-Speed 3409.78 samples/sec Loss 6.3951 LearningRate 0.0367 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:52,301-Speed 3404.25 samples/sec Loss 6.2680 LearningRate 0.0367 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:55,376-Speed 3330.65 samples/sec Loss 6.3087 LearningRate 0.0367 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:18:58,385-Speed 3404.06 samples/sec Loss 6.3964 LearningRate 0.0367 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:01,424-Speed 3370.66 samples/sec Loss 6.3305 LearningRate 0.0367 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:19:04,494-Speed 3336.51 samples/sec Loss 6.1879 LearningRate 0.0367 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:19:07,585-Speed 3313.78 samples/sec Loss 6.1422 LearningRate 0.0366 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:19:10,589-Speed 3409.59 samples/sec Loss 6.4092 LearningRate 0.0366 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:19:13,578-Speed 3426.72 samples/sec Loss 6.2859 LearningRate 0.0366 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:16,590-Speed 3401.28 samples/sec Loss 6.4721 LearningRate 0.0366 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:19,596-Speed 3406.37 samples/sec Loss 6.3819 LearningRate 0.0366 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:22,602-Speed 3407.57 samples/sec Loss 6.2279 LearningRate 0.0366 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:25,616-Speed 3398.45 samples/sec Loss 6.2636 LearningRate 0.0366 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:28,619-Speed 3410.43 samples/sec Loss 6.4423 LearningRate 0.0366 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:19:31,621-Speed 3412.27 samples/sec Loss 6.3194 LearningRate 0.0366 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:20:15,913-[lfw][40000]XNorm: 21.937800 Training: 2022-04-11 03:20:15,913-[lfw][40000]Accuracy-Flip: 0.99750+-0.00239 Training: 2022-04-11 03:20:15,914-[lfw][40000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:21:07,447-[cfp_fp][40000]XNorm: 19.839884 Training: 2022-04-11 03:21:07,448-[cfp_fp][40000]Accuracy-Flip: 0.97471+-0.00553 Training: 2022-04-11 03:21:07,448-[cfp_fp][40000]Accuracy-Highest: 0.97471 Training: 2022-04-11 03:21:51,808-[agedb_30][40000]XNorm: 21.920209 Training: 2022-04-11 03:21:51,808-[agedb_30][40000]Accuracy-Flip: 0.97967+-0.00795 Training: 2022-04-11 03:21:51,809-[agedb_30][40000]Accuracy-Highest: 0.97967 Training: 2022-04-11 03:21:54,803-Speed 71.52 samples/sec Loss 6.4074 LearningRate 0.0365 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:21:57,786-Speed 3433.28 samples/sec Loss 6.3019 LearningRate 0.0365 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:00,775-Speed 3426.27 samples/sec Loss 6.2963 LearningRate 0.0365 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:03,787-Speed 3400.95 samples/sec Loss 6.3540 LearningRate 0.0365 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:22:06,775-Speed 3427.15 samples/sec Loss 6.3858 LearningRate 0.0365 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:22:09,766-Speed 3425.65 samples/sec Loss 6.3182 LearningRate 0.0365 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:22:12,745-Speed 3437.39 samples/sec Loss 6.2947 LearningRate 0.0365 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:15,736-Speed 3425.19 samples/sec Loss 6.2720 LearningRate 0.0365 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:18,743-Speed 3406.37 samples/sec Loss 6.3699 LearningRate 0.0364 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:21,733-Speed 3425.29 samples/sec Loss 6.2907 LearningRate 0.0364 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:24,724-Speed 3424.11 samples/sec Loss 6.3289 LearningRate 0.0364 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:27,719-Speed 3419.74 samples/sec Loss 6.4349 LearningRate 0.0364 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:30,711-Speed 3423.35 samples/sec Loss 6.4235 LearningRate 0.0364 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:33,716-Speed 3408.68 samples/sec Loss 6.3353 LearningRate 0.0364 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:36,708-Speed 3423.83 samples/sec Loss 6.2066 LearningRate 0.0364 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:39,702-Speed 3421.56 samples/sec Loss 6.3762 LearningRate 0.0364 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:42,695-Speed 3421.97 samples/sec Loss 6.3634 LearningRate 0.0363 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:45,695-Speed 3413.73 samples/sec Loss 6.2661 LearningRate 0.0363 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:48,690-Speed 3420.21 samples/sec Loss 6.2241 LearningRate 0.0363 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:51,687-Speed 3417.49 samples/sec Loss 6.1981 LearningRate 0.0363 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:22:54,685-Speed 3417.12 samples/sec Loss 6.2660 LearningRate 0.0363 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:22:57,681-Speed 3418.83 samples/sec Loss 6.2505 LearningRate 0.0363 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:00,736-Speed 3352.61 samples/sec Loss 6.1688 LearningRate 0.0363 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:03,803-Speed 3340.21 samples/sec Loss 6.3047 LearningRate 0.0363 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:06,826-Speed 3387.89 samples/sec Loss 6.3046 LearningRate 0.0363 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:09,821-Speed 3419.63 samples/sec Loss 6.3114 LearningRate 0.0362 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:12,820-Speed 3415.16 samples/sec Loss 6.2667 LearningRate 0.0362 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:15,817-Speed 3418.21 samples/sec Loss 6.3334 LearningRate 0.0362 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:18,820-Speed 3411.17 samples/sec Loss 6.2126 LearningRate 0.0362 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:21,818-Speed 3416.01 samples/sec Loss 6.2342 LearningRate 0.0362 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:23:24,850-Speed 3379.17 samples/sec Loss 6.3084 LearningRate 0.0362 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:27,888-Speed 3371.04 samples/sec Loss 6.2757 LearningRate 0.0362 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:30,884-Speed 3418.51 samples/sec Loss 6.2049 LearningRate 0.0362 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:33,884-Speed 3414.29 samples/sec Loss 6.3202 LearningRate 0.0361 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:36,878-Speed 3421.05 samples/sec Loss 6.3485 LearningRate 0.0361 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:39,875-Speed 3417.42 samples/sec Loss 6.1155 LearningRate 0.0361 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:42,888-Speed 3400.19 samples/sec Loss 6.2767 LearningRate 0.0361 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:45,897-Speed 3403.51 samples/sec Loss 6.2727 LearningRate 0.0361 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:48,893-Speed 3419.78 samples/sec Loss 6.2816 LearningRate 0.0361 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:51,899-Speed 3407.06 samples/sec Loss 6.0737 LearningRate 0.0361 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:54,878-Speed 3437.69 samples/sec Loss 6.5187 LearningRate 0.0361 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:23:57,875-Speed 3417.51 samples/sec Loss 6.2566 LearningRate 0.0361 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:00,871-Speed 3418.70 samples/sec Loss 6.2567 LearningRate 0.0360 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:03,869-Speed 3417.60 samples/sec Loss 6.2290 LearningRate 0.0360 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:06,867-Speed 3415.58 samples/sec Loss 6.1114 LearningRate 0.0360 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:09,995-Speed 3274.60 samples/sec Loss 6.3817 LearningRate 0.0360 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:22,652-Speed 809.15 samples/sec Loss 5.7625 LearningRate 0.0360 Epoch: 8 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:25,812-Speed 3240.62 samples/sec Loss 5.4454 LearningRate 0.0360 Epoch: 8 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:28,855-Speed 3367.64 samples/sec Loss 5.5011 LearningRate 0.0360 Epoch: 8 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:31,867-Speed 3400.24 samples/sec Loss 5.5135 LearningRate 0.0360 Epoch: 8 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:34,895-Speed 3382.67 samples/sec Loss 5.4095 LearningRate 0.0359 Epoch: 8 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:37,926-Speed 3379.71 samples/sec Loss 5.3218 LearningRate 0.0359 Epoch: 8 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:40,931-Speed 3408.34 samples/sec Loss 5.5385 LearningRate 0.0359 Epoch: 8 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:43,960-Speed 3380.87 samples/sec Loss 5.5487 LearningRate 0.0359 Epoch: 8 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:24:46,954-Speed 3420.99 samples/sec Loss 5.5051 LearningRate 0.0359 Epoch: 8 Global Step: 40550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:24:49,964-Speed 3403.10 samples/sec Loss 5.4694 LearningRate 0.0359 Epoch: 8 Global Step: 40560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:24:53,005-Speed 3368.81 samples/sec Loss 5.6193 LearningRate 0.0359 Epoch: 8 Global Step: 40570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:24:56,048-Speed 3365.57 samples/sec Loss 5.5189 LearningRate 0.0359 Epoch: 8 Global Step: 40580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:24:59,069-Speed 3391.07 samples/sec Loss 5.5903 LearningRate 0.0359 Epoch: 8 Global Step: 40590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:02,097-Speed 3381.76 samples/sec Loss 5.6260 LearningRate 0.0358 Epoch: 8 Global Step: 40600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:05,172-Speed 3331.35 samples/sec Loss 5.5505 LearningRate 0.0358 Epoch: 8 Global Step: 40610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:08,185-Speed 3399.60 samples/sec Loss 5.7949 LearningRate 0.0358 Epoch: 8 Global Step: 40620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:11,191-Speed 3407.75 samples/sec Loss 5.5591 LearningRate 0.0358 Epoch: 8 Global Step: 40630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:14,195-Speed 3409.67 samples/sec Loss 5.6104 LearningRate 0.0358 Epoch: 8 Global Step: 40640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:25:17,198-Speed 3410.71 samples/sec Loss 5.6368 LearningRate 0.0358 Epoch: 8 Global Step: 40650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:20,200-Speed 3411.38 samples/sec Loss 5.6638 LearningRate 0.0358 Epoch: 8 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:23,205-Speed 3409.29 samples/sec Loss 5.7015 LearningRate 0.0358 Epoch: 8 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:26,230-Speed 3385.33 samples/sec Loss 5.6992 LearningRate 0.0357 Epoch: 8 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:29,243-Speed 3399.50 samples/sec Loss 5.4923 LearningRate 0.0357 Epoch: 8 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:32,242-Speed 3415.54 samples/sec Loss 5.7359 LearningRate 0.0357 Epoch: 8 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:35,251-Speed 3404.32 samples/sec Loss 5.8021 LearningRate 0.0357 Epoch: 8 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:38,256-Speed 3408.72 samples/sec Loss 5.6890 LearningRate 0.0357 Epoch: 8 Global Step: 40720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:41,285-Speed 3381.64 samples/sec Loss 5.6248 LearningRate 0.0357 Epoch: 8 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:44,316-Speed 3378.48 samples/sec Loss 5.6021 LearningRate 0.0357 Epoch: 8 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:47,327-Speed 3402.14 samples/sec Loss 5.6372 LearningRate 0.0357 Epoch: 8 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:25:50,341-Speed 3397.95 samples/sec Loss 5.7582 LearningRate 0.0356 Epoch: 8 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:25:53,334-Speed 3422.67 samples/sec Loss 5.8137 LearningRate 0.0356 Epoch: 8 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:56,339-Speed 3408.47 samples/sec Loss 5.5699 LearningRate 0.0356 Epoch: 8 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:25:59,355-Speed 3395.37 samples/sec Loss 5.7134 LearningRate 0.0356 Epoch: 8 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:02,411-Speed 3352.13 samples/sec Loss 5.6183 LearningRate 0.0356 Epoch: 8 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:05,427-Speed 3395.90 samples/sec Loss 5.5627 LearningRate 0.0356 Epoch: 8 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:08,444-Speed 3395.39 samples/sec Loss 5.7311 LearningRate 0.0356 Epoch: 8 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:11,458-Speed 3398.36 samples/sec Loss 5.6572 LearningRate 0.0356 Epoch: 8 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:14,475-Speed 3395.26 samples/sec Loss 5.8720 LearningRate 0.0356 Epoch: 8 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:17,480-Speed 3407.37 samples/sec Loss 5.7088 LearningRate 0.0355 Epoch: 8 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:20,495-Speed 3397.33 samples/sec Loss 5.5921 LearningRate 0.0355 Epoch: 8 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:23,503-Speed 3404.96 samples/sec Loss 5.7116 LearningRate 0.0355 Epoch: 8 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:26:26,490-Speed 3429.36 samples/sec Loss 5.8606 LearningRate 0.0355 Epoch: 8 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:29,501-Speed 3402.11 samples/sec Loss 5.6694 LearningRate 0.0355 Epoch: 8 Global Step: 40890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:32,510-Speed 3404.40 samples/sec Loss 5.8211 LearningRate 0.0355 Epoch: 8 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:35,513-Speed 3410.73 samples/sec Loss 5.8632 LearningRate 0.0355 Epoch: 8 Global Step: 40910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:38,512-Speed 3414.74 samples/sec Loss 5.7095 LearningRate 0.0355 Epoch: 8 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:41,511-Speed 3414.95 samples/sec Loss 5.7537 LearningRate 0.0354 Epoch: 8 Global Step: 40930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:44,512-Speed 3413.56 samples/sec Loss 5.8027 LearningRate 0.0354 Epoch: 8 Global Step: 40940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:47,515-Speed 3410.74 samples/sec Loss 5.7384 LearningRate 0.0354 Epoch: 8 Global Step: 40950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:50,521-Speed 3406.63 samples/sec Loss 5.8452 LearningRate 0.0354 Epoch: 8 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:53,576-Speed 3352.81 samples/sec Loss 5.7397 LearningRate 0.0354 Epoch: 8 Global Step: 40970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:56,562-Speed 3430.44 samples/sec Loss 5.9071 LearningRate 0.0354 Epoch: 8 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:26:59,563-Speed 3413.24 samples/sec Loss 5.8514 LearningRate 0.0354 Epoch: 8 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:02,578-Speed 3397.67 samples/sec Loss 5.5363 LearningRate 0.0354 Epoch: 8 Global Step: 41000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:05,594-Speed 3396.39 samples/sec Loss 5.9335 LearningRate 0.0354 Epoch: 8 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:08,596-Speed 3412.51 samples/sec Loss 5.7626 LearningRate 0.0353 Epoch: 8 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:11,594-Speed 3416.60 samples/sec Loss 5.6528 LearningRate 0.0353 Epoch: 8 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:14,599-Speed 3407.91 samples/sec Loss 5.7697 LearningRate 0.0353 Epoch: 8 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:17,603-Speed 3409.31 samples/sec Loss 5.6905 LearningRate 0.0353 Epoch: 8 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:20,605-Speed 3412.58 samples/sec Loss 5.8931 LearningRate 0.0353 Epoch: 8 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:23,607-Speed 3411.16 samples/sec Loss 5.7760 LearningRate 0.0353 Epoch: 8 Global Step: 41070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:26,625-Speed 3394.32 samples/sec Loss 5.7511 LearningRate 0.0353 Epoch: 8 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:27:29,646-Speed 3391.14 samples/sec Loss 6.0364 LearningRate 0.0353 Epoch: 8 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:27:32,648-Speed 3411.31 samples/sec Loss 5.8389 LearningRate 0.0352 Epoch: 8 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:27:35,631-Speed 3433.99 samples/sec Loss 5.9337 LearningRate 0.0352 Epoch: 8 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:38,633-Speed 3411.92 samples/sec Loss 5.6821 LearningRate 0.0352 Epoch: 8 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:41,634-Speed 3412.62 samples/sec Loss 5.7852 LearningRate 0.0352 Epoch: 8 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:44,636-Speed 3412.25 samples/sec Loss 5.9023 LearningRate 0.0352 Epoch: 8 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:47,676-Speed 3368.97 samples/sec Loss 5.7950 LearningRate 0.0352 Epoch: 8 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:50,695-Speed 3393.21 samples/sec Loss 5.7248 LearningRate 0.0352 Epoch: 8 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:53,708-Speed 3399.09 samples/sec Loss 5.7870 LearningRate 0.0352 Epoch: 8 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:56,721-Speed 3399.98 samples/sec Loss 5.9561 LearningRate 0.0352 Epoch: 8 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:27:59,719-Speed 3416.03 samples/sec Loss 5.9351 LearningRate 0.0351 Epoch: 8 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:02,730-Speed 3401.76 samples/sec Loss 6.0214 LearningRate 0.0351 Epoch: 8 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:05,739-Speed 3404.54 samples/sec Loss 5.7844 LearningRate 0.0351 Epoch: 8 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:28:08,762-Speed 3387.60 samples/sec Loss 5.8440 LearningRate 0.0351 Epoch: 8 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:11,765-Speed 3411.06 samples/sec Loss 5.8077 LearningRate 0.0351 Epoch: 8 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:14,845-Speed 3325.73 samples/sec Loss 5.8336 LearningRate 0.0351 Epoch: 8 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:17,907-Speed 3344.80 samples/sec Loss 5.8676 LearningRate 0.0351 Epoch: 8 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:20,908-Speed 3412.75 samples/sec Loss 6.0014 LearningRate 0.0351 Epoch: 8 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:23,910-Speed 3412.64 samples/sec Loss 5.8309 LearningRate 0.0351 Epoch: 8 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:26,910-Speed 3413.73 samples/sec Loss 5.8745 LearningRate 0.0350 Epoch: 8 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:29,910-Speed 3414.69 samples/sec Loss 5.9095 LearningRate 0.0350 Epoch: 8 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:32,912-Speed 3411.51 samples/sec Loss 5.9929 LearningRate 0.0350 Epoch: 8 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:35,913-Speed 3413.08 samples/sec Loss 5.9435 LearningRate 0.0350 Epoch: 8 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:38,959-Speed 3362.55 samples/sec Loss 5.9976 LearningRate 0.0350 Epoch: 8 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:28:41,963-Speed 3409.23 samples/sec Loss 5.9051 LearningRate 0.0350 Epoch: 8 Global Step: 41330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:28:44,968-Speed 3412.36 samples/sec Loss 5.9944 LearningRate 0.0350 Epoch: 8 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:28:47,977-Speed 3403.43 samples/sec Loss 5.9658 LearningRate 0.0350 Epoch: 8 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:28:50,964-Speed 3429.16 samples/sec Loss 5.9035 LearningRate 0.0349 Epoch: 8 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:53,989-Speed 3386.61 samples/sec Loss 5.9992 LearningRate 0.0349 Epoch: 8 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:56,988-Speed 3414.28 samples/sec Loss 5.9166 LearningRate 0.0349 Epoch: 8 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:28:59,987-Speed 3415.93 samples/sec Loss 6.0456 LearningRate 0.0349 Epoch: 8 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:02,992-Speed 3408.28 samples/sec Loss 5.9424 LearningRate 0.0349 Epoch: 8 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:06,045-Speed 3354.57 samples/sec Loss 6.0066 LearningRate 0.0349 Epoch: 8 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:09,046-Speed 3413.64 samples/sec Loss 5.9531 LearningRate 0.0349 Epoch: 8 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:12,052-Speed 3407.47 samples/sec Loss 5.8947 LearningRate 0.0349 Epoch: 8 Global Step: 41430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:15,059-Speed 3406.43 samples/sec Loss 6.0509 LearningRate 0.0349 Epoch: 8 Global Step: 41440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:18,087-Speed 3382.61 samples/sec Loss 6.0270 LearningRate 0.0348 Epoch: 8 Global Step: 41450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:21,089-Speed 3411.69 samples/sec Loss 5.8494 LearningRate 0.0348 Epoch: 8 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:29:24,076-Speed 3429.13 samples/sec Loss 5.8994 LearningRate 0.0348 Epoch: 8 Global Step: 41470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:27,081-Speed 3408.77 samples/sec Loss 5.9369 LearningRate 0.0348 Epoch: 8 Global Step: 41480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:30,085-Speed 3409.64 samples/sec Loss 5.7324 LearningRate 0.0348 Epoch: 8 Global Step: 41490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:33,085-Speed 3413.44 samples/sec Loss 6.0750 LearningRate 0.0348 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:36,089-Speed 3410.15 samples/sec Loss 5.8752 LearningRate 0.0348 Epoch: 8 Global Step: 41510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:39,093-Speed 3408.98 samples/sec Loss 5.9805 LearningRate 0.0348 Epoch: 8 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:42,101-Speed 3405.63 samples/sec Loss 5.8669 LearningRate 0.0347 Epoch: 8 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:45,102-Speed 3412.54 samples/sec Loss 5.9490 LearningRate 0.0347 Epoch: 8 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:29:48,101-Speed 3414.99 samples/sec Loss 5.9452 LearningRate 0.0347 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:29:51,102-Speed 3413.49 samples/sec Loss 6.0106 LearningRate 0.0347 Epoch: 8 Global Step: 41560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:29:54,112-Speed 3403.54 samples/sec Loss 5.7665 LearningRate 0.0347 Epoch: 8 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:29:57,122-Speed 3402.87 samples/sec Loss 5.8955 LearningRate 0.0347 Epoch: 8 Global Step: 41580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:00,125-Speed 3410.57 samples/sec Loss 5.9323 LearningRate 0.0347 Epoch: 8 Global Step: 41590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:03,133-Speed 3405.25 samples/sec Loss 5.9004 LearningRate 0.0347 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:06,154-Speed 3390.15 samples/sec Loss 5.9218 LearningRate 0.0347 Epoch: 8 Global Step: 41610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:09,189-Speed 3374.31 samples/sec Loss 6.0716 LearningRate 0.0346 Epoch: 8 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:12,197-Speed 3405.28 samples/sec Loss 6.0400 LearningRate 0.0346 Epoch: 8 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:15,204-Speed 3406.60 samples/sec Loss 5.9347 LearningRate 0.0346 Epoch: 8 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:30:18,203-Speed 3414.57 samples/sec Loss 5.9289 LearningRate 0.0346 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:21,213-Speed 3403.02 samples/sec Loss 5.9821 LearningRate 0.0346 Epoch: 8 Global Step: 41660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:24,225-Speed 3400.97 samples/sec Loss 5.8710 LearningRate 0.0346 Epoch: 8 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:27,242-Speed 3395.43 samples/sec Loss 6.0259 LearningRate 0.0346 Epoch: 8 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:30,296-Speed 3353.44 samples/sec Loss 5.8224 LearningRate 0.0346 Epoch: 8 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:33,308-Speed 3400.06 samples/sec Loss 5.8695 LearningRate 0.0345 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:36,321-Speed 3399.17 samples/sec Loss 6.0229 LearningRate 0.0345 Epoch: 8 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:39,344-Speed 3389.28 samples/sec Loss 6.0596 LearningRate 0.0345 Epoch: 8 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:42,358-Speed 3398.16 samples/sec Loss 5.9508 LearningRate 0.0345 Epoch: 8 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:45,378-Speed 3391.74 samples/sec Loss 6.0830 LearningRate 0.0345 Epoch: 8 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:48,386-Speed 3405.51 samples/sec Loss 6.0679 LearningRate 0.0345 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:30:51,375-Speed 3426.07 samples/sec Loss 5.8426 LearningRate 0.0345 Epoch: 8 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:54,385-Speed 3402.73 samples/sec Loss 6.2280 LearningRate 0.0345 Epoch: 8 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:30:57,388-Speed 3411.33 samples/sec Loss 5.9797 LearningRate 0.0345 Epoch: 8 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:00,372-Speed 3431.90 samples/sec Loss 6.0867 LearningRate 0.0344 Epoch: 8 Global Step: 41790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:03,398-Speed 3384.85 samples/sec Loss 5.9858 LearningRate 0.0344 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:06,409-Speed 3402.18 samples/sec Loss 5.9794 LearningRate 0.0344 Epoch: 8 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:09,453-Speed 3364.86 samples/sec Loss 5.9048 LearningRate 0.0344 Epoch: 8 Global Step: 41820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:12,458-Speed 3407.93 samples/sec Loss 5.9772 LearningRate 0.0344 Epoch: 8 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:15,466-Speed 3406.16 samples/sec Loss 6.0586 LearningRate 0.0344 Epoch: 8 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:18,521-Speed 3352.76 samples/sec Loss 5.8836 LearningRate 0.0344 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:21,527-Speed 3407.06 samples/sec Loss 5.8953 LearningRate 0.0344 Epoch: 8 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:24,531-Speed 3409.13 samples/sec Loss 6.0616 LearningRate 0.0344 Epoch: 8 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:27,538-Speed 3405.75 samples/sec Loss 5.8699 LearningRate 0.0343 Epoch: 8 Global Step: 41880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:31:30,546-Speed 3405.95 samples/sec Loss 5.8946 LearningRate 0.0343 Epoch: 8 Global Step: 41890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:33,551-Speed 3407.77 samples/sec Loss 5.9371 LearningRate 0.0343 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:36,558-Speed 3406.32 samples/sec Loss 5.9740 LearningRate 0.0343 Epoch: 8 Global Step: 41910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:39,564-Speed 3408.04 samples/sec Loss 5.8919 LearningRate 0.0343 Epoch: 8 Global Step: 41920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:42,565-Speed 3413.39 samples/sec Loss 6.0078 LearningRate 0.0343 Epoch: 8 Global Step: 41930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:45,583-Speed 3393.51 samples/sec Loss 6.0437 LearningRate 0.0343 Epoch: 8 Global Step: 41940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:48,584-Speed 3412.72 samples/sec Loss 5.9184 LearningRate 0.0343 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:51,588-Speed 3410.36 samples/sec Loss 6.0530 LearningRate 0.0342 Epoch: 8 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:54,592-Speed 3408.83 samples/sec Loss 6.0986 LearningRate 0.0342 Epoch: 8 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:31:57,597-Speed 3409.41 samples/sec Loss 6.1015 LearningRate 0.0342 Epoch: 8 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:32:00,616-Speed 3392.70 samples/sec Loss 6.1156 LearningRate 0.0342 Epoch: 8 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:32:03,628-Speed 3399.74 samples/sec Loss 5.9940 LearningRate 0.0342 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:32:48,004-[lfw][42000]XNorm: 23.488167 Training: 2022-04-11 03:32:48,005-[lfw][42000]Accuracy-Flip: 0.99733+-0.00327 Training: 2022-04-11 03:32:48,006-[lfw][42000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:33:39,534-[cfp_fp][42000]XNorm: 20.888780 Training: 2022-04-11 03:33:39,535-[cfp_fp][42000]Accuracy-Flip: 0.97186+-0.00712 Training: 2022-04-11 03:33:39,536-[cfp_fp][42000]Accuracy-Highest: 0.97471 Training: 2022-04-11 03:34:23,543-[agedb_30][42000]XNorm: 23.363319 Training: 2022-04-11 03:34:23,543-[agedb_30][42000]Accuracy-Flip: 0.97867+-0.00714 Training: 2022-04-11 03:34:23,544-[agedb_30][42000]Accuracy-Highest: 0.97967 Training: 2022-04-11 03:34:26,549-Speed 71.65 samples/sec Loss 5.8776 LearningRate 0.0342 Epoch: 8 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:34:29,546-Speed 3417.18 samples/sec Loss 5.9734 LearningRate 0.0342 Epoch: 8 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:34:32,534-Speed 3427.76 samples/sec Loss 5.9890 LearningRate 0.0342 Epoch: 8 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:34:35,528-Speed 3420.70 samples/sec Loss 5.8902 LearningRate 0.0342 Epoch: 8 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:34:38,521-Speed 3422.22 samples/sec Loss 6.0810 LearningRate 0.0341 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:41,517-Speed 3418.77 samples/sec Loss 5.9304 LearningRate 0.0341 Epoch: 8 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:44,509-Speed 3423.58 samples/sec Loss 6.0536 LearningRate 0.0341 Epoch: 8 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:47,503-Speed 3421.14 samples/sec Loss 6.0425 LearningRate 0.0341 Epoch: 8 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:50,496-Speed 3421.85 samples/sec Loss 6.0338 LearningRate 0.0341 Epoch: 8 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:53,491-Speed 3419.46 samples/sec Loss 5.9242 LearningRate 0.0341 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:56,482-Speed 3426.82 samples/sec Loss 6.0475 LearningRate 0.0341 Epoch: 8 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:34:59,475-Speed 3421.53 samples/sec Loss 5.9205 LearningRate 0.0341 Epoch: 8 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:02,475-Speed 3414.45 samples/sec Loss 6.0461 LearningRate 0.0341 Epoch: 8 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:05,476-Speed 3412.92 samples/sec Loss 5.8921 LearningRate 0.0340 Epoch: 8 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:08,473-Speed 3417.13 samples/sec Loss 5.9949 LearningRate 0.0340 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:35:11,454-Speed 3437.17 samples/sec Loss 6.0059 LearningRate 0.0340 Epoch: 8 Global Step: 42160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:14,478-Speed 3386.05 samples/sec Loss 5.7844 LearningRate 0.0340 Epoch: 8 Global Step: 42170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:17,532-Speed 3353.92 samples/sec Loss 5.8089 LearningRate 0.0340 Epoch: 8 Global Step: 42180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:20,548-Speed 3397.11 samples/sec Loss 5.9817 LearningRate 0.0340 Epoch: 8 Global Step: 42190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:23,546-Speed 3416.07 samples/sec Loss 5.8698 LearningRate 0.0340 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:26,546-Speed 3413.97 samples/sec Loss 6.0194 LearningRate 0.0340 Epoch: 8 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:29,562-Speed 3396.80 samples/sec Loss 5.8735 LearningRate 0.0339 Epoch: 8 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:32,566-Speed 3408.78 samples/sec Loss 6.0703 LearningRate 0.0339 Epoch: 8 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:35,572-Speed 3407.83 samples/sec Loss 5.9217 LearningRate 0.0339 Epoch: 8 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:38,580-Speed 3404.22 samples/sec Loss 5.9325 LearningRate 0.0339 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:41,590-Speed 3403.90 samples/sec Loss 5.9450 LearningRate 0.0339 Epoch: 8 Global Step: 42260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:44,610-Speed 3391.04 samples/sec Loss 6.0806 LearningRate 0.0339 Epoch: 8 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:47,620-Speed 3403.39 samples/sec Loss 5.9764 LearningRate 0.0339 Epoch: 8 Global Step: 42280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:50,644-Speed 3387.21 samples/sec Loss 6.0099 LearningRate 0.0339 Epoch: 8 Global Step: 42290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:53,657-Speed 3399.55 samples/sec Loss 5.9946 LearningRate 0.0339 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:56,664-Speed 3406.42 samples/sec Loss 6.0100 LearningRate 0.0338 Epoch: 8 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:35:59,674-Speed 3401.88 samples/sec Loss 5.9456 LearningRate 0.0338 Epoch: 8 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:02,687-Speed 3399.54 samples/sec Loss 5.8833 LearningRate 0.0338 Epoch: 8 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:05,762-Speed 3331.36 samples/sec Loss 5.8406 LearningRate 0.0338 Epoch: 8 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:08,773-Speed 3401.58 samples/sec Loss 6.0036 LearningRate 0.0338 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:11,787-Speed 3397.57 samples/sec Loss 6.0770 LearningRate 0.0338 Epoch: 8 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:14,806-Speed 3393.19 samples/sec Loss 6.0353 LearningRate 0.0338 Epoch: 8 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:17,828-Speed 3389.88 samples/sec Loss 5.9290 LearningRate 0.0338 Epoch: 8 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:20,835-Speed 3405.75 samples/sec Loss 5.9656 LearningRate 0.0338 Epoch: 8 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:23,853-Speed 3393.82 samples/sec Loss 5.9640 LearningRate 0.0337 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:26,854-Speed 3412.80 samples/sec Loss 6.0834 LearningRate 0.0337 Epoch: 8 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:29,865-Speed 3402.55 samples/sec Loss 5.9115 LearningRate 0.0337 Epoch: 8 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:32,866-Speed 3412.19 samples/sec Loss 6.0400 LearningRate 0.0337 Epoch: 8 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:35,873-Speed 3407.11 samples/sec Loss 6.0427 LearningRate 0.0337 Epoch: 8 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:38,926-Speed 3354.50 samples/sec Loss 6.0450 LearningRate 0.0337 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:36:41,933-Speed 3405.65 samples/sec Loss 6.1675 LearningRate 0.0337 Epoch: 8 Global Step: 42460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 03:36:44,900-Speed 3452.90 samples/sec Loss 5.9129 LearningRate 0.0337 Epoch: 8 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:47,896-Speed 3418.58 samples/sec Loss 6.1179 LearningRate 0.0336 Epoch: 8 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:50,900-Speed 3410.05 samples/sec Loss 5.9347 LearningRate 0.0336 Epoch: 8 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:53,901-Speed 3412.81 samples/sec Loss 5.9350 LearningRate 0.0336 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:56,906-Speed 3408.37 samples/sec Loss 5.8786 LearningRate 0.0336 Epoch: 8 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:36:59,905-Speed 3416.08 samples/sec Loss 5.9106 LearningRate 0.0336 Epoch: 8 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:02,933-Speed 3382.35 samples/sec Loss 6.0665 LearningRate 0.0336 Epoch: 8 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:05,957-Speed 3387.23 samples/sec Loss 5.9893 LearningRate 0.0336 Epoch: 8 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:08,958-Speed 3412.25 samples/sec Loss 5.9520 LearningRate 0.0336 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:11,958-Speed 3414.82 samples/sec Loss 5.9257 LearningRate 0.0336 Epoch: 8 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:14,977-Speed 3393.26 samples/sec Loss 5.9511 LearningRate 0.0335 Epoch: 8 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:17,976-Speed 3414.85 samples/sec Loss 5.9945 LearningRate 0.0335 Epoch: 8 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:20,974-Speed 3417.22 samples/sec Loss 5.7909 LearningRate 0.0335 Epoch: 8 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:23,971-Speed 3416.82 samples/sec Loss 5.8373 LearningRate 0.0335 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:26,977-Speed 3407.29 samples/sec Loss 6.2056 LearningRate 0.0335 Epoch: 8 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:29,979-Speed 3412.53 samples/sec Loss 5.9570 LearningRate 0.0335 Epoch: 8 Global Step: 42620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:32,981-Speed 3411.60 samples/sec Loss 5.9350 LearningRate 0.0335 Epoch: 8 Global Step: 42630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:35,982-Speed 3412.89 samples/sec Loss 5.8539 LearningRate 0.0335 Epoch: 8 Global Step: 42640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:38,979-Speed 3417.51 samples/sec Loss 5.9356 LearningRate 0.0335 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:41,987-Speed 3404.64 samples/sec Loss 5.9252 LearningRate 0.0334 Epoch: 8 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:44,988-Speed 3414.08 samples/sec Loss 5.9856 LearningRate 0.0334 Epoch: 8 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:37:47,991-Speed 3410.53 samples/sec Loss 6.0497 LearningRate 0.0334 Epoch: 8 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:37:50,971-Speed 3437.34 samples/sec Loss 5.9264 LearningRate 0.0334 Epoch: 8 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:53,987-Speed 3396.21 samples/sec Loss 5.8846 LearningRate 0.0334 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:56,997-Speed 3402.55 samples/sec Loss 5.9293 LearningRate 0.0334 Epoch: 8 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:37:59,997-Speed 3414.00 samples/sec Loss 6.0264 LearningRate 0.0334 Epoch: 8 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:03,003-Speed 3407.43 samples/sec Loss 5.9904 LearningRate 0.0334 Epoch: 8 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:06,033-Speed 3381.19 samples/sec Loss 6.0221 LearningRate 0.0334 Epoch: 8 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:09,036-Speed 3410.12 samples/sec Loss 6.1055 LearningRate 0.0333 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:12,040-Speed 3409.27 samples/sec Loss 5.8495 LearningRate 0.0333 Epoch: 8 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:15,048-Speed 3406.23 samples/sec Loss 6.0045 LearningRate 0.0333 Epoch: 8 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:18,048-Speed 3413.96 samples/sec Loss 5.9945 LearningRate 0.0333 Epoch: 8 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:21,051-Speed 3410.87 samples/sec Loss 5.9317 LearningRate 0.0333 Epoch: 8 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:38:24,033-Speed 3434.94 samples/sec Loss 5.9869 LearningRate 0.0333 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:27,047-Speed 3397.65 samples/sec Loss 6.0274 LearningRate 0.0333 Epoch: 8 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:30,052-Speed 3408.72 samples/sec Loss 6.0303 LearningRate 0.0333 Epoch: 8 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:33,053-Speed 3412.96 samples/sec Loss 5.9642 LearningRate 0.0332 Epoch: 8 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:36,058-Speed 3408.94 samples/sec Loss 5.9210 LearningRate 0.0332 Epoch: 8 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:39,067-Speed 3403.54 samples/sec Loss 6.0126 LearningRate 0.0332 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:42,074-Speed 3406.54 samples/sec Loss 5.9010 LearningRate 0.0332 Epoch: 8 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:45,076-Speed 3411.83 samples/sec Loss 5.9566 LearningRate 0.0332 Epoch: 8 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:48,088-Speed 3401.09 samples/sec Loss 6.0398 LearningRate 0.0332 Epoch: 8 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:51,091-Speed 3410.29 samples/sec Loss 6.0631 LearningRate 0.0332 Epoch: 8 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:54,077-Speed 3430.48 samples/sec Loss 5.9051 LearningRate 0.0332 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:38:57,079-Speed 3412.16 samples/sec Loss 5.8920 LearningRate 0.0332 Epoch: 8 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:00,130-Speed 3357.26 samples/sec Loss 6.0107 LearningRate 0.0331 Epoch: 8 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:03,143-Speed 3398.32 samples/sec Loss 6.0762 LearningRate 0.0331 Epoch: 8 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:06,176-Speed 3377.45 samples/sec Loss 5.9115 LearningRate 0.0331 Epoch: 8 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:09,182-Speed 3407.92 samples/sec Loss 5.9412 LearningRate 0.0331 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:12,184-Speed 3411.92 samples/sec Loss 5.9785 LearningRate 0.0331 Epoch: 8 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:15,219-Speed 3374.27 samples/sec Loss 5.9530 LearningRate 0.0331 Epoch: 8 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:18,228-Speed 3404.71 samples/sec Loss 5.9675 LearningRate 0.0331 Epoch: 8 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:21,249-Speed 3390.51 samples/sec Loss 5.9724 LearningRate 0.0331 Epoch: 8 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:24,274-Speed 3385.79 samples/sec Loss 5.9559 LearningRate 0.0331 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:39:27,287-Speed 3399.39 samples/sec Loss 5.8840 LearningRate 0.0330 Epoch: 8 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:39:30,304-Speed 3394.85 samples/sec Loss 5.9683 LearningRate 0.0330 Epoch: 8 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:39:33,294-Speed 3425.38 samples/sec Loss 5.9916 LearningRate 0.0330 Epoch: 8 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:36,305-Speed 3401.75 samples/sec Loss 5.9060 LearningRate 0.0330 Epoch: 8 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:39,310-Speed 3409.38 samples/sec Loss 5.9818 LearningRate 0.0330 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:42,327-Speed 3394.89 samples/sec Loss 5.8100 LearningRate 0.0330 Epoch: 8 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:45,332-Speed 3407.58 samples/sec Loss 5.9545 LearningRate 0.0330 Epoch: 8 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:48,344-Speed 3400.94 samples/sec Loss 5.7517 LearningRate 0.0330 Epoch: 8 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:51,369-Speed 3386.30 samples/sec Loss 6.0863 LearningRate 0.0330 Epoch: 8 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:54,393-Speed 3387.13 samples/sec Loss 6.1060 LearningRate 0.0329 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:39:57,400-Speed 3406.08 samples/sec Loss 6.0384 LearningRate 0.0329 Epoch: 8 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:00,409-Speed 3404.13 samples/sec Loss 5.9573 LearningRate 0.0329 Epoch: 8 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:03,418-Speed 3403.35 samples/sec Loss 5.8255 LearningRate 0.0329 Epoch: 8 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:40:06,417-Speed 3415.47 samples/sec Loss 5.9668 LearningRate 0.0329 Epoch: 8 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:09,434-Speed 3394.81 samples/sec Loss 5.9565 LearningRate 0.0329 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:12,440-Speed 3408.07 samples/sec Loss 5.9993 LearningRate 0.0329 Epoch: 8 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:15,440-Speed 3414.35 samples/sec Loss 6.1465 LearningRate 0.0329 Epoch: 8 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:18,446-Speed 3407.38 samples/sec Loss 5.9369 LearningRate 0.0329 Epoch: 8 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:21,451-Speed 3408.96 samples/sec Loss 6.0378 LearningRate 0.0328 Epoch: 8 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:24,457-Speed 3406.63 samples/sec Loss 5.9423 LearningRate 0.0328 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:27,462-Speed 3407.88 samples/sec Loss 5.8940 LearningRate 0.0328 Epoch: 8 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:30,474-Speed 3400.66 samples/sec Loss 5.8976 LearningRate 0.0328 Epoch: 8 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:33,487-Speed 3399.29 samples/sec Loss 5.9330 LearningRate 0.0328 Epoch: 8 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:36,484-Speed 3417.71 samples/sec Loss 5.8356 LearningRate 0.0328 Epoch: 8 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:39,497-Speed 3399.68 samples/sec Loss 6.0882 LearningRate 0.0328 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:42,508-Speed 3402.64 samples/sec Loss 5.9033 LearningRate 0.0328 Epoch: 8 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:45,512-Speed 3409.74 samples/sec Loss 6.0392 LearningRate 0.0327 Epoch: 8 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:48,527-Speed 3396.93 samples/sec Loss 5.9329 LearningRate 0.0327 Epoch: 8 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:51,557-Speed 3380.71 samples/sec Loss 6.0404 LearningRate 0.0327 Epoch: 8 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:54,570-Speed 3399.04 samples/sec Loss 5.9040 LearningRate 0.0327 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:40:57,579-Speed 3403.51 samples/sec Loss 6.0133 LearningRate 0.0327 Epoch: 8 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:00,592-Speed 3400.60 samples/sec Loss 5.9819 LearningRate 0.0327 Epoch: 8 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:03,649-Speed 3350.19 samples/sec Loss 5.7361 LearningRate 0.0327 Epoch: 8 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:06,668-Speed 3393.20 samples/sec Loss 6.0659 LearningRate 0.0327 Epoch: 8 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:41:09,675-Speed 3406.29 samples/sec Loss 5.9487 LearningRate 0.0327 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:41:12,657-Speed 3434.42 samples/sec Loss 6.1088 LearningRate 0.0326 Epoch: 8 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:15,675-Speed 3393.91 samples/sec Loss 5.9386 LearningRate 0.0326 Epoch: 8 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:18,690-Speed 3396.74 samples/sec Loss 5.7922 LearningRate 0.0326 Epoch: 8 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:21,699-Speed 3404.81 samples/sec Loss 5.9623 LearningRate 0.0326 Epoch: 8 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:24,710-Speed 3400.65 samples/sec Loss 5.9189 LearningRate 0.0326 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:27,716-Speed 3407.16 samples/sec Loss 5.8901 LearningRate 0.0326 Epoch: 8 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:30,736-Speed 3392.08 samples/sec Loss 5.9296 LearningRate 0.0326 Epoch: 8 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:33,742-Speed 3407.55 samples/sec Loss 6.0451 LearningRate 0.0326 Epoch: 8 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:36,748-Speed 3407.55 samples/sec Loss 5.9624 LearningRate 0.0326 Epoch: 8 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:39,757-Speed 3404.64 samples/sec Loss 6.0571 LearningRate 0.0325 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:42,762-Speed 3408.31 samples/sec Loss 5.8647 LearningRate 0.0325 Epoch: 8 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:41:45,746-Speed 3432.30 samples/sec Loss 6.0138 LearningRate 0.0325 Epoch: 8 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:48,768-Speed 3389.65 samples/sec Loss 6.0505 LearningRate 0.0325 Epoch: 8 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:51,777-Speed 3403.20 samples/sec Loss 5.9549 LearningRate 0.0325 Epoch: 8 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:54,800-Speed 3388.81 samples/sec Loss 5.9719 LearningRate 0.0325 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:41:57,802-Speed 3410.98 samples/sec Loss 5.8734 LearningRate 0.0325 Epoch: 8 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:00,820-Speed 3394.88 samples/sec Loss 5.8942 LearningRate 0.0325 Epoch: 8 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:03,842-Speed 3389.55 samples/sec Loss 6.0033 LearningRate 0.0325 Epoch: 8 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:06,852-Speed 3403.03 samples/sec Loss 5.9510 LearningRate 0.0324 Epoch: 8 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:09,857-Speed 3407.50 samples/sec Loss 5.9753 LearningRate 0.0324 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:12,849-Speed 3423.88 samples/sec Loss 6.0441 LearningRate 0.0324 Epoch: 8 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:15,863-Speed 3397.94 samples/sec Loss 5.9076 LearningRate 0.0324 Epoch: 8 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:18,881-Speed 3393.72 samples/sec Loss 5.8707 LearningRate 0.0324 Epoch: 8 Global Step: 43580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:21,890-Speed 3404.91 samples/sec Loss 5.8946 LearningRate 0.0324 Epoch: 8 Global Step: 43590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:24,891-Speed 3412.89 samples/sec Loss 5.9089 LearningRate 0.0324 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:27,917-Speed 3383.81 samples/sec Loss 5.8755 LearningRate 0.0324 Epoch: 8 Global Step: 43610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:30,937-Speed 3391.63 samples/sec Loss 5.8962 LearningRate 0.0324 Epoch: 8 Global Step: 43620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:33,959-Speed 3390.09 samples/sec Loss 5.8921 LearningRate 0.0323 Epoch: 8 Global Step: 43630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:36,965-Speed 3408.05 samples/sec Loss 5.9202 LearningRate 0.0323 Epoch: 8 Global Step: 43640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:39,971-Speed 3406.70 samples/sec Loss 5.9880 LearningRate 0.0323 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:42:42,987-Speed 3396.11 samples/sec Loss 5.9670 LearningRate 0.0323 Epoch: 8 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:45,991-Speed 3409.35 samples/sec Loss 6.0920 LearningRate 0.0323 Epoch: 8 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:48,997-Speed 3407.48 samples/sec Loss 5.9914 LearningRate 0.0323 Epoch: 8 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:52,004-Speed 3406.64 samples/sec Loss 5.9390 LearningRate 0.0323 Epoch: 8 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:55,032-Speed 3382.60 samples/sec Loss 6.0968 LearningRate 0.0323 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:42:58,042-Speed 3402.68 samples/sec Loss 6.0484 LearningRate 0.0323 Epoch: 8 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:01,057-Speed 3397.20 samples/sec Loss 5.9688 LearningRate 0.0322 Epoch: 8 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:04,063-Speed 3407.84 samples/sec Loss 5.9363 LearningRate 0.0322 Epoch: 8 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:07,075-Speed 3400.02 samples/sec Loss 6.0269 LearningRate 0.0322 Epoch: 8 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:10,090-Speed 3397.35 samples/sec Loss 5.9640 LearningRate 0.0322 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:13,101-Speed 3401.80 samples/sec Loss 6.0469 LearningRate 0.0322 Epoch: 8 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:43:16,110-Speed 3403.51 samples/sec Loss 5.7705 LearningRate 0.0322 Epoch: 8 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:43:19,153-Speed 3366.70 samples/sec Loss 6.1076 LearningRate 0.0322 Epoch: 8 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:43:22,159-Speed 3407.36 samples/sec Loss 5.9616 LearningRate 0.0322 Epoch: 8 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:43:25,156-Speed 3417.23 samples/sec Loss 6.0070 LearningRate 0.0322 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:28,163-Speed 3405.88 samples/sec Loss 6.0166 LearningRate 0.0321 Epoch: 8 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:31,168-Speed 3409.75 samples/sec Loss 5.7943 LearningRate 0.0321 Epoch: 8 Global Step: 43820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:34,174-Speed 3407.07 samples/sec Loss 5.8695 LearningRate 0.0321 Epoch: 8 Global Step: 43830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:37,202-Speed 3382.34 samples/sec Loss 5.8078 LearningRate 0.0321 Epoch: 8 Global Step: 43840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:40,208-Speed 3407.57 samples/sec Loss 5.8550 LearningRate 0.0321 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:43,214-Speed 3407.10 samples/sec Loss 5.8308 LearningRate 0.0321 Epoch: 8 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:46,218-Speed 3409.19 samples/sec Loss 5.7649 LearningRate 0.0321 Epoch: 8 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:49,225-Speed 3406.21 samples/sec Loss 5.8605 LearningRate 0.0321 Epoch: 8 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:52,231-Speed 3407.76 samples/sec Loss 5.9459 LearningRate 0.0321 Epoch: 8 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:43:55,243-Speed 3400.77 samples/sec Loss 5.8234 LearningRate 0.0320 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:43:58,229-Speed 3430.72 samples/sec Loss 5.8996 LearningRate 0.0320 Epoch: 8 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:01,235-Speed 3407.20 samples/sec Loss 5.8221 LearningRate 0.0320 Epoch: 8 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:04,244-Speed 3404.43 samples/sec Loss 5.8495 LearningRate 0.0320 Epoch: 8 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:07,251-Speed 3405.70 samples/sec Loss 6.0010 LearningRate 0.0320 Epoch: 8 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:10,258-Speed 3405.99 samples/sec Loss 5.9504 LearningRate 0.0320 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:13,262-Speed 3409.25 samples/sec Loss 5.8732 LearningRate 0.0320 Epoch: 8 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:16,281-Speed 3393.30 samples/sec Loss 5.8202 LearningRate 0.0320 Epoch: 8 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:19,300-Speed 3392.16 samples/sec Loss 5.9101 LearningRate 0.0319 Epoch: 8 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:22,313-Speed 3400.37 samples/sec Loss 5.7898 LearningRate 0.0319 Epoch: 8 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:44:25,326-Speed 3399.66 samples/sec Loss 6.0712 LearningRate 0.0319 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:45:09,601-[lfw][44000]XNorm: 22.249712 Training: 2022-04-11 03:45:09,602-[lfw][44000]Accuracy-Flip: 0.99667+-0.00325 Training: 2022-04-11 03:45:09,602-[lfw][44000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:46:00,795-[cfp_fp][44000]XNorm: 20.131060 Training: 2022-04-11 03:46:00,795-[cfp_fp][44000]Accuracy-Flip: 0.97543+-0.00800 Training: 2022-04-11 03:46:00,796-[cfp_fp][44000]Accuracy-Highest: 0.97543 Training: 2022-04-11 03:46:44,814-[agedb_30][44000]XNorm: 22.260320 Training: 2022-04-11 03:46:44,814-[agedb_30][44000]Accuracy-Flip: 0.98083+-0.00668 Training: 2022-04-11 03:46:44,815-[agedb_30][44000]Accuracy-Highest: 0.98083 Training: 2022-04-11 03:46:47,819-Speed 71.86 samples/sec Loss 5.8642 LearningRate 0.0319 Epoch: 8 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:46:50,788-Speed 3448.83 samples/sec Loss 6.0052 LearningRate 0.0319 Epoch: 8 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:46:53,773-Speed 3431.65 samples/sec Loss 6.0919 LearningRate 0.0319 Epoch: 8 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:46:56,763-Speed 3426.34 samples/sec Loss 6.0614 LearningRate 0.0319 Epoch: 8 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:46:59,796-Speed 3375.97 samples/sec Loss 5.9309 LearningRate 0.0319 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:02,795-Speed 3416.48 samples/sec Loss 5.9333 LearningRate 0.0319 Epoch: 8 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:05,796-Speed 3412.18 samples/sec Loss 5.8921 LearningRate 0.0318 Epoch: 8 Global Step: 44070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:08,797-Speed 3413.09 samples/sec Loss 5.9853 LearningRate 0.0318 Epoch: 8 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:11,802-Speed 3409.12 samples/sec Loss 5.8835 LearningRate 0.0318 Epoch: 8 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:14,800-Speed 3416.41 samples/sec Loss 5.9323 LearningRate 0.0318 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:17,883-Speed 3323.47 samples/sec Loss 6.0009 LearningRate 0.0318 Epoch: 8 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:20,878-Speed 3419.54 samples/sec Loss 5.8952 LearningRate 0.0318 Epoch: 8 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:47:23,861-Speed 3433.40 samples/sec Loss 5.8691 LearningRate 0.0318 Epoch: 8 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:26,862-Speed 3413.93 samples/sec Loss 5.9151 LearningRate 0.0318 Epoch: 8 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:29,870-Speed 3405.05 samples/sec Loss 6.0304 LearningRate 0.0318 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:32,871-Speed 3412.92 samples/sec Loss 5.8327 LearningRate 0.0317 Epoch: 8 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:35,908-Speed 3372.26 samples/sec Loss 5.8607 LearningRate 0.0317 Epoch: 8 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:38,913-Speed 3408.33 samples/sec Loss 5.9013 LearningRate 0.0317 Epoch: 8 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:41,921-Speed 3406.20 samples/sec Loss 5.8321 LearningRate 0.0317 Epoch: 8 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:44,923-Speed 3411.74 samples/sec Loss 5.8939 LearningRate 0.0317 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:47,934-Speed 3401.36 samples/sec Loss 5.9134 LearningRate 0.0317 Epoch: 8 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:50,939-Speed 3408.78 samples/sec Loss 5.9379 LearningRate 0.0317 Epoch: 8 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:47:53,939-Speed 3414.29 samples/sec Loss 6.0638 LearningRate 0.0317 Epoch: 8 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:47:56,940-Speed 3413.15 samples/sec Loss 5.9110 LearningRate 0.0317 Epoch: 8 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:47:59,949-Speed 3403.91 samples/sec Loss 6.0394 LearningRate 0.0316 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:02,950-Speed 3412.16 samples/sec Loss 5.8567 LearningRate 0.0316 Epoch: 8 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:05,978-Speed 3383.55 samples/sec Loss 5.9306 LearningRate 0.0316 Epoch: 8 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:08,975-Speed 3417.43 samples/sec Loss 5.9875 LearningRate 0.0316 Epoch: 8 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:11,972-Speed 3417.54 samples/sec Loss 5.8967 LearningRate 0.0316 Epoch: 8 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:14,969-Speed 3417.76 samples/sec Loss 5.8133 LearningRate 0.0316 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:17,985-Speed 3396.00 samples/sec Loss 5.8309 LearningRate 0.0316 Epoch: 8 Global Step: 44310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:20,983-Speed 3416.97 samples/sec Loss 5.9498 LearningRate 0.0316 Epoch: 8 Global Step: 44320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:23,984-Speed 3412.20 samples/sec Loss 5.9930 LearningRate 0.0316 Epoch: 8 Global Step: 44330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:26,983-Speed 3415.66 samples/sec Loss 6.1430 LearningRate 0.0315 Epoch: 8 Global Step: 44340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:29,985-Speed 3411.34 samples/sec Loss 5.7792 LearningRate 0.0315 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:32,990-Speed 3409.19 samples/sec Loss 5.7650 LearningRate 0.0315 Epoch: 8 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:35,989-Speed 3415.86 samples/sec Loss 5.9152 LearningRate 0.0315 Epoch: 8 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:38,984-Speed 3420.41 samples/sec Loss 5.8830 LearningRate 0.0315 Epoch: 8 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:41,987-Speed 3410.33 samples/sec Loss 5.9630 LearningRate 0.0315 Epoch: 8 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:44,987-Speed 3414.54 samples/sec Loss 5.7671 LearningRate 0.0315 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:47,988-Speed 3413.30 samples/sec Loss 5.8966 LearningRate 0.0315 Epoch: 8 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:51,005-Speed 3394.80 samples/sec Loss 5.8849 LearningRate 0.0315 Epoch: 8 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:48:53,985-Speed 3437.30 samples/sec Loss 5.8746 LearningRate 0.0314 Epoch: 8 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:48:56,981-Speed 3418.09 samples/sec Loss 5.8087 LearningRate 0.0314 Epoch: 8 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:00,005-Speed 3387.95 samples/sec Loss 5.9259 LearningRate 0.0314 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:03,087-Speed 3323.47 samples/sec Loss 5.8659 LearningRate 0.0314 Epoch: 8 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:06,106-Speed 3392.57 samples/sec Loss 6.0094 LearningRate 0.0314 Epoch: 8 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:09,104-Speed 3416.94 samples/sec Loss 5.9504 LearningRate 0.0314 Epoch: 8 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:12,104-Speed 3414.23 samples/sec Loss 5.8929 LearningRate 0.0314 Epoch: 8 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:15,110-Speed 3407.33 samples/sec Loss 5.8261 LearningRate 0.0314 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:18,112-Speed 3412.14 samples/sec Loss 5.7655 LearningRate 0.0314 Epoch: 8 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:21,113-Speed 3412.86 samples/sec Loss 5.8733 LearningRate 0.0313 Epoch: 8 Global Step: 44520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:24,123-Speed 3402.16 samples/sec Loss 5.8406 LearningRate 0.0313 Epoch: 8 Global Step: 44530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:27,135-Speed 3400.82 samples/sec Loss 5.9784 LearningRate 0.0313 Epoch: 8 Global Step: 44540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:30,134-Speed 3416.11 samples/sec Loss 5.8085 LearningRate 0.0313 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:33,139-Speed 3408.89 samples/sec Loss 5.9597 LearningRate 0.0313 Epoch: 8 Global Step: 44560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:36,138-Speed 3414.53 samples/sec Loss 6.0057 LearningRate 0.0313 Epoch: 8 Global Step: 44570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:39,150-Speed 3400.40 samples/sec Loss 5.9029 LearningRate 0.0313 Epoch: 8 Global Step: 44580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:42,152-Speed 3412.55 samples/sec Loss 5.9865 LearningRate 0.0313 Epoch: 8 Global Step: 44590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:45,153-Speed 3412.82 samples/sec Loss 5.8722 LearningRate 0.0313 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:48,155-Speed 3412.31 samples/sec Loss 5.7369 LearningRate 0.0312 Epoch: 8 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:49:51,158-Speed 3411.39 samples/sec Loss 5.8822 LearningRate 0.0312 Epoch: 8 Global Step: 44620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:54,177-Speed 3391.90 samples/sec Loss 5.7651 LearningRate 0.0312 Epoch: 8 Global Step: 44630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:49:57,187-Speed 3404.01 samples/sec Loss 5.8891 LearningRate 0.0312 Epoch: 8 Global Step: 44640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:00,186-Speed 3414.42 samples/sec Loss 5.7525 LearningRate 0.0312 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:03,191-Speed 3408.52 samples/sec Loss 5.8136 LearningRate 0.0312 Epoch: 8 Global Step: 44660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:06,198-Speed 3406.44 samples/sec Loss 5.8173 LearningRate 0.0312 Epoch: 8 Global Step: 44670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:09,198-Speed 3414.54 samples/sec Loss 5.9325 LearningRate 0.0312 Epoch: 8 Global Step: 44680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:12,204-Speed 3407.67 samples/sec Loss 5.8249 LearningRate 0.0312 Epoch: 8 Global Step: 44690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:15,205-Speed 3412.53 samples/sec Loss 5.8417 LearningRate 0.0312 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:18,208-Speed 3410.00 samples/sec Loss 5.7462 LearningRate 0.0311 Epoch: 8 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:21,211-Speed 3410.98 samples/sec Loss 5.8031 LearningRate 0.0311 Epoch: 8 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:50:24,207-Speed 3419.53 samples/sec Loss 5.9268 LearningRate 0.0311 Epoch: 8 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:27,207-Speed 3413.91 samples/sec Loss 5.8462 LearningRate 0.0311 Epoch: 8 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:30,211-Speed 3410.33 samples/sec Loss 5.8793 LearningRate 0.0311 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:33,235-Speed 3386.92 samples/sec Loss 5.8917 LearningRate 0.0311 Epoch: 8 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:36,260-Speed 3386.61 samples/sec Loss 5.8884 LearningRate 0.0311 Epoch: 8 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:39,259-Speed 3414.56 samples/sec Loss 5.9642 LearningRate 0.0311 Epoch: 8 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:42,258-Speed 3415.29 samples/sec Loss 5.7843 LearningRate 0.0311 Epoch: 8 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:45,265-Speed 3405.82 samples/sec Loss 5.8906 LearningRate 0.0310 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:48,272-Speed 3406.60 samples/sec Loss 5.8185 LearningRate 0.0310 Epoch: 8 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:51,288-Speed 3396.72 samples/sec Loss 5.7824 LearningRate 0.0310 Epoch: 8 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:50:54,288-Speed 3413.64 samples/sec Loss 5.9317 LearningRate 0.0310 Epoch: 8 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:50:57,271-Speed 3434.70 samples/sec Loss 5.8832 LearningRate 0.0310 Epoch: 8 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:00,271-Speed 3413.86 samples/sec Loss 5.8933 LearningRate 0.0310 Epoch: 8 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:03,276-Speed 3407.76 samples/sec Loss 5.7833 LearningRate 0.0310 Epoch: 8 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:06,287-Speed 3402.50 samples/sec Loss 5.9044 LearningRate 0.0310 Epoch: 8 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:09,293-Speed 3406.97 samples/sec Loss 5.8716 LearningRate 0.0310 Epoch: 8 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:12,306-Speed 3399.13 samples/sec Loss 5.8888 LearningRate 0.0309 Epoch: 8 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:15,320-Speed 3399.12 samples/sec Loss 5.7594 LearningRate 0.0309 Epoch: 8 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:18,351-Speed 3378.83 samples/sec Loss 6.0182 LearningRate 0.0309 Epoch: 8 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:21,354-Speed 3410.84 samples/sec Loss 5.8751 LearningRate 0.0309 Epoch: 8 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:24,367-Speed 3400.29 samples/sec Loss 5.8265 LearningRate 0.0309 Epoch: 8 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:27,369-Speed 3411.96 samples/sec Loss 5.9543 LearningRate 0.0309 Epoch: 8 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:51:30,351-Speed 3434.31 samples/sec Loss 5.8189 LearningRate 0.0309 Epoch: 8 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:33,359-Speed 3405.20 samples/sec Loss 5.8077 LearningRate 0.0309 Epoch: 8 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:36,360-Speed 3413.20 samples/sec Loss 5.8557 LearningRate 0.0309 Epoch: 8 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:39,361-Speed 3412.96 samples/sec Loss 5.9159 LearningRate 0.0308 Epoch: 8 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:42,369-Speed 3405.26 samples/sec Loss 5.8881 LearningRate 0.0308 Epoch: 8 Global Step: 44990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:45,375-Speed 3407.57 samples/sec Loss 5.7404 LearningRate 0.0308 Epoch: 8 Global Step: 45000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:48,376-Speed 3412.31 samples/sec Loss 5.8521 LearningRate 0.0308 Epoch: 8 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:51,385-Speed 3405.21 samples/sec Loss 5.8481 LearningRate 0.0308 Epoch: 8 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:54,388-Speed 3410.39 samples/sec Loss 5.8319 LearningRate 0.0308 Epoch: 8 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:51:57,390-Speed 3411.56 samples/sec Loss 5.9004 LearningRate 0.0308 Epoch: 8 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:00,393-Speed 3410.75 samples/sec Loss 5.8730 LearningRate 0.0308 Epoch: 8 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:52:03,414-Speed 3390.20 samples/sec Loss 5.6835 LearningRate 0.0308 Epoch: 8 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:52:06,408-Speed 3421.55 samples/sec Loss 5.8523 LearningRate 0.0307 Epoch: 8 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:09,411-Speed 3410.28 samples/sec Loss 5.7407 LearningRate 0.0307 Epoch: 8 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:12,421-Speed 3402.72 samples/sec Loss 5.8076 LearningRate 0.0307 Epoch: 8 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:15,423-Speed 3412.41 samples/sec Loss 5.8482 LearningRate 0.0307 Epoch: 8 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:18,431-Speed 3405.81 samples/sec Loss 5.8633 LearningRate 0.0307 Epoch: 8 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:21,436-Speed 3408.45 samples/sec Loss 5.7839 LearningRate 0.0307 Epoch: 8 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:24,437-Speed 3412.98 samples/sec Loss 5.6228 LearningRate 0.0307 Epoch: 8 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:27,442-Speed 3408.67 samples/sec Loss 5.7659 LearningRate 0.0307 Epoch: 8 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:30,450-Speed 3404.73 samples/sec Loss 5.8436 LearningRate 0.0307 Epoch: 8 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:33,452-Speed 3411.34 samples/sec Loss 5.7869 LearningRate 0.0306 Epoch: 8 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:36,460-Speed 3405.61 samples/sec Loss 5.9243 LearningRate 0.0306 Epoch: 8 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:52:39,468-Speed 3404.65 samples/sec Loss 5.7647 LearningRate 0.0306 Epoch: 8 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:52:42,479-Speed 3402.08 samples/sec Loss 5.7988 LearningRate 0.0306 Epoch: 8 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:52:45,471-Speed 3423.70 samples/sec Loss 5.7626 LearningRate 0.0306 Epoch: 8 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:48,490-Speed 3392.29 samples/sec Loss 5.8395 LearningRate 0.0306 Epoch: 8 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:51,519-Speed 3381.85 samples/sec Loss 5.8752 LearningRate 0.0306 Epoch: 8 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:54,526-Speed 3405.63 samples/sec Loss 5.8552 LearningRate 0.0306 Epoch: 8 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:52:57,529-Speed 3411.81 samples/sec Loss 5.9451 LearningRate 0.0306 Epoch: 8 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:00,549-Speed 3391.25 samples/sec Loss 5.8765 LearningRate 0.0305 Epoch: 8 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:03,553-Speed 3408.84 samples/sec Loss 5.8148 LearningRate 0.0305 Epoch: 8 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:06,607-Speed 3353.98 samples/sec Loss 5.7817 LearningRate 0.0305 Epoch: 8 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:09,613-Speed 3407.25 samples/sec Loss 5.8922 LearningRate 0.0305 Epoch: 8 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:12,617-Speed 3410.40 samples/sec Loss 5.7686 LearningRate 0.0305 Epoch: 8 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:15,623-Speed 3407.49 samples/sec Loss 5.8107 LearningRate 0.0305 Epoch: 8 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:18,633-Speed 3403.06 samples/sec Loss 5.8570 LearningRate 0.0305 Epoch: 8 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:21,644-Speed 3401.77 samples/sec Loss 5.7692 LearningRate 0.0305 Epoch: 8 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:24,664-Speed 3390.58 samples/sec Loss 5.9350 LearningRate 0.0305 Epoch: 8 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:27,668-Speed 3410.84 samples/sec Loss 5.7361 LearningRate 0.0304 Epoch: 8 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:30,674-Speed 3407.09 samples/sec Loss 5.7389 LearningRate 0.0304 Epoch: 8 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:33,681-Speed 3405.56 samples/sec Loss 5.7516 LearningRate 0.0304 Epoch: 8 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:36,685-Speed 3410.18 samples/sec Loss 5.8500 LearningRate 0.0304 Epoch: 8 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:39,690-Speed 3408.92 samples/sec Loss 5.8227 LearningRate 0.0304 Epoch: 8 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:42,708-Speed 3393.94 samples/sec Loss 5.9617 LearningRate 0.0304 Epoch: 8 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:45,709-Speed 3412.72 samples/sec Loss 5.8615 LearningRate 0.0304 Epoch: 8 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:48,713-Speed 3409.50 samples/sec Loss 5.7304 LearningRate 0.0304 Epoch: 8 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:51,714-Speed 3412.96 samples/sec Loss 5.7777 LearningRate 0.0304 Epoch: 8 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:54,721-Speed 3406.35 samples/sec Loss 5.8931 LearningRate 0.0304 Epoch: 8 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:53:57,723-Speed 3412.27 samples/sec Loss 5.8304 LearningRate 0.0303 Epoch: 8 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:00,732-Speed 3403.04 samples/sec Loss 5.8099 LearningRate 0.0303 Epoch: 8 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:03,739-Speed 3406.16 samples/sec Loss 5.8861 LearningRate 0.0303 Epoch: 8 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:06,745-Speed 3408.76 samples/sec Loss 5.8137 LearningRate 0.0303 Epoch: 8 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:09,759-Speed 3397.28 samples/sec Loss 5.8916 LearningRate 0.0303 Epoch: 8 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:12,772-Speed 3399.76 samples/sec Loss 6.0053 LearningRate 0.0303 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:15,781-Speed 3404.09 samples/sec Loss 5.8987 LearningRate 0.0303 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:54:18,768-Speed 3429.25 samples/sec Loss 5.8117 LearningRate 0.0303 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:21,848-Speed 3325.05 samples/sec Loss 5.7765 LearningRate 0.0303 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:34,232-Speed 826.95 samples/sec Loss 5.2065 LearningRate 0.0302 Epoch: 9 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:37,252-Speed 3391.93 samples/sec Loss 4.8917 LearningRate 0.0302 Epoch: 9 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:40,280-Speed 3382.56 samples/sec Loss 5.0386 LearningRate 0.0302 Epoch: 9 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:43,304-Speed 3387.39 samples/sec Loss 4.9489 LearningRate 0.0302 Epoch: 9 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:46,323-Speed 3392.42 samples/sec Loss 5.1061 LearningRate 0.0302 Epoch: 9 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:49,403-Speed 3326.43 samples/sec Loss 5.0126 LearningRate 0.0302 Epoch: 9 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:52,413-Speed 3402.54 samples/sec Loss 5.0469 LearningRate 0.0302 Epoch: 9 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:55,423-Speed 3402.41 samples/sec Loss 5.0692 LearningRate 0.0302 Epoch: 9 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:54:58,430-Speed 3406.84 samples/sec Loss 5.1437 LearningRate 0.0302 Epoch: 9 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:55:01,431-Speed 3413.37 samples/sec Loss 4.9158 LearningRate 0.0301 Epoch: 9 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:55:04,484-Speed 3355.24 samples/sec Loss 4.9515 LearningRate 0.0301 Epoch: 9 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:55:07,485-Speed 3411.85 samples/sec Loss 5.0875 LearningRate 0.0301 Epoch: 9 Global Step: 45640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:10,496-Speed 3402.47 samples/sec Loss 5.0600 LearningRate 0.0301 Epoch: 9 Global Step: 45650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:13,506-Speed 3402.87 samples/sec Loss 4.9215 LearningRate 0.0301 Epoch: 9 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:16,544-Speed 3370.90 samples/sec Loss 5.1093 LearningRate 0.0301 Epoch: 9 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:19,563-Speed 3393.31 samples/sec Loss 5.1684 LearningRate 0.0301 Epoch: 9 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:22,580-Speed 3395.15 samples/sec Loss 5.0457 LearningRate 0.0301 Epoch: 9 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:25,592-Speed 3400.40 samples/sec Loss 5.0505 LearningRate 0.0301 Epoch: 9 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:28,593-Speed 3413.50 samples/sec Loss 4.9961 LearningRate 0.0300 Epoch: 9 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:31,601-Speed 3404.28 samples/sec Loss 5.1279 LearningRate 0.0300 Epoch: 9 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:34,605-Speed 3410.62 samples/sec Loss 4.9850 LearningRate 0.0300 Epoch: 9 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:37,611-Speed 3406.95 samples/sec Loss 5.0851 LearningRate 0.0300 Epoch: 9 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:55:40,622-Speed 3401.30 samples/sec Loss 5.0356 LearningRate 0.0300 Epoch: 9 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 03:55:43,616-Speed 3421.30 samples/sec Loss 5.2160 LearningRate 0.0300 Epoch: 9 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:46,616-Speed 3414.05 samples/sec Loss 5.1380 LearningRate 0.0300 Epoch: 9 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:49,638-Speed 3388.63 samples/sec Loss 5.0740 LearningRate 0.0300 Epoch: 9 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:52,646-Speed 3405.32 samples/sec Loss 5.3304 LearningRate 0.0300 Epoch: 9 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:55,656-Speed 3403.63 samples/sec Loss 5.0899 LearningRate 0.0299 Epoch: 9 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:55:58,672-Speed 3395.92 samples/sec Loss 4.9939 LearningRate 0.0299 Epoch: 9 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:01,680-Speed 3404.34 samples/sec Loss 5.1931 LearningRate 0.0299 Epoch: 9 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:04,689-Speed 3404.87 samples/sec Loss 5.1395 LearningRate 0.0299 Epoch: 9 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:07,697-Speed 3404.47 samples/sec Loss 5.1148 LearningRate 0.0299 Epoch: 9 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:10,703-Speed 3408.14 samples/sec Loss 5.1427 LearningRate 0.0299 Epoch: 9 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:13,686-Speed 3432.74 samples/sec Loss 5.1997 LearningRate 0.0299 Epoch: 9 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:16,696-Speed 3403.60 samples/sec Loss 5.2560 LearningRate 0.0299 Epoch: 9 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:19,683-Speed 3429.21 samples/sec Loss 5.1218 LearningRate 0.0299 Epoch: 9 Global Step: 45880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:22,700-Speed 3394.30 samples/sec Loss 5.1061 LearningRate 0.0299 Epoch: 9 Global Step: 45890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:25,704-Speed 3409.62 samples/sec Loss 5.2268 LearningRate 0.0298 Epoch: 9 Global Step: 45900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:28,827-Speed 3280.24 samples/sec Loss 5.2606 LearningRate 0.0298 Epoch: 9 Global Step: 45910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:31,835-Speed 3404.90 samples/sec Loss 5.2577 LearningRate 0.0298 Epoch: 9 Global Step: 45920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:34,839-Speed 3409.62 samples/sec Loss 5.1279 LearningRate 0.0298 Epoch: 9 Global Step: 45930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:37,853-Speed 3398.59 samples/sec Loss 5.2932 LearningRate 0.0298 Epoch: 9 Global Step: 45940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:40,858-Speed 3408.65 samples/sec Loss 5.3948 LearningRate 0.0298 Epoch: 9 Global Step: 45950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:43,865-Speed 3406.30 samples/sec Loss 5.2828 LearningRate 0.0298 Epoch: 9 Global Step: 45960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:46,882-Speed 3393.90 samples/sec Loss 5.1796 LearningRate 0.0298 Epoch: 9 Global Step: 45970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:56:49,892-Speed 3403.30 samples/sec Loss 5.1869 LearningRate 0.0298 Epoch: 9 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:53,013-Speed 3282.15 samples/sec Loss 5.3348 LearningRate 0.0297 Epoch: 9 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:56:56,024-Speed 3401.27 samples/sec Loss 5.2796 LearningRate 0.0297 Epoch: 9 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:57:40,078-[lfw][46000]XNorm: 22.894358 Training: 2022-04-11 03:57:40,079-[lfw][46000]Accuracy-Flip: 0.99783+-0.00224 Training: 2022-04-11 03:57:40,079-[lfw][46000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:58:31,428-[cfp_fp][46000]XNorm: 20.601470 Training: 2022-04-11 03:58:31,428-[cfp_fp][46000]Accuracy-Flip: 0.97157+-0.00799 Training: 2022-04-11 03:58:31,429-[cfp_fp][46000]Accuracy-Highest: 0.97543 Training: 2022-04-11 03:59:15,673-[agedb_30][46000]XNorm: 22.624711 Training: 2022-04-11 03:59:15,674-[agedb_30][46000]Accuracy-Flip: 0.97967+-0.00653 Training: 2022-04-11 03:59:15,674-[agedb_30][46000]Accuracy-Highest: 0.98083 Training: 2022-04-11 03:59:18,671-Speed 71.79 samples/sec Loss 5.2783 LearningRate 0.0297 Epoch: 9 Global Step: 46010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:21,657-Speed 3429.95 samples/sec Loss 5.2581 LearningRate 0.0297 Epoch: 9 Global Step: 46020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:24,658-Speed 3412.14 samples/sec Loss 5.3597 LearningRate 0.0297 Epoch: 9 Global Step: 46030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:27,650-Speed 3423.30 samples/sec Loss 5.2082 LearningRate 0.0297 Epoch: 9 Global Step: 46040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:30,639-Speed 3426.96 samples/sec Loss 5.3183 LearningRate 0.0297 Epoch: 9 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:33,626-Speed 3428.92 samples/sec Loss 5.3331 LearningRate 0.0297 Epoch: 9 Global Step: 46060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:36,628-Speed 3411.84 samples/sec Loss 5.3094 LearningRate 0.0297 Epoch: 9 Global Step: 46070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:39,629-Speed 3413.73 samples/sec Loss 5.2876 LearningRate 0.0296 Epoch: 9 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:42,619-Speed 3425.75 samples/sec Loss 5.3493 LearningRate 0.0296 Epoch: 9 Global Step: 46090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:45,609-Speed 3424.69 samples/sec Loss 5.3447 LearningRate 0.0296 Epoch: 9 Global Step: 46100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 03:59:48,609-Speed 3414.78 samples/sec Loss 5.3262 LearningRate 0.0296 Epoch: 9 Global Step: 46110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:59:51,599-Speed 3425.44 samples/sec Loss 5.3824 LearningRate 0.0296 Epoch: 9 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:59:54,603-Speed 3409.09 samples/sec Loss 5.3492 LearningRate 0.0296 Epoch: 9 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 03:59:57,598-Speed 3420.69 samples/sec Loss 5.3327 LearningRate 0.0296 Epoch: 9 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:00,577-Speed 3438.29 samples/sec Loss 5.1828 LearningRate 0.0296 Epoch: 9 Global Step: 46150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:03,582-Speed 3408.89 samples/sec Loss 5.4298 LearningRate 0.0296 Epoch: 9 Global Step: 46160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:06,578-Speed 3418.12 samples/sec Loss 5.2257 LearningRate 0.0295 Epoch: 9 Global Step: 46170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:09,573-Speed 3420.41 samples/sec Loss 5.2835 LearningRate 0.0295 Epoch: 9 Global Step: 46180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:12,572-Speed 3414.99 samples/sec Loss 5.3891 LearningRate 0.0295 Epoch: 9 Global Step: 46190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:15,571-Speed 3415.03 samples/sec Loss 5.3824 LearningRate 0.0295 Epoch: 9 Global Step: 46200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:18,587-Speed 3396.55 samples/sec Loss 5.3158 LearningRate 0.0295 Epoch: 9 Global Step: 46210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:21,587-Speed 3414.34 samples/sec Loss 5.2907 LearningRate 0.0295 Epoch: 9 Global Step: 46220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:24,594-Speed 3405.75 samples/sec Loss 5.3121 LearningRate 0.0295 Epoch: 9 Global Step: 46230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:27,618-Speed 3387.04 samples/sec Loss 5.3665 LearningRate 0.0295 Epoch: 9 Global Step: 46240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:00:30,624-Speed 3407.74 samples/sec Loss 5.3940 LearningRate 0.0295 Epoch: 9 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:33,629-Speed 3408.24 samples/sec Loss 5.4352 LearningRate 0.0295 Epoch: 9 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:36,628-Speed 3415.73 samples/sec Loss 5.3078 LearningRate 0.0294 Epoch: 9 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:39,629-Speed 3412.38 samples/sec Loss 5.3276 LearningRate 0.0294 Epoch: 9 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:42,633-Speed 3410.04 samples/sec Loss 5.5195 LearningRate 0.0294 Epoch: 9 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:45,634-Speed 3412.84 samples/sec Loss 5.5237 LearningRate 0.0294 Epoch: 9 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:48,647-Speed 3399.67 samples/sec Loss 5.3673 LearningRate 0.0294 Epoch: 9 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:51,648-Speed 3413.10 samples/sec Loss 5.3944 LearningRate 0.0294 Epoch: 9 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:54,643-Speed 3419.62 samples/sec Loss 5.2891 LearningRate 0.0294 Epoch: 9 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:00:57,649-Speed 3407.15 samples/sec Loss 5.3847 LearningRate 0.0294 Epoch: 9 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:00,649-Speed 3414.60 samples/sec Loss 5.4336 LearningRate 0.0294 Epoch: 9 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:01:03,651-Speed 3412.57 samples/sec Loss 5.3958 LearningRate 0.0293 Epoch: 9 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:01:06,744-Speed 3310.51 samples/sec Loss 5.5512 LearningRate 0.0293 Epoch: 9 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:09,739-Speed 3420.27 samples/sec Loss 5.4958 LearningRate 0.0293 Epoch: 9 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:12,737-Speed 3417.08 samples/sec Loss 5.5623 LearningRate 0.0293 Epoch: 9 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:15,742-Speed 3408.20 samples/sec Loss 5.2804 LearningRate 0.0293 Epoch: 9 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:18,748-Speed 3407.67 samples/sec Loss 5.2808 LearningRate 0.0293 Epoch: 9 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:21,756-Speed 3404.66 samples/sec Loss 5.2629 LearningRate 0.0293 Epoch: 9 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:24,752-Speed 3418.57 samples/sec Loss 5.3645 LearningRate 0.0293 Epoch: 9 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:27,754-Speed 3411.47 samples/sec Loss 5.4124 LearningRate 0.0293 Epoch: 9 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:30,758-Speed 3409.74 samples/sec Loss 5.3331 LearningRate 0.0292 Epoch: 9 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:33,753-Speed 3421.22 samples/sec Loss 5.5081 LearningRate 0.0292 Epoch: 9 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:36,754-Speed 3411.90 samples/sec Loss 5.3738 LearningRate 0.0292 Epoch: 9 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:01:39,755-Speed 3413.27 samples/sec Loss 5.2747 LearningRate 0.0292 Epoch: 9 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:01:42,737-Speed 3435.57 samples/sec Loss 5.4594 LearningRate 0.0292 Epoch: 9 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:45,734-Speed 3417.03 samples/sec Loss 5.4510 LearningRate 0.0292 Epoch: 9 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:48,739-Speed 3408.40 samples/sec Loss 5.4793 LearningRate 0.0292 Epoch: 9 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:51,744-Speed 3408.79 samples/sec Loss 5.4671 LearningRate 0.0292 Epoch: 9 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:54,741-Speed 3416.94 samples/sec Loss 5.4706 LearningRate 0.0292 Epoch: 9 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:01:57,739-Speed 3416.39 samples/sec Loss 5.4302 LearningRate 0.0292 Epoch: 9 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:00,736-Speed 3417.74 samples/sec Loss 5.4615 LearningRate 0.0291 Epoch: 9 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:03,739-Speed 3411.54 samples/sec Loss 5.5114 LearningRate 0.0291 Epoch: 9 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:06,738-Speed 3414.54 samples/sec Loss 5.5233 LearningRate 0.0291 Epoch: 9 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:09,749-Speed 3401.80 samples/sec Loss 5.3292 LearningRate 0.0291 Epoch: 9 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:12,748-Speed 3415.53 samples/sec Loss 5.4721 LearningRate 0.0291 Epoch: 9 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:02:15,752-Speed 3409.28 samples/sec Loss 5.3204 LearningRate 0.0291 Epoch: 9 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:02:18,764-Speed 3400.81 samples/sec Loss 5.5185 LearningRate 0.0291 Epoch: 9 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:02:21,761-Speed 3417.42 samples/sec Loss 5.4185 LearningRate 0.0291 Epoch: 9 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:02:24,745-Speed 3432.47 samples/sec Loss 5.4870 LearningRate 0.0291 Epoch: 9 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:27,757-Speed 3400.70 samples/sec Loss 5.3830 LearningRate 0.0290 Epoch: 9 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:30,759-Speed 3412.60 samples/sec Loss 5.6241 LearningRate 0.0290 Epoch: 9 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:33,756-Speed 3417.36 samples/sec Loss 5.5085 LearningRate 0.0290 Epoch: 9 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:36,853-Speed 3307.67 samples/sec Loss 5.3946 LearningRate 0.0290 Epoch: 9 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:39,893-Speed 3368.44 samples/sec Loss 5.5283 LearningRate 0.0290 Epoch: 9 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:42,896-Speed 3411.39 samples/sec Loss 5.3179 LearningRate 0.0290 Epoch: 9 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:45,899-Speed 3410.67 samples/sec Loss 5.4737 LearningRate 0.0290 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:48,899-Speed 3414.45 samples/sec Loss 5.6211 LearningRate 0.0290 Epoch: 9 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:51,915-Speed 3395.00 samples/sec Loss 5.5703 LearningRate 0.0290 Epoch: 9 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:02:54,918-Speed 3410.89 samples/sec Loss 5.4867 LearningRate 0.0290 Epoch: 9 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:02:57,925-Speed 3406.93 samples/sec Loss 5.4721 LearningRate 0.0289 Epoch: 9 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:03:00,949-Speed 3386.51 samples/sec Loss 5.4367 LearningRate 0.0289 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:03:03,960-Speed 3402.25 samples/sec Loss 5.4975 LearningRate 0.0289 Epoch: 9 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:03:06,964-Speed 3409.28 samples/sec Loss 5.4201 LearningRate 0.0289 Epoch: 9 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:03:09,965-Speed 3412.84 samples/sec Loss 5.5425 LearningRate 0.0289 Epoch: 9 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:03:12,948-Speed 3434.99 samples/sec Loss 5.5305 LearningRate 0.0289 Epoch: 9 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:15,951-Speed 3410.87 samples/sec Loss 5.4381 LearningRate 0.0289 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:18,961-Speed 3403.14 samples/sec Loss 5.3316 LearningRate 0.0289 Epoch: 9 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:21,961-Speed 3413.57 samples/sec Loss 5.3470 LearningRate 0.0289 Epoch: 9 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:24,964-Speed 3411.23 samples/sec Loss 5.3926 LearningRate 0.0288 Epoch: 9 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:27,964-Speed 3414.44 samples/sec Loss 5.3832 LearningRate 0.0288 Epoch: 9 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:30,966-Speed 3411.61 samples/sec Loss 5.4977 LearningRate 0.0288 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:33,972-Speed 3407.53 samples/sec Loss 5.6001 LearningRate 0.0288 Epoch: 9 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:36,974-Speed 3411.93 samples/sec Loss 5.4998 LearningRate 0.0288 Epoch: 9 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:39,975-Speed 3413.01 samples/sec Loss 5.5329 LearningRate 0.0288 Epoch: 9 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:42,966-Speed 3424.59 samples/sec Loss 5.3249 LearningRate 0.0288 Epoch: 9 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:45,968-Speed 3412.11 samples/sec Loss 5.4111 LearningRate 0.0288 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:48,976-Speed 3405.48 samples/sec Loss 5.3417 LearningRate 0.0288 Epoch: 9 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:51,995-Speed 3391.60 samples/sec Loss 5.4493 LearningRate 0.0287 Epoch: 9 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:55,005-Speed 3402.91 samples/sec Loss 5.2943 LearningRate 0.0287 Epoch: 9 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:03:58,008-Speed 3411.72 samples/sec Loss 5.4319 LearningRate 0.0287 Epoch: 9 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:01,016-Speed 3404.98 samples/sec Loss 5.4996 LearningRate 0.0287 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:04,078-Speed 3345.13 samples/sec Loss 5.3327 LearningRate 0.0287 Epoch: 9 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:07,091-Speed 3399.26 samples/sec Loss 5.4907 LearningRate 0.0287 Epoch: 9 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:10,094-Speed 3410.05 samples/sec Loss 5.5476 LearningRate 0.0287 Epoch: 9 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:13,097-Speed 3411.80 samples/sec Loss 5.4290 LearningRate 0.0287 Epoch: 9 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:04:16,090-Speed 3421.79 samples/sec Loss 5.5041 LearningRate 0.0287 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:19,093-Speed 3411.14 samples/sec Loss 5.4714 LearningRate 0.0287 Epoch: 9 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:22,145-Speed 3355.97 samples/sec Loss 5.5294 LearningRate 0.0286 Epoch: 9 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:25,188-Speed 3366.11 samples/sec Loss 5.2839 LearningRate 0.0286 Epoch: 9 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:28,196-Speed 3405.50 samples/sec Loss 5.3787 LearningRate 0.0286 Epoch: 9 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:31,212-Speed 3395.54 samples/sec Loss 5.3735 LearningRate 0.0286 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:34,214-Speed 3411.78 samples/sec Loss 5.5117 LearningRate 0.0286 Epoch: 9 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:37,221-Speed 3405.81 samples/sec Loss 5.5585 LearningRate 0.0286 Epoch: 9 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:40,229-Speed 3405.78 samples/sec Loss 5.7448 LearningRate 0.0286 Epoch: 9 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:43,292-Speed 3343.70 samples/sec Loss 5.6093 LearningRate 0.0286 Epoch: 9 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:04:46,280-Speed 3428.10 samples/sec Loss 5.5032 LearningRate 0.0286 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:04:49,299-Speed 3391.82 samples/sec Loss 5.3787 LearningRate 0.0285 Epoch: 9 Global Step: 47110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:04:52,309-Speed 3403.30 samples/sec Loss 5.6982 LearningRate 0.0285 Epoch: 9 Global Step: 47120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:04:55,313-Speed 3410.04 samples/sec Loss 5.4271 LearningRate 0.0285 Epoch: 9 Global Step: 47130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:04:58,317-Speed 3409.55 samples/sec Loss 5.3607 LearningRate 0.0285 Epoch: 9 Global Step: 47140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:01,325-Speed 3405.62 samples/sec Loss 5.5383 LearningRate 0.0285 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:04,337-Speed 3399.28 samples/sec Loss 5.4425 LearningRate 0.0285 Epoch: 9 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:07,343-Speed 3408.33 samples/sec Loss 5.4831 LearningRate 0.0285 Epoch: 9 Global Step: 47170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:10,348-Speed 3407.55 samples/sec Loss 5.4661 LearningRate 0.0285 Epoch: 9 Global Step: 47180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:13,353-Speed 3409.31 samples/sec Loss 5.5436 LearningRate 0.0285 Epoch: 9 Global Step: 47190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 04:05:16,407-Speed 3353.19 samples/sec Loss 5.5051 LearningRate 0.0285 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:19,431-Speed 3387.17 samples/sec Loss 5.4730 LearningRate 0.0284 Epoch: 9 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:22,444-Speed 3400.01 samples/sec Loss 5.4450 LearningRate 0.0284 Epoch: 9 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:25,449-Speed 3408.95 samples/sec Loss 5.4261 LearningRate 0.0284 Epoch: 9 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:28,468-Speed 3392.78 samples/sec Loss 5.4290 LearningRate 0.0284 Epoch: 9 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:31,478-Speed 3401.84 samples/sec Loss 5.5208 LearningRate 0.0284 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:34,485-Speed 3406.51 samples/sec Loss 5.4529 LearningRate 0.0284 Epoch: 9 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:37,500-Speed 3396.82 samples/sec Loss 5.4388 LearningRate 0.0284 Epoch: 9 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:40,509-Speed 3403.69 samples/sec Loss 5.4476 LearningRate 0.0284 Epoch: 9 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:43,515-Speed 3408.37 samples/sec Loss 5.5065 LearningRate 0.0284 Epoch: 9 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:46,533-Speed 3393.38 samples/sec Loss 5.3900 LearningRate 0.0283 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 04:05:49,519-Speed 3429.76 samples/sec Loss 5.5682 LearningRate 0.0283 Epoch: 9 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:52,525-Speed 3408.05 samples/sec Loss 5.4730 LearningRate 0.0283 Epoch: 9 Global Step: 47320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:55,534-Speed 3403.72 samples/sec Loss 5.4974 LearningRate 0.0283 Epoch: 9 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:05:58,546-Speed 3401.47 samples/sec Loss 5.3747 LearningRate 0.0283 Epoch: 9 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:01,553-Speed 3405.89 samples/sec Loss 5.5311 LearningRate 0.0283 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:04,561-Speed 3404.95 samples/sec Loss 5.5149 LearningRate 0.0283 Epoch: 9 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:07,565-Speed 3409.29 samples/sec Loss 5.4444 LearningRate 0.0283 Epoch: 9 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:10,577-Speed 3400.71 samples/sec Loss 5.3772 LearningRate 0.0283 Epoch: 9 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:13,579-Speed 3412.14 samples/sec Loss 5.4922 LearningRate 0.0283 Epoch: 9 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:16,584-Speed 3407.55 samples/sec Loss 5.5477 LearningRate 0.0282 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:19,572-Speed 3428.56 samples/sec Loss 5.4713 LearningRate 0.0282 Epoch: 9 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:22,577-Speed 3408.82 samples/sec Loss 5.4986 LearningRate 0.0282 Epoch: 9 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:25,589-Speed 3400.50 samples/sec Loss 5.4414 LearningRate 0.0282 Epoch: 9 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:28,599-Speed 3402.71 samples/sec Loss 5.5150 LearningRate 0.0282 Epoch: 9 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:31,602-Speed 3411.01 samples/sec Loss 5.5595 LearningRate 0.0282 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:34,606-Speed 3409.26 samples/sec Loss 5.4664 LearningRate 0.0282 Epoch: 9 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:37,617-Speed 3401.53 samples/sec Loss 5.6317 LearningRate 0.0282 Epoch: 9 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:40,623-Speed 3407.39 samples/sec Loss 5.3990 LearningRate 0.0282 Epoch: 9 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:43,628-Speed 3408.44 samples/sec Loss 5.5219 LearningRate 0.0281 Epoch: 9 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 04:06:46,641-Speed 3399.42 samples/sec Loss 5.5370 LearningRate 0.0281 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:06:49,633-Speed 3423.26 samples/sec Loss 5.4336 LearningRate 0.0281 Epoch: 9 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:06:52,787-Speed 3247.12 samples/sec Loss 5.5320 LearningRate 0.0281 Epoch: 9 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:06:55,790-Speed 3411.42 samples/sec Loss 5.4518 LearningRate 0.0281 Epoch: 9 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:06:58,793-Speed 3410.84 samples/sec Loss 5.5277 LearningRate 0.0281 Epoch: 9 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:01,801-Speed 3404.65 samples/sec Loss 5.5436 LearningRate 0.0281 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:04,817-Speed 3395.74 samples/sec Loss 5.5305 LearningRate 0.0281 Epoch: 9 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:07,824-Speed 3406.48 samples/sec Loss 5.4373 LearningRate 0.0281 Epoch: 9 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:10,838-Speed 3398.53 samples/sec Loss 5.4256 LearningRate 0.0281 Epoch: 9 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:13,844-Speed 3407.36 samples/sec Loss 5.3767 LearningRate 0.0280 Epoch: 9 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:16,856-Speed 3400.55 samples/sec Loss 5.4998 LearningRate 0.0280 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:19,858-Speed 3411.57 samples/sec Loss 5.5564 LearningRate 0.0280 Epoch: 9 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:07:22,849-Speed 3425.34 samples/sec Loss 5.5421 LearningRate 0.0280 Epoch: 9 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:25,895-Speed 3361.98 samples/sec Loss 5.5926 LearningRate 0.0280 Epoch: 9 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:28,907-Speed 3400.76 samples/sec Loss 5.5777 LearningRate 0.0280 Epoch: 9 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:31,916-Speed 3404.66 samples/sec Loss 5.5195 LearningRate 0.0280 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:34,927-Speed 3401.13 samples/sec Loss 5.4675 LearningRate 0.0280 Epoch: 9 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:37,934-Speed 3406.15 samples/sec Loss 5.3233 LearningRate 0.0280 Epoch: 9 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:40,944-Speed 3402.50 samples/sec Loss 5.4438 LearningRate 0.0279 Epoch: 9 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:43,955-Speed 3401.88 samples/sec Loss 5.4691 LearningRate 0.0279 Epoch: 9 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:46,959-Speed 3409.31 samples/sec Loss 5.5525 LearningRate 0.0279 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:49,965-Speed 3407.77 samples/sec Loss 5.5828 LearningRate 0.0279 Epoch: 9 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:52,977-Speed 3400.59 samples/sec Loss 5.4482 LearningRate 0.0279 Epoch: 9 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:07:55,969-Speed 3423.19 samples/sec Loss 5.4827 LearningRate 0.0279 Epoch: 9 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:07:58,973-Speed 3410.28 samples/sec Loss 5.5970 LearningRate 0.0279 Epoch: 9 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:01,982-Speed 3403.31 samples/sec Loss 5.4873 LearningRate 0.0279 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:04,985-Speed 3411.45 samples/sec Loss 5.4661 LearningRate 0.0279 Epoch: 9 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:08,078-Speed 3311.59 samples/sec Loss 5.3300 LearningRate 0.0279 Epoch: 9 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:11,091-Speed 3398.52 samples/sec Loss 5.4351 LearningRate 0.0278 Epoch: 9 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:14,098-Speed 3406.81 samples/sec Loss 5.3806 LearningRate 0.0278 Epoch: 9 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:17,106-Speed 3404.27 samples/sec Loss 5.4983 LearningRate 0.0278 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:20,117-Speed 3401.63 samples/sec Loss 5.4936 LearningRate 0.0278 Epoch: 9 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:23,131-Speed 3398.69 samples/sec Loss 5.6095 LearningRate 0.0278 Epoch: 9 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:26,146-Speed 3398.02 samples/sec Loss 5.3294 LearningRate 0.0278 Epoch: 9 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:08:29,144-Speed 3416.13 samples/sec Loss 5.4933 LearningRate 0.0278 Epoch: 9 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:32,161-Speed 3394.63 samples/sec Loss 5.5714 LearningRate 0.0278 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:35,181-Speed 3391.64 samples/sec Loss 5.4169 LearningRate 0.0278 Epoch: 9 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:38,192-Speed 3401.32 samples/sec Loss 5.5984 LearningRate 0.0278 Epoch: 9 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:41,254-Speed 3345.43 samples/sec Loss 5.4013 LearningRate 0.0277 Epoch: 9 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:44,263-Speed 3403.57 samples/sec Loss 5.4412 LearningRate 0.0277 Epoch: 9 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:47,275-Speed 3400.32 samples/sec Loss 5.4591 LearningRate 0.0277 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:50,283-Speed 3405.13 samples/sec Loss 5.3411 LearningRate 0.0277 Epoch: 9 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:53,370-Speed 3317.92 samples/sec Loss 5.4917 LearningRate 0.0277 Epoch: 9 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:56,372-Speed 3413.20 samples/sec Loss 5.5456 LearningRate 0.0277 Epoch: 9 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:08:59,362-Speed 3425.76 samples/sec Loss 5.4907 LearningRate 0.0277 Epoch: 9 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:02,367-Speed 3407.75 samples/sec Loss 5.4387 LearningRate 0.0277 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:05,375-Speed 3405.43 samples/sec Loss 5.6267 LearningRate 0.0277 Epoch: 9 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:08,379-Speed 3409.10 samples/sec Loss 5.3642 LearningRate 0.0276 Epoch: 9 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:11,389-Speed 3402.49 samples/sec Loss 5.5626 LearningRate 0.0276 Epoch: 9 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:14,411-Speed 3390.01 samples/sec Loss 5.5128 LearningRate 0.0276 Epoch: 9 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:09:17,418-Speed 3405.93 samples/sec Loss 5.5355 LearningRate 0.0276 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:10:01,723-[lfw][48000]XNorm: 23.144163 Training: 2022-04-11 04:10:01,723-[lfw][48000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 04:10:01,724-[lfw][48000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:10:53,167-[cfp_fp][48000]XNorm: 21.078792 Training: 2022-04-11 04:10:53,168-[cfp_fp][48000]Accuracy-Flip: 0.97629+-0.00829 Training: 2022-04-11 04:10:53,168-[cfp_fp][48000]Accuracy-Highest: 0.97629 Training: 2022-04-11 04:11:37,562-[agedb_30][48000]XNorm: 22.959230 Training: 2022-04-11 04:11:37,563-[agedb_30][48000]Accuracy-Flip: 0.97883+-0.00671 Training: 2022-04-11 04:11:37,563-[agedb_30][48000]Accuracy-Highest: 0.98083 Training: 2022-04-11 04:11:40,593-Speed 71.52 samples/sec Loss 5.6525 LearningRate 0.0276 Epoch: 9 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:11:43,585-Speed 3423.29 samples/sec Loss 5.3786 LearningRate 0.0276 Epoch: 9 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:11:46,573-Speed 3427.94 samples/sec Loss 5.4885 LearningRate 0.0276 Epoch: 9 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:11:49,577-Speed 3409.78 samples/sec Loss 5.5358 LearningRate 0.0276 Epoch: 9 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:11:52,549-Speed 3446.92 samples/sec Loss 5.4781 LearningRate 0.0276 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:11:55,542-Speed 3422.39 samples/sec Loss 5.5342 LearningRate 0.0276 Epoch: 9 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:11:58,536-Speed 3420.90 samples/sec Loss 5.3398 LearningRate 0.0275 Epoch: 9 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:01,533-Speed 3416.62 samples/sec Loss 5.4674 LearningRate 0.0275 Epoch: 9 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:04,541-Speed 3406.19 samples/sec Loss 5.5689 LearningRate 0.0275 Epoch: 9 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:07,548-Speed 3405.87 samples/sec Loss 5.4827 LearningRate 0.0275 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:10,545-Speed 3417.27 samples/sec Loss 5.4170 LearningRate 0.0275 Epoch: 9 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:13,542-Speed 3417.61 samples/sec Loss 5.7087 LearningRate 0.0275 Epoch: 9 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:16,551-Speed 3403.64 samples/sec Loss 5.4306 LearningRate 0.0275 Epoch: 9 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:19,550-Speed 3415.40 samples/sec Loss 5.5830 LearningRate 0.0275 Epoch: 9 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:22,528-Speed 3439.40 samples/sec Loss 5.6120 LearningRate 0.0275 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:25,585-Speed 3351.20 samples/sec Loss 5.4231 LearningRate 0.0274 Epoch: 9 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:28,584-Speed 3415.07 samples/sec Loss 5.4111 LearningRate 0.0274 Epoch: 9 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:31,582-Speed 3416.31 samples/sec Loss 5.5693 LearningRate 0.0274 Epoch: 9 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:34,578-Speed 3418.86 samples/sec Loss 5.5040 LearningRate 0.0274 Epoch: 9 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:12:37,554-Speed 3441.69 samples/sec Loss 5.6332 LearningRate 0.0274 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:40,562-Speed 3405.29 samples/sec Loss 5.4593 LearningRate 0.0274 Epoch: 9 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:43,555-Speed 3421.84 samples/sec Loss 5.4716 LearningRate 0.0274 Epoch: 9 Global Step: 48220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:46,561-Speed 3407.71 samples/sec Loss 5.2709 LearningRate 0.0274 Epoch: 9 Global Step: 48230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:49,558-Speed 3417.99 samples/sec Loss 5.5409 LearningRate 0.0274 Epoch: 9 Global Step: 48240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:52,559-Speed 3412.64 samples/sec Loss 5.3980 LearningRate 0.0274 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:55,560-Speed 3413.07 samples/sec Loss 5.2875 LearningRate 0.0273 Epoch: 9 Global Step: 48260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:12:58,561-Speed 3413.29 samples/sec Loss 5.3233 LearningRate 0.0273 Epoch: 9 Global Step: 48270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:01,570-Speed 3403.09 samples/sec Loss 5.3303 LearningRate 0.0273 Epoch: 9 Global Step: 48280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:04,671-Speed 3303.00 samples/sec Loss 5.4491 LearningRate 0.0273 Epoch: 9 Global Step: 48290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:07,663-Speed 3423.46 samples/sec Loss 5.4425 LearningRate 0.0273 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:10,657-Speed 3421.81 samples/sec Loss 5.6298 LearningRate 0.0273 Epoch: 9 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:13,667-Speed 3402.10 samples/sec Loss 5.4931 LearningRate 0.0273 Epoch: 9 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:16,675-Speed 3405.90 samples/sec Loss 5.4282 LearningRate 0.0273 Epoch: 9 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:19,671-Speed 3418.55 samples/sec Loss 5.4110 LearningRate 0.0273 Epoch: 9 Global Step: 48340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:22,670-Speed 3414.81 samples/sec Loss 5.5373 LearningRate 0.0273 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:25,691-Speed 3390.78 samples/sec Loss 5.5117 LearningRate 0.0272 Epoch: 9 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:13:28,710-Speed 3393.03 samples/sec Loss 5.4921 LearningRate 0.0272 Epoch: 9 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:31,703-Speed 3421.59 samples/sec Loss 5.3574 LearningRate 0.0272 Epoch: 9 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:34,703-Speed 3414.69 samples/sec Loss 5.4001 LearningRate 0.0272 Epoch: 9 Global Step: 48390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:37,715-Speed 3400.02 samples/sec Loss 5.4773 LearningRate 0.0272 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:40,715-Speed 3415.54 samples/sec Loss 5.4933 LearningRate 0.0272 Epoch: 9 Global Step: 48410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:43,708-Speed 3420.98 samples/sec Loss 5.4454 LearningRate 0.0272 Epoch: 9 Global Step: 48420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:46,716-Speed 3405.68 samples/sec Loss 5.3321 LearningRate 0.0272 Epoch: 9 Global Step: 48430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:49,713-Speed 3418.19 samples/sec Loss 5.4678 LearningRate 0.0272 Epoch: 9 Global Step: 48440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:52,721-Speed 3405.26 samples/sec Loss 5.4105 LearningRate 0.0271 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:55,719-Speed 3415.79 samples/sec Loss 5.5086 LearningRate 0.0271 Epoch: 9 Global Step: 48460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:13:58,872-Speed 3248.56 samples/sec Loss 5.4004 LearningRate 0.0271 Epoch: 9 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:01,877-Speed 3408.56 samples/sec Loss 5.4598 LearningRate 0.0271 Epoch: 9 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:04,862-Speed 3430.63 samples/sec Loss 5.5447 LearningRate 0.0271 Epoch: 9 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:07,872-Speed 3402.83 samples/sec Loss 5.3768 LearningRate 0.0271 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:10,887-Speed 3397.74 samples/sec Loss 5.4789 LearningRate 0.0271 Epoch: 9 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:13,893-Speed 3408.77 samples/sec Loss 5.4834 LearningRate 0.0271 Epoch: 9 Global Step: 48520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:16,897-Speed 3409.42 samples/sec Loss 5.5553 LearningRate 0.0271 Epoch: 9 Global Step: 48530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:19,897-Speed 3414.21 samples/sec Loss 5.5330 LearningRate 0.0271 Epoch: 9 Global Step: 48540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:22,897-Speed 3413.69 samples/sec Loss 5.3512 LearningRate 0.0270 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:25,901-Speed 3410.04 samples/sec Loss 5.5464 LearningRate 0.0270 Epoch: 9 Global Step: 48560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:28,901-Speed 3414.08 samples/sec Loss 5.4852 LearningRate 0.0270 Epoch: 9 Global Step: 48570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:31,899-Speed 3416.19 samples/sec Loss 5.4487 LearningRate 0.0270 Epoch: 9 Global Step: 48580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:14:34,898-Speed 3416.17 samples/sec Loss 5.3666 LearningRate 0.0270 Epoch: 9 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:37,897-Speed 3414.74 samples/sec Loss 5.3951 LearningRate 0.0270 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:40,901-Speed 3409.31 samples/sec Loss 5.3093 LearningRate 0.0270 Epoch: 9 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:43,905-Speed 3409.76 samples/sec Loss 5.4230 LearningRate 0.0270 Epoch: 9 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:46,904-Speed 3415.16 samples/sec Loss 5.4567 LearningRate 0.0270 Epoch: 9 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:49,908-Speed 3410.74 samples/sec Loss 5.4672 LearningRate 0.0270 Epoch: 9 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:52,910-Speed 3411.49 samples/sec Loss 5.4018 LearningRate 0.0269 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:55,923-Speed 3398.80 samples/sec Loss 5.3945 LearningRate 0.0269 Epoch: 9 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:14:58,933-Speed 3402.92 samples/sec Loss 5.4281 LearningRate 0.0269 Epoch: 9 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:01,952-Speed 3392.78 samples/sec Loss 5.4769 LearningRate 0.0269 Epoch: 9 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:04,955-Speed 3410.73 samples/sec Loss 5.3589 LearningRate 0.0269 Epoch: 9 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:15:07,944-Speed 3426.20 samples/sec Loss 5.4838 LearningRate 0.0269 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:10,958-Speed 3398.75 samples/sec Loss 5.4383 LearningRate 0.0269 Epoch: 9 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:13,976-Speed 3393.82 samples/sec Loss 5.3955 LearningRate 0.0269 Epoch: 9 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:16,984-Speed 3405.43 samples/sec Loss 5.4799 LearningRate 0.0269 Epoch: 9 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:20,000-Speed 3395.83 samples/sec Loss 5.3599 LearningRate 0.0269 Epoch: 9 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:23,017-Speed 3394.82 samples/sec Loss 5.5035 LearningRate 0.0268 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:26,113-Speed 3308.83 samples/sec Loss 5.4125 LearningRate 0.0268 Epoch: 9 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:29,112-Speed 3414.57 samples/sec Loss 5.4113 LearningRate 0.0268 Epoch: 9 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:15:32,147-Speed 3374.98 samples/sec Loss 5.4653 LearningRate 0.0268 Epoch: 9 Global Step: 48780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:35,153-Speed 3407.29 samples/sec Loss 5.5042 LearningRate 0.0268 Epoch: 9 Global Step: 48790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:38,164-Speed 3402.23 samples/sec Loss 5.3874 LearningRate 0.0268 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:41,168-Speed 3409.61 samples/sec Loss 5.4982 LearningRate 0.0268 Epoch: 9 Global Step: 48810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:44,180-Speed 3400.79 samples/sec Loss 5.3680 LearningRate 0.0268 Epoch: 9 Global Step: 48820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:47,181-Speed 3412.99 samples/sec Loss 5.6280 LearningRate 0.0268 Epoch: 9 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:50,186-Speed 3408.71 samples/sec Loss 5.3405 LearningRate 0.0267 Epoch: 9 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:53,204-Speed 3394.02 samples/sec Loss 5.5152 LearningRate 0.0267 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:56,206-Speed 3411.74 samples/sec Loss 5.3615 LearningRate 0.0267 Epoch: 9 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:15:59,205-Speed 3414.59 samples/sec Loss 5.3684 LearningRate 0.0267 Epoch: 9 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:16:02,210-Speed 3408.47 samples/sec Loss 5.3377 LearningRate 0.0267 Epoch: 9 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:05,229-Speed 3393.48 samples/sec Loss 5.4125 LearningRate 0.0267 Epoch: 9 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:08,240-Speed 3401.23 samples/sec Loss 5.4213 LearningRate 0.0267 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:11,244-Speed 3410.42 samples/sec Loss 5.3876 LearningRate 0.0267 Epoch: 9 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:14,249-Speed 3408.36 samples/sec Loss 5.3732 LearningRate 0.0267 Epoch: 9 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:17,257-Speed 3404.74 samples/sec Loss 5.3918 LearningRate 0.0267 Epoch: 9 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:20,265-Speed 3405.94 samples/sec Loss 5.3551 LearningRate 0.0266 Epoch: 9 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:23,277-Speed 3399.53 samples/sec Loss 5.4850 LearningRate 0.0266 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:26,278-Speed 3413.47 samples/sec Loss 5.4487 LearningRate 0.0266 Epoch: 9 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:29,286-Speed 3404.63 samples/sec Loss 5.5324 LearningRate 0.0266 Epoch: 9 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:32,290-Speed 3410.27 samples/sec Loss 5.4465 LearningRate 0.0266 Epoch: 9 Global Step: 48980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:16:35,286-Speed 3419.11 samples/sec Loss 5.3989 LearningRate 0.0266 Epoch: 9 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:38,291-Speed 3408.89 samples/sec Loss 5.4697 LearningRate 0.0266 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:41,301-Speed 3402.49 samples/sec Loss 5.4512 LearningRate 0.0266 Epoch: 9 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:44,305-Speed 3409.50 samples/sec Loss 5.5436 LearningRate 0.0266 Epoch: 9 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:47,309-Speed 3409.42 samples/sec Loss 5.4153 LearningRate 0.0266 Epoch: 9 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:50,313-Speed 3409.96 samples/sec Loss 5.3618 LearningRate 0.0265 Epoch: 9 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:53,336-Speed 3388.16 samples/sec Loss 5.3836 LearningRate 0.0265 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:56,340-Speed 3409.66 samples/sec Loss 5.2974 LearningRate 0.0265 Epoch: 9 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:16:59,343-Speed 3410.78 samples/sec Loss 5.5083 LearningRate 0.0265 Epoch: 9 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:02,360-Speed 3394.52 samples/sec Loss 5.5412 LearningRate 0.0265 Epoch: 9 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:05,368-Speed 3405.76 samples/sec Loss 5.2931 LearningRate 0.0265 Epoch: 9 Global Step: 49090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:17:08,363-Speed 3419.38 samples/sec Loss 5.4434 LearningRate 0.0265 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:11,378-Speed 3397.72 samples/sec Loss 5.4411 LearningRate 0.0265 Epoch: 9 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:14,387-Speed 3403.56 samples/sec Loss 5.3979 LearningRate 0.0265 Epoch: 9 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:17,396-Speed 3404.72 samples/sec Loss 5.3887 LearningRate 0.0265 Epoch: 9 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:20,405-Speed 3403.81 samples/sec Loss 5.3730 LearningRate 0.0264 Epoch: 9 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:23,409-Speed 3409.60 samples/sec Loss 5.5238 LearningRate 0.0264 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:26,435-Speed 3383.79 samples/sec Loss 5.3550 LearningRate 0.0264 Epoch: 9 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:29,443-Speed 3405.43 samples/sec Loss 5.5870 LearningRate 0.0264 Epoch: 9 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:32,458-Speed 3397.78 samples/sec Loss 5.3961 LearningRate 0.0264 Epoch: 9 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:35,462-Speed 3409.81 samples/sec Loss 5.3348 LearningRate 0.0264 Epoch: 9 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:38,513-Speed 3357.46 samples/sec Loss 5.3482 LearningRate 0.0264 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:17:41,553-Speed 3369.24 samples/sec Loss 5.4252 LearningRate 0.0264 Epoch: 9 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:44,559-Speed 3407.17 samples/sec Loss 5.4053 LearningRate 0.0264 Epoch: 9 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:47,570-Speed 3401.15 samples/sec Loss 5.4571 LearningRate 0.0264 Epoch: 9 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:50,575-Speed 3408.87 samples/sec Loss 5.3647 LearningRate 0.0263 Epoch: 9 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:53,597-Speed 3389.12 samples/sec Loss 5.2743 LearningRate 0.0263 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:56,614-Speed 3394.40 samples/sec Loss 5.4077 LearningRate 0.0263 Epoch: 9 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:17:59,623-Speed 3405.04 samples/sec Loss 5.4302 LearningRate 0.0263 Epoch: 9 Global Step: 49270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:02,631-Speed 3404.77 samples/sec Loss 5.3595 LearningRate 0.0263 Epoch: 9 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:05,639-Speed 3405.51 samples/sec Loss 5.4806 LearningRate 0.0263 Epoch: 9 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:08,641-Speed 3411.34 samples/sec Loss 5.4977 LearningRate 0.0263 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:11,657-Speed 3395.92 samples/sec Loss 5.4081 LearningRate 0.0263 Epoch: 9 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:18:14,643-Speed 3430.77 samples/sec Loss 5.4343 LearningRate 0.0263 Epoch: 9 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:17,654-Speed 3401.45 samples/sec Loss 5.3604 LearningRate 0.0263 Epoch: 9 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:20,658-Speed 3409.15 samples/sec Loss 5.2398 LearningRate 0.0262 Epoch: 9 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:23,661-Speed 3411.63 samples/sec Loss 5.4429 LearningRate 0.0262 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:26,667-Speed 3406.78 samples/sec Loss 5.2699 LearningRate 0.0262 Epoch: 9 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:29,678-Speed 3402.17 samples/sec Loss 5.4193 LearningRate 0.0262 Epoch: 9 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:32,682-Speed 3409.57 samples/sec Loss 5.3354 LearningRate 0.0262 Epoch: 9 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:35,685-Speed 3410.70 samples/sec Loss 5.3999 LearningRate 0.0262 Epoch: 9 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:38,696-Speed 3402.26 samples/sec Loss 5.4904 LearningRate 0.0262 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:41,702-Speed 3406.27 samples/sec Loss 5.4611 LearningRate 0.0262 Epoch: 9 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:44,695-Speed 3422.40 samples/sec Loss 5.2563 LearningRate 0.0262 Epoch: 9 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:47,727-Speed 3378.11 samples/sec Loss 5.2768 LearningRate 0.0261 Epoch: 9 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:50,747-Speed 3392.20 samples/sec Loss 5.4804 LearningRate 0.0261 Epoch: 9 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:53,767-Speed 3391.95 samples/sec Loss 5.5131 LearningRate 0.0261 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:56,774-Speed 3406.25 samples/sec Loss 5.5220 LearningRate 0.0261 Epoch: 9 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:18:59,781-Speed 3405.71 samples/sec Loss 5.4739 LearningRate 0.0261 Epoch: 9 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:02,795-Speed 3398.64 samples/sec Loss 5.4169 LearningRate 0.0261 Epoch: 9 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:05,799-Speed 3409.39 samples/sec Loss 5.3181 LearningRate 0.0261 Epoch: 9 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:08,806-Speed 3406.38 samples/sec Loss 5.4682 LearningRate 0.0261 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:11,816-Speed 3402.58 samples/sec Loss 5.3564 LearningRate 0.0261 Epoch: 9 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:14,862-Speed 3362.59 samples/sec Loss 5.2977 LearningRate 0.0261 Epoch: 9 Global Step: 49520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:19:17,859-Speed 3417.86 samples/sec Loss 5.4034 LearningRate 0.0260 Epoch: 9 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:20,862-Speed 3411.65 samples/sec Loss 5.5335 LearningRate 0.0260 Epoch: 9 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:23,866-Speed 3408.74 samples/sec Loss 5.4223 LearningRate 0.0260 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:26,872-Speed 3407.45 samples/sec Loss 5.3081 LearningRate 0.0260 Epoch: 9 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:29,874-Speed 3412.56 samples/sec Loss 5.4435 LearningRate 0.0260 Epoch: 9 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:32,880-Speed 3406.97 samples/sec Loss 5.4551 LearningRate 0.0260 Epoch: 9 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:35,893-Speed 3399.48 samples/sec Loss 5.4246 LearningRate 0.0260 Epoch: 9 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:38,906-Speed 3399.64 samples/sec Loss 5.4431 LearningRate 0.0260 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:41,989-Speed 3321.54 samples/sec Loss 5.3675 LearningRate 0.0260 Epoch: 9 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:45,006-Speed 3395.68 samples/sec Loss 5.4586 LearningRate 0.0260 Epoch: 9 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:47,994-Speed 3427.87 samples/sec Loss 5.3827 LearningRate 0.0259 Epoch: 9 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:51,000-Speed 3407.96 samples/sec Loss 5.4098 LearningRate 0.0259 Epoch: 9 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:19:54,009-Speed 3403.98 samples/sec Loss 5.4026 LearningRate 0.0259 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:19:57,015-Speed 3406.40 samples/sec Loss 5.4204 LearningRate 0.0259 Epoch: 9 Global Step: 49660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:00,028-Speed 3399.72 samples/sec Loss 5.4439 LearningRate 0.0259 Epoch: 9 Global Step: 49670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:03,048-Speed 3391.92 samples/sec Loss 5.4220 LearningRate 0.0259 Epoch: 9 Global Step: 49680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:06,059-Speed 3401.12 samples/sec Loss 5.4492 LearningRate 0.0259 Epoch: 9 Global Step: 49690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:09,081-Speed 3389.19 samples/sec Loss 5.4378 LearningRate 0.0259 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:12,087-Speed 3407.83 samples/sec Loss 5.3606 LearningRate 0.0259 Epoch: 9 Global Step: 49710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:15,092-Speed 3408.76 samples/sec Loss 5.4811 LearningRate 0.0259 Epoch: 9 Global Step: 49720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:18,099-Speed 3406.18 samples/sec Loss 5.4151 LearningRate 0.0258 Epoch: 9 Global Step: 49730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:21,106-Speed 3407.17 samples/sec Loss 5.2602 LearningRate 0.0258 Epoch: 9 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:20:24,119-Speed 3398.82 samples/sec Loss 5.3546 LearningRate 0.0258 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:27,134-Speed 3396.96 samples/sec Loss 5.3579 LearningRate 0.0258 Epoch: 9 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:30,144-Speed 3402.80 samples/sec Loss 5.4039 LearningRate 0.0258 Epoch: 9 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:33,151-Speed 3406.48 samples/sec Loss 5.3951 LearningRate 0.0258 Epoch: 9 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:36,256-Speed 3298.56 samples/sec Loss 5.4439 LearningRate 0.0258 Epoch: 9 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:39,270-Speed 3398.17 samples/sec Loss 5.4422 LearningRate 0.0258 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:42,279-Speed 3404.01 samples/sec Loss 5.3699 LearningRate 0.0258 Epoch: 9 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:45,282-Speed 3411.19 samples/sec Loss 5.2513 LearningRate 0.0258 Epoch: 9 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:48,300-Speed 3394.22 samples/sec Loss 5.4511 LearningRate 0.0257 Epoch: 9 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:51,303-Speed 3410.97 samples/sec Loss 5.3459 LearningRate 0.0257 Epoch: 9 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:20:54,318-Speed 3396.94 samples/sec Loss 5.4647 LearningRate 0.0257 Epoch: 9 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:20:57,306-Speed 3427.80 samples/sec Loss 5.4937 LearningRate 0.0257 Epoch: 9 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:00,351-Speed 3363.40 samples/sec Loss 5.4848 LearningRate 0.0257 Epoch: 9 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:03,367-Speed 3396.85 samples/sec Loss 5.4467 LearningRate 0.0257 Epoch: 9 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:06,435-Speed 3337.76 samples/sec Loss 5.4190 LearningRate 0.0257 Epoch: 9 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:09,473-Speed 3372.62 samples/sec Loss 5.4680 LearningRate 0.0257 Epoch: 9 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:12,543-Speed 3335.84 samples/sec Loss 5.3978 LearningRate 0.0257 Epoch: 9 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:15,702-Speed 3242.28 samples/sec Loss 5.4148 LearningRate 0.0257 Epoch: 9 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:18,725-Speed 3388.14 samples/sec Loss 5.4662 LearningRate 0.0256 Epoch: 9 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:21,729-Speed 3409.11 samples/sec Loss 5.2778 LearningRate 0.0256 Epoch: 9 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:24,735-Speed 3407.90 samples/sec Loss 5.2251 LearningRate 0.0256 Epoch: 9 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:27,742-Speed 3406.53 samples/sec Loss 5.3529 LearningRate 0.0256 Epoch: 9 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:21:30,755-Speed 3399.55 samples/sec Loss 5.4018 LearningRate 0.0256 Epoch: 9 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:21:33,741-Speed 3430.17 samples/sec Loss 5.4115 LearningRate 0.0256 Epoch: 9 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:36,747-Speed 3407.66 samples/sec Loss 5.2140 LearningRate 0.0256 Epoch: 9 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:21:39,774-Speed 3383.49 samples/sec Loss 5.3956 LearningRate 0.0256 Epoch: 9 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:22:24,204-[lfw][50000]XNorm: 22.149314 Training: 2022-04-11 04:22:24,205-[lfw][50000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-04-11 04:22:24,205-[lfw][50000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:23:15,398-[cfp_fp][50000]XNorm: 20.303504 Training: 2022-04-11 04:23:15,399-[cfp_fp][50000]Accuracy-Flip: 0.97500+-0.00721 Training: 2022-04-11 04:23:15,399-[cfp_fp][50000]Accuracy-Highest: 0.97629 Training: 2022-04-11 04:23:59,372-[agedb_30][50000]XNorm: 22.360195 Training: 2022-04-11 04:23:59,373-[agedb_30][50000]Accuracy-Flip: 0.98000+-0.00683 Training: 2022-04-11 04:23:59,373-[agedb_30][50000]Accuracy-Highest: 0.98083 Training: 2022-04-11 04:24:02,373-Speed 71.81 samples/sec Loss 5.2904 LearningRate 0.0256 Epoch: 9 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:05,366-Speed 3422.12 samples/sec Loss 5.2919 LearningRate 0.0256 Epoch: 9 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:08,359-Speed 3422.14 samples/sec Loss 5.4169 LearningRate 0.0255 Epoch: 9 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:11,349-Speed 3425.43 samples/sec Loss 5.3682 LearningRate 0.0255 Epoch: 9 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:14,338-Speed 3426.17 samples/sec Loss 5.3187 LearningRate 0.0255 Epoch: 9 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:17,326-Speed 3428.31 samples/sec Loss 5.3778 LearningRate 0.0255 Epoch: 9 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:20,318-Speed 3423.48 samples/sec Loss 5.2729 LearningRate 0.0255 Epoch: 9 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:23,309-Speed 3424.07 samples/sec Loss 5.3535 LearningRate 0.0255 Epoch: 9 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:24:26,314-Speed 3409.13 samples/sec Loss 5.3742 LearningRate 0.0255 Epoch: 9 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:24:29,292-Speed 3439.03 samples/sec Loss 5.2005 LearningRate 0.0255 Epoch: 9 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:32,296-Speed 3410.23 samples/sec Loss 5.3642 LearningRate 0.0255 Epoch: 9 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:35,315-Speed 3392.57 samples/sec Loss 5.3935 LearningRate 0.0255 Epoch: 9 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:38,311-Speed 3418.68 samples/sec Loss 5.1176 LearningRate 0.0254 Epoch: 9 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:41,312-Speed 3413.19 samples/sec Loss 5.4613 LearningRate 0.0254 Epoch: 9 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:44,309-Speed 3417.30 samples/sec Loss 5.4245 LearningRate 0.0254 Epoch: 9 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:47,307-Speed 3416.32 samples/sec Loss 5.4173 LearningRate 0.0254 Epoch: 9 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:50,334-Speed 3383.69 samples/sec Loss 5.3115 LearningRate 0.0254 Epoch: 9 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:24:53,333-Speed 3415.21 samples/sec Loss 5.4011 LearningRate 0.0254 Epoch: 9 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:24:56,339-Speed 3407.66 samples/sec Loss 5.3851 LearningRate 0.0254 Epoch: 9 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:24:59,347-Speed 3405.18 samples/sec Loss 5.4700 LearningRate 0.0254 Epoch: 9 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:02,351-Speed 3409.92 samples/sec Loss 5.4977 LearningRate 0.0254 Epoch: 9 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:05,357-Speed 3406.90 samples/sec Loss 5.2777 LearningRate 0.0254 Epoch: 9 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:08,367-Speed 3403.00 samples/sec Loss 5.3412 LearningRate 0.0253 Epoch: 9 Global Step: 50230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:11,372-Speed 3409.18 samples/sec Loss 5.5678 LearningRate 0.0253 Epoch: 9 Global Step: 50240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:14,387-Speed 3396.29 samples/sec Loss 5.4419 LearningRate 0.0253 Epoch: 9 Global Step: 50250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:17,395-Speed 3406.20 samples/sec Loss 5.3541 LearningRate 0.0253 Epoch: 9 Global Step: 50260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:20,399-Speed 3409.48 samples/sec Loss 5.1490 LearningRate 0.0253 Epoch: 9 Global Step: 50270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:25:23,400-Speed 3412.93 samples/sec Loss 5.3456 LearningRate 0.0253 Epoch: 9 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:26,406-Speed 3407.63 samples/sec Loss 5.2531 LearningRate 0.0253 Epoch: 9 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:29,408-Speed 3411.13 samples/sec Loss 5.4998 LearningRate 0.0253 Epoch: 9 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:32,413-Speed 3409.47 samples/sec Loss 5.4090 LearningRate 0.0253 Epoch: 9 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:35,425-Speed 3400.20 samples/sec Loss 5.2976 LearningRate 0.0253 Epoch: 9 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:38,439-Speed 3397.48 samples/sec Loss 5.4053 LearningRate 0.0252 Epoch: 9 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:41,441-Speed 3412.12 samples/sec Loss 5.3057 LearningRate 0.0252 Epoch: 9 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:44,449-Speed 3405.63 samples/sec Loss 5.4173 LearningRate 0.0252 Epoch: 9 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:47,456-Speed 3406.91 samples/sec Loss 5.4147 LearningRate 0.0252 Epoch: 9 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:50,465-Speed 3403.38 samples/sec Loss 5.3603 LearningRate 0.0252 Epoch: 9 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:53,446-Speed 3435.23 samples/sec Loss 5.2670 LearningRate 0.0252 Epoch: 9 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:56,444-Speed 3417.73 samples/sec Loss 5.5536 LearningRate 0.0252 Epoch: 9 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:25:59,450-Speed 3407.06 samples/sec Loss 5.4204 LearningRate 0.0252 Epoch: 9 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:02,450-Speed 3413.98 samples/sec Loss 5.2032 LearningRate 0.0252 Epoch: 9 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:05,456-Speed 3406.97 samples/sec Loss 5.3780 LearningRate 0.0252 Epoch: 9 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:08,456-Speed 3413.86 samples/sec Loss 5.3191 LearningRate 0.0251 Epoch: 9 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:11,469-Speed 3400.61 samples/sec Loss 5.3982 LearningRate 0.0251 Epoch: 9 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:14,470-Speed 3413.17 samples/sec Loss 5.3313 LearningRate 0.0251 Epoch: 9 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:17,500-Speed 3380.31 samples/sec Loss 5.2509 LearningRate 0.0251 Epoch: 9 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:20,542-Speed 3366.47 samples/sec Loss 5.4481 LearningRate 0.0251 Epoch: 9 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:23,544-Speed 3412.31 samples/sec Loss 5.2691 LearningRate 0.0251 Epoch: 9 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:26,549-Speed 3408.63 samples/sec Loss 5.2010 LearningRate 0.0251 Epoch: 9 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:29,566-Speed 3393.99 samples/sec Loss 5.3184 LearningRate 0.0251 Epoch: 9 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:32,580-Speed 3399.00 samples/sec Loss 5.2706 LearningRate 0.0251 Epoch: 9 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:35,583-Speed 3410.90 samples/sec Loss 5.4198 LearningRate 0.0251 Epoch: 9 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:38,583-Speed 3414.04 samples/sec Loss 5.2386 LearningRate 0.0250 Epoch: 9 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:41,588-Speed 3408.20 samples/sec Loss 5.1735 LearningRate 0.0250 Epoch: 9 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:44,602-Speed 3398.61 samples/sec Loss 5.3122 LearningRate 0.0250 Epoch: 9 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:47,618-Speed 3396.11 samples/sec Loss 5.3725 LearningRate 0.0250 Epoch: 9 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:50,690-Speed 3333.99 samples/sec Loss 5.3586 LearningRate 0.0250 Epoch: 9 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:26:53,668-Speed 3439.89 samples/sec Loss 5.3492 LearningRate 0.0250 Epoch: 9 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:05,847-Speed 840.83 samples/sec Loss 4.3911 LearningRate 0.0250 Epoch: 10 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:09,020-Speed 3228.86 samples/sec Loss 4.5265 LearningRate 0.0250 Epoch: 10 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:12,026-Speed 3406.99 samples/sec Loss 4.5089 LearningRate 0.0250 Epoch: 10 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:15,023-Speed 3417.98 samples/sec Loss 4.5717 LearningRate 0.0250 Epoch: 10 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:18,027-Speed 3409.57 samples/sec Loss 4.4469 LearningRate 0.0250 Epoch: 10 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:21,025-Speed 3416.21 samples/sec Loss 4.4848 LearningRate 0.0249 Epoch: 10 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:24,031-Speed 3406.63 samples/sec Loss 4.5594 LearningRate 0.0249 Epoch: 10 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:27,099-Speed 3338.77 samples/sec Loss 4.5153 LearningRate 0.0249 Epoch: 10 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:30,143-Speed 3364.63 samples/sec Loss 4.5759 LearningRate 0.0249 Epoch: 10 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:33,183-Speed 3369.96 samples/sec Loss 4.5789 LearningRate 0.0249 Epoch: 10 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:27:36,178-Speed 3419.34 samples/sec Loss 4.4890 LearningRate 0.0249 Epoch: 10 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:39,185-Speed 3406.58 samples/sec Loss 4.4361 LearningRate 0.0249 Epoch: 10 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:42,192-Speed 3407.13 samples/sec Loss 4.4783 LearningRate 0.0249 Epoch: 10 Global Step: 50710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:45,201-Speed 3403.09 samples/sec Loss 4.6652 LearningRate 0.0249 Epoch: 10 Global Step: 50720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:48,202-Speed 3413.69 samples/sec Loss 4.5655 LearningRate 0.0249 Epoch: 10 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:51,206-Speed 3409.77 samples/sec Loss 4.6321 LearningRate 0.0248 Epoch: 10 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:54,211-Speed 3407.99 samples/sec Loss 4.6734 LearningRate 0.0248 Epoch: 10 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:27:57,229-Speed 3394.42 samples/sec Loss 4.5394 LearningRate 0.0248 Epoch: 10 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:00,237-Speed 3404.82 samples/sec Loss 4.5541 LearningRate 0.0248 Epoch: 10 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:03,248-Speed 3402.07 samples/sec Loss 4.5297 LearningRate 0.0248 Epoch: 10 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:06,249-Speed 3412.31 samples/sec Loss 4.6819 LearningRate 0.0248 Epoch: 10 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:28:09,236-Speed 3429.03 samples/sec Loss 4.7165 LearningRate 0.0248 Epoch: 10 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:12,241-Speed 3408.97 samples/sec Loss 4.7414 LearningRate 0.0248 Epoch: 10 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:15,256-Speed 3397.70 samples/sec Loss 4.8183 LearningRate 0.0248 Epoch: 10 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:18,259-Speed 3409.85 samples/sec Loss 4.7285 LearningRate 0.0248 Epoch: 10 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:21,269-Speed 3403.01 samples/sec Loss 4.6553 LearningRate 0.0247 Epoch: 10 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:24,269-Speed 3414.44 samples/sec Loss 4.5583 LearningRate 0.0247 Epoch: 10 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:27,274-Speed 3408.77 samples/sec Loss 4.5726 LearningRate 0.0247 Epoch: 10 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:30,282-Speed 3404.78 samples/sec Loss 4.6446 LearningRate 0.0247 Epoch: 10 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:33,301-Speed 3392.19 samples/sec Loss 4.7572 LearningRate 0.0247 Epoch: 10 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:36,400-Speed 3306.58 samples/sec Loss 4.7753 LearningRate 0.0247 Epoch: 10 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:39,434-Speed 3374.94 samples/sec Loss 4.7068 LearningRate 0.0247 Epoch: 10 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:42,577-Speed 3258.95 samples/sec Loss 4.5844 LearningRate 0.0247 Epoch: 10 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:45,583-Speed 3407.58 samples/sec Loss 4.6641 LearningRate 0.0247 Epoch: 10 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:48,587-Speed 3409.71 samples/sec Loss 4.5873 LearningRate 0.0247 Epoch: 10 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:51,597-Speed 3403.17 samples/sec Loss 4.7562 LearningRate 0.0246 Epoch: 10 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:54,604-Speed 3405.46 samples/sec Loss 4.6922 LearningRate 0.0246 Epoch: 10 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:28:57,616-Speed 3400.47 samples/sec Loss 4.7052 LearningRate 0.0246 Epoch: 10 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:00,634-Speed 3393.98 samples/sec Loss 4.5666 LearningRate 0.0246 Epoch: 10 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:03,651-Speed 3395.09 samples/sec Loss 4.6304 LearningRate 0.0246 Epoch: 10 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:06,659-Speed 3405.68 samples/sec Loss 4.9220 LearningRate 0.0246 Epoch: 10 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:09,697-Speed 3371.31 samples/sec Loss 4.6721 LearningRate 0.0246 Epoch: 10 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:12,699-Speed 3411.63 samples/sec Loss 4.9397 LearningRate 0.0246 Epoch: 10 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:15,715-Speed 3396.36 samples/sec Loss 4.7873 LearningRate 0.0246 Epoch: 10 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:18,725-Speed 3402.76 samples/sec Loss 4.8450 LearningRate 0.0246 Epoch: 10 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:21,715-Speed 3425.78 samples/sec Loss 4.7051 LearningRate 0.0245 Epoch: 10 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:24,717-Speed 3410.84 samples/sec Loss 4.7002 LearningRate 0.0245 Epoch: 10 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:27,724-Speed 3406.49 samples/sec Loss 4.8884 LearningRate 0.0245 Epoch: 10 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:30,740-Speed 3395.92 samples/sec Loss 4.6426 LearningRate 0.0245 Epoch: 10 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:33,750-Speed 3403.21 samples/sec Loss 4.7894 LearningRate 0.0245 Epoch: 10 Global Step: 51080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:36,753-Speed 3411.25 samples/sec Loss 4.7163 LearningRate 0.0245 Epoch: 10 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:39,768-Speed 3396.65 samples/sec Loss 4.7413 LearningRate 0.0245 Epoch: 10 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:42,800-Speed 3378.08 samples/sec Loss 4.6514 LearningRate 0.0245 Epoch: 10 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:45,804-Speed 3410.12 samples/sec Loss 4.6740 LearningRate 0.0245 Epoch: 10 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:48,813-Speed 3403.34 samples/sec Loss 4.8036 LearningRate 0.0245 Epoch: 10 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:29:51,833-Speed 3392.39 samples/sec Loss 4.9902 LearningRate 0.0244 Epoch: 10 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:54,851-Speed 3393.43 samples/sec Loss 4.6933 LearningRate 0.0244 Epoch: 10 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:29:57,855-Speed 3409.66 samples/sec Loss 4.8313 LearningRate 0.0244 Epoch: 10 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:30:00,854-Speed 3415.37 samples/sec Loss 4.8594 LearningRate 0.0244 Epoch: 10 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:03,876-Speed 3389.60 samples/sec Loss 4.8507 LearningRate 0.0244 Epoch: 10 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:06,889-Speed 3399.20 samples/sec Loss 4.9297 LearningRate 0.0244 Epoch: 10 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:09,900-Speed 3402.26 samples/sec Loss 4.9214 LearningRate 0.0244 Epoch: 10 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:12,925-Speed 3386.17 samples/sec Loss 4.7442 LearningRate 0.0244 Epoch: 10 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:15,934-Speed 3403.40 samples/sec Loss 4.9323 LearningRate 0.0244 Epoch: 10 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:18,943-Speed 3403.61 samples/sec Loss 4.9237 LearningRate 0.0244 Epoch: 10 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:21,950-Speed 3406.63 samples/sec Loss 5.0296 LearningRate 0.0244 Epoch: 10 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:24,962-Speed 3400.31 samples/sec Loss 4.9743 LearningRate 0.0243 Epoch: 10 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:27,975-Speed 3400.29 samples/sec Loss 4.9308 LearningRate 0.0243 Epoch: 10 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:30,999-Speed 3386.69 samples/sec Loss 4.8398 LearningRate 0.0243 Epoch: 10 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:30:34,007-Speed 3404.76 samples/sec Loss 4.7871 LearningRate 0.0243 Epoch: 10 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:30:36,996-Speed 3427.37 samples/sec Loss 4.7228 LearningRate 0.0243 Epoch: 10 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:40,016-Speed 3390.86 samples/sec Loss 4.7939 LearningRate 0.0243 Epoch: 10 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:43,025-Speed 3404.26 samples/sec Loss 4.7165 LearningRate 0.0243 Epoch: 10 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:46,035-Speed 3403.67 samples/sec Loss 4.7652 LearningRate 0.0243 Epoch: 10 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:49,055-Speed 3390.61 samples/sec Loss 4.8407 LearningRate 0.0243 Epoch: 10 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:52,071-Speed 3397.49 samples/sec Loss 4.8693 LearningRate 0.0243 Epoch: 10 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:30:55,060-Speed 3425.75 samples/sec Loss 5.0067 LearningRate 0.0242 Epoch: 10 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:30:58,069-Speed 3404.43 samples/sec Loss 4.8763 LearningRate 0.0242 Epoch: 10 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:01,075-Speed 3407.76 samples/sec Loss 4.8710 LearningRate 0.0242 Epoch: 10 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:04,084-Speed 3403.99 samples/sec Loss 4.8019 LearningRate 0.0242 Epoch: 10 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:07,092-Speed 3404.45 samples/sec Loss 4.9777 LearningRate 0.0242 Epoch: 10 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:10,099-Speed 3406.92 samples/sec Loss 4.8559 LearningRate 0.0242 Epoch: 10 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:13,109-Speed 3402.14 samples/sec Loss 4.7787 LearningRate 0.0242 Epoch: 10 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:16,129-Speed 3392.05 samples/sec Loss 4.8659 LearningRate 0.0242 Epoch: 10 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:19,133-Speed 3409.12 samples/sec Loss 4.9659 LearningRate 0.0242 Epoch: 10 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:22,136-Speed 3410.89 samples/sec Loss 4.8534 LearningRate 0.0242 Epoch: 10 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:31:25,147-Speed 3402.18 samples/sec Loss 4.9107 LearningRate 0.0241 Epoch: 10 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:28,152-Speed 3408.29 samples/sec Loss 4.9545 LearningRate 0.0241 Epoch: 10 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:31,168-Speed 3396.60 samples/sec Loss 5.0224 LearningRate 0.0241 Epoch: 10 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:34,172-Speed 3409.06 samples/sec Loss 4.9207 LearningRate 0.0241 Epoch: 10 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:37,186-Speed 3398.40 samples/sec Loss 4.9317 LearningRate 0.0241 Epoch: 10 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:40,202-Speed 3395.84 samples/sec Loss 4.8408 LearningRate 0.0241 Epoch: 10 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:43,221-Speed 3392.18 samples/sec Loss 4.9211 LearningRate 0.0241 Epoch: 10 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:46,235-Speed 3399.37 samples/sec Loss 4.9877 LearningRate 0.0241 Epoch: 10 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:49,249-Speed 3397.82 samples/sec Loss 4.9022 LearningRate 0.0241 Epoch: 10 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:52,261-Speed 3401.61 samples/sec Loss 4.8978 LearningRate 0.0241 Epoch: 10 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:31:55,266-Speed 3407.64 samples/sec Loss 4.9418 LearningRate 0.0241 Epoch: 10 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:31:58,256-Speed 3425.22 samples/sec Loss 4.8729 LearningRate 0.0240 Epoch: 10 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:01,267-Speed 3402.67 samples/sec Loss 4.7624 LearningRate 0.0240 Epoch: 10 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:04,331-Speed 3342.57 samples/sec Loss 4.9444 LearningRate 0.0240 Epoch: 10 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:07,326-Speed 3419.54 samples/sec Loss 4.9435 LearningRate 0.0240 Epoch: 10 Global Step: 51590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:10,336-Speed 3402.62 samples/sec Loss 4.8691 LearningRate 0.0240 Epoch: 10 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:13,353-Speed 3395.12 samples/sec Loss 4.8096 LearningRate 0.0240 Epoch: 10 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:16,419-Speed 3340.76 samples/sec Loss 4.8166 LearningRate 0.0240 Epoch: 10 Global Step: 51620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:19,430-Speed 3402.09 samples/sec Loss 4.9253 LearningRate 0.0240 Epoch: 10 Global Step: 51630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:22,439-Speed 3404.18 samples/sec Loss 4.9480 LearningRate 0.0240 Epoch: 10 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:25,445-Speed 3407.29 samples/sec Loss 4.8409 LearningRate 0.0240 Epoch: 10 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:28,452-Speed 3406.01 samples/sec Loss 4.7084 LearningRate 0.0239 Epoch: 10 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:31,464-Speed 3401.40 samples/sec Loss 4.8450 LearningRate 0.0239 Epoch: 10 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:34,469-Speed 3407.72 samples/sec Loss 5.0178 LearningRate 0.0239 Epoch: 10 Global Step: 51680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:32:37,479-Speed 3403.12 samples/sec Loss 4.9346 LearningRate 0.0239 Epoch: 10 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:40,494-Speed 3397.21 samples/sec Loss 4.9917 LearningRate 0.0239 Epoch: 10 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:43,509-Speed 3396.73 samples/sec Loss 4.9674 LearningRate 0.0239 Epoch: 10 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:46,523-Speed 3399.24 samples/sec Loss 4.9044 LearningRate 0.0239 Epoch: 10 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:49,533-Speed 3402.40 samples/sec Loss 4.9659 LearningRate 0.0239 Epoch: 10 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:52,540-Speed 3407.04 samples/sec Loss 4.9882 LearningRate 0.0239 Epoch: 10 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:55,555-Speed 3396.39 samples/sec Loss 5.0071 LearningRate 0.0239 Epoch: 10 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:32:58,560-Speed 3408.69 samples/sec Loss 5.0316 LearningRate 0.0238 Epoch: 10 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:01,568-Speed 3404.97 samples/sec Loss 5.0152 LearningRate 0.0238 Epoch: 10 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:04,571-Speed 3410.91 samples/sec Loss 4.9560 LearningRate 0.0238 Epoch: 10 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:07,582-Speed 3401.95 samples/sec Loss 5.0500 LearningRate 0.0238 Epoch: 10 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:33:10,843-Speed 3140.59 samples/sec Loss 4.9678 LearningRate 0.0238 Epoch: 10 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:33:13,838-Speed 3420.11 samples/sec Loss 4.9112 LearningRate 0.0238 Epoch: 10 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:16,864-Speed 3385.15 samples/sec Loss 4.9387 LearningRate 0.0238 Epoch: 10 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:19,873-Speed 3404.19 samples/sec Loss 4.9505 LearningRate 0.0238 Epoch: 10 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:22,903-Speed 3380.37 samples/sec Loss 4.8537 LearningRate 0.0238 Epoch: 10 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:25,939-Speed 3373.51 samples/sec Loss 4.9053 LearningRate 0.0238 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:28,956-Speed 3395.08 samples/sec Loss 5.0498 LearningRate 0.0238 Epoch: 10 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:31,963-Speed 3405.97 samples/sec Loss 4.8918 LearningRate 0.0237 Epoch: 10 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:34,978-Speed 3397.64 samples/sec Loss 5.0438 LearningRate 0.0237 Epoch: 10 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:37,990-Speed 3399.82 samples/sec Loss 4.9349 LearningRate 0.0237 Epoch: 10 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:41,019-Speed 3381.18 samples/sec Loss 4.9412 LearningRate 0.0237 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:44,031-Speed 3401.38 samples/sec Loss 4.9088 LearningRate 0.0237 Epoch: 10 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:33:47,021-Speed 3425.95 samples/sec Loss 4.9450 LearningRate 0.0237 Epoch: 10 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:50,027-Speed 3406.50 samples/sec Loss 4.9074 LearningRate 0.0237 Epoch: 10 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:53,054-Speed 3384.11 samples/sec Loss 4.9801 LearningRate 0.0237 Epoch: 10 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:56,079-Speed 3385.61 samples/sec Loss 5.0356 LearningRate 0.0237 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:33:59,083-Speed 3410.03 samples/sec Loss 5.1268 LearningRate 0.0237 Epoch: 10 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:34:02,095-Speed 3400.79 samples/sec Loss 5.0340 LearningRate 0.0236 Epoch: 10 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:34:05,112-Speed 3394.80 samples/sec Loss 5.0117 LearningRate 0.0236 Epoch: 10 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:34:08,136-Speed 3386.83 samples/sec Loss 4.8604 LearningRate 0.0236 Epoch: 10 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:34:11,151-Speed 3397.51 samples/sec Loss 4.8939 LearningRate 0.0236 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:34:55,241-[lfw][52000]XNorm: 20.781405 Training: 2022-04-11 04:34:55,242-[lfw][52000]Accuracy-Flip: 0.99767+-0.00249 Training: 2022-04-11 04:34:55,243-[lfw][52000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:35:46,284-[cfp_fp][52000]XNorm: 18.933418 Training: 2022-04-11 04:35:46,284-[cfp_fp][52000]Accuracy-Flip: 0.97357+-0.00788 Training: 2022-04-11 04:35:46,285-[cfp_fp][52000]Accuracy-Highest: 0.97629 Training: 2022-04-11 04:36:30,279-[agedb_30][52000]XNorm: 20.852038 Training: 2022-04-11 04:36:30,280-[agedb_30][52000]Accuracy-Flip: 0.97883+-0.00563 Training: 2022-04-11 04:36:30,280-[agedb_30][52000]Accuracy-Highest: 0.98083 Training: 2022-04-11 04:36:33,277-Speed 72.05 samples/sec Loss 4.9141 LearningRate 0.0236 Epoch: 10 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:36,271-Speed 3420.57 samples/sec Loss 5.0383 LearningRate 0.0236 Epoch: 10 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:36:39,250-Speed 3437.85 samples/sec Loss 4.8852 LearningRate 0.0236 Epoch: 10 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:42,259-Speed 3405.08 samples/sec Loss 4.9775 LearningRate 0.0236 Epoch: 10 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:45,249-Speed 3425.45 samples/sec Loss 4.9375 LearningRate 0.0236 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:48,253-Speed 3409.77 samples/sec Loss 5.0643 LearningRate 0.0236 Epoch: 10 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:51,248-Speed 3419.17 samples/sec Loss 4.9804 LearningRate 0.0235 Epoch: 10 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:54,247-Speed 3415.10 samples/sec Loss 5.0190 LearningRate 0.0235 Epoch: 10 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:36:57,239-Speed 3424.31 samples/sec Loss 4.9211 LearningRate 0.0235 Epoch: 10 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:00,233-Speed 3421.04 samples/sec Loss 4.9420 LearningRate 0.0235 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:03,229-Speed 3417.99 samples/sec Loss 4.9644 LearningRate 0.0235 Epoch: 10 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:06,228-Speed 3416.03 samples/sec Loss 4.8959 LearningRate 0.0235 Epoch: 10 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:09,230-Speed 3411.30 samples/sec Loss 4.9467 LearningRate 0.0235 Epoch: 10 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:37:12,223-Speed 3422.16 samples/sec Loss 5.0376 LearningRate 0.0235 Epoch: 10 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:37:15,210-Speed 3429.18 samples/sec Loss 4.9469 LearningRate 0.0235 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:18,214-Speed 3410.59 samples/sec Loss 4.8939 LearningRate 0.0235 Epoch: 10 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:21,214-Speed 3413.24 samples/sec Loss 4.8986 LearningRate 0.0235 Epoch: 10 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:24,216-Speed 3412.08 samples/sec Loss 4.8898 LearningRate 0.0234 Epoch: 10 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:27,215-Speed 3415.37 samples/sec Loss 5.0414 LearningRate 0.0234 Epoch: 10 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:30,233-Speed 3393.20 samples/sec Loss 5.0126 LearningRate 0.0234 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:33,234-Speed 3414.15 samples/sec Loss 4.9036 LearningRate 0.0234 Epoch: 10 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:36,227-Speed 3421.34 samples/sec Loss 4.9310 LearningRate 0.0234 Epoch: 10 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:39,252-Speed 3386.25 samples/sec Loss 5.0805 LearningRate 0.0234 Epoch: 10 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:42,249-Speed 3418.00 samples/sec Loss 4.9761 LearningRate 0.0234 Epoch: 10 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:45,229-Speed 3437.19 samples/sec Loss 4.8789 LearningRate 0.0234 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:48,230-Speed 3413.41 samples/sec Loss 4.8951 LearningRate 0.0234 Epoch: 10 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:51,231-Speed 3412.78 samples/sec Loss 5.0235 LearningRate 0.0234 Epoch: 10 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:54,240-Speed 3403.33 samples/sec Loss 4.9104 LearningRate 0.0233 Epoch: 10 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:37:57,245-Speed 3408.87 samples/sec Loss 5.0145 LearningRate 0.0233 Epoch: 10 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:00,244-Speed 3415.10 samples/sec Loss 5.0103 LearningRate 0.0233 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:03,268-Speed 3387.00 samples/sec Loss 4.8769 LearningRate 0.0233 Epoch: 10 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:06,277-Speed 3403.98 samples/sec Loss 4.9590 LearningRate 0.0233 Epoch: 10 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:09,276-Speed 3414.60 samples/sec Loss 5.0087 LearningRate 0.0233 Epoch: 10 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:12,276-Speed 3415.30 samples/sec Loss 4.9326 LearningRate 0.0233 Epoch: 10 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:15,260-Speed 3432.12 samples/sec Loss 4.9625 LearningRate 0.0233 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:18,265-Speed 3409.22 samples/sec Loss 5.0866 LearningRate 0.0233 Epoch: 10 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:21,261-Speed 3418.09 samples/sec Loss 4.9350 LearningRate 0.0233 Epoch: 10 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:24,266-Speed 3408.54 samples/sec Loss 4.9281 LearningRate 0.0233 Epoch: 10 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:27,266-Speed 3414.30 samples/sec Loss 4.8073 LearningRate 0.0232 Epoch: 10 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:30,273-Speed 3405.57 samples/sec Loss 4.9847 LearningRate 0.0232 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:33,273-Speed 3415.25 samples/sec Loss 4.9745 LearningRate 0.0232 Epoch: 10 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:36,272-Speed 3414.46 samples/sec Loss 5.0615 LearningRate 0.0232 Epoch: 10 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:39,273-Speed 3413.95 samples/sec Loss 4.9935 LearningRate 0.0232 Epoch: 10 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:42,269-Speed 3418.75 samples/sec Loss 4.9407 LearningRate 0.0232 Epoch: 10 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:45,247-Speed 3439.22 samples/sec Loss 5.0673 LearningRate 0.0232 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:48,246-Speed 3415.12 samples/sec Loss 5.0000 LearningRate 0.0232 Epoch: 10 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:51,244-Speed 3416.76 samples/sec Loss 4.9511 LearningRate 0.0232 Epoch: 10 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:54,243-Speed 3414.90 samples/sec Loss 4.9782 LearningRate 0.0232 Epoch: 10 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:38:57,247-Speed 3410.16 samples/sec Loss 4.9547 LearningRate 0.0231 Epoch: 10 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:00,246-Speed 3414.49 samples/sec Loss 4.8809 LearningRate 0.0231 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:03,307-Speed 3347.04 samples/sec Loss 4.8960 LearningRate 0.0231 Epoch: 10 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:06,353-Speed 3361.79 samples/sec Loss 5.0811 LearningRate 0.0231 Epoch: 10 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:09,357-Speed 3410.15 samples/sec Loss 5.0577 LearningRate 0.0231 Epoch: 10 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:12,364-Speed 3406.62 samples/sec Loss 5.0434 LearningRate 0.0231 Epoch: 10 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:15,365-Speed 3412.47 samples/sec Loss 5.0207 LearningRate 0.0231 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:39:18,367-Speed 3411.82 samples/sec Loss 4.9497 LearningRate 0.0231 Epoch: 10 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:39:21,348-Speed 3436.16 samples/sec Loss 4.9837 LearningRate 0.0231 Epoch: 10 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:24,347-Speed 3415.17 samples/sec Loss 4.9597 LearningRate 0.0231 Epoch: 10 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:27,347-Speed 3414.35 samples/sec Loss 4.9031 LearningRate 0.0231 Epoch: 10 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:30,353-Speed 3406.80 samples/sec Loss 4.9426 LearningRate 0.0230 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:33,358-Speed 3408.43 samples/sec Loss 4.9936 LearningRate 0.0230 Epoch: 10 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:36,360-Speed 3412.56 samples/sec Loss 4.9797 LearningRate 0.0230 Epoch: 10 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:39,360-Speed 3413.84 samples/sec Loss 4.9241 LearningRate 0.0230 Epoch: 10 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:42,390-Speed 3380.50 samples/sec Loss 5.0645 LearningRate 0.0230 Epoch: 10 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:45,395-Speed 3408.74 samples/sec Loss 4.9337 LearningRate 0.0230 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:48,400-Speed 3408.04 samples/sec Loss 4.9649 LearningRate 0.0230 Epoch: 10 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:51,397-Speed 3417.13 samples/sec Loss 4.8893 LearningRate 0.0230 Epoch: 10 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:54,400-Speed 3410.91 samples/sec Loss 5.1479 LearningRate 0.0230 Epoch: 10 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:39:57,401-Speed 3413.64 samples/sec Loss 4.8527 LearningRate 0.0230 Epoch: 10 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:00,402-Speed 3412.30 samples/sec Loss 5.0722 LearningRate 0.0229 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:03,405-Speed 3410.86 samples/sec Loss 4.9488 LearningRate 0.0229 Epoch: 10 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:06,410-Speed 3408.37 samples/sec Loss 4.8653 LearningRate 0.0229 Epoch: 10 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:09,408-Speed 3416.97 samples/sec Loss 4.9467 LearningRate 0.0229 Epoch: 10 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:12,415-Speed 3406.62 samples/sec Loss 4.9132 LearningRate 0.0229 Epoch: 10 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:15,426-Speed 3401.86 samples/sec Loss 4.8547 LearningRate 0.0229 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:18,427-Speed 3412.37 samples/sec Loss 5.0348 LearningRate 0.0229 Epoch: 10 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:21,428-Speed 3413.23 samples/sec Loss 4.9302 LearningRate 0.0229 Epoch: 10 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:40:24,420-Speed 3422.78 samples/sec Loss 4.8929 LearningRate 0.0229 Epoch: 10 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:27,442-Speed 3389.50 samples/sec Loss 4.9828 LearningRate 0.0229 Epoch: 10 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:30,447-Speed 3408.93 samples/sec Loss 5.0531 LearningRate 0.0229 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:33,460-Speed 3399.46 samples/sec Loss 4.9461 LearningRate 0.0228 Epoch: 10 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:36,467-Speed 3405.81 samples/sec Loss 5.0507 LearningRate 0.0228 Epoch: 10 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:39,476-Speed 3404.16 samples/sec Loss 5.0430 LearningRate 0.0228 Epoch: 10 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:42,477-Speed 3413.51 samples/sec Loss 5.0624 LearningRate 0.0228 Epoch: 10 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:45,478-Speed 3412.93 samples/sec Loss 5.0590 LearningRate 0.0228 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:48,501-Speed 3388.27 samples/sec Loss 5.0686 LearningRate 0.0228 Epoch: 10 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:51,727-Speed 3174.68 samples/sec Loss 4.9287 LearningRate 0.0228 Epoch: 10 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:54,717-Speed 3425.62 samples/sec Loss 5.0021 LearningRate 0.0228 Epoch: 10 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:40:57,721-Speed 3410.00 samples/sec Loss 5.0470 LearningRate 0.0228 Epoch: 10 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:00,728-Speed 3405.50 samples/sec Loss 5.1343 LearningRate 0.0228 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:03,737-Speed 3403.83 samples/sec Loss 4.8304 LearningRate 0.0227 Epoch: 10 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:06,749-Speed 3401.30 samples/sec Loss 4.9633 LearningRate 0.0227 Epoch: 10 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:09,768-Speed 3392.44 samples/sec Loss 4.9725 LearningRate 0.0227 Epoch: 10 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:12,773-Speed 3409.24 samples/sec Loss 4.9285 LearningRate 0.0227 Epoch: 10 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:15,787-Speed 3397.76 samples/sec Loss 5.0767 LearningRate 0.0227 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:18,792-Speed 3407.86 samples/sec Loss 5.0870 LearningRate 0.0227 Epoch: 10 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:21,793-Speed 3413.38 samples/sec Loss 4.8788 LearningRate 0.0227 Epoch: 10 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:24,797-Speed 3410.00 samples/sec Loss 5.0608 LearningRate 0.0227 Epoch: 10 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:41:27,782-Speed 3431.58 samples/sec Loss 5.0850 LearningRate 0.0227 Epoch: 10 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:30,791-Speed 3403.75 samples/sec Loss 4.9097 LearningRate 0.0227 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:33,798-Speed 3406.29 samples/sec Loss 5.0726 LearningRate 0.0227 Epoch: 10 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:36,806-Speed 3404.87 samples/sec Loss 5.0306 LearningRate 0.0226 Epoch: 10 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:41:39,802-Speed 3418.72 samples/sec Loss 5.0932 LearningRate 0.0226 Epoch: 10 Global Step: 53030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:42,819-Speed 3395.49 samples/sec Loss 4.9456 LearningRate 0.0226 Epoch: 10 Global Step: 53040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:45,819-Speed 3413.28 samples/sec Loss 4.9806 LearningRate 0.0226 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:48,838-Speed 3392.59 samples/sec Loss 4.9601 LearningRate 0.0226 Epoch: 10 Global Step: 53060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:51,847-Speed 3404.08 samples/sec Loss 4.9745 LearningRate 0.0226 Epoch: 10 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:54,853-Speed 3407.21 samples/sec Loss 5.0104 LearningRate 0.0226 Epoch: 10 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:41:57,857-Speed 3410.98 samples/sec Loss 4.9820 LearningRate 0.0226 Epoch: 10 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:42:00,872-Speed 3397.00 samples/sec Loss 4.9046 LearningRate 0.0226 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:42:03,954-Speed 3322.99 samples/sec Loss 4.9482 LearningRate 0.0226 Epoch: 10 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:42:06,956-Speed 3411.87 samples/sec Loss 5.0468 LearningRate 0.0226 Epoch: 10 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:42:09,964-Speed 3405.45 samples/sec Loss 5.0075 LearningRate 0.0225 Epoch: 10 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:12,970-Speed 3407.88 samples/sec Loss 4.9478 LearningRate 0.0225 Epoch: 10 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:15,975-Speed 3407.49 samples/sec Loss 5.0380 LearningRate 0.0225 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:18,982-Speed 3406.23 samples/sec Loss 4.9823 LearningRate 0.0225 Epoch: 10 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:21,990-Speed 3404.87 samples/sec Loss 5.1045 LearningRate 0.0225 Epoch: 10 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:25,045-Speed 3352.46 samples/sec Loss 4.8707 LearningRate 0.0225 Epoch: 10 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:28,061-Speed 3397.04 samples/sec Loss 4.9446 LearningRate 0.0225 Epoch: 10 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:31,081-Speed 3391.11 samples/sec Loss 4.9993 LearningRate 0.0225 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:34,082-Speed 3413.15 samples/sec Loss 4.9400 LearningRate 0.0225 Epoch: 10 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:37,085-Speed 3411.00 samples/sec Loss 5.0104 LearningRate 0.0225 Epoch: 10 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:40,110-Speed 3385.40 samples/sec Loss 5.0056 LearningRate 0.0224 Epoch: 10 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:42:43,107-Speed 3418.12 samples/sec Loss 4.9401 LearningRate 0.0224 Epoch: 10 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:46,121-Speed 3398.55 samples/sec Loss 4.9972 LearningRate 0.0224 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:49,126-Speed 3408.16 samples/sec Loss 4.9534 LearningRate 0.0224 Epoch: 10 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:52,134-Speed 3405.23 samples/sec Loss 5.0239 LearningRate 0.0224 Epoch: 10 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:55,144-Speed 3402.09 samples/sec Loss 4.9254 LearningRate 0.0224 Epoch: 10 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:42:58,161-Speed 3395.76 samples/sec Loss 4.9456 LearningRate 0.0224 Epoch: 10 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:01,165-Speed 3409.21 samples/sec Loss 4.9764 LearningRate 0.0224 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:04,176-Speed 3402.31 samples/sec Loss 5.0811 LearningRate 0.0224 Epoch: 10 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:07,193-Speed 3394.85 samples/sec Loss 4.7841 LearningRate 0.0224 Epoch: 10 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:10,194-Speed 3412.47 samples/sec Loss 4.9930 LearningRate 0.0224 Epoch: 10 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:13,177-Speed 3433.71 samples/sec Loss 4.9492 LearningRate 0.0223 Epoch: 10 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:16,180-Speed 3410.57 samples/sec Loss 5.0410 LearningRate 0.0223 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:19,198-Speed 3393.66 samples/sec Loss 4.9984 LearningRate 0.0223 Epoch: 10 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:22,201-Speed 3410.79 samples/sec Loss 4.9225 LearningRate 0.0223 Epoch: 10 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:25,213-Speed 3401.17 samples/sec Loss 4.9152 LearningRate 0.0223 Epoch: 10 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:28,248-Speed 3374.39 samples/sec Loss 4.9206 LearningRate 0.0223 Epoch: 10 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:31,258-Speed 3402.69 samples/sec Loss 4.9335 LearningRate 0.0223 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:34,292-Speed 3376.39 samples/sec Loss 4.7935 LearningRate 0.0223 Epoch: 10 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:37,321-Speed 3382.34 samples/sec Loss 5.0939 LearningRate 0.0223 Epoch: 10 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:40,329-Speed 3404.40 samples/sec Loss 5.0486 LearningRate 0.0223 Epoch: 10 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:43,358-Speed 3381.65 samples/sec Loss 5.0393 LearningRate 0.0223 Epoch: 10 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:46,365-Speed 3406.32 samples/sec Loss 5.1093 LearningRate 0.0222 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:49,393-Speed 3382.61 samples/sec Loss 4.8274 LearningRate 0.0222 Epoch: 10 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:52,411-Speed 3393.97 samples/sec Loss 4.9452 LearningRate 0.0222 Epoch: 10 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:43:55,399-Speed 3427.73 samples/sec Loss 5.0494 LearningRate 0.0222 Epoch: 10 Global Step: 53480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:43:58,409-Speed 3402.95 samples/sec Loss 5.1245 LearningRate 0.0222 Epoch: 10 Global Step: 53490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:01,414-Speed 3408.67 samples/sec Loss 5.0537 LearningRate 0.0222 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:04,455-Speed 3368.03 samples/sec Loss 5.0036 LearningRate 0.0222 Epoch: 10 Global Step: 53510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:07,460-Speed 3408.35 samples/sec Loss 4.9213 LearningRate 0.0222 Epoch: 10 Global Step: 53520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:10,461-Speed 3412.70 samples/sec Loss 5.1556 LearningRate 0.0222 Epoch: 10 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:13,464-Speed 3410.81 samples/sec Loss 4.8925 LearningRate 0.0222 Epoch: 10 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:16,468-Speed 3410.10 samples/sec Loss 5.0564 LearningRate 0.0222 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:19,472-Speed 3409.07 samples/sec Loss 4.9589 LearningRate 0.0221 Epoch: 10 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:22,480-Speed 3405.86 samples/sec Loss 5.0470 LearningRate 0.0221 Epoch: 10 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:25,482-Speed 3410.90 samples/sec Loss 4.9270 LearningRate 0.0221 Epoch: 10 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:28,493-Speed 3402.86 samples/sec Loss 5.0261 LearningRate 0.0221 Epoch: 10 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:31,498-Speed 3408.04 samples/sec Loss 4.9262 LearningRate 0.0221 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:34,514-Speed 3395.83 samples/sec Loss 4.8647 LearningRate 0.0221 Epoch: 10 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:37,536-Speed 3389.43 samples/sec Loss 4.9429 LearningRate 0.0221 Epoch: 10 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:40,565-Speed 3381.51 samples/sec Loss 4.9172 LearningRate 0.0221 Epoch: 10 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:43,572-Speed 3405.80 samples/sec Loss 5.0106 LearningRate 0.0221 Epoch: 10 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:44:46,560-Speed 3427.61 samples/sec Loss 5.1227 LearningRate 0.0221 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:49,594-Speed 3375.72 samples/sec Loss 4.9951 LearningRate 0.0220 Epoch: 10 Global Step: 53660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:52,604-Speed 3403.92 samples/sec Loss 5.0481 LearningRate 0.0220 Epoch: 10 Global Step: 53670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:55,609-Speed 3407.93 samples/sec Loss 5.0842 LearningRate 0.0220 Epoch: 10 Global Step: 53680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:44:58,621-Speed 3404.69 samples/sec Loss 4.8997 LearningRate 0.0220 Epoch: 10 Global Step: 53690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:01,633-Speed 3400.75 samples/sec Loss 4.9605 LearningRate 0.0220 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:04,644-Speed 3402.22 samples/sec Loss 4.9400 LearningRate 0.0220 Epoch: 10 Global Step: 53710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:07,665-Speed 3390.02 samples/sec Loss 4.8760 LearningRate 0.0220 Epoch: 10 Global Step: 53720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:10,667-Speed 3411.62 samples/sec Loss 4.9400 LearningRate 0.0220 Epoch: 10 Global Step: 53730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:13,674-Speed 3406.10 samples/sec Loss 4.8907 LearningRate 0.0220 Epoch: 10 Global Step: 53740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:16,680-Speed 3407.39 samples/sec Loss 4.9564 LearningRate 0.0220 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:19,684-Speed 3409.12 samples/sec Loss 4.9564 LearningRate 0.0220 Epoch: 10 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:22,692-Speed 3406.09 samples/sec Loss 4.9281 LearningRate 0.0219 Epoch: 10 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:25,744-Speed 3355.31 samples/sec Loss 5.0374 LearningRate 0.0219 Epoch: 10 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:28,790-Speed 3362.90 samples/sec Loss 4.9916 LearningRate 0.0219 Epoch: 10 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:31,800-Speed 3403.41 samples/sec Loss 4.9444 LearningRate 0.0219 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:34,804-Speed 3409.23 samples/sec Loss 4.9298 LearningRate 0.0219 Epoch: 10 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:37,823-Speed 3392.94 samples/sec Loss 4.8925 LearningRate 0.0219 Epoch: 10 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:40,840-Speed 3394.91 samples/sec Loss 5.0406 LearningRate 0.0219 Epoch: 10 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:43,862-Speed 3388.78 samples/sec Loss 4.9749 LearningRate 0.0219 Epoch: 10 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:46,854-Speed 3423.64 samples/sec Loss 4.9372 LearningRate 0.0219 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:49,861-Speed 3406.43 samples/sec Loss 5.0096 LearningRate 0.0219 Epoch: 10 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:52,867-Speed 3406.83 samples/sec Loss 5.0402 LearningRate 0.0219 Epoch: 10 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:45:55,864-Speed 3417.83 samples/sec Loss 4.9642 LearningRate 0.0218 Epoch: 10 Global Step: 53880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:45:58,867-Speed 3411.51 samples/sec Loss 4.9921 LearningRate 0.0218 Epoch: 10 Global Step: 53890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:01,868-Speed 3413.14 samples/sec Loss 5.1308 LearningRate 0.0218 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:04,879-Speed 3400.57 samples/sec Loss 4.8895 LearningRate 0.0218 Epoch: 10 Global Step: 53910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:07,909-Speed 3380.78 samples/sec Loss 5.0967 LearningRate 0.0218 Epoch: 10 Global Step: 53920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:10,913-Speed 3410.04 samples/sec Loss 4.9876 LearningRate 0.0218 Epoch: 10 Global Step: 53930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:13,925-Speed 3400.84 samples/sec Loss 5.0278 LearningRate 0.0218 Epoch: 10 Global Step: 53940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:16,941-Speed 3395.80 samples/sec Loss 4.9184 LearningRate 0.0218 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:19,961-Speed 3391.15 samples/sec Loss 5.1242 LearningRate 0.0218 Epoch: 10 Global Step: 53960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:22,965-Speed 3410.02 samples/sec Loss 5.0101 LearningRate 0.0218 Epoch: 10 Global Step: 53970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:46:25,985-Speed 3390.92 samples/sec Loss 4.8933 LearningRate 0.0218 Epoch: 10 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:46:29,001-Speed 3396.78 samples/sec Loss 5.0171 LearningRate 0.0217 Epoch: 10 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:46:32,002-Speed 3412.62 samples/sec Loss 5.0299 LearningRate 0.0217 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:47:16,183-[lfw][54000]XNorm: 21.819049 Training: 2022-04-11 04:47:16,184-[lfw][54000]Accuracy-Flip: 0.99783+-0.00248 Training: 2022-04-11 04:47:16,184-[lfw][54000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:48:07,548-[cfp_fp][54000]XNorm: 19.907261 Training: 2022-04-11 04:48:07,549-[cfp_fp][54000]Accuracy-Flip: 0.97757+-0.00836 Training: 2022-04-11 04:48:07,549-[cfp_fp][54000]Accuracy-Highest: 0.97757 Training: 2022-04-11 04:48:51,737-[agedb_30][54000]XNorm: 21.736796 Training: 2022-04-11 04:48:51,738-[agedb_30][54000]Accuracy-Flip: 0.97967+-0.00586 Training: 2022-04-11 04:48:51,739-[agedb_30][54000]Accuracy-Highest: 0.98083 Training: 2022-04-11 04:48:54,742-Speed 71.74 samples/sec Loss 5.1063 LearningRate 0.0217 Epoch: 10 Global Step: 54010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:48:57,725-Speed 3433.97 samples/sec Loss 4.9365 LearningRate 0.0217 Epoch: 10 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:00,692-Speed 3452.21 samples/sec Loss 4.7887 LearningRate 0.0217 Epoch: 10 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:03,681-Speed 3426.88 samples/sec Loss 4.9741 LearningRate 0.0217 Epoch: 10 Global Step: 54040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:07,976-Speed 2384.42 samples/sec Loss 4.8590 LearningRate 0.0217 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:10,962-Speed 3430.46 samples/sec Loss 5.0115 LearningRate 0.0217 Epoch: 10 Global Step: 54060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:13,960-Speed 3416.62 samples/sec Loss 4.9356 LearningRate 0.0217 Epoch: 10 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:16,949-Speed 3426.61 samples/sec Loss 4.8896 LearningRate 0.0217 Epoch: 10 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:19,948-Speed 3414.65 samples/sec Loss 4.8757 LearningRate 0.0217 Epoch: 10 Global Step: 54090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:22,940-Speed 3423.71 samples/sec Loss 5.0494 LearningRate 0.0216 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:25,964-Speed 3386.63 samples/sec Loss 4.8276 LearningRate 0.0216 Epoch: 10 Global Step: 54110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:28,969-Speed 3408.57 samples/sec Loss 4.9249 LearningRate 0.0216 Epoch: 10 Global Step: 54120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:49:31,982-Speed 3399.38 samples/sec Loss 4.9601 LearningRate 0.0216 Epoch: 10 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:34,999-Speed 3395.23 samples/sec Loss 4.8796 LearningRate 0.0216 Epoch: 10 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:38,031-Speed 3377.97 samples/sec Loss 4.8898 LearningRate 0.0216 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:41,089-Speed 3350.13 samples/sec Loss 4.8057 LearningRate 0.0216 Epoch: 10 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:44,090-Speed 3413.10 samples/sec Loss 4.9586 LearningRate 0.0216 Epoch: 10 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:47,092-Speed 3411.75 samples/sec Loss 4.9412 LearningRate 0.0216 Epoch: 10 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:50,097-Speed 3408.43 samples/sec Loss 4.9345 LearningRate 0.0216 Epoch: 10 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:53,100-Speed 3410.52 samples/sec Loss 4.9143 LearningRate 0.0215 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:56,101-Speed 3413.36 samples/sec Loss 4.9091 LearningRate 0.0215 Epoch: 10 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:49:59,102-Speed 3413.03 samples/sec Loss 4.9994 LearningRate 0.0215 Epoch: 10 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:02,120-Speed 3392.98 samples/sec Loss 5.0859 LearningRate 0.0215 Epoch: 10 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:50:05,108-Speed 3428.56 samples/sec Loss 4.8523 LearningRate 0.0215 Epoch: 10 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:08,119-Speed 3402.35 samples/sec Loss 4.9021 LearningRate 0.0215 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:11,169-Speed 3358.09 samples/sec Loss 4.9629 LearningRate 0.0215 Epoch: 10 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:14,325-Speed 3245.47 samples/sec Loss 4.9102 LearningRate 0.0215 Epoch: 10 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:17,326-Speed 3412.75 samples/sec Loss 5.0231 LearningRate 0.0215 Epoch: 10 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:20,337-Speed 3401.13 samples/sec Loss 4.8584 LearningRate 0.0215 Epoch: 10 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:23,335-Speed 3417.06 samples/sec Loss 4.9292 LearningRate 0.0215 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:26,348-Speed 3398.97 samples/sec Loss 4.9293 LearningRate 0.0214 Epoch: 10 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:29,354-Speed 3407.38 samples/sec Loss 5.0342 LearningRate 0.0214 Epoch: 10 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:32,351-Speed 3417.95 samples/sec Loss 5.0755 LearningRate 0.0214 Epoch: 10 Global Step: 54330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:35,354-Speed 3410.51 samples/sec Loss 4.8958 LearningRate 0.0214 Epoch: 10 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:38,373-Speed 3393.37 samples/sec Loss 4.7450 LearningRate 0.0214 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:41,399-Speed 3385.23 samples/sec Loss 5.0894 LearningRate 0.0214 Epoch: 10 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:44,396-Speed 3417.70 samples/sec Loss 5.0690 LearningRate 0.0214 Epoch: 10 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:47,408-Speed 3400.41 samples/sec Loss 4.9703 LearningRate 0.0214 Epoch: 10 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:50,408-Speed 3413.66 samples/sec Loss 4.9072 LearningRate 0.0214 Epoch: 10 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:53,410-Speed 3411.90 samples/sec Loss 5.0641 LearningRate 0.0214 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:50:56,413-Speed 3410.59 samples/sec Loss 4.9589 LearningRate 0.0214 Epoch: 10 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:00,309-Speed 2628.97 samples/sec Loss 4.9490 LearningRate 0.0213 Epoch: 10 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:03,310-Speed 3413.26 samples/sec Loss 4.8673 LearningRate 0.0213 Epoch: 10 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:06,309-Speed 3416.07 samples/sec Loss 4.9429 LearningRate 0.0213 Epoch: 10 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:51:09,289-Speed 3437.16 samples/sec Loss 4.8971 LearningRate 0.0213 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:12,274-Speed 3430.55 samples/sec Loss 4.8773 LearningRate 0.0213 Epoch: 10 Global Step: 54460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:15,267-Speed 3421.62 samples/sec Loss 4.9648 LearningRate 0.0213 Epoch: 10 Global Step: 54470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:18,269-Speed 3412.94 samples/sec Loss 4.9691 LearningRate 0.0213 Epoch: 10 Global Step: 54480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:21,266-Speed 3417.08 samples/sec Loss 4.8448 LearningRate 0.0213 Epoch: 10 Global Step: 54490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:24,281-Speed 3397.21 samples/sec Loss 4.9700 LearningRate 0.0213 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:27,279-Speed 3416.93 samples/sec Loss 4.9958 LearningRate 0.0213 Epoch: 10 Global Step: 54510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:30,282-Speed 3410.84 samples/sec Loss 4.8696 LearningRate 0.0213 Epoch: 10 Global Step: 54520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:33,279-Speed 3417.43 samples/sec Loss 4.7803 LearningRate 0.0212 Epoch: 10 Global Step: 54530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:36,276-Speed 3417.82 samples/sec Loss 4.9532 LearningRate 0.0212 Epoch: 10 Global Step: 54540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:39,275-Speed 3415.06 samples/sec Loss 4.9365 LearningRate 0.0212 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:42,274-Speed 3415.46 samples/sec Loss 4.8754 LearningRate 0.0212 Epoch: 10 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:45,273-Speed 3415.64 samples/sec Loss 4.8800 LearningRate 0.0212 Epoch: 10 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:48,279-Speed 3407.40 samples/sec Loss 5.0436 LearningRate 0.0212 Epoch: 10 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:51,283-Speed 3408.98 samples/sec Loss 4.9230 LearningRate 0.0212 Epoch: 10 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:51:54,268-Speed 3431.75 samples/sec Loss 4.9240 LearningRate 0.0212 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:51:57,285-Speed 3394.92 samples/sec Loss 4.9866 LearningRate 0.0212 Epoch: 10 Global Step: 54610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:00,281-Speed 3418.20 samples/sec Loss 4.8433 LearningRate 0.0212 Epoch: 10 Global Step: 54620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:03,295-Speed 3399.51 samples/sec Loss 4.9637 LearningRate 0.0212 Epoch: 10 Global Step: 54630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:06,300-Speed 3407.59 samples/sec Loss 4.8943 LearningRate 0.0211 Epoch: 10 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:09,317-Speed 3394.91 samples/sec Loss 4.6922 LearningRate 0.0211 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:12,313-Speed 3418.30 samples/sec Loss 4.9847 LearningRate 0.0211 Epoch: 10 Global Step: 54660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:15,308-Speed 3420.28 samples/sec Loss 4.8643 LearningRate 0.0211 Epoch: 10 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:18,312-Speed 3409.73 samples/sec Loss 5.0499 LearningRate 0.0211 Epoch: 10 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:21,325-Speed 3398.76 samples/sec Loss 4.9068 LearningRate 0.0211 Epoch: 10 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:24,349-Speed 3388.01 samples/sec Loss 4.7960 LearningRate 0.0211 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:52:27,333-Speed 3432.52 samples/sec Loss 4.8608 LearningRate 0.0211 Epoch: 10 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:30,345-Speed 3400.90 samples/sec Loss 5.0128 LearningRate 0.0211 Epoch: 10 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:33,343-Speed 3415.85 samples/sec Loss 4.8979 LearningRate 0.0211 Epoch: 10 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:36,341-Speed 3416.68 samples/sec Loss 4.8639 LearningRate 0.0211 Epoch: 10 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:39,362-Speed 3390.43 samples/sec Loss 5.0897 LearningRate 0.0210 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:42,384-Speed 3388.90 samples/sec Loss 4.9104 LearningRate 0.0210 Epoch: 10 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:45,388-Speed 3409.84 samples/sec Loss 4.9123 LearningRate 0.0210 Epoch: 10 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:48,391-Speed 3410.60 samples/sec Loss 4.9150 LearningRate 0.0210 Epoch: 10 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:51,404-Speed 3400.30 samples/sec Loss 4.8086 LearningRate 0.0210 Epoch: 10 Global Step: 54790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:54,404-Speed 3413.98 samples/sec Loss 4.9578 LearningRate 0.0210 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:52:57,402-Speed 3416.37 samples/sec Loss 4.9343 LearningRate 0.0210 Epoch: 10 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:53:00,406-Speed 3409.80 samples/sec Loss 4.9645 LearningRate 0.0210 Epoch: 10 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:53:03,429-Speed 3387.72 samples/sec Loss 4.8923 LearningRate 0.0210 Epoch: 10 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:53:06,432-Speed 3410.87 samples/sec Loss 4.9287 LearningRate 0.0210 Epoch: 10 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:53:09,416-Speed 3432.38 samples/sec Loss 4.9575 LearningRate 0.0210 Epoch: 10 Global Step: 54850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:12,416-Speed 3414.09 samples/sec Loss 5.0175 LearningRate 0.0209 Epoch: 10 Global Step: 54860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:15,420-Speed 3409.82 samples/sec Loss 4.8565 LearningRate 0.0209 Epoch: 10 Global Step: 54870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:18,423-Speed 3410.92 samples/sec Loss 4.9019 LearningRate 0.0209 Epoch: 10 Global Step: 54880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:21,420-Speed 3417.65 samples/sec Loss 4.8946 LearningRate 0.0209 Epoch: 10 Global Step: 54890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:24,424-Speed 3410.15 samples/sec Loss 4.9844 LearningRate 0.0209 Epoch: 10 Global Step: 54900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:27,469-Speed 3363.52 samples/sec Loss 4.8330 LearningRate 0.0209 Epoch: 10 Global Step: 54910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:30,492-Speed 3387.83 samples/sec Loss 4.8186 LearningRate 0.0209 Epoch: 10 Global Step: 54920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:33,496-Speed 3409.79 samples/sec Loss 4.8968 LearningRate 0.0209 Epoch: 10 Global Step: 54930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:36,500-Speed 3409.57 samples/sec Loss 4.8959 LearningRate 0.0209 Epoch: 10 Global Step: 54940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:39,506-Speed 3407.88 samples/sec Loss 4.9282 LearningRate 0.0209 Epoch: 10 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:53:42,498-Speed 3422.33 samples/sec Loss 4.9689 LearningRate 0.0209 Epoch: 10 Global Step: 54960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:45,504-Speed 3407.85 samples/sec Loss 4.9189 LearningRate 0.0208 Epoch: 10 Global Step: 54970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:48,504-Speed 3414.83 samples/sec Loss 4.9289 LearningRate 0.0208 Epoch: 10 Global Step: 54980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:51,632-Speed 3273.73 samples/sec Loss 4.8481 LearningRate 0.0208 Epoch: 10 Global Step: 54990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:54,740-Speed 3295.66 samples/sec Loss 4.9542 LearningRate 0.0208 Epoch: 10 Global Step: 55000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:53:57,740-Speed 3414.70 samples/sec Loss 5.0042 LearningRate 0.0208 Epoch: 10 Global Step: 55010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:54:00,764-Speed 3386.61 samples/sec Loss 4.8910 LearningRate 0.0208 Epoch: 10 Global Step: 55020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:54:03,803-Speed 3370.81 samples/sec Loss 4.9466 LearningRate 0.0208 Epoch: 10 Global Step: 55030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:54:06,829-Speed 3384.43 samples/sec Loss 4.8579 LearningRate 0.0208 Epoch: 10 Global Step: 55040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:54:09,835-Speed 3408.08 samples/sec Loss 4.8612 LearningRate 0.0208 Epoch: 10 Global Step: 55050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:54:12,835-Speed 3413.21 samples/sec Loss 4.8955 LearningRate 0.0208 Epoch: 10 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:15,842-Speed 3407.01 samples/sec Loss 4.8117 LearningRate 0.0208 Epoch: 10 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:18,845-Speed 3410.60 samples/sec Loss 4.9069 LearningRate 0.0207 Epoch: 10 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:21,852-Speed 3406.58 samples/sec Loss 4.9205 LearningRate 0.0207 Epoch: 10 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:24,852-Speed 3414.61 samples/sec Loss 4.8632 LearningRate 0.0207 Epoch: 10 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:27,853-Speed 3411.91 samples/sec Loss 4.9245 LearningRate 0.0207 Epoch: 10 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:30,866-Speed 3399.91 samples/sec Loss 4.8847 LearningRate 0.0207 Epoch: 10 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:33,865-Speed 3415.37 samples/sec Loss 4.7136 LearningRate 0.0207 Epoch: 10 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:36,866-Speed 3413.24 samples/sec Loss 4.8232 LearningRate 0.0207 Epoch: 10 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:39,866-Speed 3413.89 samples/sec Loss 4.9712 LearningRate 0.0207 Epoch: 10 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:42,851-Speed 3431.26 samples/sec Loss 4.8719 LearningRate 0.0207 Epoch: 10 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:45,855-Speed 3410.12 samples/sec Loss 4.9338 LearningRate 0.0207 Epoch: 10 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:48,860-Speed 3407.78 samples/sec Loss 4.7833 LearningRate 0.0207 Epoch: 10 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:51,901-Speed 3368.76 samples/sec Loss 4.8900 LearningRate 0.0207 Epoch: 10 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:54,957-Speed 3351.86 samples/sec Loss 4.8128 LearningRate 0.0206 Epoch: 10 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:54:57,960-Speed 3409.76 samples/sec Loss 4.9896 LearningRate 0.0206 Epoch: 10 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:00,964-Speed 3409.75 samples/sec Loss 4.8681 LearningRate 0.0206 Epoch: 10 Global Step: 55220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:03,969-Speed 3408.40 samples/sec Loss 4.8019 LearningRate 0.0206 Epoch: 10 Global Step: 55230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:06,978-Speed 3403.81 samples/sec Loss 4.7456 LearningRate 0.0206 Epoch: 10 Global Step: 55240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:09,985-Speed 3407.12 samples/sec Loss 4.8916 LearningRate 0.0206 Epoch: 10 Global Step: 55250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:12,991-Speed 3407.09 samples/sec Loss 4.8679 LearningRate 0.0206 Epoch: 10 Global Step: 55260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:15,996-Speed 3408.45 samples/sec Loss 4.8985 LearningRate 0.0206 Epoch: 10 Global Step: 55270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:19,001-Speed 3409.11 samples/sec Loss 4.8496 LearningRate 0.0206 Epoch: 10 Global Step: 55280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:22,011-Speed 3402.13 samples/sec Loss 4.8295 LearningRate 0.0206 Epoch: 10 Global Step: 55290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:25,013-Speed 3411.93 samples/sec Loss 4.9026 LearningRate 0.0206 Epoch: 10 Global Step: 55300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:28,026-Speed 3399.98 samples/sec Loss 4.8760 LearningRate 0.0205 Epoch: 10 Global Step: 55310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:55:31,038-Speed 3399.98 samples/sec Loss 4.7819 LearningRate 0.0205 Epoch: 10 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:34,048-Speed 3402.81 samples/sec Loss 4.8855 LearningRate 0.0205 Epoch: 10 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:37,053-Speed 3408.80 samples/sec Loss 4.6846 LearningRate 0.0205 Epoch: 10 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:40,068-Speed 3397.01 samples/sec Loss 4.8291 LearningRate 0.0205 Epoch: 10 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:43,070-Speed 3411.81 samples/sec Loss 4.8128 LearningRate 0.0205 Epoch: 10 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:46,075-Speed 3408.44 samples/sec Loss 4.8288 LearningRate 0.0205 Epoch: 10 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:49,082-Speed 3406.81 samples/sec Loss 4.8678 LearningRate 0.0205 Epoch: 10 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:52,083-Speed 3413.52 samples/sec Loss 4.7364 LearningRate 0.0205 Epoch: 10 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:55,088-Speed 3407.75 samples/sec Loss 4.9947 LearningRate 0.0205 Epoch: 10 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:55:58,105-Speed 3395.72 samples/sec Loss 4.8960 LearningRate 0.0205 Epoch: 10 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:01,114-Speed 3402.93 samples/sec Loss 4.7707 LearningRate 0.0204 Epoch: 10 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:56:04,124-Speed 3403.77 samples/sec Loss 4.9292 LearningRate 0.0204 Epoch: 10 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:07,137-Speed 3399.34 samples/sec Loss 4.8450 LearningRate 0.0204 Epoch: 10 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:10,137-Speed 3414.16 samples/sec Loss 4.7418 LearningRate 0.0204 Epoch: 10 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:13,138-Speed 3413.50 samples/sec Loss 5.0357 LearningRate 0.0204 Epoch: 10 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:16,167-Speed 3381.00 samples/sec Loss 4.8518 LearningRate 0.0204 Epoch: 10 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:19,173-Speed 3407.62 samples/sec Loss 4.7117 LearningRate 0.0204 Epoch: 10 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:22,174-Speed 3412.52 samples/sec Loss 4.7688 LearningRate 0.0204 Epoch: 10 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:25,177-Speed 3410.62 samples/sec Loss 4.7740 LearningRate 0.0204 Epoch: 10 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:28,182-Speed 3408.71 samples/sec Loss 4.9218 LearningRate 0.0204 Epoch: 10 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:31,213-Speed 3379.57 samples/sec Loss 4.9136 LearningRate 0.0204 Epoch: 10 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:34,233-Speed 3391.02 samples/sec Loss 4.8606 LearningRate 0.0203 Epoch: 10 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:56:37,233-Speed 3415.16 samples/sec Loss 4.8291 LearningRate 0.0203 Epoch: 10 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:40,238-Speed 3407.70 samples/sec Loss 4.7792 LearningRate 0.0203 Epoch: 10 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:43,244-Speed 3407.85 samples/sec Loss 4.8718 LearningRate 0.0203 Epoch: 10 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:56:46,240-Speed 3418.73 samples/sec Loss 4.9328 LearningRate 0.0203 Epoch: 10 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:56:49,256-Speed 3396.61 samples/sec Loss 4.8264 LearningRate 0.0203 Epoch: 10 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:56:52,278-Speed 3388.64 samples/sec Loss 4.9036 LearningRate 0.0203 Epoch: 10 Global Step: 55590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:56:55,288-Speed 3403.57 samples/sec Loss 4.7523 LearningRate 0.0203 Epoch: 10 Global Step: 55600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:56:58,299-Speed 3401.30 samples/sec Loss 4.8458 LearningRate 0.0203 Epoch: 10 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:01,304-Speed 3407.68 samples/sec Loss 4.6797 LearningRate 0.0203 Epoch: 10 Global Step: 55620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:04,423-Speed 3284.17 samples/sec Loss 4.8438 LearningRate 0.0203 Epoch: 10 Global Step: 55630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:16,898-Speed 820.95 samples/sec Loss 4.6885 LearningRate 0.0202 Epoch: 11 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:19,978-Speed 3325.46 samples/sec Loss 4.1221 LearningRate 0.0202 Epoch: 11 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:22,989-Speed 3402.21 samples/sec Loss 4.0773 LearningRate 0.0202 Epoch: 11 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 04:57:26,015-Speed 3384.65 samples/sec Loss 4.0264 LearningRate 0.0202 Epoch: 11 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:29,046-Speed 3379.32 samples/sec Loss 3.9786 LearningRate 0.0202 Epoch: 11 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:32,084-Speed 3371.17 samples/sec Loss 4.1270 LearningRate 0.0202 Epoch: 11 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:35,086-Speed 3411.81 samples/sec Loss 4.0034 LearningRate 0.0202 Epoch: 11 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:38,117-Speed 3379.07 samples/sec Loss 4.0125 LearningRate 0.0202 Epoch: 11 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:41,168-Speed 3356.56 samples/sec Loss 4.1319 LearningRate 0.0202 Epoch: 11 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:44,172-Speed 3410.32 samples/sec Loss 3.9433 LearningRate 0.0202 Epoch: 11 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:47,179-Speed 3406.51 samples/sec Loss 4.0760 LearningRate 0.0202 Epoch: 11 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:50,218-Speed 3371.03 samples/sec Loss 4.1344 LearningRate 0.0202 Epoch: 11 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:53,258-Speed 3369.26 samples/sec Loss 4.0797 LearningRate 0.0201 Epoch: 11 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:57:56,276-Speed 3393.66 samples/sec Loss 4.0003 LearningRate 0.0201 Epoch: 11 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 04:57:59,268-Speed 3424.08 samples/sec Loss 4.2533 LearningRate 0.0201 Epoch: 11 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:02,452-Speed 3216.41 samples/sec Loss 4.1312 LearningRate 0.0201 Epoch: 11 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:05,461-Speed 3404.49 samples/sec Loss 3.9752 LearningRate 0.0201 Epoch: 11 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:08,484-Speed 3388.08 samples/sec Loss 4.0308 LearningRate 0.0201 Epoch: 11 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:11,502-Speed 3393.25 samples/sec Loss 4.1331 LearningRate 0.0201 Epoch: 11 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:14,506-Speed 3410.57 samples/sec Loss 4.1713 LearningRate 0.0201 Epoch: 11 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:17,519-Speed 3398.51 samples/sec Loss 4.1685 LearningRate 0.0201 Epoch: 11 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:20,524-Speed 3409.36 samples/sec Loss 4.0565 LearningRate 0.0201 Epoch: 11 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:23,552-Speed 3381.89 samples/sec Loss 4.0859 LearningRate 0.0201 Epoch: 11 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:26,579-Speed 3384.47 samples/sec Loss 4.1603 LearningRate 0.0200 Epoch: 11 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:29,568-Speed 3426.86 samples/sec Loss 4.0539 LearningRate 0.0200 Epoch: 11 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:32,590-Speed 3388.90 samples/sec Loss 4.1081 LearningRate 0.0200 Epoch: 11 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:35,603-Speed 3399.84 samples/sec Loss 4.1405 LearningRate 0.0200 Epoch: 11 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:38,646-Speed 3366.19 samples/sec Loss 4.1047 LearningRate 0.0200 Epoch: 11 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:41,672-Speed 3383.97 samples/sec Loss 4.1438 LearningRate 0.0200 Epoch: 11 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:44,699-Speed 3384.28 samples/sec Loss 4.1317 LearningRate 0.0200 Epoch: 11 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:47,712-Speed 3399.25 samples/sec Loss 4.1339 LearningRate 0.0200 Epoch: 11 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:50,722-Speed 3403.30 samples/sec Loss 4.1639 LearningRate 0.0200 Epoch: 11 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:53,732-Speed 3402.60 samples/sec Loss 4.3655 LearningRate 0.0200 Epoch: 11 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:56,778-Speed 3362.81 samples/sec Loss 4.1231 LearningRate 0.0200 Epoch: 11 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:58:59,776-Speed 3416.65 samples/sec Loss 4.2203 LearningRate 0.0199 Epoch: 11 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:59:02,790-Speed 3397.20 samples/sec Loss 4.2684 LearningRate 0.0199 Epoch: 11 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:59:05,798-Speed 3405.60 samples/sec Loss 4.2431 LearningRate 0.0199 Epoch: 11 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 04:59:50,321-[lfw][56000]XNorm: 21.719909 Training: 2022-04-11 04:59:50,322-[lfw][56000]Accuracy-Flip: 0.99800+-0.00221 Training: 2022-04-11 04:59:50,322-[lfw][56000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:00:42,019-[cfp_fp][56000]XNorm: 20.101255 Training: 2022-04-11 05:00:42,020-[cfp_fp][56000]Accuracy-Flip: 0.97671+-0.00878 Training: 2022-04-11 05:00:42,020-[cfp_fp][56000]Accuracy-Highest: 0.97757 Training: 2022-04-11 05:01:26,358-[agedb_30][56000]XNorm: 21.878597 Training: 2022-04-11 05:01:26,358-[agedb_30][56000]Accuracy-Flip: 0.97833+-0.00745 Training: 2022-04-11 05:01:26,359-[agedb_30][56000]Accuracy-Highest: 0.98083 Training: 2022-04-11 05:01:29,379-Speed 71.32 samples/sec Loss 4.2396 LearningRate 0.0199 Epoch: 11 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:32,393-Speed 3398.37 samples/sec Loss 4.2381 LearningRate 0.0199 Epoch: 11 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:35,394-Speed 3412.82 samples/sec Loss 4.1980 LearningRate 0.0199 Epoch: 11 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:38,477-Speed 3322.56 samples/sec Loss 4.3040 LearningRate 0.0199 Epoch: 11 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:41,486-Speed 3403.73 samples/sec Loss 4.2613 LearningRate 0.0199 Epoch: 11 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:44,477-Speed 3424.45 samples/sec Loss 4.2558 LearningRate 0.0199 Epoch: 11 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:47,473-Speed 3418.56 samples/sec Loss 4.2568 LearningRate 0.0199 Epoch: 11 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:50,481-Speed 3405.39 samples/sec Loss 4.2845 LearningRate 0.0199 Epoch: 11 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 05:01:53,471-Speed 3425.59 samples/sec Loss 4.1836 LearningRate 0.0198 Epoch: 11 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:56,470-Speed 3415.66 samples/sec Loss 4.3173 LearningRate 0.0198 Epoch: 11 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:01:59,478-Speed 3404.68 samples/sec Loss 4.2291 LearningRate 0.0198 Epoch: 11 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:02,489-Speed 3401.42 samples/sec Loss 4.2700 LearningRate 0.0198 Epoch: 11 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:05,489-Speed 3414.45 samples/sec Loss 4.2655 LearningRate 0.0198 Epoch: 11 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:08,535-Speed 3362.99 samples/sec Loss 4.2271 LearningRate 0.0198 Epoch: 11 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:11,522-Speed 3428.87 samples/sec Loss 4.1310 LearningRate 0.0198 Epoch: 11 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:14,533-Speed 3401.23 samples/sec Loss 4.2395 LearningRate 0.0198 Epoch: 11 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:17,555-Speed 3389.49 samples/sec Loss 4.2886 LearningRate 0.0198 Epoch: 11 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:20,560-Speed 3408.99 samples/sec Loss 4.2615 LearningRate 0.0198 Epoch: 11 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:23,627-Speed 3339.70 samples/sec Loss 4.2756 LearningRate 0.0198 Epoch: 11 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:26,667-Speed 3369.54 samples/sec Loss 4.3767 LearningRate 0.0198 Epoch: 11 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:29,677-Speed 3402.31 samples/sec Loss 4.3654 LearningRate 0.0197 Epoch: 11 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:32,687-Speed 3403.46 samples/sec Loss 4.3217 LearningRate 0.0197 Epoch: 11 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:35,701-Speed 3398.18 samples/sec Loss 4.2855 LearningRate 0.0197 Epoch: 11 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:38,710-Speed 3403.34 samples/sec Loss 4.1744 LearningRate 0.0197 Epoch: 11 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:02:41,715-Speed 3408.19 samples/sec Loss 4.1882 LearningRate 0.0197 Epoch: 11 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:44,723-Speed 3404.99 samples/sec Loss 4.3331 LearningRate 0.0197 Epoch: 11 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:47,753-Speed 3381.76 samples/sec Loss 4.3084 LearningRate 0.0197 Epoch: 11 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:50,765-Speed 3399.96 samples/sec Loss 4.1891 LearningRate 0.0197 Epoch: 11 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:53,772-Speed 3407.08 samples/sec Loss 4.2287 LearningRate 0.0197 Epoch: 11 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:56,777-Speed 3407.78 samples/sec Loss 4.3983 LearningRate 0.0197 Epoch: 11 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:02:59,789-Speed 3400.08 samples/sec Loss 4.4246 LearningRate 0.0197 Epoch: 11 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:03:02,811-Speed 3390.26 samples/sec Loss 4.2697 LearningRate 0.0196 Epoch: 11 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:03:05,818-Speed 3405.25 samples/sec Loss 4.2145 LearningRate 0.0196 Epoch: 11 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:03:08,810-Speed 3423.52 samples/sec Loss 4.2014 LearningRate 0.0196 Epoch: 11 Global Step: 56340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:11,817-Speed 3406.07 samples/sec Loss 4.2385 LearningRate 0.0196 Epoch: 11 Global Step: 56350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:14,823-Speed 3407.60 samples/sec Loss 4.2950 LearningRate 0.0196 Epoch: 11 Global Step: 56360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:17,950-Speed 3275.69 samples/sec Loss 4.3712 LearningRate 0.0196 Epoch: 11 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:20,953-Speed 3411.26 samples/sec Loss 4.4364 LearningRate 0.0196 Epoch: 11 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:23,952-Speed 3415.15 samples/sec Loss 4.3008 LearningRate 0.0196 Epoch: 11 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:26,963-Speed 3402.22 samples/sec Loss 4.2950 LearningRate 0.0196 Epoch: 11 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:29,969-Speed 3407.28 samples/sec Loss 4.1714 LearningRate 0.0196 Epoch: 11 Global Step: 56410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:32,966-Speed 3417.03 samples/sec Loss 4.3942 LearningRate 0.0196 Epoch: 11 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:35,983-Speed 3395.09 samples/sec Loss 4.2936 LearningRate 0.0196 Epoch: 11 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:39,006-Speed 3387.85 samples/sec Loss 4.2772 LearningRate 0.0195 Epoch: 11 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:03:41,990-Speed 3432.27 samples/sec Loss 4.3985 LearningRate 0.0195 Epoch: 11 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:44,988-Speed 3417.22 samples/sec Loss 4.5634 LearningRate 0.0195 Epoch: 11 Global Step: 56460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:47,992-Speed 3410.26 samples/sec Loss 4.3801 LearningRate 0.0195 Epoch: 11 Global Step: 56470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:51,005-Speed 3399.28 samples/sec Loss 4.3480 LearningRate 0.0195 Epoch: 11 Global Step: 56480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:54,005-Speed 3414.47 samples/sec Loss 4.2054 LearningRate 0.0195 Epoch: 11 Global Step: 56490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:03:57,008-Speed 3409.87 samples/sec Loss 4.2949 LearningRate 0.0195 Epoch: 11 Global Step: 56500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:00,011-Speed 3411.13 samples/sec Loss 4.4257 LearningRate 0.0195 Epoch: 11 Global Step: 56510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:03,073-Speed 3344.76 samples/sec Loss 4.4419 LearningRate 0.0195 Epoch: 11 Global Step: 56520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:06,164-Speed 3313.95 samples/sec Loss 4.3958 LearningRate 0.0195 Epoch: 11 Global Step: 56530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:09,169-Speed 3408.70 samples/sec Loss 4.3965 LearningRate 0.0195 Epoch: 11 Global Step: 56540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:12,182-Speed 3399.94 samples/sec Loss 4.3161 LearningRate 0.0194 Epoch: 11 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:04:15,197-Speed 3397.11 samples/sec Loss 4.4084 LearningRate 0.0194 Epoch: 11 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:04:18,213-Speed 3396.34 samples/sec Loss 4.4466 LearningRate 0.0194 Epoch: 11 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:04:21,216-Speed 3410.28 samples/sec Loss 4.4210 LearningRate 0.0194 Epoch: 11 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:04:24,217-Speed 3413.21 samples/sec Loss 4.5139 LearningRate 0.0194 Epoch: 11 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:04:27,211-Speed 3420.22 samples/sec Loss 4.5054 LearningRate 0.0194 Epoch: 11 Global Step: 56600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:30,213-Speed 3412.57 samples/sec Loss 4.3029 LearningRate 0.0194 Epoch: 11 Global Step: 56610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:33,227-Speed 3398.60 samples/sec Loss 4.3584 LearningRate 0.0194 Epoch: 11 Global Step: 56620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:36,273-Speed 3362.58 samples/sec Loss 4.3813 LearningRate 0.0194 Epoch: 11 Global Step: 56630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:39,295-Speed 3389.49 samples/sec Loss 4.3952 LearningRate 0.0194 Epoch: 11 Global Step: 56640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:42,299-Speed 3409.49 samples/sec Loss 4.4782 LearningRate 0.0194 Epoch: 11 Global Step: 56650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:45,304-Speed 3407.88 samples/sec Loss 4.2554 LearningRate 0.0194 Epoch: 11 Global Step: 56660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:48,322-Speed 3394.38 samples/sec Loss 4.4543 LearningRate 0.0193 Epoch: 11 Global Step: 56670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:51,335-Speed 3399.42 samples/sec Loss 4.3623 LearningRate 0.0193 Epoch: 11 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:54,336-Speed 3413.71 samples/sec Loss 4.3190 LearningRate 0.0193 Epoch: 11 Global Step: 56690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:04:57,347-Speed 3402.04 samples/sec Loss 4.3282 LearningRate 0.0193 Epoch: 11 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:00,355-Speed 3404.99 samples/sec Loss 4.3701 LearningRate 0.0193 Epoch: 11 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:03,359-Speed 3409.17 samples/sec Loss 4.3732 LearningRate 0.0193 Epoch: 11 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:06,369-Speed 3403.25 samples/sec Loss 4.5143 LearningRate 0.0193 Epoch: 11 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:09,375-Speed 3406.85 samples/sec Loss 4.3155 LearningRate 0.0193 Epoch: 11 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:12,363-Speed 3428.42 samples/sec Loss 4.3528 LearningRate 0.0193 Epoch: 11 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:15,364-Speed 3412.53 samples/sec Loss 4.3621 LearningRate 0.0193 Epoch: 11 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:18,366-Speed 3412.40 samples/sec Loss 4.4782 LearningRate 0.0193 Epoch: 11 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:21,365-Speed 3415.11 samples/sec Loss 4.4053 LearningRate 0.0192 Epoch: 11 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:24,367-Speed 3412.35 samples/sec Loss 4.3950 LearningRate 0.0192 Epoch: 11 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:27,374-Speed 3406.16 samples/sec Loss 4.3841 LearningRate 0.0192 Epoch: 11 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:30,420-Speed 3363.03 samples/sec Loss 4.5437 LearningRate 0.0192 Epoch: 11 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:33,439-Speed 3392.46 samples/sec Loss 4.4345 LearningRate 0.0192 Epoch: 11 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:36,450-Speed 3401.17 samples/sec Loss 4.4270 LearningRate 0.0192 Epoch: 11 Global Step: 56830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:39,467-Speed 3395.75 samples/sec Loss 4.5089 LearningRate 0.0192 Epoch: 11 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:42,495-Speed 3382.27 samples/sec Loss 4.3224 LearningRate 0.0192 Epoch: 11 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:05:45,488-Speed 3422.81 samples/sec Loss 4.3907 LearningRate 0.0192 Epoch: 11 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:48,490-Speed 3412.15 samples/sec Loss 4.4893 LearningRate 0.0192 Epoch: 11 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:51,499-Speed 3403.63 samples/sec Loss 4.4472 LearningRate 0.0192 Epoch: 11 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:54,507-Speed 3404.54 samples/sec Loss 4.3078 LearningRate 0.0192 Epoch: 11 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:05:57,512-Speed 3408.86 samples/sec Loss 4.4905 LearningRate 0.0191 Epoch: 11 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:00,538-Speed 3384.69 samples/sec Loss 4.4118 LearningRate 0.0191 Epoch: 11 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:03,544-Speed 3407.04 samples/sec Loss 4.3694 LearningRate 0.0191 Epoch: 11 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:06,551-Speed 3406.31 samples/sec Loss 4.4792 LearningRate 0.0191 Epoch: 11 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:09,553-Speed 3412.96 samples/sec Loss 4.4489 LearningRate 0.0191 Epoch: 11 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:12,558-Speed 3407.90 samples/sec Loss 4.3845 LearningRate 0.0191 Epoch: 11 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:06:15,569-Speed 3401.20 samples/sec Loss 4.4541 LearningRate 0.0191 Epoch: 11 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:18,581-Speed 3401.34 samples/sec Loss 4.2806 LearningRate 0.0191 Epoch: 11 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:21,595-Speed 3397.73 samples/sec Loss 4.4246 LearningRate 0.0191 Epoch: 11 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:24,602-Speed 3406.54 samples/sec Loss 4.3246 LearningRate 0.0191 Epoch: 11 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:27,604-Speed 3411.42 samples/sec Loss 4.4539 LearningRate 0.0191 Epoch: 11 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:30,615-Speed 3401.65 samples/sec Loss 4.5609 LearningRate 0.0190 Epoch: 11 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:33,620-Speed 3408.59 samples/sec Loss 4.4812 LearningRate 0.0190 Epoch: 11 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:36,641-Speed 3391.26 samples/sec Loss 4.4086 LearningRate 0.0190 Epoch: 11 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:39,649-Speed 3405.36 samples/sec Loss 4.5055 LearningRate 0.0190 Epoch: 11 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:42,735-Speed 3318.18 samples/sec Loss 4.4462 LearningRate 0.0190 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:45,722-Speed 3429.65 samples/sec Loss 4.4254 LearningRate 0.0190 Epoch: 11 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:48,729-Speed 3406.47 samples/sec Loss 4.4084 LearningRate 0.0190 Epoch: 11 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:51,734-Speed 3407.47 samples/sec Loss 4.3905 LearningRate 0.0190 Epoch: 11 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:54,821-Speed 3319.01 samples/sec Loss 4.4866 LearningRate 0.0190 Epoch: 11 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:06:57,828-Speed 3406.01 samples/sec Loss 4.4860 LearningRate 0.0190 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:00,841-Speed 3399.06 samples/sec Loss 4.3599 LearningRate 0.0190 Epoch: 11 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:03,848-Speed 3406.58 samples/sec Loss 4.5689 LearningRate 0.0190 Epoch: 11 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:06,867-Speed 3392.52 samples/sec Loss 4.3848 LearningRate 0.0189 Epoch: 11 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:09,855-Speed 3428.64 samples/sec Loss 4.4509 LearningRate 0.0189 Epoch: 11 Global Step: 57140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:12,894-Speed 3369.62 samples/sec Loss 4.3924 LearningRate 0.0189 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:15,899-Speed 3408.86 samples/sec Loss 4.3734 LearningRate 0.0189 Epoch: 11 Global Step: 57160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:18,918-Speed 3393.17 samples/sec Loss 4.5187 LearningRate 0.0189 Epoch: 11 Global Step: 57170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:21,925-Speed 3405.23 samples/sec Loss 4.4403 LearningRate 0.0189 Epoch: 11 Global Step: 57180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:24,936-Speed 3401.78 samples/sec Loss 4.4377 LearningRate 0.0189 Epoch: 11 Global Step: 57190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:27,960-Speed 3388.09 samples/sec Loss 4.5336 LearningRate 0.0189 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:30,972-Speed 3399.58 samples/sec Loss 4.4923 LearningRate 0.0189 Epoch: 11 Global Step: 57210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:33,978-Speed 3408.16 samples/sec Loss 4.4126 LearningRate 0.0189 Epoch: 11 Global Step: 57220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:36,991-Speed 3399.80 samples/sec Loss 4.4540 LearningRate 0.0189 Epoch: 11 Global Step: 57230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:07:40,004-Speed 3399.12 samples/sec Loss 4.3353 LearningRate 0.0188 Epoch: 11 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:43,009-Speed 3408.22 samples/sec Loss 4.5009 LearningRate 0.0188 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:46,026-Speed 3395.37 samples/sec Loss 4.5063 LearningRate 0.0188 Epoch: 11 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:49,033-Speed 3406.72 samples/sec Loss 4.4586 LearningRate 0.0188 Epoch: 11 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:52,061-Speed 3381.77 samples/sec Loss 4.2834 LearningRate 0.0188 Epoch: 11 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:55,066-Speed 3408.60 samples/sec Loss 4.4610 LearningRate 0.0188 Epoch: 11 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:07:58,076-Speed 3402.76 samples/sec Loss 4.4843 LearningRate 0.0188 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 05:08:01,065-Speed 3426.95 samples/sec Loss 4.3609 LearningRate 0.0188 Epoch: 11 Global Step: 57310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:08:04,072-Speed 3406.13 samples/sec Loss 4.4516 LearningRate 0.0188 Epoch: 11 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:08:07,082-Speed 3403.80 samples/sec Loss 4.4454 LearningRate 0.0188 Epoch: 11 Global Step: 57330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:08:10,086-Speed 3409.22 samples/sec Loss 4.3890 LearningRate 0.0188 Epoch: 11 Global Step: 57340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:08:13,090-Speed 3409.31 samples/sec Loss 4.6032 LearningRate 0.0188 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 05:08:16,093-Speed 3411.38 samples/sec Loss 4.5290 LearningRate 0.0187 Epoch: 11 Global Step: 57360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:08:19,118-Speed 3385.39 samples/sec Loss 4.5299 LearningRate 0.0187 Epoch: 11 Global Step: 57370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:08:22,124-Speed 3408.05 samples/sec Loss 4.3301 LearningRate 0.0187 Epoch: 11 Global Step: 57380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:08:25,132-Speed 3404.88 samples/sec Loss 4.5225 LearningRate 0.0187 Epoch: 11 Global Step: 57390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:08:28,161-Speed 3381.18 samples/sec Loss 4.5029 LearningRate 0.0187 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:08:31,166-Speed 3407.92 samples/sec Loss 4.4157 LearningRate 0.0187 Epoch: 11 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:34,168-Speed 3412.19 samples/sec Loss 4.4551 LearningRate 0.0187 Epoch: 11 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:37,172-Speed 3410.07 samples/sec Loss 4.4745 LearningRate 0.0187 Epoch: 11 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:40,175-Speed 3411.41 samples/sec Loss 4.5707 LearningRate 0.0187 Epoch: 11 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:43,182-Speed 3406.21 samples/sec Loss 4.4927 LearningRate 0.0187 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:46,187-Speed 3407.88 samples/sec Loss 4.4812 LearningRate 0.0187 Epoch: 11 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:49,197-Speed 3402.67 samples/sec Loss 4.4224 LearningRate 0.0187 Epoch: 11 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:52,201-Speed 3409.83 samples/sec Loss 4.5773 LearningRate 0.0186 Epoch: 11 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:55,209-Speed 3404.60 samples/sec Loss 4.4614 LearningRate 0.0186 Epoch: 11 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:08:58,214-Speed 3409.45 samples/sec Loss 4.5686 LearningRate 0.0186 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:01,203-Speed 3426.27 samples/sec Loss 4.4939 LearningRate 0.0186 Epoch: 11 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:04,208-Speed 3409.61 samples/sec Loss 4.5030 LearningRate 0.0186 Epoch: 11 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:07,214-Speed 3406.48 samples/sec Loss 4.5574 LearningRate 0.0186 Epoch: 11 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:10,216-Speed 3411.58 samples/sec Loss 4.3043 LearningRate 0.0186 Epoch: 11 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:13,223-Speed 3406.78 samples/sec Loss 4.4808 LearningRate 0.0186 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:16,235-Speed 3400.08 samples/sec Loss 4.3531 LearningRate 0.0186 Epoch: 11 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:19,253-Speed 3394.22 samples/sec Loss 4.4620 LearningRate 0.0186 Epoch: 11 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:22,258-Speed 3408.28 samples/sec Loss 4.5113 LearningRate 0.0186 Epoch: 11 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:25,271-Speed 3399.66 samples/sec Loss 4.5156 LearningRate 0.0186 Epoch: 11 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:28,371-Speed 3303.71 samples/sec Loss 4.5323 LearningRate 0.0185 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:31,362-Speed 3424.62 samples/sec Loss 4.3789 LearningRate 0.0185 Epoch: 11 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:34,377-Speed 3397.86 samples/sec Loss 4.4469 LearningRate 0.0185 Epoch: 11 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:37,386-Speed 3404.07 samples/sec Loss 4.5326 LearningRate 0.0185 Epoch: 11 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:40,394-Speed 3404.81 samples/sec Loss 4.4514 LearningRate 0.0185 Epoch: 11 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:43,396-Speed 3412.56 samples/sec Loss 4.5952 LearningRate 0.0185 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:46,412-Speed 3395.87 samples/sec Loss 4.4392 LearningRate 0.0185 Epoch: 11 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:49,425-Speed 3399.43 samples/sec Loss 4.5251 LearningRate 0.0185 Epoch: 11 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:52,430-Speed 3408.43 samples/sec Loss 4.5268 LearningRate 0.0185 Epoch: 11 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:55,439-Speed 3403.42 samples/sec Loss 4.4197 LearningRate 0.0185 Epoch: 11 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:09:58,468-Speed 3381.62 samples/sec Loss 4.4531 LearningRate 0.0185 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:01,452-Speed 3433.19 samples/sec Loss 4.5501 LearningRate 0.0184 Epoch: 11 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:04,469-Speed 3394.98 samples/sec Loss 4.5284 LearningRate 0.0184 Epoch: 11 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:07,483-Speed 3397.71 samples/sec Loss 4.5132 LearningRate 0.0184 Epoch: 11 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:10,488-Speed 3409.00 samples/sec Loss 4.5968 LearningRate 0.0184 Epoch: 11 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:13,494-Speed 3407.21 samples/sec Loss 4.4718 LearningRate 0.0184 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:16,501-Speed 3405.92 samples/sec Loss 4.4806 LearningRate 0.0184 Epoch: 11 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:19,516-Speed 3397.12 samples/sec Loss 4.5717 LearningRate 0.0184 Epoch: 11 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:22,502-Speed 3430.73 samples/sec Loss 4.4682 LearningRate 0.0184 Epoch: 11 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:25,525-Speed 3387.94 samples/sec Loss 4.4632 LearningRate 0.0184 Epoch: 11 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:28,536-Speed 3402.42 samples/sec Loss 4.4963 LearningRate 0.0184 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:31,541-Speed 3407.73 samples/sec Loss 4.4629 LearningRate 0.0184 Epoch: 11 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:34,549-Speed 3406.34 samples/sec Loss 4.3940 LearningRate 0.0184 Epoch: 11 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:37,560-Speed 3400.49 samples/sec Loss 4.4972 LearningRate 0.0183 Epoch: 11 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:40,576-Speed 3396.25 samples/sec Loss 4.6557 LearningRate 0.0183 Epoch: 11 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:43,579-Speed 3411.09 samples/sec Loss 4.4602 LearningRate 0.0183 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:46,584-Speed 3408.57 samples/sec Loss 4.4146 LearningRate 0.0183 Epoch: 11 Global Step: 57860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:49,603-Speed 3392.20 samples/sec Loss 4.4835 LearningRate 0.0183 Epoch: 11 Global Step: 57870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:10:52,621-Speed 3394.16 samples/sec Loss 4.4803 LearningRate 0.0183 Epoch: 11 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:55,637-Speed 3396.31 samples/sec Loss 4.4999 LearningRate 0.0183 Epoch: 11 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:10:58,642-Speed 3408.07 samples/sec Loss 4.6006 LearningRate 0.0183 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:01,653-Speed 3401.72 samples/sec Loss 4.4201 LearningRate 0.0183 Epoch: 11 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:04,683-Speed 3380.99 samples/sec Loss 4.4859 LearningRate 0.0183 Epoch: 11 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:07,694-Speed 3401.69 samples/sec Loss 4.5690 LearningRate 0.0183 Epoch: 11 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:10,699-Speed 3408.21 samples/sec Loss 4.5675 LearningRate 0.0183 Epoch: 11 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:13,711-Speed 3400.42 samples/sec Loss 4.5181 LearningRate 0.0182 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:16,807-Speed 3308.13 samples/sec Loss 4.5300 LearningRate 0.0182 Epoch: 11 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:19,818-Speed 3402.06 samples/sec Loss 4.5744 LearningRate 0.0182 Epoch: 11 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:11:22,825-Speed 3406.06 samples/sec Loss 4.4946 LearningRate 0.0182 Epoch: 11 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:11:25,827-Speed 3411.72 samples/sec Loss 4.5741 LearningRate 0.0182 Epoch: 11 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:11:28,818-Speed 3425.49 samples/sec Loss 4.5162 LearningRate 0.0182 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:12:13,340-[lfw][58000]XNorm: 23.376417 Training: 2022-04-11 05:12:13,340-[lfw][58000]Accuracy-Flip: 0.99850+-0.00203 Training: 2022-04-11 05:12:13,341-[lfw][58000]Accuracy-Highest: 0.99850 Training: 2022-04-11 05:13:04,896-[cfp_fp][58000]XNorm: 21.504341 Training: 2022-04-11 05:13:04,897-[cfp_fp][58000]Accuracy-Flip: 0.97900+-0.00661 Training: 2022-04-11 05:13:04,897-[cfp_fp][58000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:13:49,111-[agedb_30][58000]XNorm: 23.291820 Training: 2022-04-11 05:13:49,112-[agedb_30][58000]Accuracy-Flip: 0.98083+-0.00534 Training: 2022-04-11 05:13:49,112-[agedb_30][58000]Accuracy-Highest: 0.98083 Training: 2022-04-11 05:13:52,127-Speed 71.45 samples/sec Loss 4.4707 LearningRate 0.0182 Epoch: 11 Global Step: 58010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:13:55,115-Speed 3427.34 samples/sec Loss 4.6467 LearningRate 0.0182 Epoch: 11 Global Step: 58020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:13:58,101-Speed 3429.72 samples/sec Loss 4.5720 LearningRate 0.0182 Epoch: 11 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:01,090-Speed 3427.56 samples/sec Loss 4.4535 LearningRate 0.0182 Epoch: 11 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:04,199-Speed 3294.29 samples/sec Loss 4.4768 LearningRate 0.0182 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:07,196-Speed 3417.63 samples/sec Loss 4.5642 LearningRate 0.0182 Epoch: 11 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:10,209-Speed 3399.06 samples/sec Loss 4.5565 LearningRate 0.0181 Epoch: 11 Global Step: 58070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:13,202-Speed 3422.07 samples/sec Loss 4.4774 LearningRate 0.0181 Epoch: 11 Global Step: 58080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:16,192-Speed 3426.37 samples/sec Loss 4.6013 LearningRate 0.0181 Epoch: 11 Global Step: 58090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:19,200-Speed 3405.47 samples/sec Loss 4.3911 LearningRate 0.0181 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:22,193-Speed 3421.46 samples/sec Loss 4.4952 LearningRate 0.0181 Epoch: 11 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:14:25,191-Speed 3417.29 samples/sec Loss 4.5046 LearningRate 0.0181 Epoch: 11 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:14:28,171-Speed 3436.79 samples/sec Loss 4.6381 LearningRate 0.0181 Epoch: 11 Global Step: 58130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:31,178-Speed 3405.98 samples/sec Loss 4.5160 LearningRate 0.0181 Epoch: 11 Global Step: 58140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:34,181-Speed 3410.17 samples/sec Loss 4.4921 LearningRate 0.0181 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:37,187-Speed 3407.44 samples/sec Loss 4.5084 LearningRate 0.0181 Epoch: 11 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:40,197-Speed 3402.73 samples/sec Loss 4.4902 LearningRate 0.0181 Epoch: 11 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:43,212-Speed 3397.44 samples/sec Loss 4.4080 LearningRate 0.0181 Epoch: 11 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:46,210-Speed 3417.24 samples/sec Loss 4.4981 LearningRate 0.0180 Epoch: 11 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:49,267-Speed 3350.38 samples/sec Loss 4.5575 LearningRate 0.0180 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:52,334-Speed 3339.19 samples/sec Loss 4.5019 LearningRate 0.0180 Epoch: 11 Global Step: 58210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:55,328-Speed 3421.35 samples/sec Loss 4.3233 LearningRate 0.0180 Epoch: 11 Global Step: 58220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:14:58,338-Speed 3402.81 samples/sec Loss 4.4221 LearningRate 0.0180 Epoch: 11 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:01,331-Speed 3422.12 samples/sec Loss 4.4388 LearningRate 0.0180 Epoch: 11 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:04,328-Speed 3417.75 samples/sec Loss 4.3897 LearningRate 0.0180 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:07,334-Speed 3407.48 samples/sec Loss 4.5543 LearningRate 0.0180 Epoch: 11 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:10,321-Speed 3428.89 samples/sec Loss 4.5632 LearningRate 0.0180 Epoch: 11 Global Step: 58270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:13,397-Speed 3330.46 samples/sec Loss 4.5405 LearningRate 0.0180 Epoch: 11 Global Step: 58280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:16,393-Speed 3418.23 samples/sec Loss 4.4583 LearningRate 0.0180 Epoch: 11 Global Step: 58290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:19,390-Speed 3418.12 samples/sec Loss 4.5360 LearningRate 0.0180 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:22,396-Speed 3407.31 samples/sec Loss 4.3775 LearningRate 0.0179 Epoch: 11 Global Step: 58310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:25,396-Speed 3414.26 samples/sec Loss 4.5017 LearningRate 0.0179 Epoch: 11 Global Step: 58320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:28,416-Speed 3391.06 samples/sec Loss 4.5576 LearningRate 0.0179 Epoch: 11 Global Step: 58330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:31,422-Speed 3408.20 samples/sec Loss 4.4149 LearningRate 0.0179 Epoch: 11 Global Step: 58340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:34,417-Speed 3419.77 samples/sec Loss 4.5746 LearningRate 0.0179 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:37,420-Speed 3409.62 samples/sec Loss 4.3813 LearningRate 0.0179 Epoch: 11 Global Step: 58360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:15:40,440-Speed 3392.29 samples/sec Loss 4.5475 LearningRate 0.0179 Epoch: 11 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:43,438-Speed 3416.28 samples/sec Loss 4.5386 LearningRate 0.0179 Epoch: 11 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:46,442-Speed 3410.55 samples/sec Loss 4.4063 LearningRate 0.0179 Epoch: 11 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:49,480-Speed 3370.75 samples/sec Loss 4.6105 LearningRate 0.0179 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:52,502-Speed 3389.40 samples/sec Loss 4.5691 LearningRate 0.0179 Epoch: 11 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:55,503-Speed 3412.81 samples/sec Loss 4.3332 LearningRate 0.0179 Epoch: 11 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:15:58,500-Speed 3417.91 samples/sec Loss 4.3927 LearningRate 0.0178 Epoch: 11 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:01,502-Speed 3411.96 samples/sec Loss 4.5007 LearningRate 0.0178 Epoch: 11 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:04,506-Speed 3409.21 samples/sec Loss 4.5195 LearningRate 0.0178 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:07,512-Speed 3407.69 samples/sec Loss 4.5083 LearningRate 0.0178 Epoch: 11 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:10,513-Speed 3413.10 samples/sec Loss 4.5127 LearningRate 0.0178 Epoch: 11 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:16:13,497-Speed 3432.97 samples/sec Loss 4.3756 LearningRate 0.0178 Epoch: 11 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:16,494-Speed 3417.53 samples/sec Loss 4.3981 LearningRate 0.0178 Epoch: 11 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:19,500-Speed 3407.07 samples/sec Loss 4.4992 LearningRate 0.0178 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:22,519-Speed 3392.92 samples/sec Loss 4.5389 LearningRate 0.0178 Epoch: 11 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:25,565-Speed 3362.62 samples/sec Loss 4.5459 LearningRate 0.0178 Epoch: 11 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:28,572-Speed 3405.94 samples/sec Loss 4.4498 LearningRate 0.0178 Epoch: 11 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:31,589-Speed 3395.95 samples/sec Loss 4.4682 LearningRate 0.0178 Epoch: 11 Global Step: 58540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:34,591-Speed 3411.71 samples/sec Loss 4.4160 LearningRate 0.0177 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:37,589-Speed 3415.69 samples/sec Loss 4.6078 LearningRate 0.0177 Epoch: 11 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:40,588-Speed 3416.10 samples/sec Loss 4.4933 LearningRate 0.0177 Epoch: 11 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:43,572-Speed 3432.38 samples/sec Loss 4.7138 LearningRate 0.0177 Epoch: 11 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:46,571-Speed 3415.64 samples/sec Loss 4.4965 LearningRate 0.0177 Epoch: 11 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:49,575-Speed 3408.80 samples/sec Loss 4.3255 LearningRate 0.0177 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:52,577-Speed 3412.79 samples/sec Loss 4.5195 LearningRate 0.0177 Epoch: 11 Global Step: 58610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:55,576-Speed 3415.54 samples/sec Loss 4.4067 LearningRate 0.0177 Epoch: 11 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:16:58,573-Speed 3416.85 samples/sec Loss 4.5022 LearningRate 0.0177 Epoch: 11 Global Step: 58630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:01,586-Speed 3399.57 samples/sec Loss 4.4430 LearningRate 0.0177 Epoch: 11 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:04,594-Speed 3405.55 samples/sec Loss 4.5982 LearningRate 0.0177 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:07,599-Speed 3408.12 samples/sec Loss 4.5647 LearningRate 0.0177 Epoch: 11 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:10,602-Speed 3410.74 samples/sec Loss 4.4948 LearningRate 0.0176 Epoch: 11 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:13,605-Speed 3410.80 samples/sec Loss 4.4416 LearningRate 0.0176 Epoch: 11 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:16,601-Speed 3419.07 samples/sec Loss 4.5770 LearningRate 0.0176 Epoch: 11 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:19,628-Speed 3383.29 samples/sec Loss 4.5141 LearningRate 0.0176 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:22,626-Speed 3416.87 samples/sec Loss 4.4526 LearningRate 0.0176 Epoch: 11 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:25,639-Speed 3398.84 samples/sec Loss 4.5660 LearningRate 0.0176 Epoch: 11 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:28,642-Speed 3411.00 samples/sec Loss 4.3930 LearningRate 0.0176 Epoch: 11 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:31,653-Speed 3402.42 samples/sec Loss 4.4280 LearningRate 0.0176 Epoch: 11 Global Step: 58740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:34,721-Speed 3338.19 samples/sec Loss 4.3825 LearningRate 0.0176 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:37,732-Speed 3401.96 samples/sec Loss 4.4308 LearningRate 0.0176 Epoch: 11 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:40,844-Speed 3291.76 samples/sec Loss 4.4977 LearningRate 0.0176 Epoch: 11 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:43,844-Speed 3413.97 samples/sec Loss 4.3774 LearningRate 0.0176 Epoch: 11 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:17:46,827-Speed 3433.76 samples/sec Loss 4.4751 LearningRate 0.0175 Epoch: 11 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:49,827-Speed 3414.11 samples/sec Loss 4.4497 LearningRate 0.0175 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:52,826-Speed 3414.95 samples/sec Loss 4.4097 LearningRate 0.0175 Epoch: 11 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:55,831-Speed 3408.85 samples/sec Loss 4.5385 LearningRate 0.0175 Epoch: 11 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:17:58,830-Speed 3415.25 samples/sec Loss 4.4172 LearningRate 0.0175 Epoch: 11 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:01,869-Speed 3369.52 samples/sec Loss 4.4490 LearningRate 0.0175 Epoch: 11 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:04,888-Speed 3393.14 samples/sec Loss 4.3943 LearningRate 0.0175 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:07,890-Speed 3411.86 samples/sec Loss 4.5378 LearningRate 0.0175 Epoch: 11 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:10,903-Speed 3399.69 samples/sec Loss 4.4154 LearningRate 0.0175 Epoch: 11 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:13,904-Speed 3413.17 samples/sec Loss 4.4237 LearningRate 0.0175 Epoch: 11 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:16,895-Speed 3424.48 samples/sec Loss 4.5563 LearningRate 0.0175 Epoch: 11 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:19,902-Speed 3405.90 samples/sec Loss 4.4540 LearningRate 0.0175 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:22,956-Speed 3354.42 samples/sec Loss 4.4384 LearningRate 0.0174 Epoch: 11 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:25,971-Speed 3396.35 samples/sec Loss 4.5232 LearningRate 0.0174 Epoch: 11 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:28,976-Speed 3408.66 samples/sec Loss 4.3119 LearningRate 0.0174 Epoch: 11 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:31,976-Speed 3414.27 samples/sec Loss 4.5456 LearningRate 0.0174 Epoch: 11 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:34,982-Speed 3407.60 samples/sec Loss 4.3921 LearningRate 0.0174 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:37,983-Speed 3412.74 samples/sec Loss 4.4589 LearningRate 0.0174 Epoch: 11 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:40,983-Speed 3415.18 samples/sec Loss 4.4114 LearningRate 0.0174 Epoch: 11 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:44,007-Speed 3386.79 samples/sec Loss 4.3394 LearningRate 0.0174 Epoch: 11 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:46,997-Speed 3425.13 samples/sec Loss 4.3347 LearningRate 0.0174 Epoch: 11 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:50,030-Speed 3377.48 samples/sec Loss 4.5346 LearningRate 0.0174 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:53,028-Speed 3416.05 samples/sec Loss 4.4734 LearningRate 0.0174 Epoch: 11 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:56,025-Speed 3418.07 samples/sec Loss 4.4943 LearningRate 0.0174 Epoch: 11 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:18:59,022-Speed 3417.49 samples/sec Loss 4.5178 LearningRate 0.0173 Epoch: 11 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:02,039-Speed 3395.63 samples/sec Loss 4.4141 LearningRate 0.0173 Epoch: 11 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:05,041-Speed 3411.18 samples/sec Loss 4.5531 LearningRate 0.0173 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:08,042-Speed 3413.17 samples/sec Loss 4.3943 LearningRate 0.0173 Epoch: 11 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:11,027-Speed 3432.00 samples/sec Loss 4.6341 LearningRate 0.0173 Epoch: 11 Global Step: 59070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:14,023-Speed 3417.71 samples/sec Loss 4.5414 LearningRate 0.0173 Epoch: 11 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:17,023-Speed 3415.35 samples/sec Loss 4.3415 LearningRate 0.0173 Epoch: 11 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:20,024-Speed 3412.36 samples/sec Loss 4.4553 LearningRate 0.0173 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:23,024-Speed 3413.42 samples/sec Loss 4.5107 LearningRate 0.0173 Epoch: 11 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:26,031-Speed 3407.11 samples/sec Loss 4.5862 LearningRate 0.0173 Epoch: 11 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:29,041-Speed 3402.00 samples/sec Loss 4.2805 LearningRate 0.0173 Epoch: 11 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:32,055-Speed 3399.12 samples/sec Loss 4.4640 LearningRate 0.0173 Epoch: 11 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:35,074-Speed 3393.15 samples/sec Loss 4.4334 LearningRate 0.0172 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:38,078-Speed 3408.58 samples/sec Loss 4.4291 LearningRate 0.0172 Epoch: 11 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:19:41,079-Speed 3413.41 samples/sec Loss 4.4046 LearningRate 0.0172 Epoch: 11 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:44,091-Speed 3400.41 samples/sec Loss 4.4467 LearningRate 0.0172 Epoch: 11 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:47,090-Speed 3415.93 samples/sec Loss 4.3756 LearningRate 0.0172 Epoch: 11 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:50,098-Speed 3404.91 samples/sec Loss 4.5401 LearningRate 0.0172 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:53,096-Speed 3416.63 samples/sec Loss 4.3949 LearningRate 0.0172 Epoch: 11 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:56,101-Speed 3408.71 samples/sec Loss 4.4940 LearningRate 0.0172 Epoch: 11 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:19:59,101-Speed 3413.79 samples/sec Loss 4.5102 LearningRate 0.0172 Epoch: 11 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:20:02,099-Speed 3417.11 samples/sec Loss 4.4333 LearningRate 0.0172 Epoch: 11 Global Step: 59240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:05,103-Speed 3409.68 samples/sec Loss 4.4942 LearningRate 0.0172 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:08,106-Speed 3409.72 samples/sec Loss 4.5075 LearningRate 0.0172 Epoch: 11 Global Step: 59260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:11,103-Speed 3418.09 samples/sec Loss 4.4748 LearningRate 0.0171 Epoch: 11 Global Step: 59270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:14,109-Speed 3406.85 samples/sec Loss 4.4951 LearningRate 0.0171 Epoch: 11 Global Step: 59280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:17,147-Speed 3372.15 samples/sec Loss 4.3757 LearningRate 0.0171 Epoch: 11 Global Step: 59290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:20,153-Speed 3407.38 samples/sec Loss 4.5438 LearningRate 0.0171 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:23,206-Speed 3354.34 samples/sec Loss 4.5495 LearningRate 0.0171 Epoch: 11 Global Step: 59310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:26,243-Speed 3373.26 samples/sec Loss 4.5701 LearningRate 0.0171 Epoch: 11 Global Step: 59320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:29,242-Speed 3414.87 samples/sec Loss 4.5984 LearningRate 0.0171 Epoch: 11 Global Step: 59330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:32,245-Speed 3411.39 samples/sec Loss 4.4442 LearningRate 0.0171 Epoch: 11 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:20:35,245-Speed 3413.90 samples/sec Loss 4.4232 LearningRate 0.0171 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:20:38,227-Speed 3434.78 samples/sec Loss 4.4811 LearningRate 0.0171 Epoch: 11 Global Step: 59360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:41,227-Speed 3413.84 samples/sec Loss 4.6000 LearningRate 0.0171 Epoch: 11 Global Step: 59370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:44,228-Speed 3412.99 samples/sec Loss 4.4548 LearningRate 0.0171 Epoch: 11 Global Step: 59380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:47,229-Speed 3413.86 samples/sec Loss 4.5446 LearningRate 0.0170 Epoch: 11 Global Step: 59390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:50,227-Speed 3416.00 samples/sec Loss 4.3092 LearningRate 0.0170 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:53,286-Speed 3348.96 samples/sec Loss 4.3578 LearningRate 0.0170 Epoch: 11 Global Step: 59410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:56,283-Speed 3417.14 samples/sec Loss 4.3979 LearningRate 0.0170 Epoch: 11 Global Step: 59420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:20:59,289-Speed 3407.68 samples/sec Loss 4.4355 LearningRate 0.0170 Epoch: 11 Global Step: 59430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:21:02,307-Speed 3394.00 samples/sec Loss 4.4040 LearningRate 0.0170 Epoch: 11 Global Step: 59440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:21:05,327-Speed 3391.49 samples/sec Loss 4.4173 LearningRate 0.0170 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:21:08,332-Speed 3408.24 samples/sec Loss 4.6200 LearningRate 0.0170 Epoch: 11 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:11,351-Speed 3393.23 samples/sec Loss 4.3081 LearningRate 0.0170 Epoch: 11 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:14,367-Speed 3395.56 samples/sec Loss 4.5016 LearningRate 0.0170 Epoch: 11 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:17,379-Speed 3400.82 samples/sec Loss 4.2655 LearningRate 0.0170 Epoch: 11 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:20,383-Speed 3409.55 samples/sec Loss 4.3938 LearningRate 0.0170 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:23,392-Speed 3403.64 samples/sec Loss 4.3089 LearningRate 0.0170 Epoch: 11 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:26,408-Speed 3396.78 samples/sec Loss 4.4687 LearningRate 0.0169 Epoch: 11 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:29,411-Speed 3411.00 samples/sec Loss 4.3665 LearningRate 0.0169 Epoch: 11 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:32,417-Speed 3407.41 samples/sec Loss 4.3784 LearningRate 0.0169 Epoch: 11 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:35,460-Speed 3365.45 samples/sec Loss 4.4751 LearningRate 0.0169 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:38,446-Speed 3430.02 samples/sec Loss 4.5359 LearningRate 0.0169 Epoch: 11 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:41,497-Speed 3357.31 samples/sec Loss 4.4129 LearningRate 0.0169 Epoch: 11 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:44,501-Speed 3409.89 samples/sec Loss 4.5288 LearningRate 0.0169 Epoch: 11 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:47,506-Speed 3408.88 samples/sec Loss 4.3432 LearningRate 0.0169 Epoch: 11 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:50,508-Speed 3411.61 samples/sec Loss 4.4513 LearningRate 0.0169 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:53,509-Speed 3412.67 samples/sec Loss 4.4296 LearningRate 0.0169 Epoch: 11 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:56,506-Speed 3417.58 samples/sec Loss 4.2895 LearningRate 0.0169 Epoch: 11 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:21:59,489-Speed 3433.30 samples/sec Loss 4.3399 LearningRate 0.0169 Epoch: 11 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:02,492-Speed 3411.12 samples/sec Loss 4.4542 LearningRate 0.0168 Epoch: 11 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:05,494-Speed 3412.11 samples/sec Loss 4.4671 LearningRate 0.0168 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:08,500-Speed 3407.24 samples/sec Loss 4.4021 LearningRate 0.0168 Epoch: 11 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:11,507-Speed 3405.97 samples/sec Loss 4.6109 LearningRate 0.0168 Epoch: 11 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:14,531-Speed 3387.83 samples/sec Loss 4.3292 LearningRate 0.0168 Epoch: 11 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:17,534-Speed 3410.58 samples/sec Loss 4.3539 LearningRate 0.0168 Epoch: 11 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:20,539-Speed 3408.90 samples/sec Loss 4.5298 LearningRate 0.0168 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:23,571-Speed 3378.11 samples/sec Loss 4.3413 LearningRate 0.0168 Epoch: 11 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:26,575-Speed 3408.74 samples/sec Loss 4.2632 LearningRate 0.0168 Epoch: 11 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:22:29,587-Speed 3400.71 samples/sec Loss 4.3858 LearningRate 0.0168 Epoch: 11 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:32,587-Speed 3414.22 samples/sec Loss 4.4137 LearningRate 0.0168 Epoch: 11 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:35,597-Speed 3402.57 samples/sec Loss 4.4406 LearningRate 0.0168 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:38,613-Speed 3396.16 samples/sec Loss 4.3914 LearningRate 0.0167 Epoch: 11 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:41,622-Speed 3404.90 samples/sec Loss 4.3090 LearningRate 0.0167 Epoch: 11 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:44,629-Speed 3406.05 samples/sec Loss 4.3177 LearningRate 0.0167 Epoch: 11 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:47,715-Speed 3318.99 samples/sec Loss 4.5367 LearningRate 0.0167 Epoch: 11 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:50,719-Speed 3409.66 samples/sec Loss 4.4334 LearningRate 0.0167 Epoch: 11 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:53,721-Speed 3411.58 samples/sec Loss 4.4195 LearningRate 0.0167 Epoch: 11 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:56,723-Speed 3411.70 samples/sec Loss 4.3412 LearningRate 0.0167 Epoch: 11 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:22:59,705-Speed 3435.53 samples/sec Loss 4.3914 LearningRate 0.0167 Epoch: 11 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:02,713-Speed 3405.39 samples/sec Loss 4.4925 LearningRate 0.0167 Epoch: 11 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:05,728-Speed 3396.16 samples/sec Loss 4.3338 LearningRate 0.0167 Epoch: 11 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:08,743-Speed 3397.34 samples/sec Loss 4.2903 LearningRate 0.0167 Epoch: 11 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:11,744-Speed 3414.51 samples/sec Loss 4.3792 LearningRate 0.0167 Epoch: 11 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:14,743-Speed 3415.36 samples/sec Loss 4.3852 LearningRate 0.0167 Epoch: 11 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:17,756-Speed 3399.58 samples/sec Loss 4.3357 LearningRate 0.0166 Epoch: 11 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:20,759-Speed 3410.87 samples/sec Loss 4.5066 LearningRate 0.0166 Epoch: 11 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:23:23,770-Speed 3400.83 samples/sec Loss 4.4399 LearningRate 0.0166 Epoch: 11 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:26,773-Speed 3411.49 samples/sec Loss 4.4372 LearningRate 0.0166 Epoch: 11 Global Step: 59920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:29,795-Speed 3389.25 samples/sec Loss 4.2816 LearningRate 0.0166 Epoch: 11 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:32,849-Speed 3353.71 samples/sec Loss 4.4249 LearningRate 0.0166 Epoch: 11 Global Step: 59940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:35,847-Speed 3416.54 samples/sec Loss 4.4441 LearningRate 0.0166 Epoch: 11 Global Step: 59950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:38,851-Speed 3409.90 samples/sec Loss 4.4958 LearningRate 0.0166 Epoch: 11 Global Step: 59960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:41,852-Speed 3412.54 samples/sec Loss 4.4792 LearningRate 0.0166 Epoch: 11 Global Step: 59970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:44,856-Speed 3409.70 samples/sec Loss 4.4931 LearningRate 0.0166 Epoch: 11 Global Step: 59980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:47,870-Speed 3398.94 samples/sec Loss 4.5210 LearningRate 0.0166 Epoch: 11 Global Step: 59990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:23:50,873-Speed 3410.83 samples/sec Loss 4.3455 LearningRate 0.0166 Epoch: 11 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:24:35,628-[lfw][60000]XNorm: 22.425556 Training: 2022-04-11 05:24:35,629-[lfw][60000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 05:24:35,629-[lfw][60000]Accuracy-Highest: 0.99850 Training: 2022-04-11 05:25:27,484-[cfp_fp][60000]XNorm: 21.019339 Training: 2022-04-11 05:25:27,485-[cfp_fp][60000]Accuracy-Flip: 0.97986+-0.00757 Training: 2022-04-11 05:25:27,485-[cfp_fp][60000]Accuracy-Highest: 0.97986 Training: 2022-04-11 05:26:11,817-[agedb_30][60000]XNorm: 22.608731 Training: 2022-04-11 05:26:11,818-[agedb_30][60000]Accuracy-Flip: 0.98083+-0.00720 Training: 2022-04-11 05:26:11,819-[agedb_30][60000]Accuracy-Highest: 0.98083 Training: 2022-04-11 05:26:14,817-Speed 71.14 samples/sec Loss 4.3787 LearningRate 0.0165 Epoch: 11 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:17,801-Speed 3432.99 samples/sec Loss 4.5122 LearningRate 0.0165 Epoch: 11 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:20,815-Speed 3397.27 samples/sec Loss 4.4110 LearningRate 0.0165 Epoch: 11 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:23,844-Speed 3381.93 samples/sec Loss 4.3699 LearningRate 0.0165 Epoch: 11 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:26,906-Speed 3344.93 samples/sec Loss 4.4498 LearningRate 0.0165 Epoch: 11 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:30,047-Speed 3261.22 samples/sec Loss 4.3965 LearningRate 0.0165 Epoch: 11 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:33,033-Speed 3430.84 samples/sec Loss 4.3591 LearningRate 0.0165 Epoch: 11 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:36,057-Speed 3386.64 samples/sec Loss 4.4501 LearningRate 0.0165 Epoch: 11 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:39,088-Speed 3378.73 samples/sec Loss 4.3141 LearningRate 0.0165 Epoch: 11 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:42,087-Speed 3415.25 samples/sec Loss 4.3010 LearningRate 0.0165 Epoch: 11 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:45,065-Speed 3439.74 samples/sec Loss 4.6966 LearningRate 0.0165 Epoch: 11 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:48,131-Speed 3340.57 samples/sec Loss 4.3839 LearningRate 0.0165 Epoch: 11 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:51,261-Speed 3272.39 samples/sec Loss 4.4660 LearningRate 0.0165 Epoch: 11 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:54,253-Speed 3423.72 samples/sec Loss 4.4459 LearningRate 0.0164 Epoch: 11 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:26:57,248-Speed 3420.35 samples/sec Loss 4.4283 LearningRate 0.0164 Epoch: 11 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:00,247-Speed 3414.27 samples/sec Loss 4.5005 LearningRate 0.0164 Epoch: 11 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:03,248-Speed 3413.08 samples/sec Loss 4.3088 LearningRate 0.0164 Epoch: 11 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:06,249-Speed 3412.88 samples/sec Loss 4.5157 LearningRate 0.0164 Epoch: 11 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:09,249-Speed 3414.69 samples/sec Loss 4.3148 LearningRate 0.0164 Epoch: 11 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:12,269-Speed 3391.84 samples/sec Loss 4.2456 LearningRate 0.0164 Epoch: 11 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:15,248-Speed 3437.38 samples/sec Loss 4.3058 LearningRate 0.0164 Epoch: 11 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:18,255-Speed 3406.94 samples/sec Loss 4.3517 LearningRate 0.0164 Epoch: 11 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:21,256-Speed 3413.31 samples/sec Loss 4.4456 LearningRate 0.0164 Epoch: 11 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:24,279-Speed 3388.18 samples/sec Loss 4.4129 LearningRate 0.0164 Epoch: 11 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:27,284-Speed 3408.56 samples/sec Loss 4.3426 LearningRate 0.0164 Epoch: 11 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:27:30,295-Speed 3401.95 samples/sec Loss 4.2259 LearningRate 0.0163 Epoch: 11 Global Step: 60260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:33,292-Speed 3417.24 samples/sec Loss 4.3787 LearningRate 0.0163 Epoch: 11 Global Step: 60270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:36,318-Speed 3384.94 samples/sec Loss 4.3963 LearningRate 0.0163 Epoch: 11 Global Step: 60280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:39,330-Speed 3401.20 samples/sec Loss 4.5088 LearningRate 0.0163 Epoch: 11 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:42,339-Speed 3402.86 samples/sec Loss 4.3729 LearningRate 0.0163 Epoch: 11 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:45,338-Speed 3415.50 samples/sec Loss 4.3412 LearningRate 0.0163 Epoch: 11 Global Step: 60310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:48,415-Speed 3329.01 samples/sec Loss 4.4547 LearningRate 0.0163 Epoch: 11 Global Step: 60320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:51,421-Speed 3407.19 samples/sec Loss 4.4211 LearningRate 0.0163 Epoch: 11 Global Step: 60330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:54,420-Speed 3415.52 samples/sec Loss 4.4116 LearningRate 0.0163 Epoch: 11 Global Step: 60340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:27:57,417-Speed 3417.11 samples/sec Loss 4.4150 LearningRate 0.0163 Epoch: 11 Global Step: 60350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:00,440-Speed 3388.37 samples/sec Loss 4.3406 LearningRate 0.0163 Epoch: 11 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:03,422-Speed 3435.17 samples/sec Loss 4.4421 LearningRate 0.0163 Epoch: 11 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:06,431-Speed 3404.04 samples/sec Loss 4.4574 LearningRate 0.0163 Epoch: 11 Global Step: 60380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:09,426-Speed 3420.05 samples/sec Loss 4.3920 LearningRate 0.0162 Epoch: 11 Global Step: 60390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:12,429-Speed 3410.39 samples/sec Loss 4.3053 LearningRate 0.0162 Epoch: 11 Global Step: 60400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:15,434-Speed 3408.82 samples/sec Loss 4.4942 LearningRate 0.0162 Epoch: 11 Global Step: 60410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:18,430-Speed 3419.12 samples/sec Loss 4.3946 LearningRate 0.0162 Epoch: 11 Global Step: 60420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:21,424-Speed 3420.56 samples/sec Loss 4.5309 LearningRate 0.0162 Epoch: 11 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:24,418-Speed 3421.41 samples/sec Loss 4.4029 LearningRate 0.0162 Epoch: 11 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:27,414-Speed 3418.69 samples/sec Loss 4.3770 LearningRate 0.0162 Epoch: 11 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:30,410-Speed 3418.80 samples/sec Loss 4.4691 LearningRate 0.0162 Epoch: 11 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:28:33,403-Speed 3421.96 samples/sec Loss 4.4252 LearningRate 0.0162 Epoch: 11 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:36,406-Speed 3410.38 samples/sec Loss 4.3839 LearningRate 0.0162 Epoch: 11 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:39,401-Speed 3420.41 samples/sec Loss 4.2273 LearningRate 0.0162 Epoch: 11 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:42,398-Speed 3418.04 samples/sec Loss 4.3997 LearningRate 0.0162 Epoch: 11 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:45,391-Speed 3421.34 samples/sec Loss 4.3156 LearningRate 0.0161 Epoch: 11 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:48,390-Speed 3415.96 samples/sec Loss 4.4161 LearningRate 0.0161 Epoch: 11 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:51,387-Speed 3417.27 samples/sec Loss 4.2972 LearningRate 0.0161 Epoch: 11 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:54,385-Speed 3416.84 samples/sec Loss 4.3370 LearningRate 0.0161 Epoch: 11 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:28:57,381-Speed 3418.29 samples/sec Loss 4.3764 LearningRate 0.0161 Epoch: 11 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:00,377-Speed 3418.05 samples/sec Loss 4.4280 LearningRate 0.0161 Epoch: 11 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:03,366-Speed 3427.63 samples/sec Loss 4.3982 LearningRate 0.0161 Epoch: 11 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:06,375-Speed 3403.37 samples/sec Loss 4.3734 LearningRate 0.0161 Epoch: 11 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:09,373-Speed 3417.73 samples/sec Loss 4.4784 LearningRate 0.0161 Epoch: 11 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:12,392-Speed 3392.57 samples/sec Loss 4.3059 LearningRate 0.0161 Epoch: 11 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:15,387-Speed 3419.47 samples/sec Loss 4.3471 LearningRate 0.0161 Epoch: 11 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:18,388-Speed 3413.12 samples/sec Loss 4.3727 LearningRate 0.0161 Epoch: 11 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:21,388-Speed 3413.54 samples/sec Loss 4.4465 LearningRate 0.0161 Epoch: 11 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:24,469-Speed 3324.45 samples/sec Loss 4.4764 LearningRate 0.0160 Epoch: 11 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:27,484-Speed 3397.48 samples/sec Loss 4.3801 LearningRate 0.0160 Epoch: 11 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:30,492-Speed 3404.72 samples/sec Loss 4.5286 LearningRate 0.0160 Epoch: 11 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:33,519-Speed 3383.66 samples/sec Loss 4.3144 LearningRate 0.0160 Epoch: 11 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:29:36,538-Speed 3393.40 samples/sec Loss 4.3455 LearningRate 0.0160 Epoch: 11 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:39,641-Speed 3301.31 samples/sec Loss 4.4246 LearningRate 0.0160 Epoch: 11 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:51,928-Speed 833.45 samples/sec Loss 4.0392 LearningRate 0.0160 Epoch: 12 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:54,966-Speed 3371.61 samples/sec Loss 3.4326 LearningRate 0.0160 Epoch: 12 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:29:58,014-Speed 3360.81 samples/sec Loss 3.5261 LearningRate 0.0160 Epoch: 12 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:01,116-Speed 3302.31 samples/sec Loss 3.4776 LearningRate 0.0160 Epoch: 12 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:04,125-Speed 3403.57 samples/sec Loss 3.5472 LearningRate 0.0160 Epoch: 12 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:07,173-Speed 3360.45 samples/sec Loss 3.5525 LearningRate 0.0160 Epoch: 12 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:10,173-Speed 3414.13 samples/sec Loss 3.4688 LearningRate 0.0159 Epoch: 12 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:13,194-Speed 3390.34 samples/sec Loss 3.6518 LearningRate 0.0159 Epoch: 12 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:16,215-Speed 3390.45 samples/sec Loss 3.5426 LearningRate 0.0159 Epoch: 12 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:30:19,217-Speed 3411.95 samples/sec Loss 3.5861 LearningRate 0.0159 Epoch: 12 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:22,216-Speed 3415.17 samples/sec Loss 3.6482 LearningRate 0.0159 Epoch: 12 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:25,214-Speed 3416.89 samples/sec Loss 3.5596 LearningRate 0.0159 Epoch: 12 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:28,230-Speed 3396.25 samples/sec Loss 3.5994 LearningRate 0.0159 Epoch: 12 Global Step: 60820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:31,231-Speed 3412.86 samples/sec Loss 3.6085 LearningRate 0.0159 Epoch: 12 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:34,232-Speed 3413.10 samples/sec Loss 3.5897 LearningRate 0.0159 Epoch: 12 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:37,248-Speed 3396.14 samples/sec Loss 3.7131 LearningRate 0.0159 Epoch: 12 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:40,252-Speed 3409.58 samples/sec Loss 3.6403 LearningRate 0.0159 Epoch: 12 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:43,258-Speed 3407.23 samples/sec Loss 3.5657 LearningRate 0.0159 Epoch: 12 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:46,259-Speed 3413.24 samples/sec Loss 3.5701 LearningRate 0.0159 Epoch: 12 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:49,282-Speed 3388.37 samples/sec Loss 3.5349 LearningRate 0.0158 Epoch: 12 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:52,318-Speed 3373.38 samples/sec Loss 3.6301 LearningRate 0.0158 Epoch: 12 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:55,376-Speed 3349.53 samples/sec Loss 3.7512 LearningRate 0.0158 Epoch: 12 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:30:58,383-Speed 3406.27 samples/sec Loss 3.5441 LearningRate 0.0158 Epoch: 12 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:01,388-Speed 3408.28 samples/sec Loss 3.6582 LearningRate 0.0158 Epoch: 12 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:04,404-Speed 3396.08 samples/sec Loss 3.5764 LearningRate 0.0158 Epoch: 12 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:07,414-Speed 3402.58 samples/sec Loss 3.6904 LearningRate 0.0158 Epoch: 12 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:10,412-Speed 3416.60 samples/sec Loss 3.6250 LearningRate 0.0158 Epoch: 12 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:13,413-Speed 3412.71 samples/sec Loss 3.6510 LearningRate 0.0158 Epoch: 12 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:16,441-Speed 3382.62 samples/sec Loss 3.7929 LearningRate 0.0158 Epoch: 12 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:19,425-Speed 3432.62 samples/sec Loss 3.6643 LearningRate 0.0158 Epoch: 12 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:22,427-Speed 3412.36 samples/sec Loss 3.6491 LearningRate 0.0158 Epoch: 12 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:25,493-Speed 3340.23 samples/sec Loss 3.8409 LearningRate 0.0158 Epoch: 12 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:31:28,484-Speed 3424.48 samples/sec Loss 3.7631 LearningRate 0.0157 Epoch: 12 Global Step: 61020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:31,488-Speed 3410.03 samples/sec Loss 3.7253 LearningRate 0.0157 Epoch: 12 Global Step: 61030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:34,492-Speed 3409.14 samples/sec Loss 3.7315 LearningRate 0.0157 Epoch: 12 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:37,531-Speed 3370.52 samples/sec Loss 3.6460 LearningRate 0.0157 Epoch: 12 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:40,546-Speed 3396.64 samples/sec Loss 3.6176 LearningRate 0.0157 Epoch: 12 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:43,565-Speed 3393.09 samples/sec Loss 3.8004 LearningRate 0.0157 Epoch: 12 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:46,571-Speed 3407.69 samples/sec Loss 3.6678 LearningRate 0.0157 Epoch: 12 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:49,586-Speed 3397.36 samples/sec Loss 3.7940 LearningRate 0.0157 Epoch: 12 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:52,591-Speed 3408.57 samples/sec Loss 3.7433 LearningRate 0.0157 Epoch: 12 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:55,614-Speed 3387.84 samples/sec Loss 3.5928 LearningRate 0.0157 Epoch: 12 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:31:58,619-Speed 3408.72 samples/sec Loss 3.6591 LearningRate 0.0157 Epoch: 12 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:01,627-Speed 3404.62 samples/sec Loss 3.8275 LearningRate 0.0157 Epoch: 12 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:04,656-Speed 3382.37 samples/sec Loss 3.7171 LearningRate 0.0157 Epoch: 12 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:07,683-Speed 3382.45 samples/sec Loss 3.7106 LearningRate 0.0156 Epoch: 12 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:10,697-Speed 3398.95 samples/sec Loss 3.7358 LearningRate 0.0156 Epoch: 12 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:13,718-Speed 3389.73 samples/sec Loss 3.7729 LearningRate 0.0156 Epoch: 12 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:16,727-Speed 3404.13 samples/sec Loss 3.8528 LearningRate 0.0156 Epoch: 12 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:19,730-Speed 3412.23 samples/sec Loss 3.8132 LearningRate 0.0156 Epoch: 12 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:22,769-Speed 3370.47 samples/sec Loss 3.8436 LearningRate 0.0156 Epoch: 12 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:25,795-Speed 3384.54 samples/sec Loss 3.6131 LearningRate 0.0156 Epoch: 12 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:28,899-Speed 3300.26 samples/sec Loss 3.8821 LearningRate 0.0156 Epoch: 12 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:32:31,896-Speed 3418.08 samples/sec Loss 3.7111 LearningRate 0.0156 Epoch: 12 Global Step: 61230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:34,939-Speed 3366.19 samples/sec Loss 3.5701 LearningRate 0.0156 Epoch: 12 Global Step: 61240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:37,970-Speed 3378.90 samples/sec Loss 3.7766 LearningRate 0.0156 Epoch: 12 Global Step: 61250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:40,995-Speed 3384.97 samples/sec Loss 3.6128 LearningRate 0.0156 Epoch: 12 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:43,999-Speed 3410.24 samples/sec Loss 3.7962 LearningRate 0.0155 Epoch: 12 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:47,003-Speed 3409.99 samples/sec Loss 3.8669 LearningRate 0.0155 Epoch: 12 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:50,031-Speed 3383.37 samples/sec Loss 3.5976 LearningRate 0.0155 Epoch: 12 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:53,088-Speed 3350.15 samples/sec Loss 3.8454 LearningRate 0.0155 Epoch: 12 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:56,091-Speed 3410.83 samples/sec Loss 3.5678 LearningRate 0.0155 Epoch: 12 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:32:59,106-Speed 3396.35 samples/sec Loss 3.8169 LearningRate 0.0155 Epoch: 12 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:33:02,134-Speed 3382.39 samples/sec Loss 3.6619 LearningRate 0.0155 Epoch: 12 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:05,153-Speed 3393.88 samples/sec Loss 3.8403 LearningRate 0.0155 Epoch: 12 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:08,161-Speed 3404.72 samples/sec Loss 3.7045 LearningRate 0.0155 Epoch: 12 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:11,162-Speed 3412.73 samples/sec Loss 3.8863 LearningRate 0.0155 Epoch: 12 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:14,182-Speed 3391.37 samples/sec Loss 3.8972 LearningRate 0.0155 Epoch: 12 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:17,224-Speed 3367.58 samples/sec Loss 3.7989 LearningRate 0.0155 Epoch: 12 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:20,242-Speed 3393.98 samples/sec Loss 3.8825 LearningRate 0.0155 Epoch: 12 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:23,252-Speed 3402.65 samples/sec Loss 3.9698 LearningRate 0.0154 Epoch: 12 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:26,262-Speed 3402.58 samples/sec Loss 3.8601 LearningRate 0.0154 Epoch: 12 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:29,266-Speed 3409.47 samples/sec Loss 3.6243 LearningRate 0.0154 Epoch: 12 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:32,247-Speed 3435.80 samples/sec Loss 3.7172 LearningRate 0.0154 Epoch: 12 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:35,254-Speed 3406.08 samples/sec Loss 3.9871 LearningRate 0.0154 Epoch: 12 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:38,257-Speed 3410.89 samples/sec Loss 3.8878 LearningRate 0.0154 Epoch: 12 Global Step: 61450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:41,265-Speed 3404.53 samples/sec Loss 3.9305 LearningRate 0.0154 Epoch: 12 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:44,277-Speed 3401.89 samples/sec Loss 3.8025 LearningRate 0.0154 Epoch: 12 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:47,277-Speed 3413.77 samples/sec Loss 3.7243 LearningRate 0.0154 Epoch: 12 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:50,285-Speed 3404.80 samples/sec Loss 3.8265 LearningRate 0.0154 Epoch: 12 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:53,292-Speed 3406.00 samples/sec Loss 3.7679 LearningRate 0.0154 Epoch: 12 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:33:56,277-Speed 3431.63 samples/sec Loss 3.9831 LearningRate 0.0154 Epoch: 12 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:33:59,280-Speed 3410.64 samples/sec Loss 3.8860 LearningRate 0.0154 Epoch: 12 Global Step: 61520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:02,439-Speed 3241.96 samples/sec Loss 3.8320 LearningRate 0.0153 Epoch: 12 Global Step: 61530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:05,581-Speed 3260.37 samples/sec Loss 3.9101 LearningRate 0.0153 Epoch: 12 Global Step: 61540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:08,588-Speed 3406.65 samples/sec Loss 3.8860 LearningRate 0.0153 Epoch: 12 Global Step: 61550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:11,590-Speed 3411.93 samples/sec Loss 3.7908 LearningRate 0.0153 Epoch: 12 Global Step: 61560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:14,600-Speed 3403.66 samples/sec Loss 3.7880 LearningRate 0.0153 Epoch: 12 Global Step: 61570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:17,604-Speed 3408.58 samples/sec Loss 3.8543 LearningRate 0.0153 Epoch: 12 Global Step: 61580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:20,604-Speed 3414.65 samples/sec Loss 3.9137 LearningRate 0.0153 Epoch: 12 Global Step: 61590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:23,607-Speed 3410.50 samples/sec Loss 3.9488 LearningRate 0.0153 Epoch: 12 Global Step: 61600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:26,641-Speed 3376.37 samples/sec Loss 3.8442 LearningRate 0.0153 Epoch: 12 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:29,650-Speed 3403.39 samples/sec Loss 3.7820 LearningRate 0.0153 Epoch: 12 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:32,653-Speed 3410.72 samples/sec Loss 3.8942 LearningRate 0.0153 Epoch: 12 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:35,662-Speed 3404.00 samples/sec Loss 3.7668 LearningRate 0.0153 Epoch: 12 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:38,716-Speed 3354.91 samples/sec Loss 3.9193 LearningRate 0.0153 Epoch: 12 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:41,780-Speed 3342.11 samples/sec Loss 3.9938 LearningRate 0.0152 Epoch: 12 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:44,785-Speed 3408.82 samples/sec Loss 3.8533 LearningRate 0.0152 Epoch: 12 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:34:47,794-Speed 3403.92 samples/sec Loss 4.0577 LearningRate 0.0152 Epoch: 12 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:50,819-Speed 3386.17 samples/sec Loss 3.8607 LearningRate 0.0152 Epoch: 12 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:53,827-Speed 3405.11 samples/sec Loss 3.9628 LearningRate 0.0152 Epoch: 12 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:56,839-Speed 3399.54 samples/sec Loss 3.9268 LearningRate 0.0152 Epoch: 12 Global Step: 61710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:34:59,847-Speed 3405.69 samples/sec Loss 3.8184 LearningRate 0.0152 Epoch: 12 Global Step: 61720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:02,857-Speed 3402.80 samples/sec Loss 3.7811 LearningRate 0.0152 Epoch: 12 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:05,921-Speed 3342.80 samples/sec Loss 3.9630 LearningRate 0.0152 Epoch: 12 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:08,924-Speed 3411.51 samples/sec Loss 3.6831 LearningRate 0.0152 Epoch: 12 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:11,933-Speed 3404.22 samples/sec Loss 3.8330 LearningRate 0.0152 Epoch: 12 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:14,950-Speed 3394.11 samples/sec Loss 3.8880 LearningRate 0.0152 Epoch: 12 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:17,979-Speed 3381.82 samples/sec Loss 3.7945 LearningRate 0.0152 Epoch: 12 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:35:20,984-Speed 3407.83 samples/sec Loss 3.8911 LearningRate 0.0151 Epoch: 12 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:35:23,987-Speed 3411.42 samples/sec Loss 3.8672 LearningRate 0.0151 Epoch: 12 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:35:26,974-Speed 3429.04 samples/sec Loss 3.9665 LearningRate 0.0151 Epoch: 12 Global Step: 61810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:29,981-Speed 3405.89 samples/sec Loss 3.9613 LearningRate 0.0151 Epoch: 12 Global Step: 61820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:32,987-Speed 3407.26 samples/sec Loss 3.9231 LearningRate 0.0151 Epoch: 12 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:36,001-Speed 3398.92 samples/sec Loss 3.8934 LearningRate 0.0151 Epoch: 12 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:39,007-Speed 3407.75 samples/sec Loss 3.9672 LearningRate 0.0151 Epoch: 12 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:42,020-Speed 3399.55 samples/sec Loss 3.8655 LearningRate 0.0151 Epoch: 12 Global Step: 61860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:45,028-Speed 3404.58 samples/sec Loss 4.0376 LearningRate 0.0151 Epoch: 12 Global Step: 61870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:48,052-Speed 3387.23 samples/sec Loss 3.9328 LearningRate 0.0151 Epoch: 12 Global Step: 61880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:51,077-Speed 3386.05 samples/sec Loss 3.9669 LearningRate 0.0151 Epoch: 12 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:54,082-Speed 3408.65 samples/sec Loss 4.0872 LearningRate 0.0151 Epoch: 12 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:35:57,081-Speed 3414.38 samples/sec Loss 3.8983 LearningRate 0.0151 Epoch: 12 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:36:00,090-Speed 3404.07 samples/sec Loss 3.9252 LearningRate 0.0150 Epoch: 12 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:36:03,128-Speed 3372.02 samples/sec Loss 3.9725 LearningRate 0.0150 Epoch: 12 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:06,213-Speed 3319.23 samples/sec Loss 4.0069 LearningRate 0.0150 Epoch: 12 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:09,225-Speed 3401.74 samples/sec Loss 3.9786 LearningRate 0.0150 Epoch: 12 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:12,228-Speed 3410.15 samples/sec Loss 3.9705 LearningRate 0.0150 Epoch: 12 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:15,256-Speed 3383.54 samples/sec Loss 4.0081 LearningRate 0.0150 Epoch: 12 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:18,322-Speed 3340.59 samples/sec Loss 3.8883 LearningRate 0.0150 Epoch: 12 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:21,327-Speed 3407.41 samples/sec Loss 3.9159 LearningRate 0.0150 Epoch: 12 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:36:24,374-Speed 3361.65 samples/sec Loss 3.9287 LearningRate 0.0150 Epoch: 12 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:37:08,681-[lfw][62000]XNorm: 22.459812 Training: 2022-04-11 05:37:08,682-[lfw][62000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-11 05:37:08,682-[lfw][62000]Accuracy-Highest: 0.99850 Training: 2022-04-11 05:38:00,155-[cfp_fp][62000]XNorm: 20.965085 Training: 2022-04-11 05:38:00,156-[cfp_fp][62000]Accuracy-Flip: 0.98100+-0.00658 Training: 2022-04-11 05:38:00,156-[cfp_fp][62000]Accuracy-Highest: 0.98100 Training: 2022-04-11 05:38:44,047-[agedb_30][62000]XNorm: 22.390025 Training: 2022-04-11 05:38:44,048-[agedb_30][62000]Accuracy-Flip: 0.98267+-0.00786 Training: 2022-04-11 05:38:44,048-[agedb_30][62000]Accuracy-Highest: 0.98267 Training: 2022-04-11 05:38:47,044-Speed 71.77 samples/sec Loss 4.0041 LearningRate 0.0150 Epoch: 12 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:38:50,028-Speed 3432.12 samples/sec Loss 3.9698 LearningRate 0.0150 Epoch: 12 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:38:53,016-Speed 3426.97 samples/sec Loss 3.9531 LearningRate 0.0150 Epoch: 12 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:38:56,006-Speed 3426.68 samples/sec Loss 3.9416 LearningRate 0.0150 Epoch: 12 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:38:59,013-Speed 3406.16 samples/sec Loss 3.7979 LearningRate 0.0149 Epoch: 12 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:02,001-Speed 3427.61 samples/sec Loss 3.7785 LearningRate 0.0149 Epoch: 12 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:04,992-Speed 3424.42 samples/sec Loss 3.9109 LearningRate 0.0149 Epoch: 12 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:07,983-Speed 3424.48 samples/sec Loss 4.0140 LearningRate 0.0149 Epoch: 12 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:10,983-Speed 3414.79 samples/sec Loss 3.9645 LearningRate 0.0149 Epoch: 12 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:13,997-Speed 3397.75 samples/sec Loss 3.8514 LearningRate 0.0149 Epoch: 12 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:16,993-Speed 3419.19 samples/sec Loss 3.8779 LearningRate 0.0149 Epoch: 12 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:19,990-Speed 3417.49 samples/sec Loss 3.9255 LearningRate 0.0149 Epoch: 12 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:22,964-Speed 3442.91 samples/sec Loss 3.8906 LearningRate 0.0149 Epoch: 12 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:25,965-Speed 3414.02 samples/sec Loss 3.9984 LearningRate 0.0149 Epoch: 12 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:28,973-Speed 3404.40 samples/sec Loss 3.8942 LearningRate 0.0149 Epoch: 12 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:31,969-Speed 3419.74 samples/sec Loss 3.8947 LearningRate 0.0149 Epoch: 12 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:34,958-Speed 3426.35 samples/sec Loss 3.9240 LearningRate 0.0149 Epoch: 12 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:37,969-Speed 3401.59 samples/sec Loss 3.8551 LearningRate 0.0148 Epoch: 12 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:40,971-Speed 3412.64 samples/sec Loss 3.9235 LearningRate 0.0148 Epoch: 12 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:43,968-Speed 3416.95 samples/sec Loss 3.9792 LearningRate 0.0148 Epoch: 12 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:46,961-Speed 3422.56 samples/sec Loss 3.9609 LearningRate 0.0148 Epoch: 12 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:50,015-Speed 3353.97 samples/sec Loss 3.8779 LearningRate 0.0148 Epoch: 12 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:53,093-Speed 3327.20 samples/sec Loss 3.7856 LearningRate 0.0148 Epoch: 12 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:39:56,072-Speed 3438.99 samples/sec Loss 4.0555 LearningRate 0.0148 Epoch: 12 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:39:59,069-Speed 3417.46 samples/sec Loss 3.9135 LearningRate 0.0148 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:02,108-Speed 3371.19 samples/sec Loss 4.1471 LearningRate 0.0148 Epoch: 12 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:05,095-Speed 3428.65 samples/sec Loss 3.9050 LearningRate 0.0148 Epoch: 12 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:08,098-Speed 3410.06 samples/sec Loss 3.9930 LearningRate 0.0148 Epoch: 12 Global Step: 62280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:11,106-Speed 3405.39 samples/sec Loss 3.9619 LearningRate 0.0148 Epoch: 12 Global Step: 62290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:14,115-Speed 3404.17 samples/sec Loss 3.9602 LearningRate 0.0148 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:17,113-Speed 3416.82 samples/sec Loss 3.9471 LearningRate 0.0147 Epoch: 12 Global Step: 62310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:20,115-Speed 3411.53 samples/sec Loss 3.8995 LearningRate 0.0147 Epoch: 12 Global Step: 62320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:23,130-Speed 3397.10 samples/sec Loss 3.9397 LearningRate 0.0147 Epoch: 12 Global Step: 62330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:26,188-Speed 3349.14 samples/sec Loss 4.0233 LearningRate 0.0147 Epoch: 12 Global Step: 62340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:29,205-Speed 3394.53 samples/sec Loss 3.9925 LearningRate 0.0147 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:32,203-Speed 3416.79 samples/sec Loss 3.8828 LearningRate 0.0147 Epoch: 12 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:40:35,200-Speed 3417.31 samples/sec Loss 4.0434 LearningRate 0.0147 Epoch: 12 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:38,201-Speed 3413.78 samples/sec Loss 3.9863 LearningRate 0.0147 Epoch: 12 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:41,199-Speed 3415.80 samples/sec Loss 3.9690 LearningRate 0.0147 Epoch: 12 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:44,209-Speed 3402.83 samples/sec Loss 3.8509 LearningRate 0.0147 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:47,207-Speed 3416.31 samples/sec Loss 3.9224 LearningRate 0.0147 Epoch: 12 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:50,205-Speed 3416.10 samples/sec Loss 3.9930 LearningRate 0.0147 Epoch: 12 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:53,210-Speed 3408.84 samples/sec Loss 3.9107 LearningRate 0.0147 Epoch: 12 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:56,209-Speed 3415.53 samples/sec Loss 3.9189 LearningRate 0.0147 Epoch: 12 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:40:59,216-Speed 3406.91 samples/sec Loss 3.9906 LearningRate 0.0146 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:02,219-Speed 3410.60 samples/sec Loss 3.9350 LearningRate 0.0146 Epoch: 12 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:05,205-Speed 3430.46 samples/sec Loss 3.9748 LearningRate 0.0146 Epoch: 12 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:08,224-Speed 3392.19 samples/sec Loss 4.0699 LearningRate 0.0146 Epoch: 12 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:11,251-Speed 3383.46 samples/sec Loss 4.0096 LearningRate 0.0146 Epoch: 12 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:14,292-Speed 3368.63 samples/sec Loss 4.0159 LearningRate 0.0146 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:17,448-Speed 3245.24 samples/sec Loss 3.9487 LearningRate 0.0146 Epoch: 12 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:20,451-Speed 3410.82 samples/sec Loss 3.9393 LearningRate 0.0146 Epoch: 12 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:23,472-Speed 3389.84 samples/sec Loss 3.9987 LearningRate 0.0146 Epoch: 12 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:26,480-Speed 3406.57 samples/sec Loss 3.9560 LearningRate 0.0146 Epoch: 12 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:29,483-Speed 3410.20 samples/sec Loss 4.0276 LearningRate 0.0146 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:32,481-Speed 3416.37 samples/sec Loss 3.9337 LearningRate 0.0146 Epoch: 12 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:35,484-Speed 3411.50 samples/sec Loss 3.8466 LearningRate 0.0146 Epoch: 12 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:41:38,484-Speed 3413.55 samples/sec Loss 4.0486 LearningRate 0.0145 Epoch: 12 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:41,494-Speed 3402.38 samples/sec Loss 4.0194 LearningRate 0.0145 Epoch: 12 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:41:44,499-Speed 3409.31 samples/sec Loss 4.0004 LearningRate 0.0145 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:41:47,512-Speed 3398.84 samples/sec Loss 4.0664 LearningRate 0.0145 Epoch: 12 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:41:50,517-Speed 3408.75 samples/sec Loss 3.9148 LearningRate 0.0145 Epoch: 12 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:41:53,527-Speed 3402.26 samples/sec Loss 3.9220 LearningRate 0.0145 Epoch: 12 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:41:56,537-Speed 3404.28 samples/sec Loss 3.9575 LearningRate 0.0145 Epoch: 12 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:41:59,539-Speed 3410.98 samples/sec Loss 3.9950 LearningRate 0.0145 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:02,562-Speed 3388.56 samples/sec Loss 3.9607 LearningRate 0.0145 Epoch: 12 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:05,562-Speed 3414.13 samples/sec Loss 3.9737 LearningRate 0.0145 Epoch: 12 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:08,573-Speed 3401.13 samples/sec Loss 3.8248 LearningRate 0.0145 Epoch: 12 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:11,576-Speed 3411.87 samples/sec Loss 3.9361 LearningRate 0.0145 Epoch: 12 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:14,592-Speed 3395.25 samples/sec Loss 3.9501 LearningRate 0.0145 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:17,615-Speed 3388.78 samples/sec Loss 4.0859 LearningRate 0.0144 Epoch: 12 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:20,621-Speed 3406.48 samples/sec Loss 4.0438 LearningRate 0.0144 Epoch: 12 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:23,671-Speed 3359.18 samples/sec Loss 3.9731 LearningRate 0.0144 Epoch: 12 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:26,871-Speed 3200.69 samples/sec Loss 4.0825 LearningRate 0.0144 Epoch: 12 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:29,908-Speed 3373.34 samples/sec Loss 3.9150 LearningRate 0.0144 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:42:32,892-Speed 3431.76 samples/sec Loss 4.1014 LearningRate 0.0144 Epoch: 12 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:35,896-Speed 3410.73 samples/sec Loss 4.0458 LearningRate 0.0144 Epoch: 12 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:38,918-Speed 3388.20 samples/sec Loss 4.0395 LearningRate 0.0144 Epoch: 12 Global Step: 62780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:41,924-Speed 3407.64 samples/sec Loss 3.9814 LearningRate 0.0144 Epoch: 12 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:44,925-Speed 3413.35 samples/sec Loss 3.9716 LearningRate 0.0144 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:47,933-Speed 3404.60 samples/sec Loss 3.9494 LearningRate 0.0144 Epoch: 12 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:50,951-Speed 3394.11 samples/sec Loss 4.0392 LearningRate 0.0144 Epoch: 12 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:54,030-Speed 3326.14 samples/sec Loss 3.8946 LearningRate 0.0144 Epoch: 12 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:42:57,032-Speed 3413.33 samples/sec Loss 3.9789 LearningRate 0.0143 Epoch: 12 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:43:00,038-Speed 3406.28 samples/sec Loss 4.0212 LearningRate 0.0143 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:43:03,044-Speed 3407.33 samples/sec Loss 3.8559 LearningRate 0.0143 Epoch: 12 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:06,061-Speed 3395.45 samples/sec Loss 3.9656 LearningRate 0.0143 Epoch: 12 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:09,066-Speed 3408.42 samples/sec Loss 3.9322 LearningRate 0.0143 Epoch: 12 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:12,078-Speed 3400.32 samples/sec Loss 3.9391 LearningRate 0.0143 Epoch: 12 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:15,096-Speed 3393.97 samples/sec Loss 3.9729 LearningRate 0.0143 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:18,121-Speed 3386.36 samples/sec Loss 3.9599 LearningRate 0.0143 Epoch: 12 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:21,133-Speed 3400.22 samples/sec Loss 4.0136 LearningRate 0.0143 Epoch: 12 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:24,142-Speed 3404.52 samples/sec Loss 3.8409 LearningRate 0.0143 Epoch: 12 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:27,226-Speed 3321.59 samples/sec Loss 3.9831 LearningRate 0.0143 Epoch: 12 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:30,320-Speed 3310.15 samples/sec Loss 3.9881 LearningRate 0.0143 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:33,310-Speed 3425.23 samples/sec Loss 4.0676 LearningRate 0.0143 Epoch: 12 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:36,317-Speed 3405.81 samples/sec Loss 3.9477 LearningRate 0.0143 Epoch: 12 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:39,416-Speed 3305.57 samples/sec Loss 3.9573 LearningRate 0.0142 Epoch: 12 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:42,435-Speed 3392.96 samples/sec Loss 3.9724 LearningRate 0.0142 Epoch: 12 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:45,440-Speed 3407.54 samples/sec Loss 3.9074 LearningRate 0.0142 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:48,444-Speed 3410.19 samples/sec Loss 3.9417 LearningRate 0.0142 Epoch: 12 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:51,450-Speed 3408.02 samples/sec Loss 4.0218 LearningRate 0.0142 Epoch: 12 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:54,456-Speed 3407.08 samples/sec Loss 3.9034 LearningRate 0.0142 Epoch: 12 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:43:57,456-Speed 3414.11 samples/sec Loss 4.0594 LearningRate 0.0142 Epoch: 12 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:00,528-Speed 3334.89 samples/sec Loss 3.9471 LearningRate 0.0142 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:03,522-Speed 3420.66 samples/sec Loss 3.9934 LearningRate 0.0142 Epoch: 12 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:06,549-Speed 3383.61 samples/sec Loss 4.0095 LearningRate 0.0142 Epoch: 12 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:09,559-Speed 3403.05 samples/sec Loss 3.9578 LearningRate 0.0142 Epoch: 12 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:12,591-Speed 3377.79 samples/sec Loss 3.9436 LearningRate 0.0142 Epoch: 12 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:15,601-Speed 3403.86 samples/sec Loss 4.0252 LearningRate 0.0142 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:18,619-Speed 3393.07 samples/sec Loss 3.9276 LearningRate 0.0141 Epoch: 12 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:21,634-Speed 3397.23 samples/sec Loss 4.0416 LearningRate 0.0141 Epoch: 12 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:24,670-Speed 3373.63 samples/sec Loss 3.9034 LearningRate 0.0141 Epoch: 12 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:27,732-Speed 3345.02 samples/sec Loss 3.9594 LearningRate 0.0141 Epoch: 12 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:30,763-Speed 3380.38 samples/sec Loss 3.9974 LearningRate 0.0141 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:33,769-Speed 3407.27 samples/sec Loss 3.9967 LearningRate 0.0141 Epoch: 12 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:36,774-Speed 3407.97 samples/sec Loss 3.9989 LearningRate 0.0141 Epoch: 12 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:39,787-Speed 3399.45 samples/sec Loss 3.9546 LearningRate 0.0141 Epoch: 12 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:42,799-Speed 3400.72 samples/sec Loss 3.9716 LearningRate 0.0141 Epoch: 12 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:44:45,804-Speed 3408.18 samples/sec Loss 4.0367 LearningRate 0.0141 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:48,811-Speed 3406.38 samples/sec Loss 4.0875 LearningRate 0.0141 Epoch: 12 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:51,820-Speed 3404.14 samples/sec Loss 3.9537 LearningRate 0.0141 Epoch: 12 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:54,843-Speed 3388.69 samples/sec Loss 4.0505 LearningRate 0.0141 Epoch: 12 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:44:57,856-Speed 3399.45 samples/sec Loss 3.9814 LearningRate 0.0141 Epoch: 12 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:00,879-Speed 3388.37 samples/sec Loss 4.0445 LearningRate 0.0140 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:03,895-Speed 3395.99 samples/sec Loss 3.8646 LearningRate 0.0140 Epoch: 12 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:06,915-Speed 3391.59 samples/sec Loss 3.9549 LearningRate 0.0140 Epoch: 12 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:09,918-Speed 3410.25 samples/sec Loss 3.9426 LearningRate 0.0140 Epoch: 12 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:12,937-Speed 3392.80 samples/sec Loss 4.0667 LearningRate 0.0140 Epoch: 12 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:15,952-Speed 3397.31 samples/sec Loss 3.9731 LearningRate 0.0140 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:18,980-Speed 3382.84 samples/sec Loss 4.0127 LearningRate 0.0140 Epoch: 12 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:21,980-Speed 3413.92 samples/sec Loss 3.9057 LearningRate 0.0140 Epoch: 12 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:24,984-Speed 3409.85 samples/sec Loss 3.9284 LearningRate 0.0140 Epoch: 12 Global Step: 63330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:28,004-Speed 3391.76 samples/sec Loss 3.9874 LearningRate 0.0140 Epoch: 12 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:45:31,053-Speed 3358.78 samples/sec Loss 4.0564 LearningRate 0.0140 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:34,058-Speed 3409.40 samples/sec Loss 4.0025 LearningRate 0.0140 Epoch: 12 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:37,102-Speed 3364.10 samples/sec Loss 3.8962 LearningRate 0.0140 Epoch: 12 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:40,115-Speed 3399.41 samples/sec Loss 3.9568 LearningRate 0.0139 Epoch: 12 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:43,129-Speed 3398.22 samples/sec Loss 3.9321 LearningRate 0.0139 Epoch: 12 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:46,134-Speed 3409.16 samples/sec Loss 4.0063 LearningRate 0.0139 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:49,146-Speed 3399.83 samples/sec Loss 4.0085 LearningRate 0.0139 Epoch: 12 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:52,194-Speed 3361.14 samples/sec Loss 4.1521 LearningRate 0.0139 Epoch: 12 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:55,201-Speed 3406.05 samples/sec Loss 4.0194 LearningRate 0.0139 Epoch: 12 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:45:58,223-Speed 3389.61 samples/sec Loss 3.9435 LearningRate 0.0139 Epoch: 12 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:46:01,191-Speed 3450.69 samples/sec Loss 4.0538 LearningRate 0.0139 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:04,215-Speed 3387.10 samples/sec Loss 4.0911 LearningRate 0.0139 Epoch: 12 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:07,223-Speed 3405.66 samples/sec Loss 4.0019 LearningRate 0.0139 Epoch: 12 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:10,231-Speed 3404.53 samples/sec Loss 3.9934 LearningRate 0.0139 Epoch: 12 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:13,251-Speed 3391.79 samples/sec Loss 4.0913 LearningRate 0.0139 Epoch: 12 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:16,261-Speed 3402.42 samples/sec Loss 3.8657 LearningRate 0.0139 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:19,274-Speed 3399.89 samples/sec Loss 4.0023 LearningRate 0.0139 Epoch: 12 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:22,277-Speed 3410.81 samples/sec Loss 3.9689 LearningRate 0.0138 Epoch: 12 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:25,382-Speed 3298.72 samples/sec Loss 3.8671 LearningRate 0.0138 Epoch: 12 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:28,420-Speed 3371.45 samples/sec Loss 4.0924 LearningRate 0.0138 Epoch: 12 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:31,446-Speed 3385.21 samples/sec Loss 4.0277 LearningRate 0.0138 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:46:34,427-Speed 3435.27 samples/sec Loss 4.0018 LearningRate 0.0138 Epoch: 12 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:37,444-Speed 3395.18 samples/sec Loss 4.0151 LearningRate 0.0138 Epoch: 12 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:40,448-Speed 3409.58 samples/sec Loss 3.9167 LearningRate 0.0138 Epoch: 12 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:43,451-Speed 3410.09 samples/sec Loss 4.1203 LearningRate 0.0138 Epoch: 12 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:46,461-Speed 3403.19 samples/sec Loss 3.9457 LearningRate 0.0138 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:49,482-Speed 3390.55 samples/sec Loss 3.9988 LearningRate 0.0138 Epoch: 12 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:52,492-Speed 3403.95 samples/sec Loss 3.8405 LearningRate 0.0138 Epoch: 12 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:55,595-Speed 3300.47 samples/sec Loss 3.9045 LearningRate 0.0138 Epoch: 12 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:46:58,604-Speed 3403.28 samples/sec Loss 4.0384 LearningRate 0.0138 Epoch: 12 Global Step: 63640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:47:01,659-Speed 3353.12 samples/sec Loss 3.9898 LearningRate 0.0137 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:47:04,670-Speed 3401.16 samples/sec Loss 3.8627 LearningRate 0.0137 Epoch: 12 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:07,674-Speed 3410.73 samples/sec Loss 3.8747 LearningRate 0.0137 Epoch: 12 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:10,679-Speed 3408.44 samples/sec Loss 3.9999 LearningRate 0.0137 Epoch: 12 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:13,688-Speed 3402.83 samples/sec Loss 3.9317 LearningRate 0.0137 Epoch: 12 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:16,750-Speed 3345.89 samples/sec Loss 4.1185 LearningRate 0.0137 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:19,755-Speed 3408.24 samples/sec Loss 4.0185 LearningRate 0.0137 Epoch: 12 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:22,760-Speed 3408.53 samples/sec Loss 3.9144 LearningRate 0.0137 Epoch: 12 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:25,765-Speed 3408.85 samples/sec Loss 3.9255 LearningRate 0.0137 Epoch: 12 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:28,788-Speed 3387.97 samples/sec Loss 3.8417 LearningRate 0.0137 Epoch: 12 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:31,795-Speed 3406.85 samples/sec Loss 3.9421 LearningRate 0.0137 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:34,798-Speed 3409.81 samples/sec Loss 3.9252 LearningRate 0.0137 Epoch: 12 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:47:37,784-Speed 3430.39 samples/sec Loss 3.9034 LearningRate 0.0137 Epoch: 12 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:40,790-Speed 3407.07 samples/sec Loss 3.8953 LearningRate 0.0137 Epoch: 12 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:43,799-Speed 3404.24 samples/sec Loss 4.0462 LearningRate 0.0136 Epoch: 12 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:46,817-Speed 3394.07 samples/sec Loss 4.1024 LearningRate 0.0136 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:49,823-Speed 3407.82 samples/sec Loss 4.1194 LearningRate 0.0136 Epoch: 12 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:52,832-Speed 3403.92 samples/sec Loss 3.9524 LearningRate 0.0136 Epoch: 12 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:55,839-Speed 3406.08 samples/sec Loss 3.8803 LearningRate 0.0136 Epoch: 12 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:47:58,843-Speed 3409.88 samples/sec Loss 3.9700 LearningRate 0.0136 Epoch: 12 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:01,851-Speed 3405.53 samples/sec Loss 3.9549 LearningRate 0.0136 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:04,859-Speed 3404.15 samples/sec Loss 4.0080 LearningRate 0.0136 Epoch: 12 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:07,850-Speed 3425.24 samples/sec Loss 3.9891 LearningRate 0.0136 Epoch: 12 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:10,856-Speed 3406.92 samples/sec Loss 4.0665 LearningRate 0.0136 Epoch: 12 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:13,861-Speed 3408.30 samples/sec Loss 4.0028 LearningRate 0.0136 Epoch: 12 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:16,875-Speed 3398.13 samples/sec Loss 4.0676 LearningRate 0.0136 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:19,882-Speed 3406.47 samples/sec Loss 3.9594 LearningRate 0.0136 Epoch: 12 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:48:22,872-Speed 3425.55 samples/sec Loss 3.9288 LearningRate 0.0136 Epoch: 12 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:25,898-Speed 3384.85 samples/sec Loss 3.9050 LearningRate 0.0135 Epoch: 12 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:28,910-Speed 3400.74 samples/sec Loss 3.9058 LearningRate 0.0135 Epoch: 12 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:31,992-Speed 3324.36 samples/sec Loss 3.7861 LearningRate 0.0135 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:35,008-Speed 3395.69 samples/sec Loss 4.0114 LearningRate 0.0135 Epoch: 12 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:38,022-Speed 3398.06 samples/sec Loss 3.9950 LearningRate 0.0135 Epoch: 12 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:41,046-Speed 3387.26 samples/sec Loss 3.8993 LearningRate 0.0135 Epoch: 12 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:44,048-Speed 3412.25 samples/sec Loss 3.9958 LearningRate 0.0135 Epoch: 12 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:48:47,051-Speed 3411.34 samples/sec Loss 3.8641 LearningRate 0.0135 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:49:31,327-[lfw][64000]XNorm: 24.006045 Training: 2022-04-11 05:49:31,328-[lfw][64000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 05:49:31,328-[lfw][64000]Accuracy-Highest: 0.99850 Training: 2022-04-11 05:50:22,728-[cfp_fp][64000]XNorm: 22.506030 Training: 2022-04-11 05:50:22,729-[cfp_fp][64000]Accuracy-Flip: 0.98300+-0.00618 Training: 2022-04-11 05:50:22,729-[cfp_fp][64000]Accuracy-Highest: 0.98300 Training: 2022-04-11 05:51:07,240-[agedb_30][64000]XNorm: 23.641323 Training: 2022-04-11 05:51:07,240-[agedb_30][64000]Accuracy-Flip: 0.98233+-0.00704 Training: 2022-04-11 05:51:07,241-[agedb_30][64000]Accuracy-Highest: 0.98267 Training: 2022-04-11 05:51:10,242-Speed 71.51 samples/sec Loss 3.9861 LearningRate 0.0135 Epoch: 12 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:13,231-Speed 3426.83 samples/sec Loss 3.9602 LearningRate 0.0135 Epoch: 12 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:51:16,253-Speed 3389.40 samples/sec Loss 3.9343 LearningRate 0.0135 Epoch: 12 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:51:19,223-Speed 3448.66 samples/sec Loss 4.0770 LearningRate 0.0135 Epoch: 12 Global Step: 64040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:22,213-Speed 3425.29 samples/sec Loss 3.9784 LearningRate 0.0135 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:25,204-Speed 3425.57 samples/sec Loss 3.9765 LearningRate 0.0135 Epoch: 12 Global Step: 64060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:28,201-Speed 3417.40 samples/sec Loss 3.9607 LearningRate 0.0134 Epoch: 12 Global Step: 64070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:31,191-Speed 3424.60 samples/sec Loss 3.8879 LearningRate 0.0134 Epoch: 12 Global Step: 64080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:34,184-Speed 3423.13 samples/sec Loss 4.0579 LearningRate 0.0134 Epoch: 12 Global Step: 64090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:37,181-Speed 3416.80 samples/sec Loss 3.9531 LearningRate 0.0134 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:40,176-Speed 3420.69 samples/sec Loss 3.8770 LearningRate 0.0134 Epoch: 12 Global Step: 64110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:43,168-Speed 3423.56 samples/sec Loss 3.8897 LearningRate 0.0134 Epoch: 12 Global Step: 64120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:46,162-Speed 3420.41 samples/sec Loss 4.0155 LearningRate 0.0134 Epoch: 12 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:51:49,165-Speed 3410.95 samples/sec Loss 4.0399 LearningRate 0.0134 Epoch: 12 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:51:52,175-Speed 3403.28 samples/sec Loss 3.9364 LearningRate 0.0134 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:51:55,205-Speed 3379.87 samples/sec Loss 3.9935 LearningRate 0.0134 Epoch: 12 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:51:58,206-Speed 3412.86 samples/sec Loss 3.9885 LearningRate 0.0134 Epoch: 12 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:01,205-Speed 3415.18 samples/sec Loss 3.9370 LearningRate 0.0134 Epoch: 12 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:04,204-Speed 3415.86 samples/sec Loss 3.9893 LearningRate 0.0134 Epoch: 12 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:07,214-Speed 3402.34 samples/sec Loss 3.8788 LearningRate 0.0133 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:10,225-Speed 3402.93 samples/sec Loss 3.8480 LearningRate 0.0133 Epoch: 12 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:13,271-Speed 3362.24 samples/sec Loss 4.0462 LearningRate 0.0133 Epoch: 12 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:16,311-Speed 3368.35 samples/sec Loss 3.8693 LearningRate 0.0133 Epoch: 12 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:19,307-Speed 3419.32 samples/sec Loss 3.9103 LearningRate 0.0133 Epoch: 12 Global Step: 64240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 05:52:22,286-Speed 3438.36 samples/sec Loss 3.9475 LearningRate 0.0133 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:25,281-Speed 3419.57 samples/sec Loss 3.9107 LearningRate 0.0133 Epoch: 12 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:28,284-Speed 3410.41 samples/sec Loss 3.9630 LearningRate 0.0133 Epoch: 12 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:31,334-Speed 3358.11 samples/sec Loss 3.9507 LearningRate 0.0133 Epoch: 12 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:34,340-Speed 3408.51 samples/sec Loss 3.9407 LearningRate 0.0133 Epoch: 12 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:37,368-Speed 3382.18 samples/sec Loss 4.0075 LearningRate 0.0133 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:40,365-Speed 3418.15 samples/sec Loss 3.9597 LearningRate 0.0133 Epoch: 12 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:43,370-Speed 3407.88 samples/sec Loss 3.9574 LearningRate 0.0133 Epoch: 12 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:46,368-Speed 3416.36 samples/sec Loss 3.9122 LearningRate 0.0133 Epoch: 12 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:49,373-Speed 3409.08 samples/sec Loss 3.9488 LearningRate 0.0132 Epoch: 12 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:52,400-Speed 3383.14 samples/sec Loss 3.8536 LearningRate 0.0132 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:55,425-Speed 3386.34 samples/sec Loss 4.1308 LearningRate 0.0132 Epoch: 12 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:52:58,451-Speed 3384.51 samples/sec Loss 3.8905 LearningRate 0.0132 Epoch: 12 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:01,471-Speed 3392.37 samples/sec Loss 3.9454 LearningRate 0.0132 Epoch: 12 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:04,469-Speed 3416.96 samples/sec Loss 3.9598 LearningRate 0.0132 Epoch: 12 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:07,468-Speed 3414.30 samples/sec Loss 4.0218 LearningRate 0.0132 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:10,462-Speed 3420.99 samples/sec Loss 3.9244 LearningRate 0.0132 Epoch: 12 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:13,461-Speed 3415.65 samples/sec Loss 4.0348 LearningRate 0.0132 Epoch: 12 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:16,458-Speed 3417.80 samples/sec Loss 3.9597 LearningRate 0.0132 Epoch: 12 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:19,457-Speed 3415.74 samples/sec Loss 3.9221 LearningRate 0.0132 Epoch: 12 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:22,439-Speed 3433.67 samples/sec Loss 3.8072 LearningRate 0.0132 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:25,456-Speed 3395.78 samples/sec Loss 3.8772 LearningRate 0.0132 Epoch: 12 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:53:28,449-Speed 3422.40 samples/sec Loss 3.8984 LearningRate 0.0132 Epoch: 12 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:31,456-Speed 3406.13 samples/sec Loss 3.9588 LearningRate 0.0131 Epoch: 12 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:34,454-Speed 3416.90 samples/sec Loss 3.8253 LearningRate 0.0131 Epoch: 12 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:37,466-Speed 3400.04 samples/sec Loss 3.9747 LearningRate 0.0131 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:40,463-Speed 3418.53 samples/sec Loss 3.9744 LearningRate 0.0131 Epoch: 12 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:43,465-Speed 3411.58 samples/sec Loss 3.9692 LearningRate 0.0131 Epoch: 12 Global Step: 64520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:46,520-Speed 3352.72 samples/sec Loss 3.9086 LearningRate 0.0131 Epoch: 12 Global Step: 64530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:49,539-Speed 3392.93 samples/sec Loss 4.0050 LearningRate 0.0131 Epoch: 12 Global Step: 64540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:52,631-Speed 3311.88 samples/sec Loss 3.8266 LearningRate 0.0131 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:55,635-Speed 3409.38 samples/sec Loss 3.9423 LearningRate 0.0131 Epoch: 12 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:53:58,642-Speed 3406.46 samples/sec Loss 4.0024 LearningRate 0.0131 Epoch: 12 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:01,646-Speed 3409.90 samples/sec Loss 3.9424 LearningRate 0.0131 Epoch: 12 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:04,655-Speed 3404.62 samples/sec Loss 3.9463 LearningRate 0.0131 Epoch: 12 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:07,656-Speed 3413.18 samples/sec Loss 3.9240 LearningRate 0.0131 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:10,652-Speed 3418.23 samples/sec Loss 3.9768 LearningRate 0.0131 Epoch: 12 Global Step: 64610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:13,658-Speed 3407.07 samples/sec Loss 3.8963 LearningRate 0.0130 Epoch: 12 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:16,659-Speed 3412.88 samples/sec Loss 3.9267 LearningRate 0.0130 Epoch: 12 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:19,662-Speed 3413.36 samples/sec Loss 3.9774 LearningRate 0.0130 Epoch: 12 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:22,671-Speed 3403.87 samples/sec Loss 3.8473 LearningRate 0.0130 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:25,663-Speed 3422.85 samples/sec Loss 3.9495 LearningRate 0.0130 Epoch: 12 Global Step: 64660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:28,738-Speed 3332.06 samples/sec Loss 3.9436 LearningRate 0.0130 Epoch: 12 Global Step: 64670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:31,741-Speed 3410.51 samples/sec Loss 3.9788 LearningRate 0.0130 Epoch: 12 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:34,743-Speed 3412.45 samples/sec Loss 3.8203 LearningRate 0.0130 Epoch: 12 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:37,767-Speed 3387.07 samples/sec Loss 3.8413 LearningRate 0.0130 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:40,770-Speed 3409.94 samples/sec Loss 3.8659 LearningRate 0.0130 Epoch: 12 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:43,774-Speed 3410.06 samples/sec Loss 3.9893 LearningRate 0.0130 Epoch: 12 Global Step: 64720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:46,774-Speed 3414.38 samples/sec Loss 4.0429 LearningRate 0.0130 Epoch: 12 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:49,917-Speed 3258.91 samples/sec Loss 3.9917 LearningRate 0.0130 Epoch: 12 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:52,947-Speed 3379.41 samples/sec Loss 3.9693 LearningRate 0.0130 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:54:55,948-Speed 3413.56 samples/sec Loss 3.9429 LearningRate 0.0129 Epoch: 12 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:54:58,935-Speed 3429.83 samples/sec Loss 3.8662 LearningRate 0.0129 Epoch: 12 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:01,998-Speed 3343.22 samples/sec Loss 3.9226 LearningRate 0.0129 Epoch: 12 Global Step: 64780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:05,031-Speed 3377.13 samples/sec Loss 3.9818 LearningRate 0.0129 Epoch: 12 Global Step: 64790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:08,062-Speed 3379.04 samples/sec Loss 3.8745 LearningRate 0.0129 Epoch: 12 Global Step: 64800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:11,062-Speed 3415.10 samples/sec Loss 4.1030 LearningRate 0.0129 Epoch: 12 Global Step: 64810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:14,064-Speed 3411.56 samples/sec Loss 4.0259 LearningRate 0.0129 Epoch: 12 Global Step: 64820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:17,078-Speed 3398.05 samples/sec Loss 3.8765 LearningRate 0.0129 Epoch: 12 Global Step: 64830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:20,079-Speed 3413.12 samples/sec Loss 3.9308 LearningRate 0.0129 Epoch: 12 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:23,084-Speed 3408.69 samples/sec Loss 3.9281 LearningRate 0.0129 Epoch: 12 Global Step: 64850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:26,086-Speed 3411.75 samples/sec Loss 3.8125 LearningRate 0.0129 Epoch: 12 Global Step: 64860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:29,094-Speed 3405.45 samples/sec Loss 3.8414 LearningRate 0.0129 Epoch: 12 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:32,094-Speed 3413.88 samples/sec Loss 3.8855 LearningRate 0.0129 Epoch: 12 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:35,095-Speed 3413.44 samples/sec Loss 4.0092 LearningRate 0.0129 Epoch: 12 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:38,097-Speed 3411.25 samples/sec Loss 4.0407 LearningRate 0.0128 Epoch: 12 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:41,107-Speed 3402.67 samples/sec Loss 3.9402 LearningRate 0.0128 Epoch: 12 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:44,111-Speed 3410.12 samples/sec Loss 3.8049 LearningRate 0.0128 Epoch: 12 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:47,111-Speed 3413.92 samples/sec Loss 4.0509 LearningRate 0.0128 Epoch: 12 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:50,114-Speed 3411.16 samples/sec Loss 3.8097 LearningRate 0.0128 Epoch: 12 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:53,132-Speed 3393.64 samples/sec Loss 4.0106 LearningRate 0.0128 Epoch: 12 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:55:56,118-Speed 3430.77 samples/sec Loss 3.8918 LearningRate 0.0128 Epoch: 12 Global Step: 64960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:55:59,118-Speed 3413.86 samples/sec Loss 3.9805 LearningRate 0.0128 Epoch: 12 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:02,122-Speed 3409.25 samples/sec Loss 3.9607 LearningRate 0.0128 Epoch: 12 Global Step: 64980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:05,131-Speed 3404.20 samples/sec Loss 3.7986 LearningRate 0.0128 Epoch: 12 Global Step: 64990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:08,143-Speed 3401.39 samples/sec Loss 3.8849 LearningRate 0.0128 Epoch: 12 Global Step: 65000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:11,164-Speed 3390.34 samples/sec Loss 3.9974 LearningRate 0.0128 Epoch: 12 Global Step: 65010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:14,170-Speed 3407.31 samples/sec Loss 3.9092 LearningRate 0.0128 Epoch: 12 Global Step: 65020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:17,188-Speed 3393.74 samples/sec Loss 4.0186 LearningRate 0.0128 Epoch: 12 Global Step: 65030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:20,191-Speed 3411.15 samples/sec Loss 3.9208 LearningRate 0.0127 Epoch: 12 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:23,192-Speed 3412.72 samples/sec Loss 3.9788 LearningRate 0.0127 Epoch: 12 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:26,187-Speed 3419.99 samples/sec Loss 3.9914 LearningRate 0.0127 Epoch: 12 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:29,341-Speed 3248.52 samples/sec Loss 3.8893 LearningRate 0.0127 Epoch: 12 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:32,412-Speed 3335.27 samples/sec Loss 3.9217 LearningRate 0.0127 Epoch: 12 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:35,421-Speed 3403.90 samples/sec Loss 3.8362 LearningRate 0.0127 Epoch: 12 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:38,420-Speed 3415.55 samples/sec Loss 3.8255 LearningRate 0.0127 Epoch: 12 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:41,432-Speed 3399.88 samples/sec Loss 3.7621 LearningRate 0.0127 Epoch: 12 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:44,446-Speed 3398.84 samples/sec Loss 3.8886 LearningRate 0.0127 Epoch: 12 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:47,457-Speed 3401.68 samples/sec Loss 3.9307 LearningRate 0.0127 Epoch: 12 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:50,483-Speed 3385.05 samples/sec Loss 3.8713 LearningRate 0.0127 Epoch: 12 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:53,488-Speed 3408.26 samples/sec Loss 3.9582 LearningRate 0.0127 Epoch: 12 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:56:56,492-Speed 3409.62 samples/sec Loss 3.9877 LearningRate 0.0127 Epoch: 12 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:56:59,496-Speed 3410.30 samples/sec Loss 3.8082 LearningRate 0.0127 Epoch: 12 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:02,522-Speed 3384.08 samples/sec Loss 3.8598 LearningRate 0.0127 Epoch: 12 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:05,575-Speed 3355.62 samples/sec Loss 3.9169 LearningRate 0.0126 Epoch: 12 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:08,576-Speed 3412.27 samples/sec Loss 3.9725 LearningRate 0.0126 Epoch: 12 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:11,617-Speed 3367.98 samples/sec Loss 4.0219 LearningRate 0.0126 Epoch: 12 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:14,648-Speed 3380.56 samples/sec Loss 3.9973 LearningRate 0.0126 Epoch: 12 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:17,666-Speed 3393.73 samples/sec Loss 3.8419 LearningRate 0.0126 Epoch: 12 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:20,694-Speed 3382.01 samples/sec Loss 3.8154 LearningRate 0.0126 Epoch: 12 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:23,682-Speed 3428.03 samples/sec Loss 3.7737 LearningRate 0.0126 Epoch: 12 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:26,693-Speed 3401.50 samples/sec Loss 3.8448 LearningRate 0.0126 Epoch: 12 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:29,710-Speed 3394.88 samples/sec Loss 3.9300 LearningRate 0.0126 Epoch: 12 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:32,726-Speed 3396.78 samples/sec Loss 4.0403 LearningRate 0.0126 Epoch: 12 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:35,727-Speed 3412.97 samples/sec Loss 3.9559 LearningRate 0.0126 Epoch: 12 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:38,744-Speed 3395.03 samples/sec Loss 3.9358 LearningRate 0.0126 Epoch: 12 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:41,751-Speed 3405.97 samples/sec Loss 3.9728 LearningRate 0.0126 Epoch: 12 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:44,756-Speed 3409.19 samples/sec Loss 3.9302 LearningRate 0.0126 Epoch: 12 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:47,764-Speed 3404.59 samples/sec Loss 3.9985 LearningRate 0.0125 Epoch: 12 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:50,868-Speed 3299.70 samples/sec Loss 3.9106 LearningRate 0.0125 Epoch: 12 Global Step: 65340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:57:53,874-Speed 3407.12 samples/sec Loss 3.8860 LearningRate 0.0125 Epoch: 12 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:56,880-Speed 3408.09 samples/sec Loss 3.8417 LearningRate 0.0125 Epoch: 12 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:57:59,884-Speed 3409.35 samples/sec Loss 4.0387 LearningRate 0.0125 Epoch: 12 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:58:02,870-Speed 3430.07 samples/sec Loss 3.8011 LearningRate 0.0125 Epoch: 12 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:05,885-Speed 3397.26 samples/sec Loss 3.8571 LearningRate 0.0125 Epoch: 12 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:08,885-Speed 3414.07 samples/sec Loss 3.8503 LearningRate 0.0125 Epoch: 12 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:11,902-Speed 3394.82 samples/sec Loss 3.7738 LearningRate 0.0125 Epoch: 12 Global Step: 65410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:15,067-Speed 3236.85 samples/sec Loss 3.9131 LearningRate 0.0125 Epoch: 12 Global Step: 65420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:18,072-Speed 3408.67 samples/sec Loss 3.8025 LearningRate 0.0125 Epoch: 12 Global Step: 65430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:21,075-Speed 3409.81 samples/sec Loss 3.7407 LearningRate 0.0125 Epoch: 12 Global Step: 65440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:24,093-Speed 3394.71 samples/sec Loss 3.8333 LearningRate 0.0125 Epoch: 12 Global Step: 65450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:27,145-Speed 3355.84 samples/sec Loss 3.7949 LearningRate 0.0125 Epoch: 12 Global Step: 65460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:30,156-Speed 3403.11 samples/sec Loss 3.9143 LearningRate 0.0124 Epoch: 12 Global Step: 65470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:33,190-Speed 3375.92 samples/sec Loss 3.9540 LearningRate 0.0124 Epoch: 12 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:58:36,173-Speed 3433.59 samples/sec Loss 4.0330 LearningRate 0.0124 Epoch: 12 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:39,184-Speed 3402.41 samples/sec Loss 3.7686 LearningRate 0.0124 Epoch: 12 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:42,188-Speed 3408.50 samples/sec Loss 3.9160 LearningRate 0.0124 Epoch: 12 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:45,221-Speed 3377.24 samples/sec Loss 3.9038 LearningRate 0.0124 Epoch: 12 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:48,247-Speed 3384.94 samples/sec Loss 3.8229 LearningRate 0.0124 Epoch: 12 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:51,275-Speed 3382.34 samples/sec Loss 4.0176 LearningRate 0.0124 Epoch: 12 Global Step: 65540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:54,281-Speed 3407.59 samples/sec Loss 3.8087 LearningRate 0.0124 Epoch: 12 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:58:57,285-Speed 3409.35 samples/sec Loss 3.9038 LearningRate 0.0124 Epoch: 12 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:00,295-Speed 3404.13 samples/sec Loss 3.7572 LearningRate 0.0124 Epoch: 12 Global Step: 65570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:03,340-Speed 3363.19 samples/sec Loss 3.8908 LearningRate 0.0124 Epoch: 12 Global Step: 65580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:06,343-Speed 3410.52 samples/sec Loss 3.9039 LearningRate 0.0124 Epoch: 12 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:09,327-Speed 3433.56 samples/sec Loss 3.8758 LearningRate 0.0124 Epoch: 12 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:12,328-Speed 3412.01 samples/sec Loss 3.8193 LearningRate 0.0123 Epoch: 12 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:15,328-Speed 3414.15 samples/sec Loss 3.8549 LearningRate 0.0123 Epoch: 12 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:18,335-Speed 3406.88 samples/sec Loss 3.9392 LearningRate 0.0123 Epoch: 12 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:21,336-Speed 3412.05 samples/sec Loss 3.9269 LearningRate 0.0123 Epoch: 12 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:24,342-Speed 3407.88 samples/sec Loss 3.9068 LearningRate 0.0123 Epoch: 12 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:27,350-Speed 3404.65 samples/sec Loss 3.7860 LearningRate 0.0123 Epoch: 12 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:30,374-Speed 3387.74 samples/sec Loss 3.8953 LearningRate 0.0123 Epoch: 12 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:33,376-Speed 3411.88 samples/sec Loss 4.0013 LearningRate 0.0123 Epoch: 12 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:36,381-Speed 3408.46 samples/sec Loss 3.7870 LearningRate 0.0123 Epoch: 12 Global Step: 65690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 05:59:39,386-Speed 3408.81 samples/sec Loss 3.8762 LearningRate 0.0123 Epoch: 12 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:42,405-Speed 3393.00 samples/sec Loss 3.8091 LearningRate 0.0123 Epoch: 12 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:45,407-Speed 3411.29 samples/sec Loss 3.7928 LearningRate 0.0123 Epoch: 12 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:48,416-Speed 3404.15 samples/sec Loss 4.0441 LearningRate 0.0123 Epoch: 12 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:51,437-Speed 3390.70 samples/sec Loss 3.8847 LearningRate 0.0123 Epoch: 12 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 05:59:54,532-Speed 3309.25 samples/sec Loss 3.9555 LearningRate 0.0123 Epoch: 12 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:07,092-Speed 815.35 samples/sec Loss 3.4283 LearningRate 0.0122 Epoch: 13 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:10,106-Speed 3398.93 samples/sec Loss 3.0649 LearningRate 0.0122 Epoch: 13 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:13,142-Speed 3372.89 samples/sec Loss 3.1461 LearningRate 0.0122 Epoch: 13 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:16,146-Speed 3410.25 samples/sec Loss 2.9722 LearningRate 0.0122 Epoch: 13 Global Step: 65790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:19,152-Speed 3406.97 samples/sec Loss 3.0527 LearningRate 0.0122 Epoch: 13 Global Step: 65800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:22,159-Speed 3406.67 samples/sec Loss 3.0370 LearningRate 0.0122 Epoch: 13 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:25,190-Speed 3379.40 samples/sec Loss 3.1143 LearningRate 0.0122 Epoch: 13 Global Step: 65820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:28,269-Speed 3325.88 samples/sec Loss 2.9776 LearningRate 0.0122 Epoch: 13 Global Step: 65830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:31,518-Speed 3153.10 samples/sec Loss 3.0850 LearningRate 0.0122 Epoch: 13 Global Step: 65840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:34,547-Speed 3381.06 samples/sec Loss 3.0776 LearningRate 0.0122 Epoch: 13 Global Step: 65850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:37,578-Speed 3379.60 samples/sec Loss 3.1088 LearningRate 0.0122 Epoch: 13 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:40,639-Speed 3345.82 samples/sec Loss 3.1222 LearningRate 0.0122 Epoch: 13 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:43,660-Speed 3390.56 samples/sec Loss 3.0332 LearningRate 0.0122 Epoch: 13 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:00:46,667-Speed 3406.30 samples/sec Loss 3.0980 LearningRate 0.0122 Epoch: 13 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:49,683-Speed 3396.12 samples/sec Loss 3.1500 LearningRate 0.0121 Epoch: 13 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:52,777-Speed 3310.87 samples/sec Loss 3.1731 LearningRate 0.0121 Epoch: 13 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:55,800-Speed 3388.49 samples/sec Loss 3.0360 LearningRate 0.0121 Epoch: 13 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:00:58,819-Speed 3392.04 samples/sec Loss 3.2163 LearningRate 0.0121 Epoch: 13 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:01,977-Speed 3244.01 samples/sec Loss 3.0755 LearningRate 0.0121 Epoch: 13 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:05,063-Speed 3319.29 samples/sec Loss 3.1848 LearningRate 0.0121 Epoch: 13 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:08,076-Speed 3398.36 samples/sec Loss 3.1903 LearningRate 0.0121 Epoch: 13 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:11,090-Speed 3399.56 samples/sec Loss 3.0516 LearningRate 0.0121 Epoch: 13 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:14,103-Speed 3398.51 samples/sec Loss 3.2382 LearningRate 0.0121 Epoch: 13 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:17,105-Speed 3412.16 samples/sec Loss 3.1762 LearningRate 0.0121 Epoch: 13 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:01:20,122-Speed 3395.17 samples/sec Loss 3.1396 LearningRate 0.0121 Epoch: 13 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:02:04,101-[lfw][66000]XNorm: 22.737340 Training: 2022-04-11 06:02:04,101-[lfw][66000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-04-11 06:02:04,102-[lfw][66000]Accuracy-Highest: 0.99850 Training: 2022-04-11 06:02:55,646-[cfp_fp][66000]XNorm: 21.792470 Training: 2022-04-11 06:02:55,646-[cfp_fp][66000]Accuracy-Flip: 0.98386+-0.00503 Training: 2022-04-11 06:02:55,647-[cfp_fp][66000]Accuracy-Highest: 0.98386 Training: 2022-04-11 06:03:39,633-[agedb_30][66000]XNorm: 23.119056 Training: 2022-04-11 06:03:39,634-[agedb_30][66000]Accuracy-Flip: 0.98317+-0.00677 Training: 2022-04-11 06:03:39,634-[agedb_30][66000]Accuracy-Highest: 0.98317 Training: 2022-04-11 06:03:42,653-Speed 71.84 samples/sec Loss 3.1056 LearningRate 0.0121 Epoch: 13 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:03:45,665-Speed 3401.33 samples/sec Loss 3.1219 LearningRate 0.0121 Epoch: 13 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:03:48,672-Speed 3405.65 samples/sec Loss 3.1336 LearningRate 0.0121 Epoch: 13 Global Step: 66030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:03:51,714-Speed 3367.23 samples/sec Loss 3.1528 LearningRate 0.0121 Epoch: 13 Global Step: 66040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:03:54,697-Speed 3433.47 samples/sec Loss 3.2243 LearningRate 0.0120 Epoch: 13 Global Step: 66050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:03:57,689-Speed 3423.48 samples/sec Loss 3.1341 LearningRate 0.0120 Epoch: 13 Global Step: 66060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:00,678-Speed 3426.38 samples/sec Loss 2.9970 LearningRate 0.0120 Epoch: 13 Global Step: 66070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:03,681-Speed 3411.31 samples/sec Loss 3.2161 LearningRate 0.0120 Epoch: 13 Global Step: 66080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:06,679-Speed 3416.15 samples/sec Loss 3.0741 LearningRate 0.0120 Epoch: 13 Global Step: 66090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:09,678-Speed 3415.94 samples/sec Loss 3.2473 LearningRate 0.0120 Epoch: 13 Global Step: 66100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:12,709-Speed 3378.90 samples/sec Loss 3.1557 LearningRate 0.0120 Epoch: 13 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:15,702-Speed 3422.35 samples/sec Loss 3.3548 LearningRate 0.0120 Epoch: 13 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:04:18,712-Speed 3403.30 samples/sec Loss 3.1934 LearningRate 0.0120 Epoch: 13 Global Step: 66130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:21,713-Speed 3413.23 samples/sec Loss 3.1966 LearningRate 0.0120 Epoch: 13 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:24,753-Speed 3369.01 samples/sec Loss 3.1430 LearningRate 0.0120 Epoch: 13 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:27,888-Speed 3266.97 samples/sec Loss 3.1748 LearningRate 0.0120 Epoch: 13 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:30,900-Speed 3400.28 samples/sec Loss 3.2452 LearningRate 0.0120 Epoch: 13 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:33,910-Speed 3403.25 samples/sec Loss 3.1407 LearningRate 0.0120 Epoch: 13 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:37,034-Speed 3278.90 samples/sec Loss 3.1868 LearningRate 0.0120 Epoch: 13 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:40,039-Speed 3408.04 samples/sec Loss 3.2852 LearningRate 0.0119 Epoch: 13 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:43,057-Speed 3393.98 samples/sec Loss 3.2327 LearningRate 0.0119 Epoch: 13 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:46,079-Speed 3389.38 samples/sec Loss 3.1051 LearningRate 0.0119 Epoch: 13 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:49,092-Speed 3399.31 samples/sec Loss 3.2373 LearningRate 0.0119 Epoch: 13 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:52,094-Speed 3412.13 samples/sec Loss 3.1890 LearningRate 0.0119 Epoch: 13 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:55,149-Speed 3352.84 samples/sec Loss 3.1810 LearningRate 0.0119 Epoch: 13 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:04:58,160-Speed 3400.65 samples/sec Loss 3.1655 LearningRate 0.0119 Epoch: 13 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:01,188-Speed 3383.65 samples/sec Loss 3.3551 LearningRate 0.0119 Epoch: 13 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:04,192-Speed 3409.59 samples/sec Loss 3.2920 LearningRate 0.0119 Epoch: 13 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:07,207-Speed 3397.35 samples/sec Loss 3.1971 LearningRate 0.0119 Epoch: 13 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:10,216-Speed 3404.25 samples/sec Loss 3.2080 LearningRate 0.0119 Epoch: 13 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:13,201-Speed 3431.16 samples/sec Loss 3.3073 LearningRate 0.0119 Epoch: 13 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:16,216-Speed 3397.13 samples/sec Loss 3.2149 LearningRate 0.0119 Epoch: 13 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:19,252-Speed 3373.91 samples/sec Loss 3.3048 LearningRate 0.0119 Epoch: 13 Global Step: 66330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:22,259-Speed 3406.15 samples/sec Loss 3.1872 LearningRate 0.0118 Epoch: 13 Global Step: 66340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:25,328-Speed 3336.97 samples/sec Loss 3.1872 LearningRate 0.0118 Epoch: 13 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:28,346-Speed 3394.45 samples/sec Loss 3.3036 LearningRate 0.0118 Epoch: 13 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:31,371-Speed 3387.19 samples/sec Loss 3.2995 LearningRate 0.0118 Epoch: 13 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:34,367-Speed 3418.45 samples/sec Loss 3.3356 LearningRate 0.0118 Epoch: 13 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:37,469-Speed 3302.08 samples/sec Loss 3.3528 LearningRate 0.0118 Epoch: 13 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:40,530-Speed 3346.54 samples/sec Loss 3.3228 LearningRate 0.0118 Epoch: 13 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:05:43,546-Speed 3395.03 samples/sec Loss 3.2435 LearningRate 0.0118 Epoch: 13 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:46,547-Speed 3413.89 samples/sec Loss 3.3007 LearningRate 0.0118 Epoch: 13 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:49,560-Speed 3398.35 samples/sec Loss 3.1444 LearningRate 0.0118 Epoch: 13 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:52,564-Speed 3410.10 samples/sec Loss 3.3043 LearningRate 0.0118 Epoch: 13 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:55,564-Speed 3413.69 samples/sec Loss 3.2533 LearningRate 0.0118 Epoch: 13 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:05:58,577-Speed 3400.34 samples/sec Loss 3.3193 LearningRate 0.0118 Epoch: 13 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:01,628-Speed 3357.48 samples/sec Loss 3.2539 LearningRate 0.0118 Epoch: 13 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:04,691-Speed 3344.35 samples/sec Loss 3.3169 LearningRate 0.0118 Epoch: 13 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:07,703-Speed 3399.79 samples/sec Loss 3.2813 LearningRate 0.0117 Epoch: 13 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:10,727-Speed 3387.48 samples/sec Loss 3.2131 LearningRate 0.0117 Epoch: 13 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:13,717-Speed 3425.43 samples/sec Loss 3.3036 LearningRate 0.0117 Epoch: 13 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:16,723-Speed 3407.71 samples/sec Loss 3.3434 LearningRate 0.0117 Epoch: 13 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:19,721-Speed 3415.90 samples/sec Loss 3.2234 LearningRate 0.0117 Epoch: 13 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:22,720-Speed 3415.30 samples/sec Loss 3.2096 LearningRate 0.0117 Epoch: 13 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:25,767-Speed 3361.65 samples/sec Loss 3.2690 LearningRate 0.0117 Epoch: 13 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:28,763-Speed 3418.61 samples/sec Loss 3.2815 LearningRate 0.0117 Epoch: 13 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:31,763-Speed 3414.93 samples/sec Loss 3.3278 LearningRate 0.0117 Epoch: 13 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:34,761-Speed 3415.92 samples/sec Loss 3.2908 LearningRate 0.0117 Epoch: 13 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:37,776-Speed 3397.23 samples/sec Loss 3.2470 LearningRate 0.0117 Epoch: 13 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:40,774-Speed 3416.08 samples/sec Loss 3.4171 LearningRate 0.0117 Epoch: 13 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:43,751-Speed 3440.79 samples/sec Loss 3.4514 LearningRate 0.0117 Epoch: 13 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:46,754-Speed 3411.21 samples/sec Loss 3.3741 LearningRate 0.0117 Epoch: 13 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:49,800-Speed 3362.44 samples/sec Loss 3.3453 LearningRate 0.0117 Epoch: 13 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:52,805-Speed 3408.44 samples/sec Loss 3.4533 LearningRate 0.0116 Epoch: 13 Global Step: 66640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:55,804-Speed 3415.50 samples/sec Loss 3.2885 LearningRate 0.0116 Epoch: 13 Global Step: 66650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:06:58,849-Speed 3363.58 samples/sec Loss 3.3049 LearningRate 0.0116 Epoch: 13 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:01,855-Speed 3408.07 samples/sec Loss 3.1475 LearningRate 0.0116 Epoch: 13 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:04,861-Speed 3406.93 samples/sec Loss 3.2636 LearningRate 0.0116 Epoch: 13 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:07,870-Speed 3404.49 samples/sec Loss 3.3108 LearningRate 0.0116 Epoch: 13 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:10,877-Speed 3405.87 samples/sec Loss 3.2780 LearningRate 0.0116 Epoch: 13 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:13,879-Speed 3411.33 samples/sec Loss 3.4563 LearningRate 0.0116 Epoch: 13 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 06:07:16,872-Speed 3422.85 samples/sec Loss 3.3771 LearningRate 0.0116 Epoch: 13 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:19,883-Speed 3401.68 samples/sec Loss 3.3260 LearningRate 0.0116 Epoch: 13 Global Step: 66730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:22,888-Speed 3408.67 samples/sec Loss 3.3029 LearningRate 0.0116 Epoch: 13 Global Step: 66740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:25,895-Speed 3405.92 samples/sec Loss 3.3103 LearningRate 0.0116 Epoch: 13 Global Step: 66750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:28,900-Speed 3408.42 samples/sec Loss 3.3359 LearningRate 0.0116 Epoch: 13 Global Step: 66760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:31,902-Speed 3412.57 samples/sec Loss 3.3019 LearningRate 0.0116 Epoch: 13 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:34,985-Speed 3322.07 samples/sec Loss 3.2611 LearningRate 0.0116 Epoch: 13 Global Step: 66780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:38,002-Speed 3395.20 samples/sec Loss 3.3780 LearningRate 0.0115 Epoch: 13 Global Step: 66790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:41,176-Speed 3227.02 samples/sec Loss 3.3457 LearningRate 0.0115 Epoch: 13 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:44,294-Speed 3284.04 samples/sec Loss 3.3920 LearningRate 0.0115 Epoch: 13 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:47,294-Speed 3414.99 samples/sec Loss 3.4095 LearningRate 0.0115 Epoch: 13 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:07:50,340-Speed 3362.15 samples/sec Loss 3.3164 LearningRate 0.0115 Epoch: 13 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:53,360-Speed 3391.71 samples/sec Loss 3.3692 LearningRate 0.0115 Epoch: 13 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:56,368-Speed 3405.25 samples/sec Loss 3.4462 LearningRate 0.0115 Epoch: 13 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:07:59,370-Speed 3411.61 samples/sec Loss 3.3892 LearningRate 0.0115 Epoch: 13 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:02,410-Speed 3370.16 samples/sec Loss 3.4609 LearningRate 0.0115 Epoch: 13 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:05,414-Speed 3409.60 samples/sec Loss 3.3078 LearningRate 0.0115 Epoch: 13 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:08,412-Speed 3415.40 samples/sec Loss 3.3558 LearningRate 0.0115 Epoch: 13 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:11,431-Speed 3393.50 samples/sec Loss 3.3197 LearningRate 0.0115 Epoch: 13 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:14,433-Speed 3411.06 samples/sec Loss 3.2149 LearningRate 0.0115 Epoch: 13 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:17,450-Speed 3395.92 samples/sec Loss 3.4654 LearningRate 0.0115 Epoch: 13 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:20,449-Speed 3414.95 samples/sec Loss 3.3755 LearningRate 0.0114 Epoch: 13 Global Step: 66930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 06:08:23,431-Speed 3434.35 samples/sec Loss 3.4891 LearningRate 0.0114 Epoch: 13 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:26,447-Speed 3396.94 samples/sec Loss 3.3476 LearningRate 0.0114 Epoch: 13 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:29,452-Speed 3408.14 samples/sec Loss 3.3487 LearningRate 0.0114 Epoch: 13 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:32,477-Speed 3385.74 samples/sec Loss 3.2795 LearningRate 0.0114 Epoch: 13 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:35,491-Speed 3398.82 samples/sec Loss 3.4362 LearningRate 0.0114 Epoch: 13 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:38,501-Speed 3402.68 samples/sec Loss 3.4542 LearningRate 0.0114 Epoch: 13 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:41,504-Speed 3410.05 samples/sec Loss 3.3175 LearningRate 0.0114 Epoch: 13 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:44,506-Speed 3412.68 samples/sec Loss 3.3207 LearningRate 0.0114 Epoch: 13 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:08:47,495-Speed 3426.46 samples/sec Loss 3.3979 LearningRate 0.0114 Epoch: 13 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:08:50,514-Speed 3393.01 samples/sec Loss 3.3367 LearningRate 0.0114 Epoch: 13 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:08:53,564-Speed 3357.26 samples/sec Loss 3.3253 LearningRate 0.0114 Epoch: 13 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:08:56,580-Speed 3396.83 samples/sec Loss 3.4028 LearningRate 0.0114 Epoch: 13 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:08:59,596-Speed 3395.70 samples/sec Loss 3.3582 LearningRate 0.0114 Epoch: 13 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:02,601-Speed 3408.89 samples/sec Loss 3.3966 LearningRate 0.0114 Epoch: 13 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:05,607-Speed 3407.91 samples/sec Loss 3.3909 LearningRate 0.0113 Epoch: 13 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:08,616-Speed 3403.77 samples/sec Loss 3.5222 LearningRate 0.0113 Epoch: 13 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:11,617-Speed 3413.31 samples/sec Loss 3.4249 LearningRate 0.0113 Epoch: 13 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:14,613-Speed 3417.81 samples/sec Loss 3.3972 LearningRate 0.0113 Epoch: 13 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 06:09:17,635-Speed 3389.80 samples/sec Loss 3.3231 LearningRate 0.0113 Epoch: 13 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:09:20,645-Speed 3403.17 samples/sec Loss 3.3539 LearningRate 0.0113 Epoch: 13 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:09:23,643-Speed 3415.48 samples/sec Loss 3.2996 LearningRate 0.0113 Epoch: 13 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:09:26,708-Speed 3342.04 samples/sec Loss 3.3668 LearningRate 0.0113 Epoch: 13 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 06:09:29,724-Speed 3396.60 samples/sec Loss 3.3769 LearningRate 0.0113 Epoch: 13 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:09:32,725-Speed 3412.83 samples/sec Loss 3.4521 LearningRate 0.0113 Epoch: 13 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:09:35,731-Speed 3407.48 samples/sec Loss 3.2845 LearningRate 0.0113 Epoch: 13 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:09:38,736-Speed 3408.32 samples/sec Loss 3.4847 LearningRate 0.0113 Epoch: 13 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:09:41,737-Speed 3412.97 samples/sec Loss 3.4117 LearningRate 0.0113 Epoch: 13 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:09:44,721-Speed 3433.07 samples/sec Loss 3.4648 LearningRate 0.0113 Epoch: 13 Global Step: 67210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:09:47,756-Speed 3374.34 samples/sec Loss 3.3908 LearningRate 0.0113 Epoch: 13 Global Step: 67220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:09:50,828-Speed 3334.51 samples/sec Loss 3.3294 LearningRate 0.0112 Epoch: 13 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:09:53,828-Speed 3413.51 samples/sec Loss 3.4712 LearningRate 0.0112 Epoch: 13 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:09:56,832-Speed 3410.52 samples/sec Loss 3.3132 LearningRate 0.0112 Epoch: 13 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:09:59,859-Speed 3383.40 samples/sec Loss 3.3348 LearningRate 0.0112 Epoch: 13 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:02,879-Speed 3392.18 samples/sec Loss 3.4215 LearningRate 0.0112 Epoch: 13 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:05,991-Speed 3291.00 samples/sec Loss 3.3604 LearningRate 0.0112 Epoch: 13 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:09,025-Speed 3376.18 samples/sec Loss 3.2591 LearningRate 0.0112 Epoch: 13 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:12,042-Speed 3395.33 samples/sec Loss 3.3414 LearningRate 0.0112 Epoch: 13 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:15,062-Speed 3390.53 samples/sec Loss 3.5142 LearningRate 0.0112 Epoch: 13 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:10:18,055-Speed 3422.39 samples/sec Loss 3.4329 LearningRate 0.0112 Epoch: 13 Global Step: 67320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:21,060-Speed 3408.88 samples/sec Loss 3.5412 LearningRate 0.0112 Epoch: 13 Global Step: 67330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:24,089-Speed 3381.90 samples/sec Loss 3.4405 LearningRate 0.0112 Epoch: 13 Global Step: 67340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:27,105-Speed 3395.43 samples/sec Loss 3.4319 LearningRate 0.0112 Epoch: 13 Global Step: 67350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:30,113-Speed 3405.97 samples/sec Loss 3.3538 LearningRate 0.0112 Epoch: 13 Global Step: 67360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:33,118-Speed 3407.77 samples/sec Loss 3.5216 LearningRate 0.0112 Epoch: 13 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:36,125-Speed 3406.52 samples/sec Loss 3.4123 LearningRate 0.0112 Epoch: 13 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:39,130-Speed 3408.02 samples/sec Loss 3.4326 LearningRate 0.0111 Epoch: 13 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:42,133-Speed 3411.54 samples/sec Loss 3.4658 LearningRate 0.0111 Epoch: 13 Global Step: 67400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:45,145-Speed 3399.87 samples/sec Loss 3.4205 LearningRate 0.0111 Epoch: 13 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:10:48,147-Speed 3411.70 samples/sec Loss 3.4354 LearningRate 0.0111 Epoch: 13 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:10:51,152-Speed 3408.93 samples/sec Loss 3.4131 LearningRate 0.0111 Epoch: 13 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:10:54,153-Speed 3412.70 samples/sec Loss 3.3534 LearningRate 0.0111 Epoch: 13 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:10:57,156-Speed 3411.37 samples/sec Loss 3.4686 LearningRate 0.0111 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:00,164-Speed 3405.54 samples/sec Loss 3.3796 LearningRate 0.0111 Epoch: 13 Global Step: 67460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:03,153-Speed 3426.68 samples/sec Loss 3.5727 LearningRate 0.0111 Epoch: 13 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:06,157-Speed 3409.45 samples/sec Loss 3.4242 LearningRate 0.0111 Epoch: 13 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:09,165-Speed 3404.47 samples/sec Loss 3.3825 LearningRate 0.0111 Epoch: 13 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:12,168-Speed 3410.86 samples/sec Loss 3.5662 LearningRate 0.0111 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:15,175-Speed 3406.86 samples/sec Loss 3.3459 LearningRate 0.0111 Epoch: 13 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:18,187-Speed 3400.68 samples/sec Loss 3.4980 LearningRate 0.0111 Epoch: 13 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:21,187-Speed 3413.61 samples/sec Loss 3.4032 LearningRate 0.0111 Epoch: 13 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:24,190-Speed 3410.68 samples/sec Loss 3.3705 LearningRate 0.0110 Epoch: 13 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:27,205-Speed 3398.12 samples/sec Loss 3.3549 LearningRate 0.0110 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:30,210-Speed 3407.81 samples/sec Loss 3.5523 LearningRate 0.0110 Epoch: 13 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:11:33,210-Speed 3414.48 samples/sec Loss 3.4183 LearningRate 0.0110 Epoch: 13 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:36,210-Speed 3414.48 samples/sec Loss 3.4265 LearningRate 0.0110 Epoch: 13 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:39,229-Speed 3391.56 samples/sec Loss 3.3879 LearningRate 0.0110 Epoch: 13 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:42,243-Speed 3398.49 samples/sec Loss 3.5470 LearningRate 0.0110 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:45,246-Speed 3411.13 samples/sec Loss 3.4450 LearningRate 0.0110 Epoch: 13 Global Step: 67610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:48,250-Speed 3409.51 samples/sec Loss 3.3831 LearningRate 0.0110 Epoch: 13 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:51,256-Speed 3408.22 samples/sec Loss 3.3946 LearningRate 0.0110 Epoch: 13 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:54,257-Speed 3412.25 samples/sec Loss 3.5376 LearningRate 0.0110 Epoch: 13 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:11:57,256-Speed 3416.13 samples/sec Loss 3.5718 LearningRate 0.0110 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:12:00,278-Speed 3388.69 samples/sec Loss 3.4813 LearningRate 0.0110 Epoch: 13 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:03,327-Speed 3359.96 samples/sec Loss 3.3552 LearningRate 0.0110 Epoch: 13 Global Step: 67670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:06,371-Speed 3364.59 samples/sec Loss 3.4897 LearningRate 0.0110 Epoch: 13 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:09,378-Speed 3406.03 samples/sec Loss 3.4827 LearningRate 0.0109 Epoch: 13 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:12,400-Speed 3389.48 samples/sec Loss 3.5704 LearningRate 0.0109 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:15,412-Speed 3400.38 samples/sec Loss 3.5089 LearningRate 0.0109 Epoch: 13 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:18,426-Speed 3398.82 samples/sec Loss 3.4249 LearningRate 0.0109 Epoch: 13 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:21,430-Speed 3409.60 samples/sec Loss 3.4303 LearningRate 0.0109 Epoch: 13 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:24,434-Speed 3409.82 samples/sec Loss 3.5960 LearningRate 0.0109 Epoch: 13 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:27,450-Speed 3396.26 samples/sec Loss 3.4765 LearningRate 0.0109 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:30,459-Speed 3403.39 samples/sec Loss 3.3333 LearningRate 0.0109 Epoch: 13 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:12:33,462-Speed 3410.56 samples/sec Loss 3.4668 LearningRate 0.0109 Epoch: 13 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:12:36,488-Speed 3384.88 samples/sec Loss 3.4244 LearningRate 0.0109 Epoch: 13 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:12:39,527-Speed 3370.24 samples/sec Loss 3.4118 LearningRate 0.0109 Epoch: 13 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:42,546-Speed 3393.16 samples/sec Loss 3.3178 LearningRate 0.0109 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:45,551-Speed 3407.53 samples/sec Loss 3.4808 LearningRate 0.0109 Epoch: 13 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:48,555-Speed 3409.73 samples/sec Loss 3.4180 LearningRate 0.0109 Epoch: 13 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:51,564-Speed 3404.49 samples/sec Loss 3.4996 LearningRate 0.0109 Epoch: 13 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:54,607-Speed 3365.99 samples/sec Loss 3.2613 LearningRate 0.0108 Epoch: 13 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:12:57,635-Speed 3383.33 samples/sec Loss 3.4559 LearningRate 0.0108 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:00,647-Speed 3400.28 samples/sec Loss 3.4190 LearningRate 0.0108 Epoch: 13 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:03,649-Speed 3411.19 samples/sec Loss 3.4737 LearningRate 0.0108 Epoch: 13 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:06,664-Speed 3397.53 samples/sec Loss 3.4267 LearningRate 0.0108 Epoch: 13 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:09,665-Speed 3412.88 samples/sec Loss 3.5035 LearningRate 0.0108 Epoch: 13 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:13:12,668-Speed 3411.28 samples/sec Loss 3.4506 LearningRate 0.0108 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:13:15,719-Speed 3357.42 samples/sec Loss 3.3756 LearningRate 0.0108 Epoch: 13 Global Step: 67910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:18,785-Speed 3340.49 samples/sec Loss 3.5643 LearningRate 0.0108 Epoch: 13 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:21,792-Speed 3406.36 samples/sec Loss 3.4311 LearningRate 0.0108 Epoch: 13 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:24,810-Speed 3394.38 samples/sec Loss 3.6493 LearningRate 0.0108 Epoch: 13 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:27,833-Speed 3387.77 samples/sec Loss 3.5987 LearningRate 0.0108 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:30,871-Speed 3371.78 samples/sec Loss 3.4208 LearningRate 0.0108 Epoch: 13 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:33,879-Speed 3404.86 samples/sec Loss 3.3550 LearningRate 0.0108 Epoch: 13 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:36,920-Speed 3367.78 samples/sec Loss 3.5627 LearningRate 0.0108 Epoch: 13 Global Step: 67980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:39,939-Speed 3392.77 samples/sec Loss 3.4549 LearningRate 0.0108 Epoch: 13 Global Step: 67990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:13:42,953-Speed 3397.88 samples/sec Loss 3.4341 LearningRate 0.0107 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:14:27,204-[lfw][68000]XNorm: 23.713765 Training: 2022-04-11 06:14:27,205-[lfw][68000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 06:14:27,205-[lfw][68000]Accuracy-Highest: 0.99850 Training: 2022-04-11 06:15:18,707-[cfp_fp][68000]XNorm: 22.193234 Training: 2022-04-11 06:15:18,707-[cfp_fp][68000]Accuracy-Flip: 0.98300+-0.00555 Training: 2022-04-11 06:15:18,708-[cfp_fp][68000]Accuracy-Highest: 0.98386 Training: 2022-04-11 06:16:03,243-[agedb_30][68000]XNorm: 23.814715 Training: 2022-04-11 06:16:03,244-[agedb_30][68000]Accuracy-Flip: 0.98350+-0.00681 Training: 2022-04-11 06:16:03,245-[agedb_30][68000]Accuracy-Highest: 0.98350 Training: 2022-04-11 06:16:06,256-Speed 71.46 samples/sec Loss 3.4309 LearningRate 0.0107 Epoch: 13 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:09,242-Speed 3429.41 samples/sec Loss 3.5168 LearningRate 0.0107 Epoch: 13 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:12,250-Speed 3404.97 samples/sec Loss 3.3346 LearningRate 0.0107 Epoch: 13 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:15,244-Speed 3421.78 samples/sec Loss 3.4914 LearningRate 0.0107 Epoch: 13 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:18,244-Speed 3413.98 samples/sec Loss 3.5780 LearningRate 0.0107 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:21,237-Speed 3422.44 samples/sec Loss 3.6364 LearningRate 0.0107 Epoch: 13 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:24,222-Speed 3431.55 samples/sec Loss 3.4713 LearningRate 0.0107 Epoch: 13 Global Step: 68070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:27,226-Speed 3408.95 samples/sec Loss 3.4850 LearningRate 0.0107 Epoch: 13 Global Step: 68080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:30,304-Speed 3327.36 samples/sec Loss 3.4962 LearningRate 0.0107 Epoch: 13 Global Step: 68090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:33,299-Speed 3420.31 samples/sec Loss 3.4609 LearningRate 0.0107 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:36,298-Speed 3416.03 samples/sec Loss 3.4126 LearningRate 0.0107 Epoch: 13 Global Step: 68110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:39,340-Speed 3367.66 samples/sec Loss 3.4961 LearningRate 0.0107 Epoch: 13 Global Step: 68120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:42,384-Speed 3364.47 samples/sec Loss 3.4185 LearningRate 0.0107 Epoch: 13 Global Step: 68130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:45,391-Speed 3406.62 samples/sec Loss 3.4429 LearningRate 0.0107 Epoch: 13 Global Step: 68140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:48,384-Speed 3422.36 samples/sec Loss 3.4446 LearningRate 0.0106 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:51,385-Speed 3413.25 samples/sec Loss 3.4552 LearningRate 0.0106 Epoch: 13 Global Step: 68160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:16:54,430-Speed 3363.95 samples/sec Loss 3.4556 LearningRate 0.0106 Epoch: 13 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:16:57,433-Speed 3410.01 samples/sec Loss 3.5487 LearningRate 0.0106 Epoch: 13 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:00,436-Speed 3411.49 samples/sec Loss 3.4540 LearningRate 0.0106 Epoch: 13 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:03,511-Speed 3331.12 samples/sec Loss 3.4846 LearningRate 0.0106 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:06,521-Speed 3401.82 samples/sec Loss 3.3109 LearningRate 0.0106 Epoch: 13 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:09,518-Speed 3418.06 samples/sec Loss 3.4166 LearningRate 0.0106 Epoch: 13 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:12,520-Speed 3411.60 samples/sec Loss 3.4519 LearningRate 0.0106 Epoch: 13 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:15,524-Speed 3411.01 samples/sec Loss 3.3921 LearningRate 0.0106 Epoch: 13 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:18,520-Speed 3418.56 samples/sec Loss 3.3440 LearningRate 0.0106 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:21,520-Speed 3414.98 samples/sec Loss 3.4051 LearningRate 0.0106 Epoch: 13 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:24,524-Speed 3409.34 samples/sec Loss 3.5386 LearningRate 0.0106 Epoch: 13 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 06:17:27,535-Speed 3401.45 samples/sec Loss 3.3283 LearningRate 0.0106 Epoch: 13 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:30,545-Speed 3402.51 samples/sec Loss 3.4957 LearningRate 0.0106 Epoch: 13 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:33,542-Speed 3418.11 samples/sec Loss 3.5025 LearningRate 0.0106 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:36,538-Speed 3418.30 samples/sec Loss 3.4624 LearningRate 0.0105 Epoch: 13 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:17:39,524-Speed 3429.99 samples/sec Loss 3.5511 LearningRate 0.0105 Epoch: 13 Global Step: 68320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:42,543-Speed 3393.64 samples/sec Loss 3.3819 LearningRate 0.0105 Epoch: 13 Global Step: 68330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:45,546-Speed 3411.18 samples/sec Loss 3.4719 LearningRate 0.0105 Epoch: 13 Global Step: 68340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:48,573-Speed 3383.86 samples/sec Loss 3.4417 LearningRate 0.0105 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:51,578-Speed 3407.53 samples/sec Loss 3.4622 LearningRate 0.0105 Epoch: 13 Global Step: 68360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:54,576-Speed 3417.07 samples/sec Loss 3.4627 LearningRate 0.0105 Epoch: 13 Global Step: 68370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:17:57,587-Speed 3401.17 samples/sec Loss 3.4251 LearningRate 0.0105 Epoch: 13 Global Step: 68380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:18:00,637-Speed 3358.97 samples/sec Loss 3.4389 LearningRate 0.0105 Epoch: 13 Global Step: 68390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:18:03,657-Speed 3390.67 samples/sec Loss 3.5272 LearningRate 0.0105 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:18:06,663-Speed 3407.21 samples/sec Loss 3.3235 LearningRate 0.0105 Epoch: 13 Global Step: 68410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:18:09,664-Speed 3413.70 samples/sec Loss 3.4557 LearningRate 0.0105 Epoch: 13 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:12,665-Speed 3413.79 samples/sec Loss 3.4266 LearningRate 0.0105 Epoch: 13 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:15,670-Speed 3407.94 samples/sec Loss 3.4928 LearningRate 0.0105 Epoch: 13 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:18,666-Speed 3419.34 samples/sec Loss 3.4549 LearningRate 0.0105 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:21,664-Speed 3416.48 samples/sec Loss 3.5328 LearningRate 0.0104 Epoch: 13 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:24,673-Speed 3403.06 samples/sec Loss 3.4340 LearningRate 0.0104 Epoch: 13 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:27,671-Speed 3417.18 samples/sec Loss 3.3031 LearningRate 0.0104 Epoch: 13 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:30,700-Speed 3381.51 samples/sec Loss 3.3972 LearningRate 0.0104 Epoch: 13 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:33,705-Speed 3408.53 samples/sec Loss 3.4793 LearningRate 0.0104 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:36,725-Speed 3390.61 samples/sec Loss 3.3901 LearningRate 0.0104 Epoch: 13 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:39,713-Speed 3428.18 samples/sec Loss 3.4433 LearningRate 0.0104 Epoch: 13 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:42,714-Speed 3413.78 samples/sec Loss 3.3664 LearningRate 0.0104 Epoch: 13 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:45,717-Speed 3410.65 samples/sec Loss 3.2703 LearningRate 0.0104 Epoch: 13 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:48,724-Speed 3406.51 samples/sec Loss 3.4704 LearningRate 0.0104 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:51,770-Speed 3361.56 samples/sec Loss 3.3875 LearningRate 0.0104 Epoch: 13 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:54,779-Speed 3405.14 samples/sec Loss 3.4754 LearningRate 0.0104 Epoch: 13 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:18:57,786-Speed 3405.25 samples/sec Loss 3.4191 LearningRate 0.0104 Epoch: 13 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:00,837-Speed 3357.95 samples/sec Loss 3.4548 LearningRate 0.0104 Epoch: 13 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:03,850-Speed 3398.58 samples/sec Loss 3.4067 LearningRate 0.0104 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:06,866-Speed 3395.90 samples/sec Loss 3.4783 LearningRate 0.0104 Epoch: 13 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:09,848-Speed 3435.92 samples/sec Loss 3.4644 LearningRate 0.0103 Epoch: 13 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:12,851-Speed 3410.36 samples/sec Loss 3.4512 LearningRate 0.0103 Epoch: 13 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:15,856-Speed 3408.77 samples/sec Loss 3.4645 LearningRate 0.0103 Epoch: 13 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:18,856-Speed 3414.40 samples/sec Loss 3.5104 LearningRate 0.0103 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:21,863-Speed 3405.64 samples/sec Loss 3.4283 LearningRate 0.0103 Epoch: 13 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:24,884-Speed 3390.24 samples/sec Loss 3.6559 LearningRate 0.0103 Epoch: 13 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:19:27,882-Speed 3416.34 samples/sec Loss 3.5924 LearningRate 0.0103 Epoch: 13 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:30,886-Speed 3410.65 samples/sec Loss 3.4766 LearningRate 0.0103 Epoch: 13 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:33,890-Speed 3408.76 samples/sec Loss 3.5354 LearningRate 0.0103 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:36,895-Speed 3408.93 samples/sec Loss 3.5134 LearningRate 0.0103 Epoch: 13 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:39,924-Speed 3381.67 samples/sec Loss 3.5661 LearningRate 0.0103 Epoch: 13 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:42,924-Speed 3413.71 samples/sec Loss 3.3729 LearningRate 0.0103 Epoch: 13 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:45,936-Speed 3401.91 samples/sec Loss 3.4449 LearningRate 0.0103 Epoch: 13 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:48,982-Speed 3362.13 samples/sec Loss 3.4445 LearningRate 0.0103 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:51,984-Speed 3411.66 samples/sec Loss 3.5933 LearningRate 0.0103 Epoch: 13 Global Step: 68760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:54,984-Speed 3414.27 samples/sec Loss 3.4544 LearningRate 0.0103 Epoch: 13 Global Step: 68770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:19:57,985-Speed 3412.85 samples/sec Loss 3.4143 LearningRate 0.0102 Epoch: 13 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:01,009-Speed 3386.70 samples/sec Loss 3.3734 LearningRate 0.0102 Epoch: 13 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:04,016-Speed 3406.42 samples/sec Loss 3.4071 LearningRate 0.0102 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:07,028-Speed 3399.94 samples/sec Loss 3.5214 LearningRate 0.0102 Epoch: 13 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:10,016-Speed 3428.37 samples/sec Loss 3.4332 LearningRate 0.0102 Epoch: 13 Global Step: 68820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:13,036-Speed 3392.38 samples/sec Loss 3.4746 LearningRate 0.0102 Epoch: 13 Global Step: 68830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:16,174-Speed 3263.78 samples/sec Loss 3.4487 LearningRate 0.0102 Epoch: 13 Global Step: 68840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:19,217-Speed 3366.21 samples/sec Loss 3.4640 LearningRate 0.0102 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:22,224-Speed 3405.73 samples/sec Loss 3.4624 LearningRate 0.0102 Epoch: 13 Global Step: 68860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:25,270-Speed 3363.07 samples/sec Loss 3.5126 LearningRate 0.0102 Epoch: 13 Global Step: 68870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:28,323-Speed 3354.81 samples/sec Loss 3.4325 LearningRate 0.0102 Epoch: 13 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:31,321-Speed 3416.59 samples/sec Loss 3.4304 LearningRate 0.0102 Epoch: 13 Global Step: 68890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:34,322-Speed 3412.63 samples/sec Loss 3.4955 LearningRate 0.0102 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:37,333-Speed 3400.95 samples/sec Loss 3.4766 LearningRate 0.0102 Epoch: 13 Global Step: 68910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:20:40,355-Speed 3390.09 samples/sec Loss 3.4854 LearningRate 0.0102 Epoch: 13 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:43,358-Speed 3410.92 samples/sec Loss 3.4554 LearningRate 0.0102 Epoch: 13 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:46,366-Speed 3404.84 samples/sec Loss 3.4338 LearningRate 0.0101 Epoch: 13 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:49,386-Speed 3391.41 samples/sec Loss 3.4888 LearningRate 0.0101 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:52,392-Speed 3408.11 samples/sec Loss 3.6025 LearningRate 0.0101 Epoch: 13 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:55,394-Speed 3411.26 samples/sec Loss 3.3716 LearningRate 0.0101 Epoch: 13 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:20:58,397-Speed 3410.30 samples/sec Loss 3.4979 LearningRate 0.0101 Epoch: 13 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:21:01,411-Speed 3399.33 samples/sec Loss 3.5118 LearningRate 0.0101 Epoch: 13 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:21:04,468-Speed 3349.78 samples/sec Loss 3.4955 LearningRate 0.0101 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:21:07,469-Speed 3413.08 samples/sec Loss 3.4931 LearningRate 0.0101 Epoch: 13 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:10,471-Speed 3412.15 samples/sec Loss 3.4881 LearningRate 0.0101 Epoch: 13 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:13,474-Speed 3411.24 samples/sec Loss 3.5507 LearningRate 0.0101 Epoch: 13 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:16,491-Speed 3394.74 samples/sec Loss 3.5700 LearningRate 0.0101 Epoch: 13 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:19,495-Speed 3409.16 samples/sec Loss 3.4800 LearningRate 0.0101 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:22,497-Speed 3412.47 samples/sec Loss 3.5402 LearningRate 0.0101 Epoch: 13 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:25,509-Speed 3400.37 samples/sec Loss 3.3650 LearningRate 0.0101 Epoch: 13 Global Step: 69070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:28,546-Speed 3372.07 samples/sec Loss 3.5011 LearningRate 0.0101 Epoch: 13 Global Step: 69080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:31,555-Speed 3404.53 samples/sec Loss 3.3006 LearningRate 0.0101 Epoch: 13 Global Step: 69090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:34,562-Speed 3406.36 samples/sec Loss 3.3804 LearningRate 0.0100 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:37,578-Speed 3395.59 samples/sec Loss 3.5079 LearningRate 0.0100 Epoch: 13 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:21:40,625-Speed 3362.70 samples/sec Loss 3.4490 LearningRate 0.0100 Epoch: 13 Global Step: 69120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:43,676-Speed 3356.78 samples/sec Loss 3.4696 LearningRate 0.0100 Epoch: 13 Global Step: 69130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:46,682-Speed 3407.21 samples/sec Loss 3.5002 LearningRate 0.0100 Epoch: 13 Global Step: 69140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:49,692-Speed 3402.84 samples/sec Loss 3.5070 LearningRate 0.0100 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:52,709-Speed 3395.35 samples/sec Loss 3.3686 LearningRate 0.0100 Epoch: 13 Global Step: 69160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:55,711-Speed 3411.41 samples/sec Loss 3.3676 LearningRate 0.0100 Epoch: 13 Global Step: 69170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:21:58,716-Speed 3408.58 samples/sec Loss 3.5355 LearningRate 0.0100 Epoch: 13 Global Step: 69180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:01,771-Speed 3353.61 samples/sec Loss 3.4736 LearningRate 0.0100 Epoch: 13 Global Step: 69190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:05,001-Speed 3170.81 samples/sec Loss 3.5262 LearningRate 0.0100 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:08,113-Speed 3291.33 samples/sec Loss 3.4034 LearningRate 0.0100 Epoch: 13 Global Step: 69210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:11,124-Speed 3401.99 samples/sec Loss 3.3300 LearningRate 0.0100 Epoch: 13 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:14,140-Speed 3395.97 samples/sec Loss 3.5418 LearningRate 0.0100 Epoch: 13 Global Step: 69230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:17,146-Speed 3407.51 samples/sec Loss 3.5063 LearningRate 0.0100 Epoch: 13 Global Step: 69240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:20,155-Speed 3403.26 samples/sec Loss 3.4462 LearningRate 0.0100 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:23,158-Speed 3411.39 samples/sec Loss 3.4578 LearningRate 0.0099 Epoch: 13 Global Step: 69260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:26,169-Speed 3401.12 samples/sec Loss 3.4563 LearningRate 0.0099 Epoch: 13 Global Step: 69270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:29,188-Speed 3392.82 samples/sec Loss 3.5275 LearningRate 0.0099 Epoch: 13 Global Step: 69280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:32,192-Speed 3410.60 samples/sec Loss 3.5095 LearningRate 0.0099 Epoch: 13 Global Step: 69290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:35,197-Speed 3408.41 samples/sec Loss 3.4664 LearningRate 0.0099 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:38,209-Speed 3400.66 samples/sec Loss 3.4188 LearningRate 0.0099 Epoch: 13 Global Step: 69310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:41,214-Speed 3408.59 samples/sec Loss 3.5255 LearningRate 0.0099 Epoch: 13 Global Step: 69320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:22:44,230-Speed 3395.84 samples/sec Loss 3.4103 LearningRate 0.0099 Epoch: 13 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:47,236-Speed 3406.51 samples/sec Loss 3.4744 LearningRate 0.0099 Epoch: 13 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:50,256-Speed 3391.82 samples/sec Loss 3.4775 LearningRate 0.0099 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:53,326-Speed 3336.27 samples/sec Loss 3.4439 LearningRate 0.0099 Epoch: 13 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:56,334-Speed 3406.15 samples/sec Loss 3.5220 LearningRate 0.0099 Epoch: 13 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:22:59,333-Speed 3415.92 samples/sec Loss 3.5189 LearningRate 0.0099 Epoch: 13 Global Step: 69380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:02,340-Speed 3406.03 samples/sec Loss 3.4441 LearningRate 0.0099 Epoch: 13 Global Step: 69390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:05,347-Speed 3406.70 samples/sec Loss 3.5249 LearningRate 0.0099 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:08,351-Speed 3409.64 samples/sec Loss 3.5210 LearningRate 0.0099 Epoch: 13 Global Step: 69410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:11,370-Speed 3392.13 samples/sec Loss 3.4849 LearningRate 0.0098 Epoch: 13 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:14,372-Speed 3411.45 samples/sec Loss 3.4365 LearningRate 0.0098 Epoch: 13 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:17,393-Speed 3390.99 samples/sec Loss 3.4325 LearningRate 0.0098 Epoch: 13 Global Step: 69440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:20,398-Speed 3408.87 samples/sec Loss 3.3709 LearningRate 0.0098 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:23,421-Speed 3387.89 samples/sec Loss 3.5669 LearningRate 0.0098 Epoch: 13 Global Step: 69460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:26,449-Speed 3382.81 samples/sec Loss 3.3395 LearningRate 0.0098 Epoch: 13 Global Step: 69470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:23:29,465-Speed 3396.09 samples/sec Loss 3.4528 LearningRate 0.0098 Epoch: 13 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:32,472-Speed 3405.57 samples/sec Loss 3.5486 LearningRate 0.0098 Epoch: 13 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:35,476-Speed 3410.18 samples/sec Loss 3.3818 LearningRate 0.0098 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:38,489-Speed 3399.06 samples/sec Loss 3.4391 LearningRate 0.0098 Epoch: 13 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:41,551-Speed 3345.58 samples/sec Loss 3.4655 LearningRate 0.0098 Epoch: 13 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:44,588-Speed 3372.49 samples/sec Loss 3.3286 LearningRate 0.0098 Epoch: 13 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:47,772-Speed 3216.59 samples/sec Loss 3.4416 LearningRate 0.0098 Epoch: 13 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:50,888-Speed 3288.07 samples/sec Loss 3.4007 LearningRate 0.0098 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:53,904-Speed 3396.45 samples/sec Loss 3.3555 LearningRate 0.0098 Epoch: 13 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:56,908-Speed 3409.49 samples/sec Loss 3.5287 LearningRate 0.0098 Epoch: 13 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:23:59,895-Speed 3428.78 samples/sec Loss 3.4804 LearningRate 0.0097 Epoch: 13 Global Step: 69580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:02,905-Speed 3402.89 samples/sec Loss 3.5278 LearningRate 0.0097 Epoch: 13 Global Step: 69590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:05,908-Speed 3410.60 samples/sec Loss 3.3987 LearningRate 0.0097 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:08,921-Speed 3398.88 samples/sec Loss 3.6077 LearningRate 0.0097 Epoch: 13 Global Step: 69610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:11,939-Speed 3394.79 samples/sec Loss 3.4617 LearningRate 0.0097 Epoch: 13 Global Step: 69620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:14,957-Speed 3393.80 samples/sec Loss 3.4212 LearningRate 0.0097 Epoch: 13 Global Step: 69630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:17,997-Speed 3368.81 samples/sec Loss 3.3929 LearningRate 0.0097 Epoch: 13 Global Step: 69640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:21,006-Speed 3404.85 samples/sec Loss 3.3888 LearningRate 0.0097 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:24,013-Speed 3405.76 samples/sec Loss 3.4761 LearningRate 0.0097 Epoch: 13 Global Step: 69660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:27,018-Speed 3408.65 samples/sec Loss 3.5284 LearningRate 0.0097 Epoch: 13 Global Step: 69670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:30,034-Speed 3395.78 samples/sec Loss 3.3308 LearningRate 0.0097 Epoch: 13 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:24:33,042-Speed 3405.21 samples/sec Loss 3.4124 LearningRate 0.0097 Epoch: 13 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:24:36,052-Speed 3403.09 samples/sec Loss 3.3987 LearningRate 0.0097 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:24:39,061-Speed 3403.22 samples/sec Loss 3.3408 LearningRate 0.0097 Epoch: 13 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:24:42,052-Speed 3425.46 samples/sec Loss 3.4647 LearningRate 0.0097 Epoch: 13 Global Step: 69720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:45,055-Speed 3410.74 samples/sec Loss 3.2711 LearningRate 0.0097 Epoch: 13 Global Step: 69730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:48,068-Speed 3399.00 samples/sec Loss 3.4836 LearningRate 0.0096 Epoch: 13 Global Step: 69740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:51,074-Speed 3407.46 samples/sec Loss 3.4383 LearningRate 0.0096 Epoch: 13 Global Step: 69750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:54,082-Speed 3405.83 samples/sec Loss 3.5949 LearningRate 0.0096 Epoch: 13 Global Step: 69760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:24:57,087-Speed 3408.18 samples/sec Loss 3.4744 LearningRate 0.0096 Epoch: 13 Global Step: 69770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:00,111-Speed 3386.61 samples/sec Loss 3.3422 LearningRate 0.0096 Epoch: 13 Global Step: 69780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:03,134-Speed 3388.79 samples/sec Loss 3.2524 LearningRate 0.0096 Epoch: 13 Global Step: 69790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:06,157-Speed 3387.78 samples/sec Loss 3.4285 LearningRate 0.0096 Epoch: 13 Global Step: 69800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:09,165-Speed 3405.73 samples/sec Loss 3.4995 LearningRate 0.0096 Epoch: 13 Global Step: 69810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:12,176-Speed 3401.93 samples/sec Loss 3.2805 LearningRate 0.0096 Epoch: 13 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:25:15,278-Speed 3302.18 samples/sec Loss 3.4992 LearningRate 0.0096 Epoch: 13 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:25:18,299-Speed 3389.70 samples/sec Loss 3.3235 LearningRate 0.0096 Epoch: 13 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:25:21,309-Speed 3403.16 samples/sec Loss 3.5230 LearningRate 0.0096 Epoch: 13 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:25:24,297-Speed 3427.74 samples/sec Loss 3.4040 LearningRate 0.0096 Epoch: 13 Global Step: 69860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:27,316-Speed 3393.11 samples/sec Loss 3.5046 LearningRate 0.0096 Epoch: 13 Global Step: 69870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:30,341-Speed 3386.78 samples/sec Loss 3.4744 LearningRate 0.0096 Epoch: 13 Global Step: 69880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:33,362-Speed 3389.68 samples/sec Loss 3.4202 LearningRate 0.0096 Epoch: 13 Global Step: 69890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:36,376-Speed 3398.96 samples/sec Loss 3.3703 LearningRate 0.0095 Epoch: 13 Global Step: 69900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:39,403-Speed 3384.63 samples/sec Loss 3.2691 LearningRate 0.0095 Epoch: 13 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:42,431-Speed 3382.43 samples/sec Loss 3.3287 LearningRate 0.0095 Epoch: 13 Global Step: 69920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:45,479-Speed 3359.81 samples/sec Loss 3.3647 LearningRate 0.0095 Epoch: 13 Global Step: 69930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:48,498-Speed 3392.82 samples/sec Loss 3.2957 LearningRate 0.0095 Epoch: 13 Global Step: 69940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:51,505-Speed 3406.32 samples/sec Loss 3.4386 LearningRate 0.0095 Epoch: 13 Global Step: 69950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:25:54,528-Speed 3388.15 samples/sec Loss 3.4355 LearningRate 0.0095 Epoch: 13 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:25:57,533-Speed 3407.95 samples/sec Loss 3.3984 LearningRate 0.0095 Epoch: 13 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:26:00,542-Speed 3404.66 samples/sec Loss 3.3940 LearningRate 0.0095 Epoch: 13 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:26:03,551-Speed 3403.90 samples/sec Loss 3.4502 LearningRate 0.0095 Epoch: 13 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:26:06,574-Speed 3388.84 samples/sec Loss 3.4327 LearningRate 0.0095 Epoch: 13 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:26:50,827-[lfw][70000]XNorm: 21.686980 Training: 2022-04-11 06:26:50,828-[lfw][70000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 06:26:50,828-[lfw][70000]Accuracy-Highest: 0.99850 Training: 2022-04-11 06:27:42,214-[cfp_fp][70000]XNorm: 20.941235 Training: 2022-04-11 06:27:42,215-[cfp_fp][70000]Accuracy-Flip: 0.98343+-0.00586 Training: 2022-04-11 06:27:42,215-[cfp_fp][70000]Accuracy-Highest: 0.98386 Training: 2022-04-11 06:28:26,489-[agedb_30][70000]XNorm: 21.943153 Training: 2022-04-11 06:28:26,490-[agedb_30][70000]Accuracy-Flip: 0.98417+-0.00684 Training: 2022-04-11 06:28:26,491-[agedb_30][70000]Accuracy-Highest: 0.98417 Training: 2022-04-11 06:28:29,509-Speed 71.64 samples/sec Loss 3.3525 LearningRate 0.0095 Epoch: 13 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:28:32,494-Speed 3430.80 samples/sec Loss 3.4274 LearningRate 0.0095 Epoch: 13 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:28:35,533-Speed 3370.16 samples/sec Loss 3.4512 LearningRate 0.0095 Epoch: 13 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:28:38,511-Speed 3440.12 samples/sec Loss 3.3062 LearningRate 0.0095 Epoch: 13 Global Step: 70040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:42,424-Speed 2616.75 samples/sec Loss 3.3373 LearningRate 0.0095 Epoch: 13 Global Step: 70050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:45,995-Speed 2868.34 samples/sec Loss 3.3845 LearningRate 0.0095 Epoch: 13 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:48,987-Speed 3423.05 samples/sec Loss 3.4653 LearningRate 0.0094 Epoch: 13 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:51,986-Speed 3415.63 samples/sec Loss 3.2355 LearningRate 0.0094 Epoch: 13 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:54,986-Speed 3414.53 samples/sec Loss 3.5037 LearningRate 0.0094 Epoch: 13 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:28:58,004-Speed 3393.49 samples/sec Loss 3.5106 LearningRate 0.0094 Epoch: 13 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:01,008-Speed 3409.61 samples/sec Loss 3.4637 LearningRate 0.0094 Epoch: 13 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:04,009-Speed 3413.41 samples/sec Loss 3.3564 LearningRate 0.0094 Epoch: 13 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:07,074-Speed 3341.90 samples/sec Loss 3.3644 LearningRate 0.0094 Epoch: 13 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:10,075-Speed 3412.31 samples/sec Loss 3.2700 LearningRate 0.0094 Epoch: 13 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:13,072-Speed 3418.05 samples/sec Loss 3.4660 LearningRate 0.0094 Epoch: 13 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:16,121-Speed 3359.51 samples/sec Loss 3.3484 LearningRate 0.0094 Epoch: 13 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:19,106-Speed 3431.88 samples/sec Loss 3.5104 LearningRate 0.0094 Epoch: 13 Global Step: 70170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:22,117-Speed 3401.80 samples/sec Loss 3.3693 LearningRate 0.0094 Epoch: 13 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:25,139-Speed 3388.64 samples/sec Loss 3.4617 LearningRate 0.0094 Epoch: 13 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:28,238-Speed 3305.57 samples/sec Loss 3.4122 LearningRate 0.0094 Epoch: 13 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:31,256-Speed 3394.33 samples/sec Loss 3.3514 LearningRate 0.0094 Epoch: 13 Global Step: 70210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:34,257-Speed 3412.75 samples/sec Loss 3.4202 LearningRate 0.0094 Epoch: 13 Global Step: 70220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:37,260-Speed 3410.99 samples/sec Loss 3.3879 LearningRate 0.0093 Epoch: 13 Global Step: 70230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:40,265-Speed 3408.45 samples/sec Loss 3.4821 LearningRate 0.0093 Epoch: 13 Global Step: 70240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:43,294-Speed 3381.22 samples/sec Loss 3.4545 LearningRate 0.0093 Epoch: 13 Global Step: 70250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:46,326-Speed 3378.50 samples/sec Loss 3.4588 LearningRate 0.0093 Epoch: 13 Global Step: 70260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:29:49,336-Speed 3402.65 samples/sec Loss 3.4546 LearningRate 0.0093 Epoch: 13 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:52,350-Speed 3398.63 samples/sec Loss 3.4021 LearningRate 0.0093 Epoch: 13 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:55,389-Speed 3370.44 samples/sec Loss 3.2585 LearningRate 0.0093 Epoch: 13 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:29:58,418-Speed 3381.65 samples/sec Loss 3.2698 LearningRate 0.0093 Epoch: 13 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:30:01,430-Speed 3401.00 samples/sec Loss 3.4051 LearningRate 0.0093 Epoch: 13 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:30:04,454-Speed 3386.42 samples/sec Loss 3.3824 LearningRate 0.0093 Epoch: 13 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:07,464-Speed 3403.72 samples/sec Loss 3.3282 LearningRate 0.0093 Epoch: 13 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:10,473-Speed 3403.53 samples/sec Loss 3.2593 LearningRate 0.0093 Epoch: 13 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:13,511-Speed 3371.45 samples/sec Loss 3.4522 LearningRate 0.0093 Epoch: 13 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:16,536-Speed 3385.58 samples/sec Loss 3.4700 LearningRate 0.0093 Epoch: 13 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:19,547-Speed 3401.83 samples/sec Loss 3.3228 LearningRate 0.0093 Epoch: 13 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:22,557-Speed 3402.99 samples/sec Loss 3.4221 LearningRate 0.0093 Epoch: 13 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:25,564-Speed 3406.21 samples/sec Loss 3.4212 LearningRate 0.0093 Epoch: 13 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:28,590-Speed 3385.72 samples/sec Loss 3.3742 LearningRate 0.0092 Epoch: 13 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:31,592-Speed 3411.32 samples/sec Loss 3.4216 LearningRate 0.0092 Epoch: 13 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:34,625-Speed 3377.33 samples/sec Loss 3.4541 LearningRate 0.0092 Epoch: 13 Global Step: 70420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:38,310-Speed 2779.39 samples/sec Loss 3.4727 LearningRate 0.0092 Epoch: 13 Global Step: 70430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:41,373-Speed 3343.92 samples/sec Loss 3.2822 LearningRate 0.0092 Epoch: 13 Global Step: 70440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:44,379-Speed 3406.96 samples/sec Loss 3.2900 LearningRate 0.0092 Epoch: 13 Global Step: 70450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:47,411-Speed 3378.52 samples/sec Loss 3.3883 LearningRate 0.0092 Epoch: 13 Global Step: 70460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:50,429-Speed 3393.03 samples/sec Loss 3.4947 LearningRate 0.0092 Epoch: 13 Global Step: 70470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:53,449-Speed 3391.98 samples/sec Loss 3.2689 LearningRate 0.0092 Epoch: 13 Global Step: 70480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:56,460-Speed 3401.90 samples/sec Loss 3.2025 LearningRate 0.0092 Epoch: 13 Global Step: 70490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:30:59,468-Speed 3405.24 samples/sec Loss 3.3504 LearningRate 0.0092 Epoch: 13 Global Step: 70500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:02,487-Speed 3393.08 samples/sec Loss 3.4325 LearningRate 0.0092 Epoch: 13 Global Step: 70510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:05,497-Speed 3402.73 samples/sec Loss 3.4555 LearningRate 0.0092 Epoch: 13 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:08,511-Speed 3398.68 samples/sec Loss 3.1862 LearningRate 0.0092 Epoch: 13 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:11,531-Speed 3390.51 samples/sec Loss 3.4441 LearningRate 0.0092 Epoch: 13 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:14,538-Speed 3407.05 samples/sec Loss 3.3864 LearningRate 0.0092 Epoch: 13 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:17,613-Speed 3331.21 samples/sec Loss 3.3682 LearningRate 0.0092 Epoch: 13 Global Step: 70560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:20,648-Speed 3375.03 samples/sec Loss 3.4384 LearningRate 0.0091 Epoch: 13 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:23,659-Speed 3401.53 samples/sec Loss 3.3434 LearningRate 0.0091 Epoch: 13 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:26,665-Speed 3407.39 samples/sec Loss 3.2874 LearningRate 0.0091 Epoch: 13 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:29,676-Speed 3401.81 samples/sec Loss 3.4066 LearningRate 0.0091 Epoch: 13 Global Step: 70600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:32,679-Speed 3410.78 samples/sec Loss 3.4450 LearningRate 0.0091 Epoch: 13 Global Step: 70610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:35,683-Speed 3409.33 samples/sec Loss 3.3985 LearningRate 0.0091 Epoch: 13 Global Step: 70620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:38,703-Speed 3392.05 samples/sec Loss 3.2701 LearningRate 0.0091 Epoch: 13 Global Step: 70630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:41,711-Speed 3405.13 samples/sec Loss 3.3746 LearningRate 0.0091 Epoch: 13 Global Step: 70640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:44,740-Speed 3380.83 samples/sec Loss 3.3133 LearningRate 0.0091 Epoch: 13 Global Step: 70650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:31:47,746-Speed 3408.29 samples/sec Loss 3.3705 LearningRate 0.0091 Epoch: 13 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:50,752-Speed 3406.55 samples/sec Loss 3.4713 LearningRate 0.0091 Epoch: 13 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:53,762-Speed 3403.72 samples/sec Loss 3.2927 LearningRate 0.0091 Epoch: 13 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:56,768-Speed 3407.18 samples/sec Loss 3.3991 LearningRate 0.0091 Epoch: 13 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:31:59,785-Speed 3394.75 samples/sec Loss 3.2460 LearningRate 0.0091 Epoch: 13 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:02,796-Speed 3402.25 samples/sec Loss 3.3919 LearningRate 0.0091 Epoch: 13 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:05,803-Speed 3405.09 samples/sec Loss 3.4439 LearningRate 0.0091 Epoch: 13 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:08,812-Speed 3405.25 samples/sec Loss 3.5068 LearningRate 0.0090 Epoch: 13 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:11,810-Speed 3415.61 samples/sec Loss 3.4229 LearningRate 0.0090 Epoch: 13 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:14,843-Speed 3377.03 samples/sec Loss 3.3325 LearningRate 0.0090 Epoch: 13 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:17,856-Speed 3400.55 samples/sec Loss 3.3361 LearningRate 0.0090 Epoch: 13 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:20,865-Speed 3404.43 samples/sec Loss 3.5019 LearningRate 0.0090 Epoch: 13 Global Step: 70770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:23,870-Speed 3408.42 samples/sec Loss 3.3333 LearningRate 0.0090 Epoch: 13 Global Step: 70780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:26,875-Speed 3408.47 samples/sec Loss 3.3726 LearningRate 0.0090 Epoch: 13 Global Step: 70790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:29,890-Speed 3396.88 samples/sec Loss 3.3918 LearningRate 0.0090 Epoch: 13 Global Step: 70800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:33,006-Speed 3287.45 samples/sec Loss 3.3950 LearningRate 0.0090 Epoch: 13 Global Step: 70810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:45,080-Speed 848.21 samples/sec Loss 2.7603 LearningRate 0.0090 Epoch: 14 Global Step: 70820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:48,117-Speed 3373.22 samples/sec Loss 2.5704 LearningRate 0.0090 Epoch: 14 Global Step: 70830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:32:51,143-Speed 3384.14 samples/sec Loss 2.5088 LearningRate 0.0090 Epoch: 14 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:54,153-Speed 3402.91 samples/sec Loss 2.6288 LearningRate 0.0090 Epoch: 14 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:32:57,197-Speed 3365.66 samples/sec Loss 2.5496 LearningRate 0.0090 Epoch: 14 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:00,208-Speed 3401.48 samples/sec Loss 2.6205 LearningRate 0.0090 Epoch: 14 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:03,219-Speed 3401.74 samples/sec Loss 2.6085 LearningRate 0.0090 Epoch: 14 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:06,227-Speed 3404.54 samples/sec Loss 2.5965 LearningRate 0.0090 Epoch: 14 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:09,240-Speed 3400.22 samples/sec Loss 2.5215 LearningRate 0.0089 Epoch: 14 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:12,234-Speed 3420.22 samples/sec Loss 2.5292 LearningRate 0.0089 Epoch: 14 Global Step: 70910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:15,242-Speed 3405.20 samples/sec Loss 2.7348 LearningRate 0.0089 Epoch: 14 Global Step: 70920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:18,270-Speed 3382.99 samples/sec Loss 2.6194 LearningRate 0.0089 Epoch: 14 Global Step: 70930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:21,273-Speed 3410.36 samples/sec Loss 2.6468 LearningRate 0.0089 Epoch: 14 Global Step: 70940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:24,298-Speed 3386.93 samples/sec Loss 2.6727 LearningRate 0.0089 Epoch: 14 Global Step: 70950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:27,306-Speed 3404.53 samples/sec Loss 2.5889 LearningRate 0.0089 Epoch: 14 Global Step: 70960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:30,312-Speed 3407.88 samples/sec Loss 2.5864 LearningRate 0.0089 Epoch: 14 Global Step: 70970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:33,322-Speed 3403.15 samples/sec Loss 2.5631 LearningRate 0.0089 Epoch: 14 Global Step: 70980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:36,335-Speed 3398.74 samples/sec Loss 2.6274 LearningRate 0.0089 Epoch: 14 Global Step: 70990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:39,343-Speed 3405.52 samples/sec Loss 2.5759 LearningRate 0.0089 Epoch: 14 Global Step: 71000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:33:42,350-Speed 3405.72 samples/sec Loss 2.6457 LearningRate 0.0089 Epoch: 14 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:45,359-Speed 3404.91 samples/sec Loss 2.7726 LearningRate 0.0089 Epoch: 14 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:48,369-Speed 3402.13 samples/sec Loss 2.6460 LearningRate 0.0089 Epoch: 14 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:51,378-Speed 3403.88 samples/sec Loss 2.6360 LearningRate 0.0089 Epoch: 14 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:54,395-Speed 3395.66 samples/sec Loss 2.5696 LearningRate 0.0089 Epoch: 14 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:33:57,414-Speed 3392.58 samples/sec Loss 2.6502 LearningRate 0.0089 Epoch: 14 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:00,424-Speed 3402.93 samples/sec Loss 2.5750 LearningRate 0.0088 Epoch: 14 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:03,449-Speed 3386.26 samples/sec Loss 2.5849 LearningRate 0.0088 Epoch: 14 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:06,463-Speed 3398.19 samples/sec Loss 2.6675 LearningRate 0.0088 Epoch: 14 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:09,471-Speed 3405.14 samples/sec Loss 2.7117 LearningRate 0.0088 Epoch: 14 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:12,480-Speed 3404.41 samples/sec Loss 2.5391 LearningRate 0.0088 Epoch: 14 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:15,491-Speed 3401.81 samples/sec Loss 2.6262 LearningRate 0.0088 Epoch: 14 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:18,501-Speed 3402.24 samples/sec Loss 2.6352 LearningRate 0.0088 Epoch: 14 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:21,513-Speed 3401.18 samples/sec Loss 2.7168 LearningRate 0.0088 Epoch: 14 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:24,523-Speed 3403.05 samples/sec Loss 2.7600 LearningRate 0.0088 Epoch: 14 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:27,559-Speed 3373.69 samples/sec Loss 2.7600 LearningRate 0.0088 Epoch: 14 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:30,616-Speed 3350.49 samples/sec Loss 2.7235 LearningRate 0.0088 Epoch: 14 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:33,622-Speed 3407.21 samples/sec Loss 2.6662 LearningRate 0.0088 Epoch: 14 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:36,630-Speed 3404.69 samples/sec Loss 2.5145 LearningRate 0.0088 Epoch: 14 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:39,666-Speed 3373.50 samples/sec Loss 2.6377 LearningRate 0.0088 Epoch: 14 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:42,666-Speed 3415.07 samples/sec Loss 2.6860 LearningRate 0.0088 Epoch: 14 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:45,681-Speed 3397.07 samples/sec Loss 2.7306 LearningRate 0.0088 Epoch: 14 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:48,687-Speed 3407.49 samples/sec Loss 2.7035 LearningRate 0.0088 Epoch: 14 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:51,696-Speed 3403.82 samples/sec Loss 2.7346 LearningRate 0.0087 Epoch: 14 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:54,707-Speed 3401.90 samples/sec Loss 2.6800 LearningRate 0.0087 Epoch: 14 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:34:57,714-Speed 3406.06 samples/sec Loss 2.6644 LearningRate 0.0087 Epoch: 14 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:00,728-Speed 3399.23 samples/sec Loss 2.7357 LearningRate 0.0087 Epoch: 14 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:03,740-Speed 3400.27 samples/sec Loss 2.6674 LearningRate 0.0087 Epoch: 14 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:06,752-Speed 3400.25 samples/sec Loss 2.7218 LearningRate 0.0087 Epoch: 14 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:09,763-Speed 3402.32 samples/sec Loss 2.7658 LearningRate 0.0087 Epoch: 14 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:12,792-Speed 3381.22 samples/sec Loss 2.6193 LearningRate 0.0087 Epoch: 14 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 06:35:15,787-Speed 3420.45 samples/sec Loss 2.6588 LearningRate 0.0087 Epoch: 14 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:18,813-Speed 3384.66 samples/sec Loss 2.7199 LearningRate 0.0087 Epoch: 14 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:21,821-Speed 3404.96 samples/sec Loss 2.6855 LearningRate 0.0087 Epoch: 14 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:24,847-Speed 3385.36 samples/sec Loss 2.6532 LearningRate 0.0087 Epoch: 14 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:27,861-Speed 3398.23 samples/sec Loss 2.7891 LearningRate 0.0087 Epoch: 14 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:30,877-Speed 3395.94 samples/sec Loss 2.7851 LearningRate 0.0087 Epoch: 14 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:33,883-Speed 3407.06 samples/sec Loss 2.8199 LearningRate 0.0087 Epoch: 14 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:36,900-Speed 3394.52 samples/sec Loss 2.6052 LearningRate 0.0087 Epoch: 14 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:39,910-Speed 3403.22 samples/sec Loss 2.7818 LearningRate 0.0087 Epoch: 14 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:42,922-Speed 3400.66 samples/sec Loss 2.7191 LearningRate 0.0086 Epoch: 14 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:45,915-Speed 3423.22 samples/sec Loss 2.7833 LearningRate 0.0086 Epoch: 14 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:48,925-Speed 3402.31 samples/sec Loss 2.7678 LearningRate 0.0086 Epoch: 14 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:51,938-Speed 3399.71 samples/sec Loss 2.7247 LearningRate 0.0086 Epoch: 14 Global Step: 71440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:54,976-Speed 3371.04 samples/sec Loss 2.7324 LearningRate 0.0086 Epoch: 14 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:35:57,987-Speed 3401.42 samples/sec Loss 2.7773 LearningRate 0.0086 Epoch: 14 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:00,998-Speed 3402.69 samples/sec Loss 2.7051 LearningRate 0.0086 Epoch: 14 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:04,018-Speed 3391.36 samples/sec Loss 2.7224 LearningRate 0.0086 Epoch: 14 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:07,033-Speed 3396.92 samples/sec Loss 2.6998 LearningRate 0.0086 Epoch: 14 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:10,062-Speed 3380.86 samples/sec Loss 2.7052 LearningRate 0.0086 Epoch: 14 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:13,072-Speed 3403.65 samples/sec Loss 2.7679 LearningRate 0.0086 Epoch: 14 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:16,068-Speed 3418.51 samples/sec Loss 2.7076 LearningRate 0.0086 Epoch: 14 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:19,078-Speed 3403.42 samples/sec Loss 2.6315 LearningRate 0.0086 Epoch: 14 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:22,089-Speed 3401.38 samples/sec Loss 2.6511 LearningRate 0.0086 Epoch: 14 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:25,107-Speed 3394.37 samples/sec Loss 2.7836 LearningRate 0.0086 Epoch: 14 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:28,159-Speed 3355.31 samples/sec Loss 2.8121 LearningRate 0.0086 Epoch: 14 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:36:31,149-Speed 3426.31 samples/sec Loss 2.8380 LearningRate 0.0086 Epoch: 14 Global Step: 71570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:34,159-Speed 3402.09 samples/sec Loss 2.7658 LearningRate 0.0086 Epoch: 14 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:37,168-Speed 3404.25 samples/sec Loss 2.8937 LearningRate 0.0085 Epoch: 14 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:40,188-Speed 3391.56 samples/sec Loss 2.8159 LearningRate 0.0085 Epoch: 14 Global Step: 71600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:43,199-Speed 3401.82 samples/sec Loss 2.9052 LearningRate 0.0085 Epoch: 14 Global Step: 71610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:46,211-Speed 3400.56 samples/sec Loss 2.7871 LearningRate 0.0085 Epoch: 14 Global Step: 71620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:49,221-Speed 3403.12 samples/sec Loss 2.8111 LearningRate 0.0085 Epoch: 14 Global Step: 71630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:52,234-Speed 3399.29 samples/sec Loss 2.7491 LearningRate 0.0085 Epoch: 14 Global Step: 71640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:55,246-Speed 3400.45 samples/sec Loss 2.6691 LearningRate 0.0085 Epoch: 14 Global Step: 71650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:36:58,258-Speed 3400.78 samples/sec Loss 2.8340 LearningRate 0.0085 Epoch: 14 Global Step: 71660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:01,274-Speed 3396.63 samples/sec Loss 2.7804 LearningRate 0.0085 Epoch: 14 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:04,269-Speed 3419.97 samples/sec Loss 2.8721 LearningRate 0.0085 Epoch: 14 Global Step: 71680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:07,297-Speed 3382.00 samples/sec Loss 2.8942 LearningRate 0.0085 Epoch: 14 Global Step: 71690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:10,311-Speed 3398.68 samples/sec Loss 2.7670 LearningRate 0.0085 Epoch: 14 Global Step: 71700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:13,329-Speed 3393.78 samples/sec Loss 2.7238 LearningRate 0.0085 Epoch: 14 Global Step: 71710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:16,346-Speed 3396.19 samples/sec Loss 2.8571 LearningRate 0.0085 Epoch: 14 Global Step: 71720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:19,359-Speed 3399.21 samples/sec Loss 2.9309 LearningRate 0.0085 Epoch: 14 Global Step: 71730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:22,375-Speed 3395.12 samples/sec Loss 2.8250 LearningRate 0.0085 Epoch: 14 Global Step: 71740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:25,383-Speed 3405.36 samples/sec Loss 2.7545 LearningRate 0.0085 Epoch: 14 Global Step: 71750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:28,396-Speed 3399.36 samples/sec Loss 2.9306 LearningRate 0.0084 Epoch: 14 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:31,416-Speed 3391.71 samples/sec Loss 2.7978 LearningRate 0.0084 Epoch: 14 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:37:34,444-Speed 3382.53 samples/sec Loss 2.7300 LearningRate 0.0084 Epoch: 14 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:37,458-Speed 3399.29 samples/sec Loss 2.8398 LearningRate 0.0084 Epoch: 14 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:40,475-Speed 3394.69 samples/sec Loss 2.8742 LearningRate 0.0084 Epoch: 14 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:43,500-Speed 3385.31 samples/sec Loss 2.7591 LearningRate 0.0084 Epoch: 14 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:46,517-Speed 3395.84 samples/sec Loss 2.8369 LearningRate 0.0084 Epoch: 14 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:49,533-Speed 3395.51 samples/sec Loss 2.7794 LearningRate 0.0084 Epoch: 14 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:52,548-Speed 3397.12 samples/sec Loss 2.8461 LearningRate 0.0084 Epoch: 14 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:55,561-Speed 3400.40 samples/sec Loss 2.8838 LearningRate 0.0084 Epoch: 14 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:37:58,576-Speed 3396.58 samples/sec Loss 2.9715 LearningRate 0.0084 Epoch: 14 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:38:01,576-Speed 3413.88 samples/sec Loss 2.8153 LearningRate 0.0084 Epoch: 14 Global Step: 71870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:04,596-Speed 3392.45 samples/sec Loss 2.9428 LearningRate 0.0084 Epoch: 14 Global Step: 71880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:07,610-Speed 3398.14 samples/sec Loss 2.7397 LearningRate 0.0084 Epoch: 14 Global Step: 71890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:10,625-Speed 3397.28 samples/sec Loss 2.8397 LearningRate 0.0084 Epoch: 14 Global Step: 71900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:13,639-Speed 3398.65 samples/sec Loss 2.9608 LearningRate 0.0084 Epoch: 14 Global Step: 71910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:16,656-Speed 3394.88 samples/sec Loss 2.7701 LearningRate 0.0084 Epoch: 14 Global Step: 71920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:19,672-Speed 3395.64 samples/sec Loss 2.7998 LearningRate 0.0083 Epoch: 14 Global Step: 71930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:22,688-Speed 3396.23 samples/sec Loss 2.8607 LearningRate 0.0083 Epoch: 14 Global Step: 71940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:25,706-Speed 3393.45 samples/sec Loss 2.7260 LearningRate 0.0083 Epoch: 14 Global Step: 71950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:28,739-Speed 3377.72 samples/sec Loss 2.9608 LearningRate 0.0083 Epoch: 14 Global Step: 71960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:38:31,757-Speed 3393.81 samples/sec Loss 2.8359 LearningRate 0.0083 Epoch: 14 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:38:34,780-Speed 3388.25 samples/sec Loss 2.7301 LearningRate 0.0083 Epoch: 14 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:38:37,828-Speed 3360.03 samples/sec Loss 2.8618 LearningRate 0.0083 Epoch: 14 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:38:40,858-Speed 3381.43 samples/sec Loss 2.8219 LearningRate 0.0083 Epoch: 14 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:39:24,919-[lfw][72000]XNorm: 22.353854 Training: 2022-04-11 06:39:24,920-[lfw][72000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 06:39:24,920-[lfw][72000]Accuracy-Highest: 0.99850 Training: 2022-04-11 06:40:16,000-[cfp_fp][72000]XNorm: 21.396975 Training: 2022-04-11 06:40:16,000-[cfp_fp][72000]Accuracy-Flip: 0.98414+-0.00608 Training: 2022-04-11 06:40:16,001-[cfp_fp][72000]Accuracy-Highest: 0.98414 Training: 2022-04-11 06:41:00,030-[agedb_30][72000]XNorm: 22.576923 Training: 2022-04-11 06:41:00,030-[agedb_30][72000]Accuracy-Flip: 0.98433+-0.00606 Training: 2022-04-11 06:41:00,031-[agedb_30][72000]Accuracy-Highest: 0.98433 Training: 2022-04-11 06:41:03,025-Speed 72.03 samples/sec Loss 2.9244 LearningRate 0.0083 Epoch: 14 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:06,014-Speed 3426.27 samples/sec Loss 2.8531 LearningRate 0.0083 Epoch: 14 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:09,006-Speed 3423.21 samples/sec Loss 2.8063 LearningRate 0.0083 Epoch: 14 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:12,003-Speed 3417.86 samples/sec Loss 2.8122 LearningRate 0.0083 Epoch: 14 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:14,994-Speed 3424.27 samples/sec Loss 2.8469 LearningRate 0.0083 Epoch: 14 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:17,989-Speed 3419.80 samples/sec Loss 2.7974 LearningRate 0.0083 Epoch: 14 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:20,966-Speed 3440.31 samples/sec Loss 2.7933 LearningRate 0.0083 Epoch: 14 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:23,962-Speed 3418.97 samples/sec Loss 2.7879 LearningRate 0.0083 Epoch: 14 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:26,979-Speed 3395.17 samples/sec Loss 2.7830 LearningRate 0.0083 Epoch: 14 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:29,978-Speed 3416.52 samples/sec Loss 2.7945 LearningRate 0.0083 Epoch: 14 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:32,991-Speed 3399.26 samples/sec Loss 2.8608 LearningRate 0.0082 Epoch: 14 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:35,989-Speed 3416.61 samples/sec Loss 2.8532 LearningRate 0.0082 Epoch: 14 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:38,989-Speed 3414.43 samples/sec Loss 2.9175 LearningRate 0.0082 Epoch: 14 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:42,000-Speed 3400.96 samples/sec Loss 2.7940 LearningRate 0.0082 Epoch: 14 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:45,003-Speed 3410.53 samples/sec Loss 2.7678 LearningRate 0.0082 Epoch: 14 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:48,013-Speed 3403.47 samples/sec Loss 2.8017 LearningRate 0.0082 Epoch: 14 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:51,018-Speed 3408.44 samples/sec Loss 2.8494 LearningRate 0.0082 Epoch: 14 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 06:41:54,002-Speed 3433.32 samples/sec Loss 2.9179 LearningRate 0.0082 Epoch: 14 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:41:57,008-Speed 3406.93 samples/sec Loss 2.8197 LearningRate 0.0082 Epoch: 14 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:00,003-Speed 3419.82 samples/sec Loss 2.8068 LearningRate 0.0082 Epoch: 14 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:03,015-Speed 3400.79 samples/sec Loss 2.8078 LearningRate 0.0082 Epoch: 14 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:06,035-Speed 3391.57 samples/sec Loss 2.8347 LearningRate 0.0082 Epoch: 14 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:09,045-Speed 3403.16 samples/sec Loss 2.8844 LearningRate 0.0082 Epoch: 14 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:12,050-Speed 3408.11 samples/sec Loss 2.8862 LearningRate 0.0082 Epoch: 14 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:15,066-Speed 3396.68 samples/sec Loss 2.7560 LearningRate 0.0082 Epoch: 14 Global Step: 72250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:18,082-Speed 3395.46 samples/sec Loss 2.9106 LearningRate 0.0082 Epoch: 14 Global Step: 72260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:21,096-Speed 3398.81 samples/sec Loss 2.8782 LearningRate 0.0082 Epoch: 14 Global Step: 72270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:24,115-Speed 3392.27 samples/sec Loss 2.9168 LearningRate 0.0082 Epoch: 14 Global Step: 72280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:27,129-Speed 3398.96 samples/sec Loss 2.9630 LearningRate 0.0081 Epoch: 14 Global Step: 72290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:42:30,141-Speed 3400.74 samples/sec Loss 2.8305 LearningRate 0.0081 Epoch: 14 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:33,165-Speed 3387.62 samples/sec Loss 2.7549 LearningRate 0.0081 Epoch: 14 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:36,188-Speed 3388.10 samples/sec Loss 2.7834 LearningRate 0.0081 Epoch: 14 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:39,243-Speed 3351.59 samples/sec Loss 2.8455 LearningRate 0.0081 Epoch: 14 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:42,282-Speed 3370.91 samples/sec Loss 2.8008 LearningRate 0.0081 Epoch: 14 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:45,289-Speed 3406.20 samples/sec Loss 2.8710 LearningRate 0.0081 Epoch: 14 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:48,310-Speed 3390.69 samples/sec Loss 2.8587 LearningRate 0.0081 Epoch: 14 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:51,320-Speed 3403.61 samples/sec Loss 2.8208 LearningRate 0.0081 Epoch: 14 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:54,330-Speed 3401.97 samples/sec Loss 2.9367 LearningRate 0.0081 Epoch: 14 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:42:57,337-Speed 3406.56 samples/sec Loss 2.8852 LearningRate 0.0081 Epoch: 14 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:00,331-Speed 3421.74 samples/sec Loss 2.7627 LearningRate 0.0081 Epoch: 14 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:03,342-Speed 3401.24 samples/sec Loss 2.7745 LearningRate 0.0081 Epoch: 14 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:06,352-Speed 3403.10 samples/sec Loss 2.9264 LearningRate 0.0081 Epoch: 14 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:09,362-Speed 3401.79 samples/sec Loss 2.8712 LearningRate 0.0081 Epoch: 14 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:12,383-Speed 3390.91 samples/sec Loss 2.9704 LearningRate 0.0081 Epoch: 14 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:15,402-Speed 3392.81 samples/sec Loss 2.9426 LearningRate 0.0081 Epoch: 14 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:18,417-Speed 3397.65 samples/sec Loss 2.8874 LearningRate 0.0080 Epoch: 14 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:21,431-Speed 3398.46 samples/sec Loss 2.8405 LearningRate 0.0080 Epoch: 14 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:24,437-Speed 3406.57 samples/sec Loss 2.8920 LearningRate 0.0080 Epoch: 14 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:27,448-Speed 3401.98 samples/sec Loss 2.7864 LearningRate 0.0080 Epoch: 14 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:30,439-Speed 3424.81 samples/sec Loss 2.7839 LearningRate 0.0080 Epoch: 14 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:33,450-Speed 3400.93 samples/sec Loss 2.8947 LearningRate 0.0080 Epoch: 14 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:36,465-Speed 3397.80 samples/sec Loss 2.8763 LearningRate 0.0080 Epoch: 14 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:39,473-Speed 3405.24 samples/sec Loss 2.8356 LearningRate 0.0080 Epoch: 14 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:42,480-Speed 3405.92 samples/sec Loss 2.8958 LearningRate 0.0080 Epoch: 14 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:43:45,470-Speed 3425.96 samples/sec Loss 2.9140 LearningRate 0.0080 Epoch: 14 Global Step: 72550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:43:48,483-Speed 3399.05 samples/sec Loss 2.8785 LearningRate 0.0080 Epoch: 14 Global Step: 72560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:43:51,515-Speed 3378.24 samples/sec Loss 3.0178 LearningRate 0.0080 Epoch: 14 Global Step: 72570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:43:54,522-Speed 3406.38 samples/sec Loss 2.7822 LearningRate 0.0080 Epoch: 14 Global Step: 72580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:43:57,529-Speed 3406.62 samples/sec Loss 2.8011 LearningRate 0.0080 Epoch: 14 Global Step: 72590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:00,540-Speed 3401.46 samples/sec Loss 3.0326 LearningRate 0.0080 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:03,545-Speed 3408.41 samples/sec Loss 2.7411 LearningRate 0.0080 Epoch: 14 Global Step: 72610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:06,555-Speed 3402.97 samples/sec Loss 2.9755 LearningRate 0.0080 Epoch: 14 Global Step: 72620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:09,582-Speed 3383.29 samples/sec Loss 2.8251 LearningRate 0.0080 Epoch: 14 Global Step: 72630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:12,591-Speed 3404.48 samples/sec Loss 2.8583 LearningRate 0.0079 Epoch: 14 Global Step: 72640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:15,627-Speed 3373.96 samples/sec Loss 2.9547 LearningRate 0.0079 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:18,644-Speed 3395.14 samples/sec Loss 2.8787 LearningRate 0.0079 Epoch: 14 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:21,653-Speed 3404.08 samples/sec Loss 2.7908 LearningRate 0.0079 Epoch: 14 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:24,663-Speed 3402.62 samples/sec Loss 2.9469 LearningRate 0.0079 Epoch: 14 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:27,678-Speed 3397.39 samples/sec Loss 2.8864 LearningRate 0.0079 Epoch: 14 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:30,689-Speed 3401.15 samples/sec Loss 2.8898 LearningRate 0.0079 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:33,701-Speed 3401.48 samples/sec Loss 2.9290 LearningRate 0.0079 Epoch: 14 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:44:36,698-Speed 3417.54 samples/sec Loss 2.9110 LearningRate 0.0079 Epoch: 14 Global Step: 72720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:39,713-Speed 3396.87 samples/sec Loss 2.8880 LearningRate 0.0079 Epoch: 14 Global Step: 72730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:42,728-Speed 3397.79 samples/sec Loss 2.8957 LearningRate 0.0079 Epoch: 14 Global Step: 72740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:45,741-Speed 3398.67 samples/sec Loss 2.8715 LearningRate 0.0079 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:48,755-Speed 3398.85 samples/sec Loss 2.8817 LearningRate 0.0079 Epoch: 14 Global Step: 72760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:51,769-Speed 3398.28 samples/sec Loss 2.8674 LearningRate 0.0079 Epoch: 14 Global Step: 72770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:54,792-Speed 3388.78 samples/sec Loss 2.8947 LearningRate 0.0079 Epoch: 14 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:44:57,811-Speed 3392.18 samples/sec Loss 2.8589 LearningRate 0.0079 Epoch: 14 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:45:00,833-Speed 3389.70 samples/sec Loss 2.8702 LearningRate 0.0079 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:45:03,848-Speed 3396.73 samples/sec Loss 2.8667 LearningRate 0.0079 Epoch: 14 Global Step: 72810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:45:06,864-Speed 3395.95 samples/sec Loss 2.8850 LearningRate 0.0078 Epoch: 14 Global Step: 72820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:09,877-Speed 3400.20 samples/sec Loss 2.8735 LearningRate 0.0078 Epoch: 14 Global Step: 72830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:12,892-Speed 3397.29 samples/sec Loss 2.8551 LearningRate 0.0078 Epoch: 14 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:15,912-Speed 3391.96 samples/sec Loss 2.8474 LearningRate 0.0078 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:18,940-Speed 3381.89 samples/sec Loss 2.8688 LearningRate 0.0078 Epoch: 14 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:21,968-Speed 3382.75 samples/sec Loss 2.9351 LearningRate 0.0078 Epoch: 14 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:24,988-Speed 3392.43 samples/sec Loss 2.9820 LearningRate 0.0078 Epoch: 14 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:28,000-Speed 3399.99 samples/sec Loss 2.9940 LearningRate 0.0078 Epoch: 14 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:31,020-Speed 3391.63 samples/sec Loss 2.8809 LearningRate 0.0078 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:34,039-Speed 3392.95 samples/sec Loss 2.9479 LearningRate 0.0078 Epoch: 14 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:37,036-Speed 3417.52 samples/sec Loss 2.9412 LearningRate 0.0078 Epoch: 14 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:40,053-Speed 3395.33 samples/sec Loss 2.9110 LearningRate 0.0078 Epoch: 14 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:43,071-Speed 3393.45 samples/sec Loss 2.9213 LearningRate 0.0078 Epoch: 14 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:46,095-Speed 3387.82 samples/sec Loss 2.9554 LearningRate 0.0078 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:49,180-Speed 3320.25 samples/sec Loss 2.8667 LearningRate 0.0078 Epoch: 14 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:52,202-Speed 3388.53 samples/sec Loss 2.9779 LearningRate 0.0078 Epoch: 14 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:55,214-Speed 3401.10 samples/sec Loss 2.9376 LearningRate 0.0078 Epoch: 14 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:45:58,230-Speed 3396.02 samples/sec Loss 2.9594 LearningRate 0.0078 Epoch: 14 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:01,255-Speed 3386.39 samples/sec Loss 2.9763 LearningRate 0.0077 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:04,270-Speed 3396.64 samples/sec Loss 2.9815 LearningRate 0.0077 Epoch: 14 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:07,261-Speed 3424.93 samples/sec Loss 2.9077 LearningRate 0.0077 Epoch: 14 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:10,279-Speed 3392.82 samples/sec Loss 2.7500 LearningRate 0.0077 Epoch: 14 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:13,293-Speed 3399.12 samples/sec Loss 2.8931 LearningRate 0.0077 Epoch: 14 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:16,319-Speed 3384.56 samples/sec Loss 2.8959 LearningRate 0.0077 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:19,336-Speed 3395.20 samples/sec Loss 2.9172 LearningRate 0.0077 Epoch: 14 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:22,329-Speed 3422.29 samples/sec Loss 2.9129 LearningRate 0.0077 Epoch: 14 Global Step: 73070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:25,368-Speed 3370.78 samples/sec Loss 2.9622 LearningRate 0.0077 Epoch: 14 Global Step: 73080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:28,421-Speed 3354.90 samples/sec Loss 3.0007 LearningRate 0.0077 Epoch: 14 Global Step: 73090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:31,436-Speed 3397.19 samples/sec Loss 2.9719 LearningRate 0.0077 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:34,453-Speed 3394.85 samples/sec Loss 2.9555 LearningRate 0.0077 Epoch: 14 Global Step: 73110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:37,505-Speed 3355.24 samples/sec Loss 2.9142 LearningRate 0.0077 Epoch: 14 Global Step: 73120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:40,519-Speed 3398.97 samples/sec Loss 2.9630 LearningRate 0.0077 Epoch: 14 Global Step: 73130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:43,537-Speed 3393.42 samples/sec Loss 2.9073 LearningRate 0.0077 Epoch: 14 Global Step: 73140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:46,553-Speed 3397.05 samples/sec Loss 2.8642 LearningRate 0.0077 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:49,561-Speed 3404.09 samples/sec Loss 2.8637 LearningRate 0.0077 Epoch: 14 Global Step: 73160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:46:52,572-Speed 3402.13 samples/sec Loss 2.9695 LearningRate 0.0077 Epoch: 14 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:55,579-Speed 3405.82 samples/sec Loss 2.9757 LearningRate 0.0077 Epoch: 14 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:46:58,594-Speed 3398.12 samples/sec Loss 2.9082 LearningRate 0.0076 Epoch: 14 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:47:01,588-Speed 3420.44 samples/sec Loss 2.8390 LearningRate 0.0076 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:04,599-Speed 3401.60 samples/sec Loss 2.9668 LearningRate 0.0076 Epoch: 14 Global Step: 73210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:07,622-Speed 3388.22 samples/sec Loss 2.8557 LearningRate 0.0076 Epoch: 14 Global Step: 73220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:10,634-Speed 3400.72 samples/sec Loss 3.0347 LearningRate 0.0076 Epoch: 14 Global Step: 73230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:13,643-Speed 3403.82 samples/sec Loss 2.9177 LearningRate 0.0076 Epoch: 14 Global Step: 73240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:16,679-Speed 3373.63 samples/sec Loss 2.8446 LearningRate 0.0076 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:19,692-Speed 3400.00 samples/sec Loss 2.8169 LearningRate 0.0076 Epoch: 14 Global Step: 73260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:22,707-Speed 3397.03 samples/sec Loss 2.8907 LearningRate 0.0076 Epoch: 14 Global Step: 73270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:25,722-Speed 3396.77 samples/sec Loss 2.9274 LearningRate 0.0076 Epoch: 14 Global Step: 73280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:28,746-Speed 3387.31 samples/sec Loss 2.8532 LearningRate 0.0076 Epoch: 14 Global Step: 73290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:31,759-Speed 3400.03 samples/sec Loss 2.9562 LearningRate 0.0076 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:47:34,781-Speed 3389.21 samples/sec Loss 2.8580 LearningRate 0.0076 Epoch: 14 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:47:37,785-Speed 3409.07 samples/sec Loss 2.9172 LearningRate 0.0076 Epoch: 14 Global Step: 73320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:40,799-Speed 3398.36 samples/sec Loss 2.8860 LearningRate 0.0076 Epoch: 14 Global Step: 73330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:43,816-Speed 3394.98 samples/sec Loss 2.8538 LearningRate 0.0076 Epoch: 14 Global Step: 73340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:46,830-Speed 3399.27 samples/sec Loss 2.8753 LearningRate 0.0076 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:49,844-Speed 3398.11 samples/sec Loss 2.8402 LearningRate 0.0076 Epoch: 14 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:52,864-Speed 3391.40 samples/sec Loss 2.9042 LearningRate 0.0075 Epoch: 14 Global Step: 73370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:55,877-Speed 3399.39 samples/sec Loss 2.9677 LearningRate 0.0075 Epoch: 14 Global Step: 73380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:47:58,885-Speed 3404.65 samples/sec Loss 2.8588 LearningRate 0.0075 Epoch: 14 Global Step: 73390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:48:01,898-Speed 3400.26 samples/sec Loss 2.9168 LearningRate 0.0075 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:48:04,924-Speed 3384.40 samples/sec Loss 2.8963 LearningRate 0.0075 Epoch: 14 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:48:07,946-Speed 3389.08 samples/sec Loss 2.9556 LearningRate 0.0075 Epoch: 14 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:10,957-Speed 3401.66 samples/sec Loss 2.9405 LearningRate 0.0075 Epoch: 14 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:13,976-Speed 3393.27 samples/sec Loss 2.8982 LearningRate 0.0075 Epoch: 14 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:16,989-Speed 3399.18 samples/sec Loss 2.9033 LearningRate 0.0075 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:20,000-Speed 3401.43 samples/sec Loss 2.9164 LearningRate 0.0075 Epoch: 14 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:23,017-Speed 3395.59 samples/sec Loss 3.0018 LearningRate 0.0075 Epoch: 14 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:26,073-Speed 3351.17 samples/sec Loss 2.9803 LearningRate 0.0075 Epoch: 14 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:29,089-Speed 3396.17 samples/sec Loss 2.8193 LearningRate 0.0075 Epoch: 14 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:32,103-Speed 3398.97 samples/sec Loss 2.8346 LearningRate 0.0075 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:35,121-Speed 3393.48 samples/sec Loss 2.9596 LearningRate 0.0075 Epoch: 14 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:38,162-Speed 3368.69 samples/sec Loss 2.9199 LearningRate 0.0075 Epoch: 14 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:41,210-Speed 3359.53 samples/sec Loss 3.0005 LearningRate 0.0075 Epoch: 14 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:44,238-Speed 3382.42 samples/sec Loss 2.8602 LearningRate 0.0075 Epoch: 14 Global Step: 73540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:47,266-Speed 3383.43 samples/sec Loss 2.9996 LearningRate 0.0074 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:50,280-Speed 3398.68 samples/sec Loss 2.9527 LearningRate 0.0074 Epoch: 14 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:53,297-Speed 3394.71 samples/sec Loss 2.9052 LearningRate 0.0074 Epoch: 14 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:56,306-Speed 3404.01 samples/sec Loss 2.9780 LearningRate 0.0074 Epoch: 14 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:48:59,317-Speed 3401.68 samples/sec Loss 3.0063 LearningRate 0.0074 Epoch: 14 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:49:02,329-Speed 3400.86 samples/sec Loss 2.8376 LearningRate 0.0074 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:49:05,322-Speed 3421.10 samples/sec Loss 2.9247 LearningRate 0.0074 Epoch: 14 Global Step: 73610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:08,333-Speed 3402.19 samples/sec Loss 2.7632 LearningRate 0.0074 Epoch: 14 Global Step: 73620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:11,345-Speed 3400.87 samples/sec Loss 3.1004 LearningRate 0.0074 Epoch: 14 Global Step: 73630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:14,354-Speed 3404.32 samples/sec Loss 2.9149 LearningRate 0.0074 Epoch: 14 Global Step: 73640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:17,374-Speed 3391.68 samples/sec Loss 2.9591 LearningRate 0.0074 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:20,391-Speed 3394.04 samples/sec Loss 2.9571 LearningRate 0.0074 Epoch: 14 Global Step: 73660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:23,411-Speed 3392.59 samples/sec Loss 2.8295 LearningRate 0.0074 Epoch: 14 Global Step: 73670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:26,425-Speed 3397.33 samples/sec Loss 2.9417 LearningRate 0.0074 Epoch: 14 Global Step: 73680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:29,437-Speed 3400.86 samples/sec Loss 2.9018 LearningRate 0.0074 Epoch: 14 Global Step: 73690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:32,449-Speed 3400.38 samples/sec Loss 2.9150 LearningRate 0.0074 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:35,469-Speed 3391.79 samples/sec Loss 2.9020 LearningRate 0.0074 Epoch: 14 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:49:38,491-Speed 3389.13 samples/sec Loss 2.9045 LearningRate 0.0074 Epoch: 14 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:49:41,504-Speed 3399.82 samples/sec Loss 2.9823 LearningRate 0.0074 Epoch: 14 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:49:44,497-Speed 3422.70 samples/sec Loss 2.9659 LearningRate 0.0073 Epoch: 14 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:47,518-Speed 3389.86 samples/sec Loss 2.9963 LearningRate 0.0073 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:50,537-Speed 3392.90 samples/sec Loss 2.8619 LearningRate 0.0073 Epoch: 14 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:53,556-Speed 3393.08 samples/sec Loss 2.9852 LearningRate 0.0073 Epoch: 14 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:56,567-Speed 3401.91 samples/sec Loss 3.0306 LearningRate 0.0073 Epoch: 14 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:49:59,579-Speed 3400.13 samples/sec Loss 2.8977 LearningRate 0.0073 Epoch: 14 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:50:02,597-Speed 3394.31 samples/sec Loss 2.8861 LearningRate 0.0073 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:50:05,606-Speed 3403.13 samples/sec Loss 2.8214 LearningRate 0.0073 Epoch: 14 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:50:08,619-Speed 3399.61 samples/sec Loss 2.8269 LearningRate 0.0073 Epoch: 14 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:50:11,644-Speed 3386.06 samples/sec Loss 2.8292 LearningRate 0.0073 Epoch: 14 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:50:14,705-Speed 3346.04 samples/sec Loss 3.0183 LearningRate 0.0073 Epoch: 14 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:17,722-Speed 3395.52 samples/sec Loss 2.9888 LearningRate 0.0073 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:20,735-Speed 3399.01 samples/sec Loss 2.8698 LearningRate 0.0073 Epoch: 14 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:23,745-Speed 3403.55 samples/sec Loss 2.9168 LearningRate 0.0073 Epoch: 14 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:26,763-Speed 3393.55 samples/sec Loss 2.9686 LearningRate 0.0073 Epoch: 14 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:29,779-Speed 3395.71 samples/sec Loss 2.8405 LearningRate 0.0073 Epoch: 14 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:32,797-Speed 3394.72 samples/sec Loss 2.9510 LearningRate 0.0073 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:35,809-Speed 3399.47 samples/sec Loss 2.7893 LearningRate 0.0073 Epoch: 14 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:38,822-Speed 3400.24 samples/sec Loss 2.9040 LearningRate 0.0073 Epoch: 14 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:41,834-Speed 3401.07 samples/sec Loss 2.9824 LearningRate 0.0072 Epoch: 14 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:44,829-Speed 3418.85 samples/sec Loss 2.9196 LearningRate 0.0072 Epoch: 14 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:47,838-Speed 3404.40 samples/sec Loss 2.9419 LearningRate 0.0072 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:50,852-Speed 3398.77 samples/sec Loss 2.8949 LearningRate 0.0072 Epoch: 14 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:53,867-Speed 3397.44 samples/sec Loss 2.9218 LearningRate 0.0072 Epoch: 14 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:56,880-Speed 3399.01 samples/sec Loss 2.8489 LearningRate 0.0072 Epoch: 14 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:50:59,922-Speed 3366.37 samples/sec Loss 3.0105 LearningRate 0.0072 Epoch: 14 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:51:02,937-Speed 3397.21 samples/sec Loss 2.8678 LearningRate 0.0072 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:51:47,251-[lfw][74000]XNorm: 21.648780 Training: 2022-04-11 06:51:47,252-[lfw][74000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 06:51:47,253-[lfw][74000]Accuracy-Highest: 0.99850 Training: 2022-04-11 06:52:38,720-[cfp_fp][74000]XNorm: 20.810559 Training: 2022-04-11 06:52:38,721-[cfp_fp][74000]Accuracy-Flip: 0.98186+-0.00582 Training: 2022-04-11 06:52:38,721-[cfp_fp][74000]Accuracy-Highest: 0.98414 Training: 2022-04-11 06:53:23,088-[agedb_30][74000]XNorm: 21.834063 Training: 2022-04-11 06:53:23,089-[agedb_30][74000]Accuracy-Flip: 0.98383+-0.00597 Training: 2022-04-11 06:53:23,090-[agedb_30][74000]Accuracy-Highest: 0.98433 Training: 2022-04-11 06:53:26,093-Speed 71.53 samples/sec Loss 2.9331 LearningRate 0.0072 Epoch: 14 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:53:29,087-Speed 3421.97 samples/sec Loss 2.9823 LearningRate 0.0072 Epoch: 14 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:53:32,075-Speed 3427.78 samples/sec Loss 2.9226 LearningRate 0.0072 Epoch: 14 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:53:35,069-Speed 3421.50 samples/sec Loss 2.9922 LearningRate 0.0072 Epoch: 14 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 06:53:38,041-Speed 3445.98 samples/sec Loss 2.9290 LearningRate 0.0072 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:53:41,033-Speed 3423.25 samples/sec Loss 2.9030 LearningRate 0.0072 Epoch: 14 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:53:44,009-Speed 3441.89 samples/sec Loss 2.9249 LearningRate 0.0072 Epoch: 14 Global Step: 74070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:53:47,003-Speed 3420.91 samples/sec Loss 3.0250 LearningRate 0.0072 Epoch: 14 Global Step: 74080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:53:50,046-Speed 3365.73 samples/sec Loss 2.8446 LearningRate 0.0072 Epoch: 14 Global Step: 74090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:53:53,058-Speed 3400.80 samples/sec Loss 2.9727 LearningRate 0.0072 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:53:56,056-Speed 3416.50 samples/sec Loss 2.9037 LearningRate 0.0072 Epoch: 14 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:53:59,069-Speed 3398.91 samples/sec Loss 2.8753 LearningRate 0.0071 Epoch: 14 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:02,109-Speed 3369.32 samples/sec Loss 2.8705 LearningRate 0.0071 Epoch: 14 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:05,110-Speed 3413.21 samples/sec Loss 2.9130 LearningRate 0.0071 Epoch: 14 Global Step: 74140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:08,113-Speed 3411.18 samples/sec Loss 2.9099 LearningRate 0.0071 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:11,113-Speed 3414.07 samples/sec Loss 2.8892 LearningRate 0.0071 Epoch: 14 Global Step: 74160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:14,114-Speed 3412.65 samples/sec Loss 2.9778 LearningRate 0.0071 Epoch: 14 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:54:17,116-Speed 3412.06 samples/sec Loss 2.8044 LearningRate 0.0071 Epoch: 14 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:54:20,104-Speed 3427.89 samples/sec Loss 2.9155 LearningRate 0.0071 Epoch: 14 Global Step: 74190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:23,109-Speed 3408.36 samples/sec Loss 2.9493 LearningRate 0.0071 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:26,108-Speed 3416.18 samples/sec Loss 2.9475 LearningRate 0.0071 Epoch: 14 Global Step: 74210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:29,109-Speed 3413.13 samples/sec Loss 2.8975 LearningRate 0.0071 Epoch: 14 Global Step: 74220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:32,113-Speed 3409.49 samples/sec Loss 2.9692 LearningRate 0.0071 Epoch: 14 Global Step: 74230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:35,118-Speed 3408.69 samples/sec Loss 2.8594 LearningRate 0.0071 Epoch: 14 Global Step: 74240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:38,118-Speed 3414.26 samples/sec Loss 2.9143 LearningRate 0.0071 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:41,123-Speed 3408.11 samples/sec Loss 2.9483 LearningRate 0.0071 Epoch: 14 Global Step: 74260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:44,125-Speed 3412.68 samples/sec Loss 2.8865 LearningRate 0.0071 Epoch: 14 Global Step: 74270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:47,132-Speed 3405.87 samples/sec Loss 2.9679 LearningRate 0.0071 Epoch: 14 Global Step: 74280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:54:50,136-Speed 3410.00 samples/sec Loss 2.8530 LearningRate 0.0071 Epoch: 14 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:54:53,184-Speed 3360.03 samples/sec Loss 2.9678 LearningRate 0.0071 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:54:56,184-Speed 3413.56 samples/sec Loss 3.0412 LearningRate 0.0070 Epoch: 14 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:54:59,188-Speed 3409.92 samples/sec Loss 2.8266 LearningRate 0.0070 Epoch: 14 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:02,195-Speed 3406.36 samples/sec Loss 2.9828 LearningRate 0.0070 Epoch: 14 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:05,199-Speed 3409.85 samples/sec Loss 2.8736 LearningRate 0.0070 Epoch: 14 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:08,201-Speed 3411.80 samples/sec Loss 2.9047 LearningRate 0.0070 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:11,201-Speed 3414.20 samples/sec Loss 2.9547 LearningRate 0.0070 Epoch: 14 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:14,210-Speed 3403.78 samples/sec Loss 2.9393 LearningRate 0.0070 Epoch: 14 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:17,229-Speed 3393.44 samples/sec Loss 2.8294 LearningRate 0.0070 Epoch: 14 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:20,207-Speed 3438.61 samples/sec Loss 2.9467 LearningRate 0.0070 Epoch: 14 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:23,212-Speed 3408.84 samples/sec Loss 2.8446 LearningRate 0.0070 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:26,223-Speed 3402.49 samples/sec Loss 3.0385 LearningRate 0.0070 Epoch: 14 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:29,235-Speed 3400.37 samples/sec Loss 2.8899 LearningRate 0.0070 Epoch: 14 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:32,235-Speed 3414.26 samples/sec Loss 2.8707 LearningRate 0.0070 Epoch: 14 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:35,246-Speed 3401.04 samples/sec Loss 2.8504 LearningRate 0.0070 Epoch: 14 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:38,257-Speed 3402.22 samples/sec Loss 2.8410 LearningRate 0.0070 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:41,263-Speed 3406.87 samples/sec Loss 2.8588 LearningRate 0.0070 Epoch: 14 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:44,269-Speed 3407.82 samples/sec Loss 2.9460 LearningRate 0.0070 Epoch: 14 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:47,272-Speed 3410.21 samples/sec Loss 2.9575 LearningRate 0.0070 Epoch: 14 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:50,250-Speed 3439.37 samples/sec Loss 2.9365 LearningRate 0.0070 Epoch: 14 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:53,263-Speed 3400.32 samples/sec Loss 2.8238 LearningRate 0.0069 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:56,266-Speed 3410.45 samples/sec Loss 2.8252 LearningRate 0.0069 Epoch: 14 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:55:59,270-Speed 3409.94 samples/sec Loss 2.9275 LearningRate 0.0069 Epoch: 14 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:02,273-Speed 3410.14 samples/sec Loss 2.9011 LearningRate 0.0069 Epoch: 14 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:05,284-Speed 3402.81 samples/sec Loss 2.8705 LearningRate 0.0069 Epoch: 14 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:08,288-Speed 3408.52 samples/sec Loss 2.8002 LearningRate 0.0069 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:11,299-Speed 3401.84 samples/sec Loss 2.9771 LearningRate 0.0069 Epoch: 14 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:14,313-Speed 3398.97 samples/sec Loss 2.8863 LearningRate 0.0069 Epoch: 14 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:17,319-Speed 3407.29 samples/sec Loss 2.8677 LearningRate 0.0069 Epoch: 14 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:20,304-Speed 3430.82 samples/sec Loss 3.0011 LearningRate 0.0069 Epoch: 14 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:23,309-Speed 3408.65 samples/sec Loss 2.9017 LearningRate 0.0069 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:26,311-Speed 3412.53 samples/sec Loss 2.8405 LearningRate 0.0069 Epoch: 14 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:29,324-Speed 3399.58 samples/sec Loss 2.8360 LearningRate 0.0069 Epoch: 14 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:32,332-Speed 3404.99 samples/sec Loss 2.7976 LearningRate 0.0069 Epoch: 14 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:35,354-Speed 3389.69 samples/sec Loss 2.8155 LearningRate 0.0069 Epoch: 14 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:56:38,348-Speed 3420.41 samples/sec Loss 2.9649 LearningRate 0.0069 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:41,356-Speed 3405.13 samples/sec Loss 2.9475 LearningRate 0.0069 Epoch: 14 Global Step: 74660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:44,361-Speed 3408.71 samples/sec Loss 2.9425 LearningRate 0.0069 Epoch: 14 Global Step: 74670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:47,367-Speed 3407.07 samples/sec Loss 2.8186 LearningRate 0.0069 Epoch: 14 Global Step: 74680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:50,373-Speed 3408.65 samples/sec Loss 2.8245 LearningRate 0.0068 Epoch: 14 Global Step: 74690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:53,384-Speed 3400.60 samples/sec Loss 2.8696 LearningRate 0.0068 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:56,390-Speed 3407.77 samples/sec Loss 2.9915 LearningRate 0.0068 Epoch: 14 Global Step: 74710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:56:59,392-Speed 3412.47 samples/sec Loss 2.9206 LearningRate 0.0068 Epoch: 14 Global Step: 74720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:57:02,403-Speed 3400.74 samples/sec Loss 2.9234 LearningRate 0.0068 Epoch: 14 Global Step: 74730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:57:05,416-Speed 3400.36 samples/sec Loss 2.8810 LearningRate 0.0068 Epoch: 14 Global Step: 74740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:57:08,423-Speed 3405.38 samples/sec Loss 2.7871 LearningRate 0.0068 Epoch: 14 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:11,440-Speed 3395.39 samples/sec Loss 2.7942 LearningRate 0.0068 Epoch: 14 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:14,444-Speed 3409.15 samples/sec Loss 2.9021 LearningRate 0.0068 Epoch: 14 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:17,459-Speed 3397.62 samples/sec Loss 2.9491 LearningRate 0.0068 Epoch: 14 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:20,470-Speed 3401.87 samples/sec Loss 2.7418 LearningRate 0.0068 Epoch: 14 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:23,488-Speed 3393.72 samples/sec Loss 2.9784 LearningRate 0.0068 Epoch: 14 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:26,499-Speed 3402.16 samples/sec Loss 2.8002 LearningRate 0.0068 Epoch: 14 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:29,508-Speed 3403.54 samples/sec Loss 3.0106 LearningRate 0.0068 Epoch: 14 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:32,515-Speed 3405.94 samples/sec Loss 2.8174 LearningRate 0.0068 Epoch: 14 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:35,531-Speed 3396.91 samples/sec Loss 2.8442 LearningRate 0.0068 Epoch: 14 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:38,538-Speed 3405.92 samples/sec Loss 2.8295 LearningRate 0.0068 Epoch: 14 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 06:57:41,537-Speed 3414.84 samples/sec Loss 2.9278 LearningRate 0.0068 Epoch: 14 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:44,544-Speed 3407.64 samples/sec Loss 2.8721 LearningRate 0.0068 Epoch: 14 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:47,556-Speed 3399.88 samples/sec Loss 2.9042 LearningRate 0.0067 Epoch: 14 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:50,561-Speed 3409.13 samples/sec Loss 2.6974 LearningRate 0.0067 Epoch: 14 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:53,593-Speed 3377.49 samples/sec Loss 2.8649 LearningRate 0.0067 Epoch: 14 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:56,601-Speed 3404.90 samples/sec Loss 2.7739 LearningRate 0.0067 Epoch: 14 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:57:59,612-Speed 3402.32 samples/sec Loss 2.8291 LearningRate 0.0067 Epoch: 14 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:02,684-Speed 3333.81 samples/sec Loss 2.9279 LearningRate 0.0067 Epoch: 14 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:05,677-Speed 3422.09 samples/sec Loss 3.0540 LearningRate 0.0067 Epoch: 14 Global Step: 74940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:08,689-Speed 3401.07 samples/sec Loss 2.7891 LearningRate 0.0067 Epoch: 14 Global Step: 74950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:11,698-Speed 3403.81 samples/sec Loss 2.8527 LearningRate 0.0067 Epoch: 14 Global Step: 74960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:14,707-Speed 3405.08 samples/sec Loss 2.9161 LearningRate 0.0067 Epoch: 14 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:17,722-Speed 3397.66 samples/sec Loss 2.9920 LearningRate 0.0067 Epoch: 14 Global Step: 74980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:20,730-Speed 3405.21 samples/sec Loss 2.9278 LearningRate 0.0067 Epoch: 14 Global Step: 74990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:23,739-Speed 3403.32 samples/sec Loss 2.8903 LearningRate 0.0067 Epoch: 14 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:26,763-Speed 3387.01 samples/sec Loss 2.8610 LearningRate 0.0067 Epoch: 14 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:29,771-Speed 3405.52 samples/sec Loss 2.7613 LearningRate 0.0067 Epoch: 14 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:32,785-Speed 3398.46 samples/sec Loss 2.9182 LearningRate 0.0067 Epoch: 14 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:58:35,808-Speed 3387.70 samples/sec Loss 2.8775 LearningRate 0.0067 Epoch: 14 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:38,833-Speed 3386.97 samples/sec Loss 2.7964 LearningRate 0.0067 Epoch: 14 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:41,840-Speed 3405.17 samples/sec Loss 2.8985 LearningRate 0.0067 Epoch: 14 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:44,865-Speed 3387.10 samples/sec Loss 2.8866 LearningRate 0.0067 Epoch: 14 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:47,882-Speed 3394.89 samples/sec Loss 2.8526 LearningRate 0.0066 Epoch: 14 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:50,893-Speed 3401.89 samples/sec Loss 2.8234 LearningRate 0.0066 Epoch: 14 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:53,906-Speed 3398.80 samples/sec Loss 2.9293 LearningRate 0.0066 Epoch: 14 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:56,923-Speed 3394.18 samples/sec Loss 2.9077 LearningRate 0.0066 Epoch: 14 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:58:59,938-Speed 3397.59 samples/sec Loss 2.8720 LearningRate 0.0066 Epoch: 14 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:02,982-Speed 3364.68 samples/sec Loss 2.9363 LearningRate 0.0066 Epoch: 14 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:05,987-Speed 3409.18 samples/sec Loss 2.9142 LearningRate 0.0066 Epoch: 14 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:09,013-Speed 3384.50 samples/sec Loss 2.8866 LearningRate 0.0066 Epoch: 14 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:12,024-Speed 3401.54 samples/sec Loss 2.8349 LearningRate 0.0066 Epoch: 14 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:15,038-Speed 3398.43 samples/sec Loss 3.0091 LearningRate 0.0066 Epoch: 14 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:18,053-Speed 3397.35 samples/sec Loss 2.8644 LearningRate 0.0066 Epoch: 14 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:21,064-Speed 3401.71 samples/sec Loss 2.8872 LearningRate 0.0066 Epoch: 14 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:24,068-Speed 3410.01 samples/sec Loss 2.8228 LearningRate 0.0066 Epoch: 14 Global Step: 75200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:27,076-Speed 3404.28 samples/sec Loss 2.7496 LearningRate 0.0066 Epoch: 14 Global Step: 75210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:30,094-Speed 3394.25 samples/sec Loss 2.7860 LearningRate 0.0066 Epoch: 14 Global Step: 75220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:33,110-Speed 3395.74 samples/sec Loss 2.9371 LearningRate 0.0066 Epoch: 14 Global Step: 75230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:36,139-Speed 3382.34 samples/sec Loss 2.8352 LearningRate 0.0066 Epoch: 14 Global Step: 75240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:39,167-Speed 3382.84 samples/sec Loss 2.7633 LearningRate 0.0066 Epoch: 14 Global Step: 75250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:42,184-Speed 3394.58 samples/sec Loss 2.9637 LearningRate 0.0066 Epoch: 14 Global Step: 75260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:45,211-Speed 3383.42 samples/sec Loss 2.9351 LearningRate 0.0066 Epoch: 14 Global Step: 75270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:48,224-Speed 3399.74 samples/sec Loss 2.8995 LearningRate 0.0065 Epoch: 14 Global Step: 75280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:51,237-Speed 3399.63 samples/sec Loss 2.9237 LearningRate 0.0065 Epoch: 14 Global Step: 75290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 06:59:54,252-Speed 3397.11 samples/sec Loss 2.9082 LearningRate 0.0065 Epoch: 14 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 06:59:57,264-Speed 3400.08 samples/sec Loss 2.9735 LearningRate 0.0065 Epoch: 14 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:00,275-Speed 3402.32 samples/sec Loss 2.8548 LearningRate 0.0065 Epoch: 14 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:03,289-Speed 3398.59 samples/sec Loss 2.8586 LearningRate 0.0065 Epoch: 14 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:06,303-Speed 3398.34 samples/sec Loss 2.9811 LearningRate 0.0065 Epoch: 14 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:09,316-Speed 3399.10 samples/sec Loss 2.8751 LearningRate 0.0065 Epoch: 14 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:12,341-Speed 3386.40 samples/sec Loss 2.9166 LearningRate 0.0065 Epoch: 14 Global Step: 75360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:15,355-Speed 3398.77 samples/sec Loss 2.9808 LearningRate 0.0065 Epoch: 14 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:18,396-Speed 3367.29 samples/sec Loss 2.7980 LearningRate 0.0065 Epoch: 14 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:21,406-Speed 3403.10 samples/sec Loss 2.7875 LearningRate 0.0065 Epoch: 14 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:24,404-Speed 3416.30 samples/sec Loss 2.9856 LearningRate 0.0065 Epoch: 14 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:27,423-Speed 3392.60 samples/sec Loss 2.8523 LearningRate 0.0065 Epoch: 14 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:30,436-Speed 3400.10 samples/sec Loss 2.8055 LearningRate 0.0065 Epoch: 14 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:33,446-Speed 3403.25 samples/sec Loss 2.9352 LearningRate 0.0065 Epoch: 14 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:36,457-Speed 3401.83 samples/sec Loss 2.8658 LearningRate 0.0065 Epoch: 14 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:39,543-Speed 3318.16 samples/sec Loss 2.8422 LearningRate 0.0065 Epoch: 14 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:42,639-Speed 3307.89 samples/sec Loss 2.9183 LearningRate 0.0065 Epoch: 14 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:45,673-Speed 3376.58 samples/sec Loss 2.7708 LearningRate 0.0064 Epoch: 14 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:48,709-Speed 3373.66 samples/sec Loss 2.8030 LearningRate 0.0064 Epoch: 14 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:51,728-Speed 3392.91 samples/sec Loss 2.8962 LearningRate 0.0064 Epoch: 14 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:54,735-Speed 3406.37 samples/sec Loss 2.9365 LearningRate 0.0064 Epoch: 14 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:00:57,751-Speed 3395.84 samples/sec Loss 2.7947 LearningRate 0.0064 Epoch: 14 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:00,768-Speed 3395.73 samples/sec Loss 2.8954 LearningRate 0.0064 Epoch: 14 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:03,785-Speed 3393.83 samples/sec Loss 2.7685 LearningRate 0.0064 Epoch: 14 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:06,795-Speed 3403.86 samples/sec Loss 2.8373 LearningRate 0.0064 Epoch: 14 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:09,806-Speed 3401.05 samples/sec Loss 2.7867 LearningRate 0.0064 Epoch: 14 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:12,817-Speed 3401.82 samples/sec Loss 2.8577 LearningRate 0.0064 Epoch: 14 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:15,851-Speed 3375.35 samples/sec Loss 2.8552 LearningRate 0.0064 Epoch: 14 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:18,865-Speed 3398.93 samples/sec Loss 2.9295 LearningRate 0.0064 Epoch: 14 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:21,880-Speed 3396.66 samples/sec Loss 2.7502 LearningRate 0.0064 Epoch: 14 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:24,877-Speed 3418.30 samples/sec Loss 2.8956 LearningRate 0.0064 Epoch: 14 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:27,894-Speed 3394.30 samples/sec Loss 2.8682 LearningRate 0.0064 Epoch: 14 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:30,907-Speed 3399.72 samples/sec Loss 2.8283 LearningRate 0.0064 Epoch: 14 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:33,919-Speed 3400.68 samples/sec Loss 2.8846 LearningRate 0.0064 Epoch: 14 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:36,938-Speed 3392.88 samples/sec Loss 2.8218 LearningRate 0.0064 Epoch: 14 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:39,965-Speed 3383.38 samples/sec Loss 2.7970 LearningRate 0.0064 Epoch: 14 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:42,981-Speed 3396.29 samples/sec Loss 2.7596 LearningRate 0.0064 Epoch: 14 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:45,993-Speed 3401.13 samples/sec Loss 2.9084 LearningRate 0.0063 Epoch: 14 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:49,008-Speed 3396.90 samples/sec Loss 2.9413 LearningRate 0.0063 Epoch: 14 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:52,027-Speed 3392.72 samples/sec Loss 2.8878 LearningRate 0.0063 Epoch: 14 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:55,046-Speed 3392.60 samples/sec Loss 2.8692 LearningRate 0.0063 Epoch: 14 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:01:58,062-Speed 3396.26 samples/sec Loss 2.8906 LearningRate 0.0063 Epoch: 14 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:01,079-Speed 3395.49 samples/sec Loss 2.8951 LearningRate 0.0063 Epoch: 14 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:04,098-Speed 3391.49 samples/sec Loss 2.8284 LearningRate 0.0063 Epoch: 14 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:07,117-Speed 3392.44 samples/sec Loss 2.9389 LearningRate 0.0063 Epoch: 14 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:10,160-Speed 3366.51 samples/sec Loss 2.8537 LearningRate 0.0063 Epoch: 14 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:13,175-Speed 3396.86 samples/sec Loss 2.8743 LearningRate 0.0063 Epoch: 14 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:16,206-Speed 3380.08 samples/sec Loss 3.0263 LearningRate 0.0063 Epoch: 14 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:19,224-Speed 3393.09 samples/sec Loss 2.8311 LearningRate 0.0063 Epoch: 14 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:22,233-Speed 3404.14 samples/sec Loss 2.8246 LearningRate 0.0063 Epoch: 14 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:25,230-Speed 3418.55 samples/sec Loss 2.9242 LearningRate 0.0063 Epoch: 14 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:28,244-Speed 3397.75 samples/sec Loss 2.8649 LearningRate 0.0063 Epoch: 14 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:02:31,241-Speed 3418.14 samples/sec Loss 2.8988 LearningRate 0.0063 Epoch: 14 Global Step: 75820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:02:34,251-Speed 3402.05 samples/sec Loss 2.8399 LearningRate 0.0063 Epoch: 14 Global Step: 75830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:02:37,266-Speed 3397.04 samples/sec Loss 2.7793 LearningRate 0.0063 Epoch: 14 Global Step: 75840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:02:40,288-Speed 3389.79 samples/sec Loss 2.8571 LearningRate 0.0063 Epoch: 14 Global Step: 75850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:02:43,374-Speed 3319.09 samples/sec Loss 2.7944 LearningRate 0.0063 Epoch: 14 Global Step: 75860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:02:46,386-Speed 3400.89 samples/sec Loss 2.8666 LearningRate 0.0063 Epoch: 14 Global Step: 75870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:03:00,385-Speed 731.54 samples/sec Loss 2.0833 LearningRate 0.0062 Epoch: 15 Global Step: 75880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:03:03,476-Speed 3314.40 samples/sec Loss 2.1095 LearningRate 0.0062 Epoch: 15 Global Step: 75890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:03:06,659-Speed 3217.75 samples/sec Loss 2.1342 LearningRate 0.0062 Epoch: 15 Global Step: 75900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:03:09,671-Speed 3400.54 samples/sec Loss 2.1609 LearningRate 0.0062 Epoch: 15 Global Step: 75910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:03:12,696-Speed 3387.09 samples/sec Loss 2.0920 LearningRate 0.0062 Epoch: 15 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:15,746-Speed 3357.65 samples/sec Loss 2.0930 LearningRate 0.0062 Epoch: 15 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:18,793-Speed 3361.77 samples/sec Loss 2.0624 LearningRate 0.0062 Epoch: 15 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:21,829-Speed 3373.82 samples/sec Loss 1.9988 LearningRate 0.0062 Epoch: 15 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:24,859-Speed 3381.22 samples/sec Loss 2.1097 LearningRate 0.0062 Epoch: 15 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:27,876-Speed 3394.65 samples/sec Loss 2.2103 LearningRate 0.0062 Epoch: 15 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:30,896-Speed 3391.31 samples/sec Loss 1.9270 LearningRate 0.0062 Epoch: 15 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:33,905-Speed 3404.02 samples/sec Loss 2.1127 LearningRate 0.0062 Epoch: 15 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:03:36,929-Speed 3388.32 samples/sec Loss 2.0209 LearningRate 0.0062 Epoch: 15 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:04:21,363-[lfw][76000]XNorm: 21.568427 Training: 2022-04-11 07:04:21,364-[lfw][76000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 07:04:21,364-[lfw][76000]Accuracy-Highest: 0.99850 Training: 2022-04-11 07:05:12,824-[cfp_fp][76000]XNorm: 20.738903 Training: 2022-04-11 07:05:12,825-[cfp_fp][76000]Accuracy-Flip: 0.98400+-0.00510 Training: 2022-04-11 07:05:12,826-[cfp_fp][76000]Accuracy-Highest: 0.98414 Training: 2022-04-11 07:05:56,846-[agedb_30][76000]XNorm: 21.965160 Training: 2022-04-11 07:05:56,847-[agedb_30][76000]Accuracy-Flip: 0.98467+-0.00686 Training: 2022-04-11 07:05:56,847-[agedb_30][76000]Accuracy-Highest: 0.98467 Training: 2022-04-11 07:05:59,856-Speed 71.65 samples/sec Loss 2.0340 LearningRate 0.0062 Epoch: 15 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:02,851-Speed 3420.36 samples/sec Loss 2.1386 LearningRate 0.0062 Epoch: 15 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:05,868-Speed 3394.30 samples/sec Loss 2.1095 LearningRate 0.0062 Epoch: 15 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:08,912-Speed 3365.59 samples/sec Loss 2.0769 LearningRate 0.0062 Epoch: 15 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:11,918-Speed 3407.03 samples/sec Loss 2.1753 LearningRate 0.0062 Epoch: 15 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:14,947-Speed 3381.67 samples/sec Loss 2.1641 LearningRate 0.0062 Epoch: 15 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:17,944-Speed 3418.02 samples/sec Loss 2.0547 LearningRate 0.0062 Epoch: 15 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:20,946-Speed 3411.89 samples/sec Loss 2.2085 LearningRate 0.0061 Epoch: 15 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:23,957-Speed 3402.19 samples/sec Loss 2.1282 LearningRate 0.0061 Epoch: 15 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:26,992-Speed 3375.21 samples/sec Loss 2.2945 LearningRate 0.0061 Epoch: 15 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:29,993-Speed 3412.61 samples/sec Loss 2.2733 LearningRate 0.0061 Epoch: 15 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:32,984-Speed 3424.87 samples/sec Loss 2.1702 LearningRate 0.0061 Epoch: 15 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:35,992-Speed 3404.65 samples/sec Loss 2.0808 LearningRate 0.0061 Epoch: 15 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:06:39,001-Speed 3405.40 samples/sec Loss 2.0937 LearningRate 0.0061 Epoch: 15 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:42,029-Speed 3382.16 samples/sec Loss 2.1284 LearningRate 0.0061 Epoch: 15 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:45,036-Speed 3406.37 samples/sec Loss 2.0839 LearningRate 0.0061 Epoch: 15 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:48,051-Speed 3397.46 samples/sec Loss 2.1165 LearningRate 0.0061 Epoch: 15 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:51,057-Speed 3407.54 samples/sec Loss 2.1820 LearningRate 0.0061 Epoch: 15 Global Step: 76180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:54,086-Speed 3381.49 samples/sec Loss 2.2147 LearningRate 0.0061 Epoch: 15 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:06:57,096-Speed 3402.57 samples/sec Loss 2.1303 LearningRate 0.0061 Epoch: 15 Global Step: 76200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:07:00,102-Speed 3406.90 samples/sec Loss 2.2449 LearningRate 0.0061 Epoch: 15 Global Step: 76210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:07:03,114-Speed 3401.62 samples/sec Loss 2.1456 LearningRate 0.0061 Epoch: 15 Global Step: 76220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:07:06,121-Speed 3406.15 samples/sec Loss 2.1285 LearningRate 0.0061 Epoch: 15 Global Step: 76230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:07:09,135-Speed 3398.93 samples/sec Loss 2.1925 LearningRate 0.0061 Epoch: 15 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:12,158-Speed 3388.33 samples/sec Loss 2.1780 LearningRate 0.0061 Epoch: 15 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:15,167-Speed 3403.62 samples/sec Loss 2.1291 LearningRate 0.0061 Epoch: 15 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:18,175-Speed 3405.33 samples/sec Loss 2.0570 LearningRate 0.0061 Epoch: 15 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:21,185-Speed 3403.25 samples/sec Loss 2.1734 LearningRate 0.0060 Epoch: 15 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:24,192-Speed 3407.14 samples/sec Loss 2.1773 LearningRate 0.0060 Epoch: 15 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:27,204-Speed 3400.50 samples/sec Loss 2.2078 LearningRate 0.0060 Epoch: 15 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:30,211-Speed 3405.53 samples/sec Loss 2.1205 LearningRate 0.0060 Epoch: 15 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:33,215-Speed 3410.96 samples/sec Loss 2.2474 LearningRate 0.0060 Epoch: 15 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:36,221-Speed 3406.88 samples/sec Loss 2.2300 LearningRate 0.0060 Epoch: 15 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:39,229-Speed 3405.45 samples/sec Loss 2.1565 LearningRate 0.0060 Epoch: 15 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:42,357-Speed 3274.75 samples/sec Loss 2.1384 LearningRate 0.0060 Epoch: 15 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:45,363-Speed 3406.80 samples/sec Loss 2.2640 LearningRate 0.0060 Epoch: 15 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:48,384-Speed 3390.72 samples/sec Loss 2.1167 LearningRate 0.0060 Epoch: 15 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:51,413-Speed 3381.45 samples/sec Loss 2.2159 LearningRate 0.0060 Epoch: 15 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:54,420-Speed 3406.44 samples/sec Loss 2.1124 LearningRate 0.0060 Epoch: 15 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:07:57,427-Speed 3406.58 samples/sec Loss 2.2328 LearningRate 0.0060 Epoch: 15 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:00,455-Speed 3382.20 samples/sec Loss 2.2954 LearningRate 0.0060 Epoch: 15 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:03,497-Speed 3367.66 samples/sec Loss 2.1584 LearningRate 0.0060 Epoch: 15 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:06,515-Speed 3393.19 samples/sec Loss 2.2790 LearningRate 0.0060 Epoch: 15 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:09,500-Speed 3432.50 samples/sec Loss 2.2063 LearningRate 0.0060 Epoch: 15 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:12,506-Speed 3407.74 samples/sec Loss 2.2682 LearningRate 0.0060 Epoch: 15 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:15,535-Speed 3381.15 samples/sec Loss 2.1822 LearningRate 0.0060 Epoch: 15 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:18,545-Speed 3402.81 samples/sec Loss 2.2337 LearningRate 0.0060 Epoch: 15 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:21,559-Speed 3398.69 samples/sec Loss 2.2166 LearningRate 0.0060 Epoch: 15 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:24,559-Speed 3414.81 samples/sec Loss 2.2149 LearningRate 0.0059 Epoch: 15 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:27,567-Speed 3404.49 samples/sec Loss 2.2466 LearningRate 0.0059 Epoch: 15 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:30,716-Speed 3252.90 samples/sec Loss 2.2285 LearningRate 0.0059 Epoch: 15 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:33,717-Speed 3414.42 samples/sec Loss 2.2537 LearningRate 0.0059 Epoch: 15 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:36,727-Speed 3402.65 samples/sec Loss 2.2889 LearningRate 0.0059 Epoch: 15 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:39,728-Speed 3414.25 samples/sec Loss 2.2384 LearningRate 0.0059 Epoch: 15 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 07:08:42,729-Speed 3412.33 samples/sec Loss 2.2584 LearningRate 0.0059 Epoch: 15 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:45,733-Speed 3410.11 samples/sec Loss 2.1739 LearningRate 0.0059 Epoch: 15 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:48,739-Speed 3406.93 samples/sec Loss 2.1692 LearningRate 0.0059 Epoch: 15 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:51,748-Speed 3404.56 samples/sec Loss 2.2389 LearningRate 0.0059 Epoch: 15 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:54,759-Speed 3401.96 samples/sec Loss 2.2116 LearningRate 0.0059 Epoch: 15 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:08:57,782-Speed 3388.58 samples/sec Loss 2.2567 LearningRate 0.0059 Epoch: 15 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:00,792-Speed 3402.56 samples/sec Loss 2.2108 LearningRate 0.0059 Epoch: 15 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:03,805-Speed 3399.90 samples/sec Loss 2.3144 LearningRate 0.0059 Epoch: 15 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:06,858-Speed 3354.51 samples/sec Loss 2.2366 LearningRate 0.0059 Epoch: 15 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:09,884-Speed 3385.39 samples/sec Loss 2.1827 LearningRate 0.0059 Epoch: 15 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:12,904-Speed 3392.06 samples/sec Loss 2.2326 LearningRate 0.0059 Epoch: 15 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:15,927-Speed 3387.82 samples/sec Loss 2.3014 LearningRate 0.0059 Epoch: 15 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:18,926-Speed 3416.61 samples/sec Loss 2.1651 LearningRate 0.0059 Epoch: 15 Global Step: 76670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:21,930-Speed 3409.25 samples/sec Loss 2.2054 LearningRate 0.0059 Epoch: 15 Global Step: 76680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:24,938-Speed 3404.97 samples/sec Loss 2.2745 LearningRate 0.0059 Epoch: 15 Global Step: 76690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:27,950-Speed 3400.70 samples/sec Loss 2.2693 LearningRate 0.0058 Epoch: 15 Global Step: 76700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:30,954-Speed 3409.64 samples/sec Loss 2.2262 LearningRate 0.0058 Epoch: 15 Global Step: 76710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:33,964-Speed 3402.99 samples/sec Loss 2.2862 LearningRate 0.0058 Epoch: 15 Global Step: 76720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:37,010-Speed 3362.61 samples/sec Loss 2.1810 LearningRate 0.0058 Epoch: 15 Global Step: 76730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:40,021-Speed 3402.59 samples/sec Loss 2.2868 LearningRate 0.0058 Epoch: 15 Global Step: 76740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:43,026-Speed 3408.35 samples/sec Loss 2.2347 LearningRate 0.0058 Epoch: 15 Global Step: 76750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:46,033-Speed 3406.63 samples/sec Loss 2.3627 LearningRate 0.0058 Epoch: 15 Global Step: 76760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 07:09:49,042-Speed 3404.12 samples/sec Loss 2.2756 LearningRate 0.0058 Epoch: 15 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:52,048-Speed 3408.02 samples/sec Loss 2.3267 LearningRate 0.0058 Epoch: 15 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:55,057-Speed 3404.54 samples/sec Loss 2.2930 LearningRate 0.0058 Epoch: 15 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:09:58,062-Speed 3408.12 samples/sec Loss 2.2742 LearningRate 0.0058 Epoch: 15 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:01,145-Speed 3322.35 samples/sec Loss 2.2906 LearningRate 0.0058 Epoch: 15 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:04,156-Speed 3401.95 samples/sec Loss 2.3226 LearningRate 0.0058 Epoch: 15 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:07,163-Speed 3406.17 samples/sec Loss 2.1935 LearningRate 0.0058 Epoch: 15 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:10,169-Speed 3408.08 samples/sec Loss 2.2873 LearningRate 0.0058 Epoch: 15 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:13,174-Speed 3407.92 samples/sec Loss 2.2900 LearningRate 0.0058 Epoch: 15 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:16,229-Speed 3353.09 samples/sec Loss 2.3007 LearningRate 0.0058 Epoch: 15 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:19,227-Speed 3416.41 samples/sec Loss 2.1862 LearningRate 0.0058 Epoch: 15 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:22,234-Speed 3406.67 samples/sec Loss 2.2880 LearningRate 0.0058 Epoch: 15 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:25,241-Speed 3406.85 samples/sec Loss 2.2769 LearningRate 0.0058 Epoch: 15 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:28,252-Speed 3401.25 samples/sec Loss 2.3336 LearningRate 0.0058 Epoch: 15 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:31,281-Speed 3381.89 samples/sec Loss 2.2408 LearningRate 0.0057 Epoch: 15 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 07:10:34,308-Speed 3384.28 samples/sec Loss 2.3494 LearningRate 0.0057 Epoch: 15 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:37,319-Speed 3401.83 samples/sec Loss 2.2182 LearningRate 0.0057 Epoch: 15 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:40,327-Speed 3404.85 samples/sec Loss 2.3595 LearningRate 0.0057 Epoch: 15 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:43,367-Speed 3369.73 samples/sec Loss 2.2548 LearningRate 0.0057 Epoch: 15 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:46,373-Speed 3407.74 samples/sec Loss 2.1869 LearningRate 0.0057 Epoch: 15 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:49,366-Speed 3421.89 samples/sec Loss 2.2701 LearningRate 0.0057 Epoch: 15 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:52,383-Speed 3395.15 samples/sec Loss 2.2621 LearningRate 0.0057 Epoch: 15 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:55,406-Speed 3388.18 samples/sec Loss 2.3154 LearningRate 0.0057 Epoch: 15 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:10:58,410-Speed 3409.38 samples/sec Loss 2.3535 LearningRate 0.0057 Epoch: 15 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:01,424-Speed 3398.54 samples/sec Loss 2.3280 LearningRate 0.0057 Epoch: 15 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:04,456-Speed 3378.42 samples/sec Loss 2.2923 LearningRate 0.0057 Epoch: 15 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:07,471-Speed 3397.83 samples/sec Loss 2.2836 LearningRate 0.0057 Epoch: 15 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:10,478-Speed 3405.73 samples/sec Loss 2.3525 LearningRate 0.0057 Epoch: 15 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:13,493-Speed 3397.42 samples/sec Loss 2.4476 LearningRate 0.0057 Epoch: 15 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:16,528-Speed 3375.47 samples/sec Loss 2.2550 LearningRate 0.0057 Epoch: 15 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:19,513-Speed 3431.03 samples/sec Loss 2.3316 LearningRate 0.0057 Epoch: 15 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:22,520-Speed 3406.41 samples/sec Loss 2.3185 LearningRate 0.0057 Epoch: 15 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:25,533-Speed 3399.81 samples/sec Loss 2.2533 LearningRate 0.0057 Epoch: 15 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:28,544-Speed 3401.44 samples/sec Loss 2.2326 LearningRate 0.0057 Epoch: 15 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:31,555-Speed 3402.49 samples/sec Loss 2.3154 LearningRate 0.0057 Epoch: 15 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:34,564-Speed 3404.36 samples/sec Loss 2.3499 LearningRate 0.0056 Epoch: 15 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:37,581-Speed 3394.47 samples/sec Loss 2.3028 LearningRate 0.0056 Epoch: 15 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:40,590-Speed 3404.10 samples/sec Loss 2.1977 LearningRate 0.0056 Epoch: 15 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:43,603-Speed 3399.39 samples/sec Loss 2.3164 LearningRate 0.0056 Epoch: 15 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:46,610-Speed 3406.79 samples/sec Loss 2.3504 LearningRate 0.0056 Epoch: 15 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:49,617-Speed 3406.19 samples/sec Loss 2.3262 LearningRate 0.0056 Epoch: 15 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:52,628-Speed 3401.94 samples/sec Loss 2.3169 LearningRate 0.0056 Epoch: 15 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:55,637-Speed 3403.40 samples/sec Loss 2.2854 LearningRate 0.0056 Epoch: 15 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:11:58,630-Speed 3422.51 samples/sec Loss 2.3057 LearningRate 0.0056 Epoch: 15 Global Step: 77200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:01,641-Speed 3401.27 samples/sec Loss 2.3719 LearningRate 0.0056 Epoch: 15 Global Step: 77210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:04,650-Speed 3404.27 samples/sec Loss 2.3397 LearningRate 0.0056 Epoch: 15 Global Step: 77220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:07,656-Speed 3408.07 samples/sec Loss 2.3481 LearningRate 0.0056 Epoch: 15 Global Step: 77230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:10,675-Speed 3392.43 samples/sec Loss 2.2948 LearningRate 0.0056 Epoch: 15 Global Step: 77240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:13,698-Speed 3388.07 samples/sec Loss 2.3208 LearningRate 0.0056 Epoch: 15 Global Step: 77250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:16,714-Speed 3396.90 samples/sec Loss 2.3662 LearningRate 0.0056 Epoch: 15 Global Step: 77260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:19,727-Speed 3399.88 samples/sec Loss 2.3024 LearningRate 0.0056 Epoch: 15 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:22,739-Speed 3400.63 samples/sec Loss 2.2655 LearningRate 0.0056 Epoch: 15 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:25,749-Speed 3402.38 samples/sec Loss 2.2438 LearningRate 0.0056 Epoch: 15 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:12:28,769-Speed 3391.70 samples/sec Loss 2.4208 LearningRate 0.0056 Epoch: 15 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:31,776-Speed 3407.05 samples/sec Loss 2.2915 LearningRate 0.0056 Epoch: 15 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:34,789-Speed 3399.61 samples/sec Loss 2.2442 LearningRate 0.0056 Epoch: 15 Global Step: 77320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:37,804-Speed 3396.63 samples/sec Loss 2.4188 LearningRate 0.0055 Epoch: 15 Global Step: 77330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:40,839-Speed 3375.18 samples/sec Loss 2.3442 LearningRate 0.0055 Epoch: 15 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:43,859-Speed 3391.27 samples/sec Loss 2.4186 LearningRate 0.0055 Epoch: 15 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:46,866-Speed 3406.62 samples/sec Loss 2.3365 LearningRate 0.0055 Epoch: 15 Global Step: 77360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:49,882-Speed 3396.18 samples/sec Loss 2.2135 LearningRate 0.0055 Epoch: 15 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:52,891-Speed 3404.62 samples/sec Loss 2.3371 LearningRate 0.0055 Epoch: 15 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:55,905-Speed 3397.88 samples/sec Loss 2.3040 LearningRate 0.0055 Epoch: 15 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:12:58,898-Speed 3422.09 samples/sec Loss 2.2043 LearningRate 0.0055 Epoch: 15 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:01,924-Speed 3385.29 samples/sec Loss 2.2928 LearningRate 0.0055 Epoch: 15 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:04,931-Speed 3406.00 samples/sec Loss 2.2837 LearningRate 0.0055 Epoch: 15 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:07,953-Speed 3389.90 samples/sec Loss 2.3181 LearningRate 0.0055 Epoch: 15 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:10,958-Speed 3408.15 samples/sec Loss 2.3692 LearningRate 0.0055 Epoch: 15 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:13,971-Speed 3400.44 samples/sec Loss 2.3211 LearningRate 0.0055 Epoch: 15 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:16,981-Speed 3402.25 samples/sec Loss 2.3139 LearningRate 0.0055 Epoch: 15 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:19,998-Speed 3395.65 samples/sec Loss 2.1941 LearningRate 0.0055 Epoch: 15 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:23,012-Speed 3398.11 samples/sec Loss 2.4409 LearningRate 0.0055 Epoch: 15 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:26,029-Speed 3394.99 samples/sec Loss 2.3452 LearningRate 0.0055 Epoch: 15 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:29,029-Speed 3414.12 samples/sec Loss 2.3243 LearningRate 0.0055 Epoch: 15 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:32,044-Speed 3397.76 samples/sec Loss 2.3778 LearningRate 0.0055 Epoch: 15 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:35,065-Speed 3390.44 samples/sec Loss 2.3411 LearningRate 0.0055 Epoch: 15 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:38,078-Speed 3400.18 samples/sec Loss 2.3420 LearningRate 0.0055 Epoch: 15 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:41,090-Speed 3400.60 samples/sec Loss 2.2320 LearningRate 0.0055 Epoch: 15 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:44,107-Speed 3394.63 samples/sec Loss 2.2113 LearningRate 0.0054 Epoch: 15 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:47,154-Speed 3361.29 samples/sec Loss 2.2931 LearningRate 0.0054 Epoch: 15 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:50,168-Speed 3398.72 samples/sec Loss 2.3057 LearningRate 0.0054 Epoch: 15 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:53,187-Speed 3393.45 samples/sec Loss 2.3154 LearningRate 0.0054 Epoch: 15 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:56,206-Speed 3392.26 samples/sec Loss 2.2846 LearningRate 0.0054 Epoch: 15 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:13:59,225-Speed 3393.62 samples/sec Loss 2.2721 LearningRate 0.0054 Epoch: 15 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:02,274-Speed 3359.29 samples/sec Loss 2.2669 LearningRate 0.0054 Epoch: 15 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:05,288-Speed 3398.26 samples/sec Loss 2.3996 LearningRate 0.0054 Epoch: 15 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:08,300-Speed 3400.13 samples/sec Loss 2.3518 LearningRate 0.0054 Epoch: 15 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:11,315-Speed 3397.45 samples/sec Loss 2.3398 LearningRate 0.0054 Epoch: 15 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:14,335-Speed 3392.11 samples/sec Loss 2.3753 LearningRate 0.0054 Epoch: 15 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:17,351-Speed 3396.39 samples/sec Loss 2.3266 LearningRate 0.0054 Epoch: 15 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:20,365-Speed 3398.20 samples/sec Loss 2.2809 LearningRate 0.0054 Epoch: 15 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:23,376-Speed 3401.70 samples/sec Loss 2.2976 LearningRate 0.0054 Epoch: 15 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:26,397-Speed 3390.25 samples/sec Loss 2.2554 LearningRate 0.0054 Epoch: 15 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:29,395-Speed 3417.30 samples/sec Loss 2.2410 LearningRate 0.0054 Epoch: 15 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:32,408-Speed 3399.06 samples/sec Loss 2.2954 LearningRate 0.0054 Epoch: 15 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:35,419-Speed 3401.83 samples/sec Loss 2.3513 LearningRate 0.0054 Epoch: 15 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:38,434-Speed 3397.54 samples/sec Loss 2.3157 LearningRate 0.0054 Epoch: 15 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:41,452-Speed 3393.39 samples/sec Loss 2.3595 LearningRate 0.0054 Epoch: 15 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:44,466-Speed 3398.58 samples/sec Loss 2.3569 LearningRate 0.0054 Epoch: 15 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:47,480-Speed 3398.34 samples/sec Loss 2.4002 LearningRate 0.0054 Epoch: 15 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:50,497-Speed 3396.13 samples/sec Loss 2.3765 LearningRate 0.0053 Epoch: 15 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:53,510-Speed 3399.47 samples/sec Loss 2.4157 LearningRate 0.0053 Epoch: 15 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:56,519-Speed 3403.28 samples/sec Loss 2.2622 LearningRate 0.0053 Epoch: 15 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:14:59,518-Speed 3416.23 samples/sec Loss 2.2984 LearningRate 0.0053 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:02,534-Speed 3396.04 samples/sec Loss 2.2234 LearningRate 0.0053 Epoch: 15 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:05,550-Speed 3396.33 samples/sec Loss 2.2612 LearningRate 0.0053 Epoch: 15 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:08,558-Speed 3404.21 samples/sec Loss 2.4050 LearningRate 0.0053 Epoch: 15 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:11,572-Speed 3398.82 samples/sec Loss 2.3616 LearningRate 0.0053 Epoch: 15 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:14,597-Speed 3385.52 samples/sec Loss 2.3815 LearningRate 0.0053 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:17,621-Speed 3388.17 samples/sec Loss 2.3450 LearningRate 0.0053 Epoch: 15 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:20,617-Speed 3418.29 samples/sec Loss 2.2792 LearningRate 0.0053 Epoch: 15 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:23,638-Speed 3391.14 samples/sec Loss 2.2639 LearningRate 0.0053 Epoch: 15 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:26,661-Speed 3386.98 samples/sec Loss 2.2706 LearningRate 0.0053 Epoch: 15 Global Step: 77890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:29,682-Speed 3390.80 samples/sec Loss 2.3376 LearningRate 0.0053 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:32,703-Speed 3390.69 samples/sec Loss 2.3022 LearningRate 0.0053 Epoch: 15 Global Step: 77910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:35,727-Speed 3388.00 samples/sec Loss 2.3078 LearningRate 0.0053 Epoch: 15 Global Step: 77920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:38,762-Speed 3374.55 samples/sec Loss 2.3053 LearningRate 0.0053 Epoch: 15 Global Step: 77930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:41,853-Speed 3313.77 samples/sec Loss 2.3511 LearningRate 0.0053 Epoch: 15 Global Step: 77940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:44,864-Speed 3402.79 samples/sec Loss 2.4319 LearningRate 0.0053 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:47,905-Speed 3367.23 samples/sec Loss 2.4210 LearningRate 0.0053 Epoch: 15 Global Step: 77960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:50,913-Speed 3405.96 samples/sec Loss 2.3667 LearningRate 0.0053 Epoch: 15 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:53,928-Speed 3397.02 samples/sec Loss 2.3259 LearningRate 0.0053 Epoch: 15 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:15:56,938-Speed 3403.00 samples/sec Loss 2.3098 LearningRate 0.0052 Epoch: 15 Global Step: 77990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:15:59,949-Speed 3401.36 samples/sec Loss 2.4057 LearningRate 0.0052 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:16:44,204-[lfw][78000]XNorm: 22.490540 Training: 2022-04-11 07:16:44,204-[lfw][78000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 07:16:44,205-[lfw][78000]Accuracy-Highest: 0.99850 Training: 2022-04-11 07:17:35,525-[cfp_fp][78000]XNorm: 21.857906 Training: 2022-04-11 07:17:35,526-[cfp_fp][78000]Accuracy-Flip: 0.98614+-0.00434 Training: 2022-04-11 07:17:35,526-[cfp_fp][78000]Accuracy-Highest: 0.98614 Training: 2022-04-11 07:18:19,862-[agedb_30][78000]XNorm: 22.593340 Training: 2022-04-11 07:18:19,863-[agedb_30][78000]Accuracy-Flip: 0.98367+-0.00726 Training: 2022-04-11 07:18:19,863-[agedb_30][78000]Accuracy-Highest: 0.98467 Training: 2022-04-11 07:18:22,868-Speed 71.65 samples/sec Loss 2.3617 LearningRate 0.0052 Epoch: 15 Global Step: 78010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:25,870-Speed 3410.88 samples/sec Loss 2.3265 LearningRate 0.0052 Epoch: 15 Global Step: 78020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:28,873-Speed 3411.23 samples/sec Loss 2.3039 LearningRate 0.0052 Epoch: 15 Global Step: 78030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:31,864-Speed 3424.21 samples/sec Loss 2.3074 LearningRate 0.0052 Epoch: 15 Global Step: 78040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:34,859-Speed 3420.90 samples/sec Loss 2.4195 LearningRate 0.0052 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:37,856-Speed 3417.55 samples/sec Loss 2.4355 LearningRate 0.0052 Epoch: 15 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:40,858-Speed 3411.79 samples/sec Loss 2.3921 LearningRate 0.0052 Epoch: 15 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:43,875-Speed 3394.41 samples/sec Loss 2.3244 LearningRate 0.0052 Epoch: 15 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:18:46,873-Speed 3417.09 samples/sec Loss 2.3311 LearningRate 0.0052 Epoch: 15 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:18:49,872-Speed 3415.11 samples/sec Loss 2.3781 LearningRate 0.0052 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:18:52,876-Speed 3409.93 samples/sec Loss 2.2937 LearningRate 0.0052 Epoch: 15 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:18:55,882-Speed 3407.64 samples/sec Loss 2.1519 LearningRate 0.0052 Epoch: 15 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:18:58,884-Speed 3411.69 samples/sec Loss 2.4603 LearningRate 0.0052 Epoch: 15 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:01,886-Speed 3412.56 samples/sec Loss 2.3543 LearningRate 0.0052 Epoch: 15 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:04,889-Speed 3410.84 samples/sec Loss 2.4144 LearningRate 0.0052 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:07,893-Speed 3409.55 samples/sec Loss 2.3419 LearningRate 0.0052 Epoch: 15 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:10,895-Speed 3412.05 samples/sec Loss 2.3337 LearningRate 0.0052 Epoch: 15 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:13,929-Speed 3375.51 samples/sec Loss 2.4799 LearningRate 0.0052 Epoch: 15 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:16,924-Speed 3420.62 samples/sec Loss 2.3099 LearningRate 0.0052 Epoch: 15 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:19,936-Speed 3401.15 samples/sec Loss 2.3419 LearningRate 0.0052 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:22,942-Speed 3407.64 samples/sec Loss 2.3702 LearningRate 0.0051 Epoch: 15 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:25,978-Speed 3372.81 samples/sec Loss 2.3470 LearningRate 0.0051 Epoch: 15 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:28,988-Speed 3403.20 samples/sec Loss 2.4960 LearningRate 0.0051 Epoch: 15 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:31,996-Speed 3405.73 samples/sec Loss 2.3791 LearningRate 0.0051 Epoch: 15 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:19:34,983-Speed 3428.72 samples/sec Loss 2.3373 LearningRate 0.0051 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:37,998-Speed 3397.42 samples/sec Loss 2.2751 LearningRate 0.0051 Epoch: 15 Global Step: 78260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:41,013-Speed 3397.83 samples/sec Loss 2.4652 LearningRate 0.0051 Epoch: 15 Global Step: 78270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:44,059-Speed 3362.19 samples/sec Loss 2.3801 LearningRate 0.0051 Epoch: 15 Global Step: 78280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:47,069-Speed 3403.57 samples/sec Loss 2.2908 LearningRate 0.0051 Epoch: 15 Global Step: 78290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:50,082-Speed 3399.16 samples/sec Loss 2.4145 LearningRate 0.0051 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:53,086-Speed 3410.11 samples/sec Loss 2.4124 LearningRate 0.0051 Epoch: 15 Global Step: 78310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:56,094-Speed 3405.91 samples/sec Loss 2.2981 LearningRate 0.0051 Epoch: 15 Global Step: 78320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:19:59,097-Speed 3410.06 samples/sec Loss 2.4040 LearningRate 0.0051 Epoch: 15 Global Step: 78330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:02,109-Speed 3400.96 samples/sec Loss 2.4144 LearningRate 0.0051 Epoch: 15 Global Step: 78340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:05,122-Speed 3399.88 samples/sec Loss 2.3662 LearningRate 0.0051 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:08,125-Speed 3410.62 samples/sec Loss 2.3432 LearningRate 0.0051 Epoch: 15 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:11,188-Speed 3344.45 samples/sec Loss 2.3832 LearningRate 0.0051 Epoch: 15 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:14,200-Speed 3401.12 samples/sec Loss 2.2267 LearningRate 0.0051 Epoch: 15 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:17,208-Speed 3404.39 samples/sec Loss 2.5378 LearningRate 0.0051 Epoch: 15 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:20,221-Speed 3400.09 samples/sec Loss 2.4372 LearningRate 0.0051 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:23,238-Speed 3394.53 samples/sec Loss 2.3645 LearningRate 0.0051 Epoch: 15 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:26,255-Speed 3396.05 samples/sec Loss 2.3365 LearningRate 0.0051 Epoch: 15 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:29,265-Speed 3402.03 samples/sec Loss 2.2588 LearningRate 0.0050 Epoch: 15 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:32,276-Speed 3402.08 samples/sec Loss 2.3928 LearningRate 0.0050 Epoch: 15 Global Step: 78440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:35,279-Speed 3411.24 samples/sec Loss 2.3781 LearningRate 0.0050 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:38,309-Speed 3379.85 samples/sec Loss 2.3331 LearningRate 0.0050 Epoch: 15 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:41,316-Speed 3406.37 samples/sec Loss 2.3169 LearningRate 0.0050 Epoch: 15 Global Step: 78470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:20:44,306-Speed 3426.49 samples/sec Loss 2.4280 LearningRate 0.0050 Epoch: 15 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:47,308-Speed 3411.54 samples/sec Loss 2.3085 LearningRate 0.0050 Epoch: 15 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:50,312-Speed 3409.40 samples/sec Loss 2.3778 LearningRate 0.0050 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:53,319-Speed 3406.18 samples/sec Loss 2.2485 LearningRate 0.0050 Epoch: 15 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:56,326-Speed 3406.98 samples/sec Loss 2.3092 LearningRate 0.0050 Epoch: 15 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:20:59,350-Speed 3387.08 samples/sec Loss 2.3353 LearningRate 0.0050 Epoch: 15 Global Step: 78530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:21:02,379-Speed 3381.61 samples/sec Loss 2.4053 LearningRate 0.0050 Epoch: 15 Global Step: 78540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:21:05,417-Speed 3372.34 samples/sec Loss 2.2003 LearningRate 0.0050 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:21:08,425-Speed 3404.27 samples/sec Loss 2.3757 LearningRate 0.0050 Epoch: 15 Global Step: 78560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:21:11,444-Speed 3393.93 samples/sec Loss 2.4079 LearningRate 0.0050 Epoch: 15 Global Step: 78570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:21:14,457-Speed 3398.74 samples/sec Loss 2.2141 LearningRate 0.0050 Epoch: 15 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:17,490-Speed 3377.66 samples/sec Loss 2.2823 LearningRate 0.0050 Epoch: 15 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:20,502-Speed 3401.41 samples/sec Loss 2.2706 LearningRate 0.0050 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:23,531-Speed 3381.36 samples/sec Loss 2.2760 LearningRate 0.0050 Epoch: 15 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:26,592-Speed 3345.82 samples/sec Loss 2.3584 LearningRate 0.0050 Epoch: 15 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:29,601-Speed 3404.18 samples/sec Loss 2.3398 LearningRate 0.0050 Epoch: 15 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:32,614-Speed 3399.95 samples/sec Loss 2.2616 LearningRate 0.0050 Epoch: 15 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:35,627-Speed 3399.46 samples/sec Loss 2.3639 LearningRate 0.0050 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:38,637-Speed 3402.74 samples/sec Loss 2.4195 LearningRate 0.0049 Epoch: 15 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:41,653-Speed 3396.35 samples/sec Loss 2.3289 LearningRate 0.0049 Epoch: 15 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:44,666-Speed 3399.01 samples/sec Loss 2.3754 LearningRate 0.0049 Epoch: 15 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 07:21:47,662-Speed 3419.69 samples/sec Loss 2.3007 LearningRate 0.0049 Epoch: 15 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:50,683-Speed 3389.35 samples/sec Loss 2.3837 LearningRate 0.0049 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:53,755-Speed 3334.60 samples/sec Loss 2.4027 LearningRate 0.0049 Epoch: 15 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:56,776-Speed 3390.94 samples/sec Loss 2.3666 LearningRate 0.0049 Epoch: 15 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:21:59,796-Speed 3391.37 samples/sec Loss 2.3656 LearningRate 0.0049 Epoch: 15 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:02,811-Speed 3397.76 samples/sec Loss 2.3343 LearningRate 0.0049 Epoch: 15 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:05,828-Speed 3394.89 samples/sec Loss 2.2598 LearningRate 0.0049 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:08,843-Speed 3397.00 samples/sec Loss 2.3033 LearningRate 0.0049 Epoch: 15 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:11,857-Speed 3397.89 samples/sec Loss 2.2645 LearningRate 0.0049 Epoch: 15 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:14,879-Speed 3389.88 samples/sec Loss 2.3440 LearningRate 0.0049 Epoch: 15 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:17,881-Speed 3412.29 samples/sec Loss 2.3525 LearningRate 0.0049 Epoch: 15 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:20,898-Speed 3395.04 samples/sec Loss 2.4096 LearningRate 0.0049 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:23,939-Speed 3368.01 samples/sec Loss 2.3127 LearningRate 0.0049 Epoch: 15 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:26,962-Speed 3388.09 samples/sec Loss 2.3165 LearningRate 0.0049 Epoch: 15 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:29,979-Speed 3394.66 samples/sec Loss 2.3596 LearningRate 0.0049 Epoch: 15 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:32,999-Speed 3392.77 samples/sec Loss 2.3426 LearningRate 0.0049 Epoch: 15 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:36,015-Speed 3395.86 samples/sec Loss 2.2032 LearningRate 0.0049 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:39,031-Speed 3395.75 samples/sec Loss 2.3318 LearningRate 0.0049 Epoch: 15 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:42,057-Speed 3385.40 samples/sec Loss 2.4277 LearningRate 0.0049 Epoch: 15 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:45,076-Speed 3392.98 samples/sec Loss 2.4021 LearningRate 0.0049 Epoch: 15 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:48,076-Speed 3414.18 samples/sec Loss 2.3647 LearningRate 0.0048 Epoch: 15 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:51,093-Speed 3394.56 samples/sec Loss 2.3197 LearningRate 0.0048 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:54,111-Speed 3393.52 samples/sec Loss 2.2841 LearningRate 0.0048 Epoch: 15 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:22:57,127-Speed 3397.29 samples/sec Loss 2.4123 LearningRate 0.0048 Epoch: 15 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:00,138-Speed 3401.74 samples/sec Loss 2.4281 LearningRate 0.0048 Epoch: 15 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:03,167-Speed 3381.36 samples/sec Loss 2.3851 LearningRate 0.0048 Epoch: 15 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:06,214-Speed 3361.18 samples/sec Loss 2.3548 LearningRate 0.0048 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:09,215-Speed 3414.06 samples/sec Loss 2.3873 LearningRate 0.0048 Epoch: 15 Global Step: 78960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:12,237-Speed 3389.42 samples/sec Loss 2.4457 LearningRate 0.0048 Epoch: 15 Global Step: 78970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:15,278-Speed 3367.77 samples/sec Loss 2.4270 LearningRate 0.0048 Epoch: 15 Global Step: 78980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:18,292-Speed 3399.05 samples/sec Loss 2.3978 LearningRate 0.0048 Epoch: 15 Global Step: 78990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:21,306-Speed 3398.17 samples/sec Loss 2.2488 LearningRate 0.0048 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:24,325-Speed 3392.95 samples/sec Loss 2.4383 LearningRate 0.0048 Epoch: 15 Global Step: 79010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:27,339-Speed 3398.70 samples/sec Loss 2.3153 LearningRate 0.0048 Epoch: 15 Global Step: 79020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:30,350-Speed 3401.41 samples/sec Loss 2.3598 LearningRate 0.0048 Epoch: 15 Global Step: 79030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:33,361-Speed 3402.27 samples/sec Loss 2.3348 LearningRate 0.0048 Epoch: 15 Global Step: 79040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:36,378-Speed 3395.14 samples/sec Loss 2.3272 LearningRate 0.0048 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:23:39,394-Speed 3396.72 samples/sec Loss 2.2101 LearningRate 0.0048 Epoch: 15 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:42,406-Speed 3400.90 samples/sec Loss 2.3659 LearningRate 0.0048 Epoch: 15 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:45,419-Speed 3399.36 samples/sec Loss 2.3735 LearningRate 0.0048 Epoch: 15 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:48,443-Speed 3386.99 samples/sec Loss 2.3952 LearningRate 0.0048 Epoch: 15 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:51,469-Speed 3385.42 samples/sec Loss 2.3191 LearningRate 0.0048 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:54,487-Speed 3393.70 samples/sec Loss 2.4250 LearningRate 0.0048 Epoch: 15 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:23:57,502-Speed 3396.82 samples/sec Loss 2.4527 LearningRate 0.0047 Epoch: 15 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:00,516-Speed 3399.50 samples/sec Loss 2.5446 LearningRate 0.0047 Epoch: 15 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:03,529-Speed 3398.81 samples/sec Loss 2.4598 LearningRate 0.0047 Epoch: 15 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:06,544-Speed 3397.18 samples/sec Loss 2.3078 LearningRate 0.0047 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:09,542-Speed 3416.70 samples/sec Loss 2.2904 LearningRate 0.0047 Epoch: 15 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:12,561-Speed 3393.58 samples/sec Loss 2.4256 LearningRate 0.0047 Epoch: 15 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:15,620-Speed 3347.78 samples/sec Loss 2.4384 LearningRate 0.0047 Epoch: 15 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:18,674-Speed 3353.87 samples/sec Loss 2.3424 LearningRate 0.0047 Epoch: 15 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:21,695-Speed 3391.47 samples/sec Loss 2.3432 LearningRate 0.0047 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:24,712-Speed 3394.10 samples/sec Loss 2.5418 LearningRate 0.0047 Epoch: 15 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:27,745-Speed 3377.24 samples/sec Loss 2.3372 LearningRate 0.0047 Epoch: 15 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:30,773-Speed 3383.18 samples/sec Loss 2.3642 LearningRate 0.0047 Epoch: 15 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:33,793-Speed 3391.83 samples/sec Loss 2.3581 LearningRate 0.0047 Epoch: 15 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:36,812-Speed 3392.24 samples/sec Loss 2.3581 LearningRate 0.0047 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:39,830-Speed 3394.87 samples/sec Loss 2.2282 LearningRate 0.0047 Epoch: 15 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 07:24:42,829-Speed 3415.28 samples/sec Loss 2.3652 LearningRate 0.0047 Epoch: 15 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:45,848-Speed 3393.02 samples/sec Loss 2.3337 LearningRate 0.0047 Epoch: 15 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:48,865-Speed 3394.33 samples/sec Loss 2.3347 LearningRate 0.0047 Epoch: 15 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:24:51,869-Speed 3409.51 samples/sec Loss 2.3460 LearningRate 0.0047 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:24:54,900-Speed 3380.02 samples/sec Loss 2.3919 LearningRate 0.0047 Epoch: 15 Global Step: 79310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:24:57,914-Speed 3398.20 samples/sec Loss 2.5114 LearningRate 0.0047 Epoch: 15 Global Step: 79320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:00,933-Speed 3392.43 samples/sec Loss 2.3541 LearningRate 0.0047 Epoch: 15 Global Step: 79330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:04,024-Speed 3313.96 samples/sec Loss 2.4443 LearningRate 0.0047 Epoch: 15 Global Step: 79340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:07,374-Speed 3058.05 samples/sec Loss 2.4080 LearningRate 0.0046 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:10,389-Speed 3396.67 samples/sec Loss 2.1968 LearningRate 0.0046 Epoch: 15 Global Step: 79360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:13,403-Speed 3398.50 samples/sec Loss 2.4866 LearningRate 0.0046 Epoch: 15 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:16,444-Speed 3369.04 samples/sec Loss 2.3945 LearningRate 0.0046 Epoch: 15 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:19,458-Speed 3398.14 samples/sec Loss 2.2531 LearningRate 0.0046 Epoch: 15 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:25:22,471-Speed 3399.52 samples/sec Loss 2.3343 LearningRate 0.0046 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:25,492-Speed 3389.76 samples/sec Loss 2.3322 LearningRate 0.0046 Epoch: 15 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:28,508-Speed 3397.25 samples/sec Loss 2.2873 LearningRate 0.0046 Epoch: 15 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:31,523-Speed 3396.67 samples/sec Loss 2.3747 LearningRate 0.0046 Epoch: 15 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:34,546-Speed 3388.52 samples/sec Loss 2.4241 LearningRate 0.0046 Epoch: 15 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:37,607-Speed 3346.53 samples/sec Loss 2.3784 LearningRate 0.0046 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:40,669-Speed 3344.49 samples/sec Loss 2.3016 LearningRate 0.0046 Epoch: 15 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:43,688-Speed 3392.62 samples/sec Loss 2.3892 LearningRate 0.0046 Epoch: 15 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:46,699-Speed 3401.58 samples/sec Loss 2.3379 LearningRate 0.0046 Epoch: 15 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:49,718-Speed 3393.17 samples/sec Loss 2.3488 LearningRate 0.0046 Epoch: 15 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:52,735-Speed 3395.30 samples/sec Loss 2.3295 LearningRate 0.0046 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:55,753-Speed 3393.05 samples/sec Loss 2.3740 LearningRate 0.0046 Epoch: 15 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:25:58,768-Speed 3397.54 samples/sec Loss 2.3085 LearningRate 0.0046 Epoch: 15 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:01,789-Speed 3390.79 samples/sec Loss 2.4175 LearningRate 0.0046 Epoch: 15 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:04,819-Speed 3380.12 samples/sec Loss 2.3642 LearningRate 0.0046 Epoch: 15 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:07,828-Speed 3404.65 samples/sec Loss 2.4312 LearningRate 0.0046 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:10,848-Speed 3392.58 samples/sec Loss 2.4109 LearningRate 0.0046 Epoch: 15 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:13,862-Speed 3397.66 samples/sec Loss 2.3809 LearningRate 0.0046 Epoch: 15 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:16,908-Speed 3362.84 samples/sec Loss 2.3035 LearningRate 0.0046 Epoch: 15 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:19,935-Speed 3383.97 samples/sec Loss 2.2693 LearningRate 0.0045 Epoch: 15 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:22,952-Speed 3395.51 samples/sec Loss 2.3002 LearningRate 0.0045 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:25,965-Speed 3399.13 samples/sec Loss 2.4050 LearningRate 0.0045 Epoch: 15 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:28,976-Speed 3401.40 samples/sec Loss 2.3774 LearningRate 0.0045 Epoch: 15 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:31,995-Speed 3393.13 samples/sec Loss 2.2827 LearningRate 0.0045 Epoch: 15 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:35,009-Speed 3398.94 samples/sec Loss 2.3521 LearningRate 0.0045 Epoch: 15 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:38,061-Speed 3356.54 samples/sec Loss 2.3523 LearningRate 0.0045 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:41,077-Speed 3395.97 samples/sec Loss 2.3780 LearningRate 0.0045 Epoch: 15 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:44,101-Speed 3387.23 samples/sec Loss 2.4048 LearningRate 0.0045 Epoch: 15 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:47,124-Speed 3388.25 samples/sec Loss 2.4890 LearningRate 0.0045 Epoch: 15 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:50,136-Speed 3400.32 samples/sec Loss 2.4122 LearningRate 0.0045 Epoch: 15 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:53,164-Speed 3383.26 samples/sec Loss 2.4114 LearningRate 0.0045 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:26:56,168-Speed 3408.92 samples/sec Loss 2.4138 LearningRate 0.0045 Epoch: 15 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:26:59,183-Speed 3397.70 samples/sec Loss 2.2760 LearningRate 0.0045 Epoch: 15 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:02,202-Speed 3392.39 samples/sec Loss 2.3790 LearningRate 0.0045 Epoch: 15 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:05,218-Speed 3396.49 samples/sec Loss 2.4293 LearningRate 0.0045 Epoch: 15 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:08,231-Speed 3399.13 samples/sec Loss 2.3654 LearningRate 0.0045 Epoch: 15 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:11,244-Speed 3399.97 samples/sec Loss 2.4503 LearningRate 0.0045 Epoch: 15 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:14,258-Speed 3398.36 samples/sec Loss 2.2931 LearningRate 0.0045 Epoch: 15 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:17,291-Speed 3377.49 samples/sec Loss 2.3231 LearningRate 0.0045 Epoch: 15 Global Step: 79780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:20,309-Speed 3393.15 samples/sec Loss 2.3652 LearningRate 0.0045 Epoch: 15 Global Step: 79790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:23,329-Speed 3392.12 samples/sec Loss 2.3536 LearningRate 0.0045 Epoch: 15 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:26,350-Speed 3389.86 samples/sec Loss 2.4462 LearningRate 0.0045 Epoch: 15 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:27:29,369-Speed 3393.30 samples/sec Loss 2.3780 LearningRate 0.0045 Epoch: 15 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:27:32,369-Speed 3414.36 samples/sec Loss 2.3124 LearningRate 0.0044 Epoch: 15 Global Step: 79830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:35,383-Speed 3398.30 samples/sec Loss 2.2544 LearningRate 0.0044 Epoch: 15 Global Step: 79840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:38,394-Speed 3402.20 samples/sec Loss 2.4144 LearningRate 0.0044 Epoch: 15 Global Step: 79850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:41,407-Speed 3398.90 samples/sec Loss 2.3746 LearningRate 0.0044 Epoch: 15 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:44,419-Speed 3400.81 samples/sec Loss 2.2095 LearningRate 0.0044 Epoch: 15 Global Step: 79870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:47,430-Speed 3401.72 samples/sec Loss 2.3694 LearningRate 0.0044 Epoch: 15 Global Step: 79880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:50,444-Speed 3398.36 samples/sec Loss 2.3677 LearningRate 0.0044 Epoch: 15 Global Step: 79890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:53,468-Speed 3386.82 samples/sec Loss 2.2829 LearningRate 0.0044 Epoch: 15 Global Step: 79900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:56,492-Speed 3387.54 samples/sec Loss 2.2542 LearningRate 0.0044 Epoch: 15 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:27:59,506-Speed 3398.53 samples/sec Loss 2.4698 LearningRate 0.0044 Epoch: 15 Global Step: 79920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:28:02,522-Speed 3396.23 samples/sec Loss 2.3644 LearningRate 0.0044 Epoch: 15 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:05,541-Speed 3393.32 samples/sec Loss 2.3864 LearningRate 0.0044 Epoch: 15 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:08,556-Speed 3397.04 samples/sec Loss 2.4451 LearningRate 0.0044 Epoch: 15 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:11,592-Speed 3373.31 samples/sec Loss 2.3564 LearningRate 0.0044 Epoch: 15 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:14,611-Speed 3393.09 samples/sec Loss 2.2801 LearningRate 0.0044 Epoch: 15 Global Step: 79970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:17,628-Speed 3395.28 samples/sec Loss 2.4942 LearningRate 0.0044 Epoch: 15 Global Step: 79980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:20,641-Speed 3399.41 samples/sec Loss 2.4069 LearningRate 0.0044 Epoch: 15 Global Step: 79990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:28:23,650-Speed 3404.10 samples/sec Loss 2.3614 LearningRate 0.0044 Epoch: 15 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:29:07,978-[lfw][80000]XNorm: 22.612949 Training: 2022-04-11 07:29:07,978-[lfw][80000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-04-11 07:29:07,979-[lfw][80000]Accuracy-Highest: 0.99850 Training: 2022-04-11 07:29:59,260-[cfp_fp][80000]XNorm: 22.071024 Training: 2022-04-11 07:29:59,261-[cfp_fp][80000]Accuracy-Flip: 0.98414+-0.00594 Training: 2022-04-11 07:29:59,261-[cfp_fp][80000]Accuracy-Highest: 0.98614 Training: 2022-04-11 07:30:43,337-[agedb_30][80000]XNorm: 22.798119 Training: 2022-04-11 07:30:43,337-[agedb_30][80000]Accuracy-Flip: 0.98550+-0.00582 Training: 2022-04-11 07:30:43,338-[agedb_30][80000]Accuracy-Highest: 0.98550 Training: 2022-04-11 07:30:46,350-Speed 71.76 samples/sec Loss 2.3717 LearningRate 0.0044 Epoch: 15 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:30:49,339-Speed 3427.01 samples/sec Loss 2.4076 LearningRate 0.0044 Epoch: 15 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:30:52,326-Speed 3428.93 samples/sec Loss 2.4681 LearningRate 0.0044 Epoch: 15 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 07:30:55,303-Speed 3441.52 samples/sec Loss 2.3086 LearningRate 0.0044 Epoch: 15 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:30:58,295-Speed 3422.80 samples/sec Loss 2.3055 LearningRate 0.0044 Epoch: 15 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:31:01,295-Speed 3414.29 samples/sec Loss 2.3724 LearningRate 0.0044 Epoch: 15 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:31:04,288-Speed 3422.60 samples/sec Loss 2.2990 LearningRate 0.0043 Epoch: 15 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:31:07,329-Speed 3367.95 samples/sec Loss 2.2324 LearningRate 0.0043 Epoch: 15 Global Step: 80080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:10,327-Speed 3416.25 samples/sec Loss 2.3661 LearningRate 0.0043 Epoch: 15 Global Step: 80090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:13,326-Speed 3415.06 samples/sec Loss 2.3955 LearningRate 0.0043 Epoch: 15 Global Step: 80100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:16,343-Speed 3395.52 samples/sec Loss 2.3705 LearningRate 0.0043 Epoch: 15 Global Step: 80110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:19,343-Speed 3415.02 samples/sec Loss 2.4157 LearningRate 0.0043 Epoch: 15 Global Step: 80120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:22,346-Speed 3410.03 samples/sec Loss 2.4449 LearningRate 0.0043 Epoch: 15 Global Step: 80130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:25,350-Speed 3409.61 samples/sec Loss 2.3576 LearningRate 0.0043 Epoch: 15 Global Step: 80140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:28,357-Speed 3406.94 samples/sec Loss 2.2548 LearningRate 0.0043 Epoch: 15 Global Step: 80150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:31,363-Speed 3406.56 samples/sec Loss 2.2605 LearningRate 0.0043 Epoch: 15 Global Step: 80160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:34,365-Speed 3411.81 samples/sec Loss 2.3838 LearningRate 0.0043 Epoch: 15 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:37,369-Speed 3410.70 samples/sec Loss 2.3973 LearningRate 0.0043 Epoch: 15 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:31:40,374-Speed 3408.22 samples/sec Loss 2.3001 LearningRate 0.0043 Epoch: 15 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:31:43,360-Speed 3430.11 samples/sec Loss 2.4000 LearningRate 0.0043 Epoch: 15 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:46,366-Speed 3407.46 samples/sec Loss 2.3743 LearningRate 0.0043 Epoch: 15 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:49,399-Speed 3377.03 samples/sec Loss 2.3596 LearningRate 0.0043 Epoch: 15 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:52,433-Speed 3376.23 samples/sec Loss 2.4137 LearningRate 0.0043 Epoch: 15 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:55,455-Speed 3389.73 samples/sec Loss 2.3921 LearningRate 0.0043 Epoch: 15 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:31:58,460-Speed 3408.45 samples/sec Loss 2.3257 LearningRate 0.0043 Epoch: 15 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:01,472-Speed 3399.99 samples/sec Loss 2.3531 LearningRate 0.0043 Epoch: 15 Global Step: 80260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:04,480-Speed 3406.85 samples/sec Loss 2.3081 LearningRate 0.0043 Epoch: 15 Global Step: 80270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:07,491-Speed 3401.67 samples/sec Loss 2.3325 LearningRate 0.0043 Epoch: 15 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:10,497-Speed 3407.65 samples/sec Loss 2.3832 LearningRate 0.0043 Epoch: 15 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:13,501-Speed 3409.66 samples/sec Loss 2.3104 LearningRate 0.0043 Epoch: 15 Global Step: 80300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:16,514-Speed 3398.93 samples/sec Loss 2.4144 LearningRate 0.0042 Epoch: 15 Global Step: 80310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:19,525-Speed 3402.29 samples/sec Loss 2.4106 LearningRate 0.0042 Epoch: 15 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:22,554-Speed 3381.77 samples/sec Loss 2.3792 LearningRate 0.0042 Epoch: 15 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:25,554-Speed 3414.33 samples/sec Loss 2.3032 LearningRate 0.0042 Epoch: 15 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:28,575-Speed 3390.51 samples/sec Loss 2.3676 LearningRate 0.0042 Epoch: 15 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:31,588-Speed 3400.04 samples/sec Loss 2.3752 LearningRate 0.0042 Epoch: 15 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:34,591-Speed 3410.85 samples/sec Loss 2.2940 LearningRate 0.0042 Epoch: 15 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:32:37,595-Speed 3409.21 samples/sec Loss 2.2950 LearningRate 0.0042 Epoch: 15 Global Step: 80380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:40,617-Speed 3389.68 samples/sec Loss 2.2658 LearningRate 0.0042 Epoch: 15 Global Step: 80390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:43,658-Speed 3368.51 samples/sec Loss 2.3768 LearningRate 0.0042 Epoch: 15 Global Step: 80400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:46,657-Speed 3415.32 samples/sec Loss 2.3815 LearningRate 0.0042 Epoch: 15 Global Step: 80410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:49,660-Speed 3410.01 samples/sec Loss 2.3968 LearningRate 0.0042 Epoch: 15 Global Step: 80420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:52,673-Speed 3399.96 samples/sec Loss 2.3478 LearningRate 0.0042 Epoch: 15 Global Step: 80430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:55,677-Speed 3411.77 samples/sec Loss 2.2595 LearningRate 0.0042 Epoch: 15 Global Step: 80440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:32:58,685-Speed 3404.64 samples/sec Loss 2.2209 LearningRate 0.0042 Epoch: 15 Global Step: 80450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:01,712-Speed 3383.76 samples/sec Loss 2.3925 LearningRate 0.0042 Epoch: 15 Global Step: 80460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:04,713-Speed 3412.78 samples/sec Loss 2.3747 LearningRate 0.0042 Epoch: 15 Global Step: 80470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:07,694-Speed 3436.45 samples/sec Loss 2.3288 LearningRate 0.0042 Epoch: 15 Global Step: 80480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:10,730-Speed 3374.05 samples/sec Loss 2.3716 LearningRate 0.0042 Epoch: 15 Global Step: 80490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:13,767-Speed 3372.26 samples/sec Loss 2.3214 LearningRate 0.0042 Epoch: 15 Global Step: 80500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:16,772-Speed 3408.60 samples/sec Loss 2.3770 LearningRate 0.0042 Epoch: 15 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:19,776-Speed 3409.26 samples/sec Loss 2.5175 LearningRate 0.0042 Epoch: 15 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:22,791-Speed 3397.66 samples/sec Loss 2.3617 LearningRate 0.0042 Epoch: 15 Global Step: 80530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:25,828-Speed 3372.55 samples/sec Loss 2.3969 LearningRate 0.0042 Epoch: 15 Global Step: 80540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:28,843-Speed 3397.09 samples/sec Loss 2.3864 LearningRate 0.0042 Epoch: 15 Global Step: 80550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:31,862-Speed 3392.90 samples/sec Loss 2.3925 LearningRate 0.0041 Epoch: 15 Global Step: 80560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:34,879-Speed 3394.94 samples/sec Loss 2.3972 LearningRate 0.0041 Epoch: 15 Global Step: 80570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:37,887-Speed 3405.66 samples/sec Loss 2.2605 LearningRate 0.0041 Epoch: 15 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:33:40,887-Speed 3413.73 samples/sec Loss 2.3352 LearningRate 0.0041 Epoch: 15 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:33:43,894-Speed 3406.39 samples/sec Loss 2.3687 LearningRate 0.0041 Epoch: 15 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:33:46,897-Speed 3410.75 samples/sec Loss 2.3407 LearningRate 0.0041 Epoch: 15 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:33:49,911-Speed 3398.37 samples/sec Loss 2.3487 LearningRate 0.0041 Epoch: 15 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:33:52,898-Speed 3429.43 samples/sec Loss 2.2629 LearningRate 0.0041 Epoch: 15 Global Step: 80630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:55,911-Speed 3399.65 samples/sec Loss 2.3278 LearningRate 0.0041 Epoch: 15 Global Step: 80640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:33:58,928-Speed 3394.45 samples/sec Loss 2.3345 LearningRate 0.0041 Epoch: 15 Global Step: 80650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:01,934-Speed 3408.14 samples/sec Loss 2.3650 LearningRate 0.0041 Epoch: 15 Global Step: 80660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:04,941-Speed 3406.41 samples/sec Loss 2.3356 LearningRate 0.0041 Epoch: 15 Global Step: 80670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:07,972-Speed 3378.51 samples/sec Loss 2.3527 LearningRate 0.0041 Epoch: 15 Global Step: 80680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:10,974-Speed 3412.63 samples/sec Loss 2.2742 LearningRate 0.0041 Epoch: 15 Global Step: 80690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:14,004-Speed 3380.47 samples/sec Loss 2.2487 LearningRate 0.0041 Epoch: 15 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:17,016-Speed 3400.25 samples/sec Loss 2.2711 LearningRate 0.0041 Epoch: 15 Global Step: 80710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:20,026-Speed 3404.13 samples/sec Loss 2.3857 LearningRate 0.0041 Epoch: 15 Global Step: 80720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:23,031-Speed 3407.85 samples/sec Loss 2.3791 LearningRate 0.0041 Epoch: 15 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:34:26,037-Speed 3407.82 samples/sec Loss 2.3897 LearningRate 0.0041 Epoch: 15 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:34:29,026-Speed 3426.84 samples/sec Loss 2.3662 LearningRate 0.0041 Epoch: 15 Global Step: 80750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:32,041-Speed 3396.87 samples/sec Loss 2.3807 LearningRate 0.0041 Epoch: 15 Global Step: 80760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:35,062-Speed 3390.96 samples/sec Loss 2.3639 LearningRate 0.0041 Epoch: 15 Global Step: 80770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:38,132-Speed 3336.25 samples/sec Loss 2.3297 LearningRate 0.0041 Epoch: 15 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:41,149-Speed 3395.10 samples/sec Loss 2.2747 LearningRate 0.0041 Epoch: 15 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:44,158-Speed 3404.25 samples/sec Loss 2.3059 LearningRate 0.0041 Epoch: 15 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:47,168-Speed 3402.46 samples/sec Loss 2.4015 LearningRate 0.0040 Epoch: 15 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:50,171-Speed 3411.07 samples/sec Loss 2.2942 LearningRate 0.0040 Epoch: 15 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:53,199-Speed 3382.48 samples/sec Loss 2.2543 LearningRate 0.0040 Epoch: 15 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:56,235-Speed 3374.07 samples/sec Loss 2.3495 LearningRate 0.0040 Epoch: 15 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:34:59,255-Speed 3391.38 samples/sec Loss 2.3359 LearningRate 0.0040 Epoch: 15 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:02,243-Speed 3427.39 samples/sec Loss 2.3156 LearningRate 0.0040 Epoch: 15 Global Step: 80860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:05,254-Speed 3402.32 samples/sec Loss 2.3179 LearningRate 0.0040 Epoch: 15 Global Step: 80870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:08,257-Speed 3410.01 samples/sec Loss 2.4104 LearningRate 0.0040 Epoch: 15 Global Step: 80880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:11,276-Speed 3393.54 samples/sec Loss 2.2435 LearningRate 0.0040 Epoch: 15 Global Step: 80890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:14,286-Speed 3402.76 samples/sec Loss 2.3227 LearningRate 0.0040 Epoch: 15 Global Step: 80900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:17,312-Speed 3384.57 samples/sec Loss 2.2904 LearningRate 0.0040 Epoch: 15 Global Step: 80910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:20,411-Speed 3305.41 samples/sec Loss 2.3343 LearningRate 0.0040 Epoch: 15 Global Step: 80920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:33,466-Speed 784.46 samples/sec Loss 2.1770 LearningRate 0.0040 Epoch: 16 Global Step: 80930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:36,491-Speed 3386.93 samples/sec Loss 1.6201 LearningRate 0.0040 Epoch: 16 Global Step: 80940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:39,523-Speed 3377.47 samples/sec Loss 1.7448 LearningRate 0.0040 Epoch: 16 Global Step: 80950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:35:42,539-Speed 3396.53 samples/sec Loss 1.6103 LearningRate 0.0040 Epoch: 16 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:45,560-Speed 3390.92 samples/sec Loss 1.6768 LearningRate 0.0040 Epoch: 16 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:48,585-Speed 3386.41 samples/sec Loss 1.6491 LearningRate 0.0040 Epoch: 16 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:51,591-Speed 3406.80 samples/sec Loss 1.7085 LearningRate 0.0040 Epoch: 16 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:54,600-Speed 3403.86 samples/sec Loss 1.6586 LearningRate 0.0040 Epoch: 16 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:35:57,620-Speed 3391.48 samples/sec Loss 1.7052 LearningRate 0.0040 Epoch: 16 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:00,631-Speed 3402.34 samples/sec Loss 1.6895 LearningRate 0.0040 Epoch: 16 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:03,643-Speed 3400.46 samples/sec Loss 1.6915 LearningRate 0.0040 Epoch: 16 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:06,637-Speed 3421.26 samples/sec Loss 1.6344 LearningRate 0.0040 Epoch: 16 Global Step: 81040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:09,653-Speed 3396.05 samples/sec Loss 1.6491 LearningRate 0.0040 Epoch: 16 Global Step: 81050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:12,674-Speed 3390.28 samples/sec Loss 1.6585 LearningRate 0.0039 Epoch: 16 Global Step: 81060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:15,762-Speed 3317.54 samples/sec Loss 1.6738 LearningRate 0.0039 Epoch: 16 Global Step: 81070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:18,814-Speed 3356.02 samples/sec Loss 1.6946 LearningRate 0.0039 Epoch: 16 Global Step: 81080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:21,826-Speed 3401.28 samples/sec Loss 1.6237 LearningRate 0.0039 Epoch: 16 Global Step: 81090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:24,830-Speed 3409.03 samples/sec Loss 1.6493 LearningRate 0.0039 Epoch: 16 Global Step: 81100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:27,861-Speed 3379.03 samples/sec Loss 1.7701 LearningRate 0.0039 Epoch: 16 Global Step: 81110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:30,911-Speed 3358.84 samples/sec Loss 1.6793 LearningRate 0.0039 Epoch: 16 Global Step: 81120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:33,946-Speed 3375.08 samples/sec Loss 1.6237 LearningRate 0.0039 Epoch: 16 Global Step: 81130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:36:36,952-Speed 3407.05 samples/sec Loss 1.6982 LearningRate 0.0039 Epoch: 16 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:40,003-Speed 3356.77 samples/sec Loss 1.6096 LearningRate 0.0039 Epoch: 16 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:43,018-Speed 3397.38 samples/sec Loss 1.5395 LearningRate 0.0039 Epoch: 16 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:46,040-Speed 3390.34 samples/sec Loss 1.6944 LearningRate 0.0039 Epoch: 16 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:49,068-Speed 3382.42 samples/sec Loss 1.6874 LearningRate 0.0039 Epoch: 16 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:52,082-Speed 3397.52 samples/sec Loss 1.6802 LearningRate 0.0039 Epoch: 16 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:55,139-Speed 3351.59 samples/sec Loss 1.6430 LearningRate 0.0039 Epoch: 16 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:36:58,148-Speed 3403.74 samples/sec Loss 1.7025 LearningRate 0.0039 Epoch: 16 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:01,160-Speed 3400.28 samples/sec Loss 1.6272 LearningRate 0.0039 Epoch: 16 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:04,166-Speed 3406.96 samples/sec Loss 1.6475 LearningRate 0.0039 Epoch: 16 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:07,157-Speed 3425.08 samples/sec Loss 1.6376 LearningRate 0.0039 Epoch: 16 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:10,166-Speed 3404.06 samples/sec Loss 1.7275 LearningRate 0.0039 Epoch: 16 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:13,189-Speed 3388.51 samples/sec Loss 1.6570 LearningRate 0.0039 Epoch: 16 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:16,201-Speed 3400.32 samples/sec Loss 1.7174 LearningRate 0.0039 Epoch: 16 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:19,209-Speed 3405.21 samples/sec Loss 1.6847 LearningRate 0.0039 Epoch: 16 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:22,215-Speed 3407.11 samples/sec Loss 1.7620 LearningRate 0.0039 Epoch: 16 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:25,231-Speed 3396.46 samples/sec Loss 1.7026 LearningRate 0.0039 Epoch: 16 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:28,270-Speed 3370.55 samples/sec Loss 1.6240 LearningRate 0.0039 Epoch: 16 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:37:31,266-Speed 3418.06 samples/sec Loss 1.7849 LearningRate 0.0038 Epoch: 16 Global Step: 81320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:34,272-Speed 3407.74 samples/sec Loss 1.7197 LearningRate 0.0038 Epoch: 16 Global Step: 81330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:37,299-Speed 3383.37 samples/sec Loss 1.6556 LearningRate 0.0038 Epoch: 16 Global Step: 81340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:40,316-Speed 3395.63 samples/sec Loss 1.7220 LearningRate 0.0038 Epoch: 16 Global Step: 81350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:43,352-Speed 3373.35 samples/sec Loss 1.8375 LearningRate 0.0038 Epoch: 16 Global Step: 81360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:46,420-Speed 3339.83 samples/sec Loss 1.7401 LearningRate 0.0038 Epoch: 16 Global Step: 81370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:49,459-Speed 3369.76 samples/sec Loss 1.7394 LearningRate 0.0038 Epoch: 16 Global Step: 81380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:52,493-Speed 3375.38 samples/sec Loss 1.6667 LearningRate 0.0038 Epoch: 16 Global Step: 81390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:55,520-Speed 3385.02 samples/sec Loss 1.6769 LearningRate 0.0038 Epoch: 16 Global Step: 81400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:37:58,544-Speed 3387.34 samples/sec Loss 1.6185 LearningRate 0.0038 Epoch: 16 Global Step: 81410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:38:01,557-Speed 3399.97 samples/sec Loss 1.7172 LearningRate 0.0038 Epoch: 16 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:04,568-Speed 3401.36 samples/sec Loss 1.6817 LearningRate 0.0038 Epoch: 16 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:07,579-Speed 3401.00 samples/sec Loss 1.7199 LearningRate 0.0038 Epoch: 16 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:10,587-Speed 3405.75 samples/sec Loss 1.7348 LearningRate 0.0038 Epoch: 16 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:13,595-Speed 3405.33 samples/sec Loss 1.6691 LearningRate 0.0038 Epoch: 16 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:16,616-Speed 3390.16 samples/sec Loss 1.7767 LearningRate 0.0038 Epoch: 16 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:19,635-Speed 3392.66 samples/sec Loss 1.6486 LearningRate 0.0038 Epoch: 16 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:22,645-Speed 3402.64 samples/sec Loss 1.7795 LearningRate 0.0038 Epoch: 16 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:25,669-Speed 3387.32 samples/sec Loss 1.7690 LearningRate 0.0038 Epoch: 16 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:28,686-Speed 3396.74 samples/sec Loss 1.6790 LearningRate 0.0038 Epoch: 16 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:31,681-Speed 3420.20 samples/sec Loss 1.6958 LearningRate 0.0038 Epoch: 16 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:34,691-Speed 3402.48 samples/sec Loss 1.6528 LearningRate 0.0038 Epoch: 16 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:37,756-Speed 3341.64 samples/sec Loss 1.6936 LearningRate 0.0038 Epoch: 16 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:40,769-Speed 3399.84 samples/sec Loss 1.7996 LearningRate 0.0038 Epoch: 16 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:43,780-Speed 3400.85 samples/sec Loss 1.7488 LearningRate 0.0038 Epoch: 16 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:46,800-Speed 3391.88 samples/sec Loss 1.7688 LearningRate 0.0038 Epoch: 16 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:49,818-Speed 3393.86 samples/sec Loss 1.6950 LearningRate 0.0037 Epoch: 16 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:38:52,811-Speed 3422.04 samples/sec Loss 1.7294 LearningRate 0.0037 Epoch: 16 Global Step: 81590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:38:55,824-Speed 3399.48 samples/sec Loss 1.7688 LearningRate 0.0037 Epoch: 16 Global Step: 81600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:38:58,836-Speed 3401.01 samples/sec Loss 1.7614 LearningRate 0.0037 Epoch: 16 Global Step: 81610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:01,851-Speed 3397.86 samples/sec Loss 1.7379 LearningRate 0.0037 Epoch: 16 Global Step: 81620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:04,866-Speed 3396.49 samples/sec Loss 1.6556 LearningRate 0.0037 Epoch: 16 Global Step: 81630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:07,911-Speed 3363.94 samples/sec Loss 1.7744 LearningRate 0.0037 Epoch: 16 Global Step: 81640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:10,919-Speed 3404.52 samples/sec Loss 1.7833 LearningRate 0.0037 Epoch: 16 Global Step: 81650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:13,931-Speed 3401.53 samples/sec Loss 1.7596 LearningRate 0.0037 Epoch: 16 Global Step: 81660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:16,951-Speed 3392.52 samples/sec Loss 1.7271 LearningRate 0.0037 Epoch: 16 Global Step: 81670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:19,960-Speed 3403.27 samples/sec Loss 1.7956 LearningRate 0.0037 Epoch: 16 Global Step: 81680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:22,954-Speed 3421.66 samples/sec Loss 1.8069 LearningRate 0.0037 Epoch: 16 Global Step: 81690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:25,986-Speed 3377.71 samples/sec Loss 1.7464 LearningRate 0.0037 Epoch: 16 Global Step: 81700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:29,035-Speed 3359.97 samples/sec Loss 1.6564 LearningRate 0.0037 Epoch: 16 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:32,049-Speed 3398.53 samples/sec Loss 1.8433 LearningRate 0.0037 Epoch: 16 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:35,063-Speed 3397.79 samples/sec Loss 1.7720 LearningRate 0.0037 Epoch: 16 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:38,077-Speed 3398.05 samples/sec Loss 1.8142 LearningRate 0.0037 Epoch: 16 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:41,088-Speed 3401.42 samples/sec Loss 1.7626 LearningRate 0.0037 Epoch: 16 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:44,104-Speed 3396.67 samples/sec Loss 1.8372 LearningRate 0.0037 Epoch: 16 Global Step: 81760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:47,129-Speed 3385.90 samples/sec Loss 1.8104 LearningRate 0.0037 Epoch: 16 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:50,142-Speed 3399.05 samples/sec Loss 1.7463 LearningRate 0.0037 Epoch: 16 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:39:53,155-Speed 3400.09 samples/sec Loss 1.7286 LearningRate 0.0037 Epoch: 16 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:39:56,171-Speed 3396.12 samples/sec Loss 1.7636 LearningRate 0.0037 Epoch: 16 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:39:59,195-Speed 3387.47 samples/sec Loss 1.6516 LearningRate 0.0037 Epoch: 16 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:02,223-Speed 3382.34 samples/sec Loss 1.8102 LearningRate 0.0037 Epoch: 16 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:05,236-Speed 3399.34 samples/sec Loss 1.7605 LearningRate 0.0037 Epoch: 16 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:08,252-Speed 3395.96 samples/sec Loss 1.8115 LearningRate 0.0036 Epoch: 16 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:11,299-Speed 3361.92 samples/sec Loss 1.7806 LearningRate 0.0036 Epoch: 16 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:14,325-Speed 3386.29 samples/sec Loss 1.7200 LearningRate 0.0036 Epoch: 16 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:17,343-Speed 3393.55 samples/sec Loss 1.7823 LearningRate 0.0036 Epoch: 16 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:20,356-Speed 3399.29 samples/sec Loss 1.7494 LearningRate 0.0036 Epoch: 16 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:23,370-Speed 3398.44 samples/sec Loss 1.7747 LearningRate 0.0036 Epoch: 16 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 07:40:26,361-Speed 3424.81 samples/sec Loss 1.7654 LearningRate 0.0036 Epoch: 16 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:29,377-Speed 3395.73 samples/sec Loss 1.7598 LearningRate 0.0036 Epoch: 16 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:32,396-Speed 3392.95 samples/sec Loss 1.8004 LearningRate 0.0036 Epoch: 16 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:35,413-Speed 3395.47 samples/sec Loss 1.7880 LearningRate 0.0036 Epoch: 16 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:38,440-Speed 3383.24 samples/sec Loss 1.7369 LearningRate 0.0036 Epoch: 16 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:41,456-Speed 3396.59 samples/sec Loss 1.7408 LearningRate 0.0036 Epoch: 16 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:44,469-Speed 3399.97 samples/sec Loss 1.8241 LearningRate 0.0036 Epoch: 16 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:47,483-Speed 3397.44 samples/sec Loss 1.8706 LearningRate 0.0036 Epoch: 16 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:50,495-Speed 3400.86 samples/sec Loss 1.8013 LearningRate 0.0036 Epoch: 16 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:53,507-Speed 3401.37 samples/sec Loss 1.8026 LearningRate 0.0036 Epoch: 16 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:40:56,524-Speed 3394.89 samples/sec Loss 1.8007 LearningRate 0.0036 Epoch: 16 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:41:40,886-[lfw][82000]XNorm: 22.328598 Training: 2022-04-11 07:41:40,887-[lfw][82000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 07:41:40,887-[lfw][82000]Accuracy-Highest: 0.99850 Training: 2022-04-11 07:42:32,182-[cfp_fp][82000]XNorm: 21.870537 Training: 2022-04-11 07:42:32,183-[cfp_fp][82000]Accuracy-Flip: 0.98686+-0.00578 Training: 2022-04-11 07:42:32,184-[cfp_fp][82000]Accuracy-Highest: 0.98686 Training: 2022-04-11 07:43:16,173-[agedb_30][82000]XNorm: 22.679104 Training: 2022-04-11 07:43:16,173-[agedb_30][82000]Accuracy-Flip: 0.98467+-0.00722 Training: 2022-04-11 07:43:16,174-[agedb_30][82000]Accuracy-Highest: 0.98550 Training: 2022-04-11 07:43:19,177-Speed 71.78 samples/sec Loss 1.7585 LearningRate 0.0036 Epoch: 16 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:22,172-Speed 3419.74 samples/sec Loss 1.7360 LearningRate 0.0036 Epoch: 16 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:25,167-Speed 3420.18 samples/sec Loss 1.8161 LearningRate 0.0036 Epoch: 16 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:28,164-Speed 3417.31 samples/sec Loss 1.8582 LearningRate 0.0036 Epoch: 16 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:31,171-Speed 3406.12 samples/sec Loss 1.6539 LearningRate 0.0036 Epoch: 16 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:34,159-Speed 3428.81 samples/sec Loss 1.7576 LearningRate 0.0036 Epoch: 16 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:37,154-Speed 3419.98 samples/sec Loss 1.8039 LearningRate 0.0036 Epoch: 16 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:40,152-Speed 3416.50 samples/sec Loss 1.9447 LearningRate 0.0036 Epoch: 16 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:43,146-Speed 3420.78 samples/sec Loss 1.7864 LearningRate 0.0036 Epoch: 16 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:46,122-Speed 3442.13 samples/sec Loss 1.7777 LearningRate 0.0035 Epoch: 16 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:49,120-Speed 3416.95 samples/sec Loss 1.7604 LearningRate 0.0035 Epoch: 16 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:52,117-Speed 3417.39 samples/sec Loss 1.6786 LearningRate 0.0035 Epoch: 16 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:55,127-Speed 3402.20 samples/sec Loss 1.8197 LearningRate 0.0035 Epoch: 16 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:43:58,121-Speed 3420.74 samples/sec Loss 1.7635 LearningRate 0.0035 Epoch: 16 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:01,133-Speed 3401.18 samples/sec Loss 1.7998 LearningRate 0.0035 Epoch: 16 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:04,133-Speed 3414.43 samples/sec Loss 1.8390 LearningRate 0.0035 Epoch: 16 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:07,132-Speed 3415.17 samples/sec Loss 1.7890 LearningRate 0.0035 Epoch: 16 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:10,133-Speed 3413.17 samples/sec Loss 1.8282 LearningRate 0.0035 Epoch: 16 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:13,144-Speed 3401.36 samples/sec Loss 1.8477 LearningRate 0.0035 Epoch: 16 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:16,139-Speed 3419.94 samples/sec Loss 1.8890 LearningRate 0.0035 Epoch: 16 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:19,139-Speed 3414.91 samples/sec Loss 1.7553 LearningRate 0.0035 Epoch: 16 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:22,140-Speed 3412.07 samples/sec Loss 1.8087 LearningRate 0.0035 Epoch: 16 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:25,146-Speed 3407.37 samples/sec Loss 1.6612 LearningRate 0.0035 Epoch: 16 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:28,164-Speed 3394.13 samples/sec Loss 1.8042 LearningRate 0.0035 Epoch: 16 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:31,165-Speed 3412.82 samples/sec Loss 1.8162 LearningRate 0.0035 Epoch: 16 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:34,177-Speed 3401.77 samples/sec Loss 1.8858 LearningRate 0.0035 Epoch: 16 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:37,183-Speed 3406.76 samples/sec Loss 1.8627 LearningRate 0.0035 Epoch: 16 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:40,197-Speed 3398.34 samples/sec Loss 1.8039 LearningRate 0.0035 Epoch: 16 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:43,204-Speed 3405.71 samples/sec Loss 1.7635 LearningRate 0.0035 Epoch: 16 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:46,193-Speed 3427.26 samples/sec Loss 1.8557 LearningRate 0.0035 Epoch: 16 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:49,205-Speed 3401.00 samples/sec Loss 1.8199 LearningRate 0.0035 Epoch: 16 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:52,227-Speed 3388.33 samples/sec Loss 1.9413 LearningRate 0.0035 Epoch: 16 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:55,226-Speed 3415.45 samples/sec Loss 1.9084 LearningRate 0.0035 Epoch: 16 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:44:58,232-Speed 3407.82 samples/sec Loss 1.7817 LearningRate 0.0035 Epoch: 16 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:01,244-Speed 3400.58 samples/sec Loss 1.7572 LearningRate 0.0035 Epoch: 16 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:04,252-Speed 3405.73 samples/sec Loss 1.7822 LearningRate 0.0035 Epoch: 16 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:07,260-Speed 3404.95 samples/sec Loss 1.7609 LearningRate 0.0035 Epoch: 16 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:10,261-Speed 3412.25 samples/sec Loss 1.8191 LearningRate 0.0034 Epoch: 16 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:13,275-Speed 3399.76 samples/sec Loss 1.8091 LearningRate 0.0034 Epoch: 16 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:16,264-Speed 3426.29 samples/sec Loss 1.7860 LearningRate 0.0034 Epoch: 16 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:19,274-Speed 3402.63 samples/sec Loss 1.7446 LearningRate 0.0034 Epoch: 16 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:22,273-Speed 3415.04 samples/sec Loss 1.7747 LearningRate 0.0034 Epoch: 16 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:25,286-Speed 3400.05 samples/sec Loss 1.6929 LearningRate 0.0034 Epoch: 16 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:28,302-Speed 3396.44 samples/sec Loss 1.8072 LearningRate 0.0034 Epoch: 16 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:31,311-Speed 3404.35 samples/sec Loss 1.7533 LearningRate 0.0034 Epoch: 16 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:34,315-Speed 3409.11 samples/sec Loss 1.7294 LearningRate 0.0034 Epoch: 16 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:37,321-Speed 3407.14 samples/sec Loss 1.7684 LearningRate 0.0034 Epoch: 16 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:40,333-Speed 3400.73 samples/sec Loss 1.8226 LearningRate 0.0034 Epoch: 16 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:43,338-Speed 3408.57 samples/sec Loss 1.7911 LearningRate 0.0034 Epoch: 16 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:46,331-Speed 3422.50 samples/sec Loss 1.8782 LearningRate 0.0034 Epoch: 16 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:49,356-Speed 3385.98 samples/sec Loss 1.7903 LearningRate 0.0034 Epoch: 16 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:52,366-Speed 3402.90 samples/sec Loss 1.8631 LearningRate 0.0034 Epoch: 16 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:45:55,363-Speed 3417.94 samples/sec Loss 1.8092 LearningRate 0.0034 Epoch: 16 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:45:58,366-Speed 3411.19 samples/sec Loss 1.8587 LearningRate 0.0034 Epoch: 16 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:01,371-Speed 3407.84 samples/sec Loss 1.7897 LearningRate 0.0034 Epoch: 16 Global Step: 82550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:04,394-Speed 3387.95 samples/sec Loss 1.8537 LearningRate 0.0034 Epoch: 16 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:07,421-Speed 3383.81 samples/sec Loss 1.7736 LearningRate 0.0034 Epoch: 16 Global Step: 82570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:10,431-Speed 3402.55 samples/sec Loss 1.8353 LearningRate 0.0034 Epoch: 16 Global Step: 82580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:13,451-Speed 3392.60 samples/sec Loss 1.8624 LearningRate 0.0034 Epoch: 16 Global Step: 82590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:16,458-Speed 3405.95 samples/sec Loss 1.8037 LearningRate 0.0034 Epoch: 16 Global Step: 82600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:19,465-Speed 3406.76 samples/sec Loss 1.8759 LearningRate 0.0034 Epoch: 16 Global Step: 82610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:22,469-Speed 3409.29 samples/sec Loss 1.7606 LearningRate 0.0034 Epoch: 16 Global Step: 82620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:25,475-Speed 3407.59 samples/sec Loss 1.8530 LearningRate 0.0034 Epoch: 16 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:46:28,497-Speed 3390.08 samples/sec Loss 1.7709 LearningRate 0.0034 Epoch: 16 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:46:31,490-Speed 3421.74 samples/sec Loss 1.8769 LearningRate 0.0033 Epoch: 16 Global Step: 82650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:34,494-Speed 3409.83 samples/sec Loss 1.8508 LearningRate 0.0033 Epoch: 16 Global Step: 82660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:37,512-Speed 3393.68 samples/sec Loss 1.8392 LearningRate 0.0033 Epoch: 16 Global Step: 82670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:40,519-Speed 3405.87 samples/sec Loss 1.6784 LearningRate 0.0033 Epoch: 16 Global Step: 82680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:43,532-Speed 3399.81 samples/sec Loss 1.8262 LearningRate 0.0033 Epoch: 16 Global Step: 82690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:46,584-Speed 3356.14 samples/sec Loss 1.8355 LearningRate 0.0033 Epoch: 16 Global Step: 82700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:49,592-Speed 3404.94 samples/sec Loss 1.8094 LearningRate 0.0033 Epoch: 16 Global Step: 82710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:52,618-Speed 3384.93 samples/sec Loss 1.8496 LearningRate 0.0033 Epoch: 16 Global Step: 82720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:55,645-Speed 3383.26 samples/sec Loss 1.8344 LearningRate 0.0033 Epoch: 16 Global Step: 82730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:46:58,656-Speed 3402.34 samples/sec Loss 1.8446 LearningRate 0.0033 Epoch: 16 Global Step: 82740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:01,661-Speed 3408.03 samples/sec Loss 1.7235 LearningRate 0.0033 Epoch: 16 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:04,670-Speed 3403.97 samples/sec Loss 1.7450 LearningRate 0.0033 Epoch: 16 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:07,694-Speed 3387.67 samples/sec Loss 1.8266 LearningRate 0.0033 Epoch: 16 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:10,713-Speed 3392.41 samples/sec Loss 1.7767 LearningRate 0.0033 Epoch: 16 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:13,708-Speed 3420.75 samples/sec Loss 1.8384 LearningRate 0.0033 Epoch: 16 Global Step: 82790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:16,717-Speed 3403.91 samples/sec Loss 1.7926 LearningRate 0.0033 Epoch: 16 Global Step: 82800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:19,723-Speed 3406.61 samples/sec Loss 1.7828 LearningRate 0.0033 Epoch: 16 Global Step: 82810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:22,738-Speed 3398.18 samples/sec Loss 1.8123 LearningRate 0.0033 Epoch: 16 Global Step: 82820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:25,757-Speed 3391.76 samples/sec Loss 1.8253 LearningRate 0.0033 Epoch: 16 Global Step: 82830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:28,773-Speed 3396.41 samples/sec Loss 1.8416 LearningRate 0.0033 Epoch: 16 Global Step: 82840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:31,792-Speed 3392.24 samples/sec Loss 1.7914 LearningRate 0.0033 Epoch: 16 Global Step: 82850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:34,809-Speed 3395.62 samples/sec Loss 1.8519 LearningRate 0.0033 Epoch: 16 Global Step: 82860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:37,824-Speed 3396.94 samples/sec Loss 1.8917 LearningRate 0.0033 Epoch: 16 Global Step: 82870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:40,834-Speed 3402.70 samples/sec Loss 1.8566 LearningRate 0.0033 Epoch: 16 Global Step: 82880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:47:43,841-Speed 3406.42 samples/sec Loss 1.8218 LearningRate 0.0033 Epoch: 16 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:46,864-Speed 3388.68 samples/sec Loss 1.7862 LearningRate 0.0033 Epoch: 16 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:49,872-Speed 3404.46 samples/sec Loss 1.8065 LearningRate 0.0033 Epoch: 16 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:52,880-Speed 3405.47 samples/sec Loss 1.8516 LearningRate 0.0033 Epoch: 16 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:55,892-Speed 3400.25 samples/sec Loss 1.7356 LearningRate 0.0032 Epoch: 16 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:47:58,905-Speed 3399.88 samples/sec Loss 1.8642 LearningRate 0.0032 Epoch: 16 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:01,925-Speed 3390.91 samples/sec Loss 1.8023 LearningRate 0.0032 Epoch: 16 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:04,968-Speed 3366.71 samples/sec Loss 1.7732 LearningRate 0.0032 Epoch: 16 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:07,978-Speed 3403.12 samples/sec Loss 1.8509 LearningRate 0.0032 Epoch: 16 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:11,015-Speed 3373.65 samples/sec Loss 1.9480 LearningRate 0.0032 Epoch: 16 Global Step: 82980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:14,039-Speed 3387.28 samples/sec Loss 1.8860 LearningRate 0.0032 Epoch: 16 Global Step: 82990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:17,050-Speed 3401.86 samples/sec Loss 1.7968 LearningRate 0.0032 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:20,056-Speed 3406.54 samples/sec Loss 1.7667 LearningRate 0.0032 Epoch: 16 Global Step: 83010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:23,064-Speed 3405.23 samples/sec Loss 1.9236 LearningRate 0.0032 Epoch: 16 Global Step: 83020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:26,074-Speed 3402.67 samples/sec Loss 1.8357 LearningRate 0.0032 Epoch: 16 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:29,130-Speed 3351.41 samples/sec Loss 1.7920 LearningRate 0.0032 Epoch: 16 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:32,144-Speed 3398.78 samples/sec Loss 1.8252 LearningRate 0.0032 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:35,157-Speed 3399.53 samples/sec Loss 1.8443 LearningRate 0.0032 Epoch: 16 Global Step: 83060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:38,181-Speed 3387.74 samples/sec Loss 1.7747 LearningRate 0.0032 Epoch: 16 Global Step: 83070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:41,242-Speed 3345.47 samples/sec Loss 1.8081 LearningRate 0.0032 Epoch: 16 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:44,271-Speed 3382.03 samples/sec Loss 1.8399 LearningRate 0.0032 Epoch: 16 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:47,286-Speed 3397.35 samples/sec Loss 1.8741 LearningRate 0.0032 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:50,300-Speed 3398.15 samples/sec Loss 1.8174 LearningRate 0.0032 Epoch: 16 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:48:53,294-Speed 3421.63 samples/sec Loss 1.8485 LearningRate 0.0032 Epoch: 16 Global Step: 83120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:56,307-Speed 3399.04 samples/sec Loss 1.7908 LearningRate 0.0032 Epoch: 16 Global Step: 83130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:48:59,315-Speed 3405.44 samples/sec Loss 1.7907 LearningRate 0.0032 Epoch: 16 Global Step: 83140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:02,339-Speed 3386.39 samples/sec Loss 1.7875 LearningRate 0.0032 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:05,426-Speed 3318.16 samples/sec Loss 1.8032 LearningRate 0.0032 Epoch: 16 Global Step: 83160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:08,443-Speed 3394.80 samples/sec Loss 1.8322 LearningRate 0.0032 Epoch: 16 Global Step: 83170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:11,450-Speed 3406.81 samples/sec Loss 1.8332 LearningRate 0.0032 Epoch: 16 Global Step: 83180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:14,491-Speed 3368.73 samples/sec Loss 1.8661 LearningRate 0.0032 Epoch: 16 Global Step: 83190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:17,504-Speed 3399.14 samples/sec Loss 1.7960 LearningRate 0.0032 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:20,513-Speed 3403.54 samples/sec Loss 1.8648 LearningRate 0.0031 Epoch: 16 Global Step: 83210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:23,532-Speed 3392.35 samples/sec Loss 1.9032 LearningRate 0.0031 Epoch: 16 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:49:26,540-Speed 3405.31 samples/sec Loss 1.8311 LearningRate 0.0031 Epoch: 16 Global Step: 83230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:29,563-Speed 3388.14 samples/sec Loss 1.7905 LearningRate 0.0031 Epoch: 16 Global Step: 83240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:32,572-Speed 3404.52 samples/sec Loss 1.7233 LearningRate 0.0031 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:35,599-Speed 3383.11 samples/sec Loss 1.8608 LearningRate 0.0031 Epoch: 16 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:38,618-Speed 3393.44 samples/sec Loss 1.8108 LearningRate 0.0031 Epoch: 16 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:41,629-Speed 3402.32 samples/sec Loss 1.8218 LearningRate 0.0031 Epoch: 16 Global Step: 83280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:44,653-Speed 3387.63 samples/sec Loss 1.7984 LearningRate 0.0031 Epoch: 16 Global Step: 83290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:47,666-Speed 3399.54 samples/sec Loss 1.8287 LearningRate 0.0031 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:50,798-Speed 3269.80 samples/sec Loss 1.8882 LearningRate 0.0031 Epoch: 16 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:53,847-Speed 3359.47 samples/sec Loss 1.8200 LearningRate 0.0031 Epoch: 16 Global Step: 83320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:49:56,863-Speed 3395.78 samples/sec Loss 1.8237 LearningRate 0.0031 Epoch: 16 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:49:59,860-Speed 3417.71 samples/sec Loss 1.8277 LearningRate 0.0031 Epoch: 16 Global Step: 83340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:02,882-Speed 3390.18 samples/sec Loss 1.8861 LearningRate 0.0031 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:05,903-Speed 3389.73 samples/sec Loss 1.7787 LearningRate 0.0031 Epoch: 16 Global Step: 83360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:08,915-Speed 3401.89 samples/sec Loss 1.8589 LearningRate 0.0031 Epoch: 16 Global Step: 83370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:11,949-Speed 3375.74 samples/sec Loss 1.7561 LearningRate 0.0031 Epoch: 16 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:15,044-Speed 3308.74 samples/sec Loss 1.7359 LearningRate 0.0031 Epoch: 16 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:18,062-Speed 3393.90 samples/sec Loss 1.8378 LearningRate 0.0031 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:21,071-Speed 3403.93 samples/sec Loss 1.8672 LearningRate 0.0031 Epoch: 16 Global Step: 83410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:24,078-Speed 3406.50 samples/sec Loss 1.7899 LearningRate 0.0031 Epoch: 16 Global Step: 83420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:27,090-Speed 3400.17 samples/sec Loss 1.8926 LearningRate 0.0031 Epoch: 16 Global Step: 83430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:30,111-Speed 3390.52 samples/sec Loss 1.9153 LearningRate 0.0031 Epoch: 16 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:50:33,108-Speed 3417.40 samples/sec Loss 1.7245 LearningRate 0.0031 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:36,125-Speed 3395.20 samples/sec Loss 1.8441 LearningRate 0.0031 Epoch: 16 Global Step: 83460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:39,145-Speed 3391.97 samples/sec Loss 1.8335 LearningRate 0.0031 Epoch: 16 Global Step: 83470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:42,180-Speed 3375.36 samples/sec Loss 1.8735 LearningRate 0.0031 Epoch: 16 Global Step: 83480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:45,188-Speed 3404.92 samples/sec Loss 1.7881 LearningRate 0.0031 Epoch: 16 Global Step: 83490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:48,194-Speed 3407.31 samples/sec Loss 1.8236 LearningRate 0.0030 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:51,204-Speed 3402.45 samples/sec Loss 1.8562 LearningRate 0.0030 Epoch: 16 Global Step: 83510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:54,221-Speed 3396.18 samples/sec Loss 1.7709 LearningRate 0.0030 Epoch: 16 Global Step: 83520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:50:57,253-Speed 3378.64 samples/sec Loss 1.7723 LearningRate 0.0030 Epoch: 16 Global Step: 83530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:51:00,266-Speed 3398.65 samples/sec Loss 1.8157 LearningRate 0.0030 Epoch: 16 Global Step: 83540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:51:03,279-Speed 3400.27 samples/sec Loss 1.7362 LearningRate 0.0030 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:06,292-Speed 3399.89 samples/sec Loss 1.8805 LearningRate 0.0030 Epoch: 16 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:09,302-Speed 3402.51 samples/sec Loss 1.8455 LearningRate 0.0030 Epoch: 16 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:12,321-Speed 3392.37 samples/sec Loss 1.9228 LearningRate 0.0030 Epoch: 16 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:15,337-Speed 3396.34 samples/sec Loss 1.8656 LearningRate 0.0030 Epoch: 16 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:18,357-Speed 3392.79 samples/sec Loss 1.9530 LearningRate 0.0030 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:21,387-Speed 3380.18 samples/sec Loss 1.7501 LearningRate 0.0030 Epoch: 16 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:24,417-Speed 3380.58 samples/sec Loss 1.7739 LearningRate 0.0030 Epoch: 16 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:27,436-Speed 3393.21 samples/sec Loss 1.8189 LearningRate 0.0030 Epoch: 16 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:30,563-Speed 3275.15 samples/sec Loss 1.8578 LearningRate 0.0030 Epoch: 16 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:33,557-Speed 3421.43 samples/sec Loss 1.8495 LearningRate 0.0030 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:36,568-Speed 3401.50 samples/sec Loss 1.8855 LearningRate 0.0030 Epoch: 16 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:39,591-Speed 3389.28 samples/sec Loss 1.8487 LearningRate 0.0030 Epoch: 16 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:42,602-Speed 3401.29 samples/sec Loss 1.8005 LearningRate 0.0030 Epoch: 16 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:45,611-Speed 3403.58 samples/sec Loss 1.7680 LearningRate 0.0030 Epoch: 16 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:48,630-Speed 3393.53 samples/sec Loss 1.6806 LearningRate 0.0030 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:51,646-Speed 3395.24 samples/sec Loss 1.8730 LearningRate 0.0030 Epoch: 16 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:54,665-Speed 3393.61 samples/sec Loss 1.8398 LearningRate 0.0030 Epoch: 16 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:51:57,676-Speed 3400.92 samples/sec Loss 1.7808 LearningRate 0.0030 Epoch: 16 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:00,688-Speed 3400.19 samples/sec Loss 1.8100 LearningRate 0.0030 Epoch: 16 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:03,683-Speed 3420.71 samples/sec Loss 1.8586 LearningRate 0.0030 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:06,697-Speed 3399.51 samples/sec Loss 1.7905 LearningRate 0.0030 Epoch: 16 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:09,711-Speed 3398.58 samples/sec Loss 1.8553 LearningRate 0.0030 Epoch: 16 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:12,736-Speed 3385.96 samples/sec Loss 1.8889 LearningRate 0.0030 Epoch: 16 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:15,754-Speed 3393.37 samples/sec Loss 1.9040 LearningRate 0.0029 Epoch: 16 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:18,771-Speed 3395.59 samples/sec Loss 1.8695 LearningRate 0.0029 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:21,780-Speed 3403.97 samples/sec Loss 1.7880 LearningRate 0.0029 Epoch: 16 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:24,796-Speed 3396.86 samples/sec Loss 1.8086 LearningRate 0.0029 Epoch: 16 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:52:27,799-Speed 3410.61 samples/sec Loss 1.9282 LearningRate 0.0029 Epoch: 16 Global Step: 83830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:30,813-Speed 3398.88 samples/sec Loss 1.7861 LearningRate 0.0029 Epoch: 16 Global Step: 83840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:33,832-Speed 3392.94 samples/sec Loss 1.7643 LearningRate 0.0029 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:36,854-Speed 3389.59 samples/sec Loss 1.8074 LearningRate 0.0029 Epoch: 16 Global Step: 83860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:39,864-Speed 3402.38 samples/sec Loss 1.8058 LearningRate 0.0029 Epoch: 16 Global Step: 83870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:42,875-Speed 3402.53 samples/sec Loss 1.8769 LearningRate 0.0029 Epoch: 16 Global Step: 83880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:45,888-Speed 3399.09 samples/sec Loss 1.9446 LearningRate 0.0029 Epoch: 16 Global Step: 83890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:48,901-Speed 3399.69 samples/sec Loss 1.8490 LearningRate 0.0029 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:51,910-Speed 3403.53 samples/sec Loss 1.7801 LearningRate 0.0029 Epoch: 16 Global Step: 83910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:54,933-Speed 3388.87 samples/sec Loss 1.8804 LearningRate 0.0029 Epoch: 16 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:52:57,941-Speed 3404.22 samples/sec Loss 1.8770 LearningRate 0.0029 Epoch: 16 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:00,974-Speed 3377.44 samples/sec Loss 1.8208 LearningRate 0.0029 Epoch: 16 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:04,021-Speed 3361.31 samples/sec Loss 1.8107 LearningRate 0.0029 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:07,047-Speed 3385.33 samples/sec Loss 1.8254 LearningRate 0.0029 Epoch: 16 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:10,057-Speed 3403.67 samples/sec Loss 1.8543 LearningRate 0.0029 Epoch: 16 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:13,072-Speed 3396.25 samples/sec Loss 1.9197 LearningRate 0.0029 Epoch: 16 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:16,085-Speed 3399.40 samples/sec Loss 1.8200 LearningRate 0.0029 Epoch: 16 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:53:19,282-Speed 3204.35 samples/sec Loss 1.8953 LearningRate 0.0029 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:54:03,723-[lfw][84000]XNorm: 22.611818 Training: 2022-04-11 07:54:03,724-[lfw][84000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 07:54:03,724-[lfw][84000]Accuracy-Highest: 0.99850 Training: 2022-04-11 07:54:55,388-[cfp_fp][84000]XNorm: 21.961186 Training: 2022-04-11 07:54:55,389-[cfp_fp][84000]Accuracy-Flip: 0.98700+-0.00467 Training: 2022-04-11 07:54:55,390-[cfp_fp][84000]Accuracy-Highest: 0.98700 Training: 2022-04-11 07:55:39,864-[agedb_30][84000]XNorm: 22.608432 Training: 2022-04-11 07:55:39,865-[agedb_30][84000]Accuracy-Flip: 0.98317+-0.00701 Training: 2022-04-11 07:55:39,865-[agedb_30][84000]Accuracy-Highest: 0.98550 Training: 2022-04-11 07:55:42,874-Speed 71.31 samples/sec Loss 1.9562 LearningRate 0.0029 Epoch: 16 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:55:45,847-Speed 3445.79 samples/sec Loss 1.8300 LearningRate 0.0029 Epoch: 16 Global Step: 84020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:55:48,837-Speed 3424.86 samples/sec Loss 1.8387 LearningRate 0.0029 Epoch: 16 Global Step: 84030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:55:51,865-Speed 3382.43 samples/sec Loss 1.8359 LearningRate 0.0029 Epoch: 16 Global Step: 84040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:55:54,858-Speed 3422.91 samples/sec Loss 1.9368 LearningRate 0.0029 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:55:57,852-Speed 3420.78 samples/sec Loss 1.8980 LearningRate 0.0029 Epoch: 16 Global Step: 84060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:00,848-Speed 3419.59 samples/sec Loss 1.8513 LearningRate 0.0029 Epoch: 16 Global Step: 84070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:03,846-Speed 3416.15 samples/sec Loss 1.7621 LearningRate 0.0029 Epoch: 16 Global Step: 84080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:06,847-Speed 3412.65 samples/sec Loss 1.8263 LearningRate 0.0028 Epoch: 16 Global Step: 84090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:09,846-Speed 3415.79 samples/sec Loss 1.9068 LearningRate 0.0028 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:12,877-Speed 3379.82 samples/sec Loss 1.9127 LearningRate 0.0028 Epoch: 16 Global Step: 84110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:15,922-Speed 3363.45 samples/sec Loss 1.8204 LearningRate 0.0028 Epoch: 16 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:18,929-Speed 3405.68 samples/sec Loss 1.7560 LearningRate 0.0028 Epoch: 16 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:21,925-Speed 3419.83 samples/sec Loss 1.8469 LearningRate 0.0028 Epoch: 16 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:24,935-Speed 3402.85 samples/sec Loss 1.8663 LearningRate 0.0028 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:27,931-Speed 3418.47 samples/sec Loss 1.8293 LearningRate 0.0028 Epoch: 16 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:30,931-Speed 3413.97 samples/sec Loss 1.8527 LearningRate 0.0028 Epoch: 16 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:33,945-Speed 3397.84 samples/sec Loss 1.7530 LearningRate 0.0028 Epoch: 16 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:36,975-Speed 3381.05 samples/sec Loss 1.8990 LearningRate 0.0028 Epoch: 16 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:39,985-Speed 3402.72 samples/sec Loss 1.8801 LearningRate 0.0028 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:42,992-Speed 3405.85 samples/sec Loss 1.8532 LearningRate 0.0028 Epoch: 16 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:45,985-Speed 3422.21 samples/sec Loss 1.7954 LearningRate 0.0028 Epoch: 16 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:48,985-Speed 3414.78 samples/sec Loss 1.8956 LearningRate 0.0028 Epoch: 16 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:56:51,968-Speed 3434.99 samples/sec Loss 1.8938 LearningRate 0.0028 Epoch: 16 Global Step: 84240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:54,978-Speed 3402.89 samples/sec Loss 1.7867 LearningRate 0.0028 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:56:58,022-Speed 3364.37 samples/sec Loss 1.8663 LearningRate 0.0028 Epoch: 16 Global Step: 84260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:01,042-Speed 3392.00 samples/sec Loss 1.8337 LearningRate 0.0028 Epoch: 16 Global Step: 84270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:04,067-Speed 3385.89 samples/sec Loss 1.7923 LearningRate 0.0028 Epoch: 16 Global Step: 84280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:07,065-Speed 3417.03 samples/sec Loss 1.8043 LearningRate 0.0028 Epoch: 16 Global Step: 84290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:10,069-Speed 3408.83 samples/sec Loss 1.8107 LearningRate 0.0028 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:13,085-Speed 3396.37 samples/sec Loss 1.8195 LearningRate 0.0028 Epoch: 16 Global Step: 84310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:16,085-Speed 3413.82 samples/sec Loss 1.8392 LearningRate 0.0028 Epoch: 16 Global Step: 84320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:19,147-Speed 3345.56 samples/sec Loss 1.9164 LearningRate 0.0028 Epoch: 16 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:22,150-Speed 3410.84 samples/sec Loss 1.9111 LearningRate 0.0028 Epoch: 16 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:25,153-Speed 3411.55 samples/sec Loss 1.8107 LearningRate 0.0028 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:28,177-Speed 3386.55 samples/sec Loss 1.7837 LearningRate 0.0028 Epoch: 16 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:31,177-Speed 3414.22 samples/sec Loss 1.8463 LearningRate 0.0028 Epoch: 16 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:34,177-Speed 3414.36 samples/sec Loss 1.8686 LearningRate 0.0028 Epoch: 16 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:37,184-Speed 3406.64 samples/sec Loss 1.8089 LearningRate 0.0027 Epoch: 16 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:40,190-Speed 3407.15 samples/sec Loss 1.7618 LearningRate 0.0027 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:43,191-Speed 3412.60 samples/sec Loss 1.8337 LearningRate 0.0027 Epoch: 16 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:46,191-Speed 3414.55 samples/sec Loss 1.8730 LearningRate 0.0027 Epoch: 16 Global Step: 84420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:57:49,174-Speed 3433.52 samples/sec Loss 1.8896 LearningRate 0.0027 Epoch: 16 Global Step: 84430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:52,240-Speed 3340.67 samples/sec Loss 1.8221 LearningRate 0.0027 Epoch: 16 Global Step: 84440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:55,244-Speed 3410.10 samples/sec Loss 1.8993 LearningRate 0.0027 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:57:58,246-Speed 3411.43 samples/sec Loss 1.8734 LearningRate 0.0027 Epoch: 16 Global Step: 84460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:01,258-Speed 3400.43 samples/sec Loss 1.8239 LearningRate 0.0027 Epoch: 16 Global Step: 84470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:04,259-Speed 3413.56 samples/sec Loss 1.8692 LearningRate 0.0027 Epoch: 16 Global Step: 84480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:07,264-Speed 3408.76 samples/sec Loss 1.7531 LearningRate 0.0027 Epoch: 16 Global Step: 84490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:10,295-Speed 3378.36 samples/sec Loss 1.8293 LearningRate 0.0027 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:13,315-Speed 3392.35 samples/sec Loss 1.9230 LearningRate 0.0027 Epoch: 16 Global Step: 84510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:16,331-Speed 3396.14 samples/sec Loss 1.7646 LearningRate 0.0027 Epoch: 16 Global Step: 84520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 07:58:19,333-Speed 3411.88 samples/sec Loss 1.8452 LearningRate 0.0027 Epoch: 16 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:22,335-Speed 3412.14 samples/sec Loss 1.9242 LearningRate 0.0027 Epoch: 16 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:25,341-Speed 3407.13 samples/sec Loss 1.7783 LearningRate 0.0027 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:28,347-Speed 3406.92 samples/sec Loss 1.8913 LearningRate 0.0027 Epoch: 16 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:31,350-Speed 3411.24 samples/sec Loss 1.8228 LearningRate 0.0027 Epoch: 16 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:34,355-Speed 3408.38 samples/sec Loss 1.8089 LearningRate 0.0027 Epoch: 16 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:37,359-Speed 3409.62 samples/sec Loss 1.8070 LearningRate 0.0027 Epoch: 16 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:40,375-Speed 3396.54 samples/sec Loss 1.9137 LearningRate 0.0027 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:43,382-Speed 3405.41 samples/sec Loss 1.7663 LearningRate 0.0027 Epoch: 16 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:46,390-Speed 3406.40 samples/sec Loss 1.9247 LearningRate 0.0027 Epoch: 16 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:49,384-Speed 3420.44 samples/sec Loss 1.8444 LearningRate 0.0027 Epoch: 16 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:52,390-Speed 3407.90 samples/sec Loss 1.9531 LearningRate 0.0027 Epoch: 16 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:55,401-Speed 3402.08 samples/sec Loss 1.8201 LearningRate 0.0027 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:58:58,409-Speed 3404.26 samples/sec Loss 1.8310 LearningRate 0.0027 Epoch: 16 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:01,416-Speed 3406.31 samples/sec Loss 1.8178 LearningRate 0.0027 Epoch: 16 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:04,526-Speed 3293.57 samples/sec Loss 1.8038 LearningRate 0.0027 Epoch: 16 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:07,542-Speed 3396.26 samples/sec Loss 1.8189 LearningRate 0.0027 Epoch: 16 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:10,559-Speed 3394.82 samples/sec Loss 1.7945 LearningRate 0.0026 Epoch: 16 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:13,565-Speed 3407.58 samples/sec Loss 1.7822 LearningRate 0.0026 Epoch: 16 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:16,571-Speed 3407.59 samples/sec Loss 1.8096 LearningRate 0.0026 Epoch: 16 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:19,579-Speed 3404.59 samples/sec Loss 1.8141 LearningRate 0.0026 Epoch: 16 Global Step: 84730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 07:59:22,568-Speed 3427.48 samples/sec Loss 1.7775 LearningRate 0.0026 Epoch: 16 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:25,585-Speed 3394.31 samples/sec Loss 1.7637 LearningRate 0.0026 Epoch: 16 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:28,590-Speed 3409.21 samples/sec Loss 1.7850 LearningRate 0.0026 Epoch: 16 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:31,597-Speed 3405.93 samples/sec Loss 1.8710 LearningRate 0.0026 Epoch: 16 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:34,605-Speed 3405.49 samples/sec Loss 1.8651 LearningRate 0.0026 Epoch: 16 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:37,609-Speed 3409.07 samples/sec Loss 1.8176 LearningRate 0.0026 Epoch: 16 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:40,621-Speed 3401.11 samples/sec Loss 1.7747 LearningRate 0.0026 Epoch: 16 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:43,638-Speed 3394.77 samples/sec Loss 1.9011 LearningRate 0.0026 Epoch: 16 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:46,646-Speed 3405.23 samples/sec Loss 1.8098 LearningRate 0.0026 Epoch: 16 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:49,655-Speed 3404.58 samples/sec Loss 1.8826 LearningRate 0.0026 Epoch: 16 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:52,652-Speed 3417.41 samples/sec Loss 1.7935 LearningRate 0.0026 Epoch: 16 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:55,666-Speed 3398.18 samples/sec Loss 1.8022 LearningRate 0.0026 Epoch: 16 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 07:59:58,678-Speed 3399.97 samples/sec Loss 1.8988 LearningRate 0.0026 Epoch: 16 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:01,697-Speed 3393.24 samples/sec Loss 1.9091 LearningRate 0.0026 Epoch: 16 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:04,703-Speed 3406.81 samples/sec Loss 1.8506 LearningRate 0.0026 Epoch: 16 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:07,719-Speed 3396.96 samples/sec Loss 1.7569 LearningRate 0.0026 Epoch: 16 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:10,746-Speed 3383.47 samples/sec Loss 1.8090 LearningRate 0.0026 Epoch: 16 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:13,765-Speed 3392.96 samples/sec Loss 1.8228 LearningRate 0.0026 Epoch: 16 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:16,778-Speed 3400.25 samples/sec Loss 1.9219 LearningRate 0.0026 Epoch: 16 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:19,807-Speed 3381.27 samples/sec Loss 1.9767 LearningRate 0.0026 Epoch: 16 Global Step: 84930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:22,820-Speed 3399.67 samples/sec Loss 1.7592 LearningRate 0.0026 Epoch: 16 Global Step: 84940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:25,826-Speed 3407.21 samples/sec Loss 1.7936 LearningRate 0.0026 Epoch: 16 Global Step: 84950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:28,832-Speed 3407.59 samples/sec Loss 1.8628 LearningRate 0.0026 Epoch: 16 Global Step: 84960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:31,843-Speed 3401.99 samples/sec Loss 1.8073 LearningRate 0.0026 Epoch: 16 Global Step: 84970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:34,852-Speed 3403.40 samples/sec Loss 1.9060 LearningRate 0.0026 Epoch: 16 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:37,881-Speed 3382.08 samples/sec Loss 1.8312 LearningRate 0.0026 Epoch: 16 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:40,904-Speed 3387.65 samples/sec Loss 1.8380 LearningRate 0.0026 Epoch: 16 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:43,916-Speed 3401.06 samples/sec Loss 1.8099 LearningRate 0.0025 Epoch: 16 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:46,922-Speed 3407.50 samples/sec Loss 1.8915 LearningRate 0.0025 Epoch: 16 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:49,937-Speed 3397.47 samples/sec Loss 1.8006 LearningRate 0.0025 Epoch: 16 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:52,955-Speed 3393.91 samples/sec Loss 1.7782 LearningRate 0.0025 Epoch: 16 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:00:55,947-Speed 3422.51 samples/sec Loss 1.7561 LearningRate 0.0025 Epoch: 16 Global Step: 85050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:00:58,992-Speed 3364.48 samples/sec Loss 1.7097 LearningRate 0.0025 Epoch: 16 Global Step: 85060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:02,039-Speed 3361.04 samples/sec Loss 1.7620 LearningRate 0.0025 Epoch: 16 Global Step: 85070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:05,055-Speed 3396.55 samples/sec Loss 1.8325 LearningRate 0.0025 Epoch: 16 Global Step: 85080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:08,075-Speed 3390.92 samples/sec Loss 1.8693 LearningRate 0.0025 Epoch: 16 Global Step: 85090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:11,084-Speed 3404.13 samples/sec Loss 1.8309 LearningRate 0.0025 Epoch: 16 Global Step: 85100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:14,102-Speed 3393.92 samples/sec Loss 1.9143 LearningRate 0.0025 Epoch: 16 Global Step: 85110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:17,114-Speed 3401.18 samples/sec Loss 1.7583 LearningRate 0.0025 Epoch: 16 Global Step: 85120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:20,124-Speed 3402.67 samples/sec Loss 1.9557 LearningRate 0.0025 Epoch: 16 Global Step: 85130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:23,134-Speed 3403.00 samples/sec Loss 1.7685 LearningRate 0.0025 Epoch: 16 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:01:26,153-Speed 3392.97 samples/sec Loss 1.8198 LearningRate 0.0025 Epoch: 16 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:29,171-Speed 3393.19 samples/sec Loss 1.9153 LearningRate 0.0025 Epoch: 16 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:32,262-Speed 3314.17 samples/sec Loss 1.8861 LearningRate 0.0025 Epoch: 16 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:35,272-Speed 3403.00 samples/sec Loss 1.8891 LearningRate 0.0025 Epoch: 16 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:38,306-Speed 3375.94 samples/sec Loss 1.7173 LearningRate 0.0025 Epoch: 16 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:41,332-Speed 3384.90 samples/sec Loss 1.8140 LearningRate 0.0025 Epoch: 16 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:44,348-Speed 3396.11 samples/sec Loss 1.6706 LearningRate 0.0025 Epoch: 16 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:47,361-Speed 3399.82 samples/sec Loss 1.8195 LearningRate 0.0025 Epoch: 16 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:50,377-Speed 3395.88 samples/sec Loss 1.8898 LearningRate 0.0025 Epoch: 16 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:53,389-Speed 3400.35 samples/sec Loss 1.7853 LearningRate 0.0025 Epoch: 16 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:56,378-Speed 3426.83 samples/sec Loss 1.8237 LearningRate 0.0025 Epoch: 16 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:01:59,387-Speed 3403.78 samples/sec Loss 1.8837 LearningRate 0.0025 Epoch: 16 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:02:02,428-Speed 3368.47 samples/sec Loss 1.8539 LearningRate 0.0025 Epoch: 16 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:02:05,436-Speed 3405.13 samples/sec Loss 1.7916 LearningRate 0.0025 Epoch: 16 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:02:08,444-Speed 3406.69 samples/sec Loss 1.7877 LearningRate 0.0025 Epoch: 16 Global Step: 85290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:11,458-Speed 3397.76 samples/sec Loss 1.8174 LearningRate 0.0025 Epoch: 16 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:14,483-Speed 3386.83 samples/sec Loss 1.8287 LearningRate 0.0025 Epoch: 16 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:17,496-Speed 3399.73 samples/sec Loss 1.7634 LearningRate 0.0025 Epoch: 16 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:20,527-Speed 3379.78 samples/sec Loss 1.7581 LearningRate 0.0024 Epoch: 16 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:23,549-Speed 3389.27 samples/sec Loss 1.7792 LearningRate 0.0024 Epoch: 16 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:26,568-Speed 3393.37 samples/sec Loss 1.8527 LearningRate 0.0024 Epoch: 16 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:29,585-Speed 3394.76 samples/sec Loss 1.7556 LearningRate 0.0024 Epoch: 16 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:32,604-Speed 3393.81 samples/sec Loss 1.8478 LearningRate 0.0024 Epoch: 16 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:35,621-Speed 3394.24 samples/sec Loss 1.8087 LearningRate 0.0024 Epoch: 16 Global Step: 85380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:38,635-Speed 3398.60 samples/sec Loss 1.8817 LearningRate 0.0024 Epoch: 16 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:02:41,668-Speed 3377.63 samples/sec Loss 1.8493 LearningRate 0.0024 Epoch: 16 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:02:44,660-Speed 3423.17 samples/sec Loss 1.7755 LearningRate 0.0024 Epoch: 16 Global Step: 85410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:47,717-Speed 3350.45 samples/sec Loss 1.8410 LearningRate 0.0024 Epoch: 16 Global Step: 85420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:50,741-Speed 3387.56 samples/sec Loss 1.8436 LearningRate 0.0024 Epoch: 16 Global Step: 85430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:53,765-Speed 3386.29 samples/sec Loss 1.9278 LearningRate 0.0024 Epoch: 16 Global Step: 85440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:56,797-Speed 3378.18 samples/sec Loss 1.8696 LearningRate 0.0024 Epoch: 16 Global Step: 85450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:02:59,807-Speed 3402.81 samples/sec Loss 1.8607 LearningRate 0.0024 Epoch: 16 Global Step: 85460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:02,819-Speed 3400.87 samples/sec Loss 1.8462 LearningRate 0.0024 Epoch: 16 Global Step: 85470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:05,836-Speed 3396.14 samples/sec Loss 1.9611 LearningRate 0.0024 Epoch: 16 Global Step: 85480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:08,846-Speed 3402.54 samples/sec Loss 1.9216 LearningRate 0.0024 Epoch: 16 Global Step: 85490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:11,856-Speed 3403.68 samples/sec Loss 1.8765 LearningRate 0.0024 Epoch: 16 Global Step: 85500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:14,879-Speed 3388.09 samples/sec Loss 1.8258 LearningRate 0.0024 Epoch: 16 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:17,889-Speed 3402.44 samples/sec Loss 1.8765 LearningRate 0.0024 Epoch: 16 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:20,914-Speed 3386.33 samples/sec Loss 1.8197 LearningRate 0.0024 Epoch: 16 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:23,931-Speed 3394.90 samples/sec Loss 1.8476 LearningRate 0.0024 Epoch: 16 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:26,945-Speed 3398.26 samples/sec Loss 1.8278 LearningRate 0.0024 Epoch: 16 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:29,987-Speed 3367.65 samples/sec Loss 1.8911 LearningRate 0.0024 Epoch: 16 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:33,015-Speed 3383.06 samples/sec Loss 1.7732 LearningRate 0.0024 Epoch: 16 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:36,030-Speed 3396.27 samples/sec Loss 1.8487 LearningRate 0.0024 Epoch: 16 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:39,046-Speed 3396.78 samples/sec Loss 1.7677 LearningRate 0.0024 Epoch: 16 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:42,098-Speed 3355.78 samples/sec Loss 1.7714 LearningRate 0.0024 Epoch: 16 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:03:45,080-Speed 3435.06 samples/sec Loss 1.7634 LearningRate 0.0024 Epoch: 16 Global Step: 85610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:48,097-Speed 3394.73 samples/sec Loss 1.8270 LearningRate 0.0024 Epoch: 16 Global Step: 85620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:51,133-Speed 3373.43 samples/sec Loss 1.9146 LearningRate 0.0024 Epoch: 16 Global Step: 85630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:54,151-Speed 3394.01 samples/sec Loss 1.7655 LearningRate 0.0024 Epoch: 16 Global Step: 85640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:03:57,162-Speed 3401.80 samples/sec Loss 1.8461 LearningRate 0.0024 Epoch: 16 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:00,176-Speed 3398.26 samples/sec Loss 1.8652 LearningRate 0.0023 Epoch: 16 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:03,187-Speed 3401.83 samples/sec Loss 1.8099 LearningRate 0.0023 Epoch: 16 Global Step: 85670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:06,207-Speed 3391.84 samples/sec Loss 1.6763 LearningRate 0.0023 Epoch: 16 Global Step: 85680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:09,227-Speed 3391.23 samples/sec Loss 1.7919 LearningRate 0.0023 Epoch: 16 Global Step: 85690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:12,286-Speed 3349.35 samples/sec Loss 1.6663 LearningRate 0.0023 Epoch: 16 Global Step: 85700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:15,310-Speed 3386.87 samples/sec Loss 1.8255 LearningRate 0.0023 Epoch: 16 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:04:18,328-Speed 3393.94 samples/sec Loss 1.8746 LearningRate 0.0023 Epoch: 16 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:04:21,349-Speed 3390.38 samples/sec Loss 1.8211 LearningRate 0.0023 Epoch: 16 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:04:24,342-Speed 3422.26 samples/sec Loss 1.7989 LearningRate 0.0023 Epoch: 16 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:27,352-Speed 3403.27 samples/sec Loss 1.7329 LearningRate 0.0023 Epoch: 16 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:30,358-Speed 3406.74 samples/sec Loss 1.8250 LearningRate 0.0023 Epoch: 16 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:33,371-Speed 3399.61 samples/sec Loss 1.8176 LearningRate 0.0023 Epoch: 16 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:36,409-Speed 3372.00 samples/sec Loss 1.7029 LearningRate 0.0023 Epoch: 16 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:39,454-Speed 3362.87 samples/sec Loss 1.7959 LearningRate 0.0023 Epoch: 16 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:42,470-Speed 3396.66 samples/sec Loss 1.8635 LearningRate 0.0023 Epoch: 16 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:45,489-Speed 3392.39 samples/sec Loss 1.7841 LearningRate 0.0023 Epoch: 16 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:48,498-Speed 3404.43 samples/sec Loss 1.8446 LearningRate 0.0023 Epoch: 16 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:51,508-Speed 3402.76 samples/sec Loss 1.8712 LearningRate 0.0023 Epoch: 16 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:04:54,529-Speed 3391.25 samples/sec Loss 1.9147 LearningRate 0.0023 Epoch: 16 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:04:57,545-Speed 3395.51 samples/sec Loss 1.8327 LearningRate 0.0023 Epoch: 16 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:05:00,542-Speed 3417.63 samples/sec Loss 1.8997 LearningRate 0.0023 Epoch: 16 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:03,560-Speed 3393.70 samples/sec Loss 1.7949 LearningRate 0.0023 Epoch: 16 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:06,582-Speed 3389.97 samples/sec Loss 1.7999 LearningRate 0.0023 Epoch: 16 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:09,624-Speed 3366.84 samples/sec Loss 1.8072 LearningRate 0.0023 Epoch: 16 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:12,638-Speed 3398.87 samples/sec Loss 1.8078 LearningRate 0.0023 Epoch: 16 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:15,669-Speed 3378.31 samples/sec Loss 1.9557 LearningRate 0.0023 Epoch: 16 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:18,688-Speed 3393.93 samples/sec Loss 1.8154 LearningRate 0.0023 Epoch: 16 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:21,704-Speed 3396.15 samples/sec Loss 1.8146 LearningRate 0.0023 Epoch: 16 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:24,739-Speed 3374.82 samples/sec Loss 1.7767 LearningRate 0.0023 Epoch: 16 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:27,760-Speed 3390.44 samples/sec Loss 1.8342 LearningRate 0.0023 Epoch: 16 Global Step: 85950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:05:30,777-Speed 3393.98 samples/sec Loss 1.8229 LearningRate 0.0023 Epoch: 16 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:05:33,791-Speed 3398.71 samples/sec Loss 1.7763 LearningRate 0.0023 Epoch: 16 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:05:36,883-Speed 3312.55 samples/sec Loss 1.9842 LearningRate 0.0023 Epoch: 16 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:05:50,056-Speed 777.46 samples/sec Loss 1.6229 LearningRate 0.0022 Epoch: 17 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:05:53,147-Speed 3313.65 samples/sec Loss 1.3550 LearningRate 0.0022 Epoch: 17 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:06:37,323-[lfw][86000]XNorm: 22.082740 Training: 2022-04-11 08:06:37,324-[lfw][86000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:06:37,324-[lfw][86000]Accuracy-Highest: 0.99850 Training: 2022-04-11 08:07:28,876-[cfp_fp][86000]XNorm: 21.868317 Training: 2022-04-11 08:07:28,877-[cfp_fp][86000]Accuracy-Flip: 0.98786+-0.00496 Training: 2022-04-11 08:07:28,878-[cfp_fp][86000]Accuracy-Highest: 0.98786 Training: 2022-04-11 08:08:13,287-[agedb_30][86000]XNorm: 22.396077 Training: 2022-04-11 08:08:13,288-[agedb_30][86000]Accuracy-Flip: 0.98450+-0.00753 Training: 2022-04-11 08:08:13,289-[agedb_30][86000]Accuracy-Highest: 0.98550 Training: 2022-04-11 08:08:16,371-Speed 71.50 samples/sec Loss 1.3132 LearningRate 0.0022 Epoch: 17 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:19,376-Speed 3408.50 samples/sec Loss 1.2478 LearningRate 0.0022 Epoch: 17 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:22,366-Speed 3426.75 samples/sec Loss 1.2685 LearningRate 0.0022 Epoch: 17 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:25,389-Speed 3387.53 samples/sec Loss 1.1868 LearningRate 0.0022 Epoch: 17 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:29,001-Speed 2835.86 samples/sec Loss 1.3474 LearningRate 0.0022 Epoch: 17 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:32,011-Speed 3403.46 samples/sec Loss 1.2943 LearningRate 0.0022 Epoch: 17 Global Step: 86060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 08:08:35,526-Speed 2914.07 samples/sec Loss 1.2886 LearningRate 0.0022 Epoch: 17 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:38,570-Speed 3364.11 samples/sec Loss 1.3543 LearningRate 0.0022 Epoch: 17 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:41,580-Speed 3404.19 samples/sec Loss 1.3051 LearningRate 0.0022 Epoch: 17 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:08:44,573-Speed 3422.83 samples/sec Loss 1.2681 LearningRate 0.0022 Epoch: 17 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:08:47,572-Speed 3415.72 samples/sec Loss 1.2277 LearningRate 0.0022 Epoch: 17 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:08:50,588-Speed 3397.04 samples/sec Loss 1.2770 LearningRate 0.0022 Epoch: 17 Global Step: 86120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:08:53,593-Speed 3408.29 samples/sec Loss 1.3078 LearningRate 0.0022 Epoch: 17 Global Step: 86130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:08:56,595-Speed 3411.92 samples/sec Loss 1.3292 LearningRate 0.0022 Epoch: 17 Global Step: 86140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:08:59,614-Speed 3393.30 samples/sec Loss 1.2963 LearningRate 0.0022 Epoch: 17 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:09:02,661-Speed 3362.30 samples/sec Loss 1.2971 LearningRate 0.0022 Epoch: 17 Global Step: 86160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:09:05,673-Speed 3401.93 samples/sec Loss 1.2171 LearningRate 0.0022 Epoch: 17 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:09:08,723-Speed 3358.48 samples/sec Loss 1.2775 LearningRate 0.0022 Epoch: 17 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:09:11,729-Speed 3407.95 samples/sec Loss 1.3205 LearningRate 0.0022 Epoch: 17 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:09:14,736-Speed 3406.38 samples/sec Loss 1.3024 LearningRate 0.0022 Epoch: 17 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:17,745-Speed 3404.09 samples/sec Loss 1.2389 LearningRate 0.0022 Epoch: 17 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:20,757-Speed 3401.70 samples/sec Loss 1.2635 LearningRate 0.0022 Epoch: 17 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:23,771-Speed 3398.31 samples/sec Loss 1.3030 LearningRate 0.0022 Epoch: 17 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:26,808-Speed 3372.51 samples/sec Loss 1.3868 LearningRate 0.0022 Epoch: 17 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:29,834-Speed 3384.73 samples/sec Loss 1.4039 LearningRate 0.0022 Epoch: 17 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:32,845-Speed 3402.47 samples/sec Loss 1.3311 LearningRate 0.0022 Epoch: 17 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:35,848-Speed 3410.97 samples/sec Loss 1.2779 LearningRate 0.0022 Epoch: 17 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:38,894-Speed 3361.67 samples/sec Loss 1.2922 LearningRate 0.0022 Epoch: 17 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:41,964-Speed 3336.71 samples/sec Loss 1.3333 LearningRate 0.0022 Epoch: 17 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:44,967-Speed 3411.70 samples/sec Loss 1.3580 LearningRate 0.0022 Epoch: 17 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:47,985-Speed 3393.99 samples/sec Loss 1.3322 LearningRate 0.0022 Epoch: 17 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:51,037-Speed 3356.64 samples/sec Loss 1.3342 LearningRate 0.0022 Epoch: 17 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:54,042-Speed 3408.71 samples/sec Loss 1.3074 LearningRate 0.0021 Epoch: 17 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:09:57,053-Speed 3401.51 samples/sec Loss 1.2837 LearningRate 0.0021 Epoch: 17 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:00,056-Speed 3411.53 samples/sec Loss 1.2803 LearningRate 0.0021 Epoch: 17 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:03,058-Speed 3411.66 samples/sec Loss 1.3291 LearningRate 0.0021 Epoch: 17 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:06,065-Speed 3406.74 samples/sec Loss 1.2802 LearningRate 0.0021 Epoch: 17 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:09,093-Speed 3382.76 samples/sec Loss 1.2803 LearningRate 0.0021 Epoch: 17 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:12,077-Speed 3432.08 samples/sec Loss 1.3213 LearningRate 0.0021 Epoch: 17 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:15,097-Speed 3391.66 samples/sec Loss 1.3111 LearningRate 0.0021 Epoch: 17 Global Step: 86400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:18,116-Speed 3393.72 samples/sec Loss 1.2683 LearningRate 0.0021 Epoch: 17 Global Step: 86410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:21,127-Speed 3402.34 samples/sec Loss 1.3173 LearningRate 0.0021 Epoch: 17 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:24,137-Speed 3402.65 samples/sec Loss 1.3268 LearningRate 0.0021 Epoch: 17 Global Step: 86430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:27,169-Speed 3377.81 samples/sec Loss 1.2506 LearningRate 0.0021 Epoch: 17 Global Step: 86440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:30,550-Speed 3030.17 samples/sec Loss 1.3263 LearningRate 0.0021 Epoch: 17 Global Step: 86450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:33,553-Speed 3410.51 samples/sec Loss 1.4059 LearningRate 0.0021 Epoch: 17 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:36,559-Speed 3407.85 samples/sec Loss 1.2933 LearningRate 0.0021 Epoch: 17 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:39,561-Speed 3411.98 samples/sec Loss 1.3506 LearningRate 0.0021 Epoch: 17 Global Step: 86480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:10:42,568-Speed 3405.60 samples/sec Loss 1.3429 LearningRate 0.0021 Epoch: 17 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:45,588-Speed 3392.95 samples/sec Loss 1.3898 LearningRate 0.0021 Epoch: 17 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:48,589-Speed 3412.74 samples/sec Loss 1.3099 LearningRate 0.0021 Epoch: 17 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:51,600-Speed 3400.81 samples/sec Loss 1.4323 LearningRate 0.0021 Epoch: 17 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:54,691-Speed 3314.42 samples/sec Loss 1.3981 LearningRate 0.0021 Epoch: 17 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 08:10:57,679-Speed 3428.19 samples/sec Loss 1.3257 LearningRate 0.0021 Epoch: 17 Global Step: 86540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:00,755-Speed 3330.40 samples/sec Loss 1.3525 LearningRate 0.0021 Epoch: 17 Global Step: 86550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:03,784-Speed 3381.57 samples/sec Loss 1.3742 LearningRate 0.0021 Epoch: 17 Global Step: 86560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:06,846-Speed 3344.76 samples/sec Loss 1.3812 LearningRate 0.0021 Epoch: 17 Global Step: 86570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:09,857-Speed 3402.52 samples/sec Loss 1.4278 LearningRate 0.0021 Epoch: 17 Global Step: 86580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:12,864-Speed 3406.40 samples/sec Loss 1.2716 LearningRate 0.0021 Epoch: 17 Global Step: 86590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:15,882-Speed 3393.26 samples/sec Loss 1.3500 LearningRate 0.0021 Epoch: 17 Global Step: 86600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:18,911-Speed 3381.95 samples/sec Loss 1.3516 LearningRate 0.0021 Epoch: 17 Global Step: 86610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:21,921-Speed 3402.16 samples/sec Loss 1.3884 LearningRate 0.0021 Epoch: 17 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:24,947-Speed 3385.63 samples/sec Loss 1.3564 LearningRate 0.0021 Epoch: 17 Global Step: 86630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 08:11:27,956-Speed 3403.58 samples/sec Loss 1.2876 LearningRate 0.0021 Epoch: 17 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:11:30,984-Speed 3383.00 samples/sec Loss 1.2810 LearningRate 0.0021 Epoch: 17 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:11:33,997-Speed 3400.15 samples/sec Loss 1.3444 LearningRate 0.0021 Epoch: 17 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:11:37,014-Speed 3395.15 samples/sec Loss 1.3981 LearningRate 0.0021 Epoch: 17 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:11:40,027-Speed 3398.81 samples/sec Loss 1.4401 LearningRate 0.0020 Epoch: 17 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:11:43,012-Speed 3432.14 samples/sec Loss 1.3868 LearningRate 0.0020 Epoch: 17 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:11:46,051-Speed 3370.26 samples/sec Loss 1.3595 LearningRate 0.0020 Epoch: 17 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:11:49,057-Speed 3407.29 samples/sec Loss 1.3439 LearningRate 0.0020 Epoch: 17 Global Step: 86710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:11:52,072-Speed 3397.48 samples/sec Loss 1.4099 LearningRate 0.0020 Epoch: 17 Global Step: 86720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:11:55,078-Speed 3407.86 samples/sec Loss 1.4140 LearningRate 0.0020 Epoch: 17 Global Step: 86730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:11:58,094-Speed 3396.01 samples/sec Loss 1.3309 LearningRate 0.0020 Epoch: 17 Global Step: 86740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:01,099-Speed 3408.50 samples/sec Loss 1.3015 LearningRate 0.0020 Epoch: 17 Global Step: 86750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:04,113-Speed 3397.59 samples/sec Loss 1.4280 LearningRate 0.0020 Epoch: 17 Global Step: 86760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:07,119-Speed 3408.14 samples/sec Loss 1.3904 LearningRate 0.0020 Epoch: 17 Global Step: 86770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:10,124-Speed 3409.13 samples/sec Loss 1.3533 LearningRate 0.0020 Epoch: 17 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:13,140-Speed 3396.30 samples/sec Loss 1.3416 LearningRate 0.0020 Epoch: 17 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:16,187-Speed 3361.00 samples/sec Loss 1.3466 LearningRate 0.0020 Epoch: 17 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:19,241-Speed 3353.70 samples/sec Loss 1.3433 LearningRate 0.0020 Epoch: 17 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:22,251-Speed 3403.05 samples/sec Loss 1.3354 LearningRate 0.0020 Epoch: 17 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:25,258-Speed 3406.73 samples/sec Loss 1.4095 LearningRate 0.0020 Epoch: 17 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:28,274-Speed 3395.40 samples/sec Loss 1.3564 LearningRate 0.0020 Epoch: 17 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:31,292-Speed 3394.71 samples/sec Loss 1.4004 LearningRate 0.0020 Epoch: 17 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:12:34,279-Speed 3429.39 samples/sec Loss 1.3173 LearningRate 0.0020 Epoch: 17 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:37,294-Speed 3396.14 samples/sec Loss 1.3391 LearningRate 0.0020 Epoch: 17 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:40,306-Speed 3400.93 samples/sec Loss 1.4048 LearningRate 0.0020 Epoch: 17 Global Step: 86880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:43,312-Speed 3407.75 samples/sec Loss 1.3410 LearningRate 0.0020 Epoch: 17 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:46,319-Speed 3407.04 samples/sec Loss 1.2538 LearningRate 0.0020 Epoch: 17 Global Step: 86900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:49,327-Speed 3404.24 samples/sec Loss 1.2821 LearningRate 0.0020 Epoch: 17 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:52,344-Speed 3395.87 samples/sec Loss 1.3086 LearningRate 0.0020 Epoch: 17 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:55,353-Speed 3403.80 samples/sec Loss 1.3632 LearningRate 0.0020 Epoch: 17 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:12:58,373-Speed 3391.52 samples/sec Loss 1.2974 LearningRate 0.0020 Epoch: 17 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:13:01,435-Speed 3344.94 samples/sec Loss 1.3711 LearningRate 0.0020 Epoch: 17 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:13:04,480-Speed 3363.83 samples/sec Loss 1.3655 LearningRate 0.0020 Epoch: 17 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:07,495-Speed 3397.20 samples/sec Loss 1.3561 LearningRate 0.0020 Epoch: 17 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:10,497-Speed 3412.24 samples/sec Loss 1.4256 LearningRate 0.0020 Epoch: 17 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:13,510-Speed 3398.76 samples/sec Loss 1.3559 LearningRate 0.0020 Epoch: 17 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:16,522-Speed 3401.63 samples/sec Loss 1.2922 LearningRate 0.0020 Epoch: 17 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:19,563-Speed 3368.65 samples/sec Loss 1.3925 LearningRate 0.0020 Epoch: 17 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:22,570-Speed 3405.92 samples/sec Loss 1.3844 LearningRate 0.0020 Epoch: 17 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:25,623-Speed 3355.13 samples/sec Loss 1.3194 LearningRate 0.0020 Epoch: 17 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:28,695-Speed 3334.07 samples/sec Loss 1.3376 LearningRate 0.0019 Epoch: 17 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:31,701-Speed 3407.96 samples/sec Loss 1.4134 LearningRate 0.0019 Epoch: 17 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:34,686-Speed 3430.90 samples/sec Loss 1.3840 LearningRate 0.0019 Epoch: 17 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:37,707-Speed 3391.25 samples/sec Loss 1.3851 LearningRate 0.0019 Epoch: 17 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:40,714-Speed 3406.37 samples/sec Loss 1.3382 LearningRate 0.0019 Epoch: 17 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:43,721-Speed 3405.84 samples/sec Loss 1.3619 LearningRate 0.0019 Epoch: 17 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:46,735-Speed 3398.56 samples/sec Loss 1.3028 LearningRate 0.0019 Epoch: 17 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:49,747-Speed 3401.17 samples/sec Loss 1.2934 LearningRate 0.0019 Epoch: 17 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:52,816-Speed 3336.83 samples/sec Loss 1.3699 LearningRate 0.0019 Epoch: 17 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:55,825-Speed 3404.29 samples/sec Loss 1.4354 LearningRate 0.0019 Epoch: 17 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:13:58,834-Speed 3403.31 samples/sec Loss 1.3166 LearningRate 0.0019 Epoch: 17 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:01,842-Speed 3405.51 samples/sec Loss 1.3472 LearningRate 0.0019 Epoch: 17 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:04,846-Speed 3409.97 samples/sec Loss 1.3641 LearningRate 0.0019 Epoch: 17 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:07,855-Speed 3404.52 samples/sec Loss 1.3667 LearningRate 0.0019 Epoch: 17 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:10,868-Speed 3399.71 samples/sec Loss 1.3997 LearningRate 0.0019 Epoch: 17 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:13,878-Speed 3402.66 samples/sec Loss 1.2850 LearningRate 0.0019 Epoch: 17 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:16,887-Speed 3403.46 samples/sec Loss 1.3666 LearningRate 0.0019 Epoch: 17 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:19,958-Speed 3336.85 samples/sec Loss 1.3467 LearningRate 0.0019 Epoch: 17 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:22,951-Speed 3421.76 samples/sec Loss 1.3833 LearningRate 0.0019 Epoch: 17 Global Step: 87220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:25,959-Speed 3405.51 samples/sec Loss 1.3650 LearningRate 0.0019 Epoch: 17 Global Step: 87230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:28,972-Speed 3399.66 samples/sec Loss 1.3454 LearningRate 0.0019 Epoch: 17 Global Step: 87240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:31,983-Speed 3401.38 samples/sec Loss 1.2597 LearningRate 0.0019 Epoch: 17 Global Step: 87250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:34,990-Speed 3407.12 samples/sec Loss 1.4138 LearningRate 0.0019 Epoch: 17 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:38,002-Speed 3400.62 samples/sec Loss 1.3497 LearningRate 0.0019 Epoch: 17 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:41,038-Speed 3374.35 samples/sec Loss 1.3620 LearningRate 0.0019 Epoch: 17 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:44,065-Speed 3383.71 samples/sec Loss 1.3031 LearningRate 0.0019 Epoch: 17 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:47,088-Speed 3387.77 samples/sec Loss 1.5053 LearningRate 0.0019 Epoch: 17 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:50,107-Speed 3393.87 samples/sec Loss 1.3084 LearningRate 0.0019 Epoch: 17 Global Step: 87310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:53,123-Speed 3396.14 samples/sec Loss 1.3582 LearningRate 0.0019 Epoch: 17 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:14:56,117-Speed 3421.21 samples/sec Loss 1.3717 LearningRate 0.0019 Epoch: 17 Global Step: 87330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:14:59,128-Speed 3402.14 samples/sec Loss 1.3920 LearningRate 0.0019 Epoch: 17 Global Step: 87340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:02,184-Speed 3351.69 samples/sec Loss 1.3688 LearningRate 0.0019 Epoch: 17 Global Step: 87350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:05,249-Speed 3341.65 samples/sec Loss 1.2883 LearningRate 0.0019 Epoch: 17 Global Step: 87360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:08,329-Speed 3325.87 samples/sec Loss 1.3514 LearningRate 0.0019 Epoch: 17 Global Step: 87370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:11,340-Speed 3402.39 samples/sec Loss 1.3654 LearningRate 0.0019 Epoch: 17 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:14,355-Speed 3396.64 samples/sec Loss 1.3986 LearningRate 0.0019 Epoch: 17 Global Step: 87390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:17,384-Speed 3381.42 samples/sec Loss 1.2709 LearningRate 0.0019 Epoch: 17 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:20,391-Speed 3406.27 samples/sec Loss 1.3616 LearningRate 0.0018 Epoch: 17 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:23,410-Speed 3392.65 samples/sec Loss 1.4479 LearningRate 0.0018 Epoch: 17 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:15:26,454-Speed 3365.67 samples/sec Loss 1.4042 LearningRate 0.0018 Epoch: 17 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:29,520-Speed 3340.78 samples/sec Loss 1.3478 LearningRate 0.0018 Epoch: 17 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:32,530-Speed 3402.75 samples/sec Loss 1.3027 LearningRate 0.0018 Epoch: 17 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:35,538-Speed 3405.29 samples/sec Loss 1.4647 LearningRate 0.0018 Epoch: 17 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:38,559-Speed 3390.47 samples/sec Loss 1.3950 LearningRate 0.0018 Epoch: 17 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:41,569-Speed 3403.04 samples/sec Loss 1.4899 LearningRate 0.0018 Epoch: 17 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:44,578-Speed 3404.37 samples/sec Loss 1.4140 LearningRate 0.0018 Epoch: 17 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:47,604-Speed 3384.93 samples/sec Loss 1.3026 LearningRate 0.0018 Epoch: 17 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:50,640-Speed 3374.30 samples/sec Loss 1.3016 LearningRate 0.0018 Epoch: 17 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:53,646-Speed 3406.68 samples/sec Loss 1.3404 LearningRate 0.0018 Epoch: 17 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:56,647-Speed 3413.30 samples/sec Loss 1.4132 LearningRate 0.0018 Epoch: 17 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:15:59,656-Speed 3404.77 samples/sec Loss 1.4400 LearningRate 0.0018 Epoch: 17 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:02,691-Speed 3373.65 samples/sec Loss 1.4374 LearningRate 0.0018 Epoch: 17 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:05,709-Speed 3394.55 samples/sec Loss 1.3762 LearningRate 0.0018 Epoch: 17 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:08,760-Speed 3357.47 samples/sec Loss 1.3656 LearningRate 0.0018 Epoch: 17 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:11,774-Speed 3398.80 samples/sec Loss 1.3729 LearningRate 0.0018 Epoch: 17 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:14,782-Speed 3405.61 samples/sec Loss 1.5050 LearningRate 0.0018 Epoch: 17 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:17,827-Speed 3362.74 samples/sec Loss 1.3578 LearningRate 0.0018 Epoch: 17 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:20,854-Speed 3384.80 samples/sec Loss 1.3784 LearningRate 0.0018 Epoch: 17 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:23,866-Speed 3399.36 samples/sec Loss 1.4052 LearningRate 0.0018 Epoch: 17 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:16:26,890-Speed 3387.38 samples/sec Loss 1.3835 LearningRate 0.0018 Epoch: 17 Global Step: 87630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 08:16:29,869-Speed 3439.47 samples/sec Loss 1.4702 LearningRate 0.0018 Epoch: 17 Global Step: 87640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:32,883-Speed 3398.43 samples/sec Loss 1.4091 LearningRate 0.0018 Epoch: 17 Global Step: 87650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:35,901-Speed 3393.25 samples/sec Loss 1.3753 LearningRate 0.0018 Epoch: 17 Global Step: 87660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:38,916-Speed 3398.33 samples/sec Loss 1.4080 LearningRate 0.0018 Epoch: 17 Global Step: 87670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:41,933-Speed 3395.90 samples/sec Loss 1.3830 LearningRate 0.0018 Epoch: 17 Global Step: 87680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:44,948-Speed 3397.07 samples/sec Loss 1.4769 LearningRate 0.0018 Epoch: 17 Global Step: 87690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:47,956-Speed 3404.18 samples/sec Loss 1.3453 LearningRate 0.0018 Epoch: 17 Global Step: 87700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:50,967-Speed 3402.54 samples/sec Loss 1.4178 LearningRate 0.0018 Epoch: 17 Global Step: 87710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:53,983-Speed 3395.53 samples/sec Loss 1.4203 LearningRate 0.0018 Epoch: 17 Global Step: 87720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:16:56,999-Speed 3397.17 samples/sec Loss 1.5259 LearningRate 0.0018 Epoch: 17 Global Step: 87730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:00,025-Speed 3383.81 samples/sec Loss 1.3396 LearningRate 0.0018 Epoch: 17 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:03,077-Speed 3356.56 samples/sec Loss 1.4785 LearningRate 0.0018 Epoch: 17 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:06,104-Speed 3384.43 samples/sec Loss 1.3809 LearningRate 0.0018 Epoch: 17 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:09,122-Speed 3392.63 samples/sec Loss 1.3002 LearningRate 0.0018 Epoch: 17 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:12,137-Speed 3398.39 samples/sec Loss 1.3537 LearningRate 0.0017 Epoch: 17 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:15,174-Speed 3372.68 samples/sec Loss 1.3814 LearningRate 0.0017 Epoch: 17 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:18,198-Speed 3386.51 samples/sec Loss 1.3454 LearningRate 0.0017 Epoch: 17 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:21,209-Speed 3402.30 samples/sec Loss 1.3597 LearningRate 0.0017 Epoch: 17 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:24,234-Speed 3386.27 samples/sec Loss 1.3721 LearningRate 0.0017 Epoch: 17 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:27,245-Speed 3400.93 samples/sec Loss 1.3416 LearningRate 0.0017 Epoch: 17 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:30,245-Speed 3414.56 samples/sec Loss 1.4577 LearningRate 0.0017 Epoch: 17 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:33,255-Speed 3402.69 samples/sec Loss 1.3967 LearningRate 0.0017 Epoch: 17 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:36,270-Speed 3397.53 samples/sec Loss 1.4059 LearningRate 0.0017 Epoch: 17 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:39,294-Speed 3386.70 samples/sec Loss 1.3956 LearningRate 0.0017 Epoch: 17 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:17:42,307-Speed 3400.43 samples/sec Loss 1.4306 LearningRate 0.0017 Epoch: 17 Global Step: 87880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:45,313-Speed 3408.33 samples/sec Loss 1.4391 LearningRate 0.0017 Epoch: 17 Global Step: 87890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:48,326-Speed 3399.00 samples/sec Loss 1.3990 LearningRate 0.0017 Epoch: 17 Global Step: 87900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:51,416-Speed 3316.03 samples/sec Loss 1.3748 LearningRate 0.0017 Epoch: 17 Global Step: 87910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:54,484-Speed 3338.01 samples/sec Loss 1.3190 LearningRate 0.0017 Epoch: 17 Global Step: 87920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:17:57,503-Speed 3392.41 samples/sec Loss 1.4038 LearningRate 0.0017 Epoch: 17 Global Step: 87930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:18:00,515-Speed 3401.45 samples/sec Loss 1.3829 LearningRate 0.0017 Epoch: 17 Global Step: 87940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:18:03,550-Speed 3373.87 samples/sec Loss 1.3525 LearningRate 0.0017 Epoch: 17 Global Step: 87950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:18:06,563-Speed 3400.60 samples/sec Loss 1.3466 LearningRate 0.0017 Epoch: 17 Global Step: 87960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:18:09,585-Speed 3388.56 samples/sec Loss 1.3550 LearningRate 0.0017 Epoch: 17 Global Step: 87970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:18:12,616-Speed 3379.00 samples/sec Loss 1.4046 LearningRate 0.0017 Epoch: 17 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:18:15,632-Speed 3396.36 samples/sec Loss 1.3661 LearningRate 0.0017 Epoch: 17 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:18:18,686-Speed 3353.85 samples/sec Loss 1.3331 LearningRate 0.0017 Epoch: 17 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:19:03,003-[lfw][88000]XNorm: 22.146688 Training: 2022-04-11 08:19:03,003-[lfw][88000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:19:03,004-[lfw][88000]Accuracy-Highest: 0.99850 Training: 2022-04-11 08:19:54,257-[cfp_fp][88000]XNorm: 22.118556 Training: 2022-04-11 08:19:54,258-[cfp_fp][88000]Accuracy-Flip: 0.98857+-0.00461 Training: 2022-04-11 08:19:54,258-[cfp_fp][88000]Accuracy-Highest: 0.98857 Training: 2022-04-11 08:20:38,400-[agedb_30][88000]XNorm: 22.611050 Training: 2022-04-11 08:20:38,401-[agedb_30][88000]Accuracy-Flip: 0.98233+-0.00793 Training: 2022-04-11 08:20:38,401-[agedb_30][88000]Accuracy-Highest: 0.98550 Training: 2022-04-11 08:20:41,398-Speed 71.75 samples/sec Loss 1.3531 LearningRate 0.0017 Epoch: 17 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:44,391-Speed 3421.56 samples/sec Loss 1.3390 LearningRate 0.0017 Epoch: 17 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:47,395-Speed 3409.91 samples/sec Loss 1.3188 LearningRate 0.0017 Epoch: 17 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:50,391-Speed 3418.85 samples/sec Loss 1.4532 LearningRate 0.0017 Epoch: 17 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:53,401-Speed 3403.55 samples/sec Loss 1.3794 LearningRate 0.0017 Epoch: 17 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:56,402-Speed 3413.99 samples/sec Loss 1.3794 LearningRate 0.0017 Epoch: 17 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:20:59,401-Speed 3414.23 samples/sec Loss 1.3948 LearningRate 0.0017 Epoch: 17 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:02,384-Speed 3433.63 samples/sec Loss 1.4084 LearningRate 0.0017 Epoch: 17 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:05,392-Speed 3405.55 samples/sec Loss 1.3908 LearningRate 0.0017 Epoch: 17 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:08,377-Speed 3431.05 samples/sec Loss 1.3855 LearningRate 0.0017 Epoch: 17 Global Step: 88100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:11,372-Speed 3421.12 samples/sec Loss 1.3058 LearningRate 0.0017 Epoch: 17 Global Step: 88110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:14,411-Speed 3369.75 samples/sec Loss 1.3174 LearningRate 0.0017 Epoch: 17 Global Step: 88120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:17,413-Speed 3412.01 samples/sec Loss 1.3915 LearningRate 0.0017 Epoch: 17 Global Step: 88130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:20,412-Speed 3415.35 samples/sec Loss 1.3892 LearningRate 0.0017 Epoch: 17 Global Step: 88140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:23,441-Speed 3380.99 samples/sec Loss 1.3359 LearningRate 0.0017 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:26,452-Speed 3402.74 samples/sec Loss 1.2779 LearningRate 0.0017 Epoch: 17 Global Step: 88160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:29,468-Speed 3395.40 samples/sec Loss 1.3653 LearningRate 0.0016 Epoch: 17 Global Step: 88170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:32,508-Speed 3369.95 samples/sec Loss 1.3987 LearningRate 0.0016 Epoch: 17 Global Step: 88180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:35,525-Speed 3407.06 samples/sec Loss 1.4186 LearningRate 0.0016 Epoch: 17 Global Step: 88190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:21:38,528-Speed 3410.89 samples/sec Loss 1.3236 LearningRate 0.0016 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:41,635-Speed 3296.42 samples/sec Loss 1.3053 LearningRate 0.0016 Epoch: 17 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:44,640-Speed 3408.07 samples/sec Loss 1.4308 LearningRate 0.0016 Epoch: 17 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:47,668-Speed 3383.07 samples/sec Loss 1.2678 LearningRate 0.0016 Epoch: 17 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:50,686-Speed 3393.61 samples/sec Loss 1.4392 LearningRate 0.0016 Epoch: 17 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:53,702-Speed 3395.80 samples/sec Loss 1.3513 LearningRate 0.0016 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:56,713-Speed 3402.18 samples/sec Loss 1.5255 LearningRate 0.0016 Epoch: 17 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:21:59,715-Speed 3412.50 samples/sec Loss 1.3735 LearningRate 0.0016 Epoch: 17 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:02,729-Speed 3398.44 samples/sec Loss 1.3859 LearningRate 0.0016 Epoch: 17 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:05,744-Speed 3410.90 samples/sec Loss 1.4042 LearningRate 0.0016 Epoch: 17 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:08,732-Speed 3427.13 samples/sec Loss 1.3902 LearningRate 0.0016 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:11,787-Speed 3352.95 samples/sec Loss 1.3823 LearningRate 0.0016 Epoch: 17 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:14,821-Speed 3380.35 samples/sec Loss 1.3703 LearningRate 0.0016 Epoch: 17 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:17,831-Speed 3402.43 samples/sec Loss 1.3916 LearningRate 0.0016 Epoch: 17 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:20,851-Speed 3402.39 samples/sec Loss 1.3337 LearningRate 0.0016 Epoch: 17 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:23,855-Speed 3408.71 samples/sec Loss 1.3878 LearningRate 0.0016 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:26,876-Speed 3390.61 samples/sec Loss 1.4113 LearningRate 0.0016 Epoch: 17 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:29,900-Speed 3406.91 samples/sec Loss 1.3421 LearningRate 0.0016 Epoch: 17 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:32,910-Speed 3402.77 samples/sec Loss 1.3881 LearningRate 0.0016 Epoch: 17 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:22:35,918-Speed 3410.63 samples/sec Loss 1.4359 LearningRate 0.0016 Epoch: 17 Global Step: 88390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:38,950-Speed 3378.38 samples/sec Loss 1.4487 LearningRate 0.0016 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:41,965-Speed 3397.76 samples/sec Loss 1.4072 LearningRate 0.0016 Epoch: 17 Global Step: 88410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:45,176-Speed 3411.63 samples/sec Loss 1.3890 LearningRate 0.0016 Epoch: 17 Global Step: 88420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:48,182-Speed 3407.59 samples/sec Loss 1.3643 LearningRate 0.0016 Epoch: 17 Global Step: 88430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:51,200-Speed 3399.01 samples/sec Loss 1.3970 LearningRate 0.0016 Epoch: 17 Global Step: 88440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:54,205-Speed 3408.87 samples/sec Loss 1.3489 LearningRate 0.0016 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:22:57,216-Speed 3402.10 samples/sec Loss 1.4077 LearningRate 0.0016 Epoch: 17 Global Step: 88460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:00,242-Speed 3408.66 samples/sec Loss 1.3915 LearningRate 0.0016 Epoch: 17 Global Step: 88470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:03,249-Speed 3405.81 samples/sec Loss 1.4482 LearningRate 0.0016 Epoch: 17 Global Step: 88480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:06,362-Speed 3414.21 samples/sec Loss 1.3400 LearningRate 0.0016 Epoch: 17 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:09,378-Speed 3396.84 samples/sec Loss 1.4403 LearningRate 0.0016 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:12,388-Speed 3402.82 samples/sec Loss 1.3773 LearningRate 0.0016 Epoch: 17 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:15,391-Speed 3413.27 samples/sec Loss 1.3520 LearningRate 0.0016 Epoch: 17 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:18,404-Speed 3399.52 samples/sec Loss 1.4097 LearningRate 0.0016 Epoch: 17 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:21,424-Speed 3398.30 samples/sec Loss 1.3185 LearningRate 0.0016 Epoch: 17 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:24,430-Speed 3407.38 samples/sec Loss 1.4543 LearningRate 0.0016 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:27,459-Speed 3381.39 samples/sec Loss 1.3557 LearningRate 0.0016 Epoch: 17 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:30,486-Speed 3390.04 samples/sec Loss 1.3437 LearningRate 0.0015 Epoch: 17 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:33,495-Speed 3403.86 samples/sec Loss 1.3695 LearningRate 0.0015 Epoch: 17 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:36,510-Speed 3425.92 samples/sec Loss 1.3637 LearningRate 0.0015 Epoch: 17 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:39,520-Speed 3402.25 samples/sec Loss 1.3254 LearningRate 0.0015 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:23:42,509-Speed 3426.43 samples/sec Loss 1.3526 LearningRate 0.0015 Epoch: 17 Global Step: 88610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:45,533-Speed 3387.75 samples/sec Loss 1.4315 LearningRate 0.0015 Epoch: 17 Global Step: 88620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:48,547-Speed 3398.47 samples/sec Loss 1.4626 LearningRate 0.0015 Epoch: 17 Global Step: 88630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:51,627-Speed 3325.03 samples/sec Loss 1.3160 LearningRate 0.0015 Epoch: 17 Global Step: 88640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:54,640-Speed 3399.83 samples/sec Loss 1.4334 LearningRate 0.0015 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:23:57,649-Speed 3403.30 samples/sec Loss 1.3181 LearningRate 0.0015 Epoch: 17 Global Step: 88660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:00,662-Speed 3400.28 samples/sec Loss 1.3347 LearningRate 0.0015 Epoch: 17 Global Step: 88670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:03,675-Speed 3400.15 samples/sec Loss 1.3410 LearningRate 0.0015 Epoch: 17 Global Step: 88680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:06,694-Speed 3392.99 samples/sec Loss 1.3802 LearningRate 0.0015 Epoch: 17 Global Step: 88690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:09,704-Speed 3402.51 samples/sec Loss 1.3089 LearningRate 0.0015 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:12,727-Speed 3388.63 samples/sec Loss 1.4100 LearningRate 0.0015 Epoch: 17 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:24:15,787-Speed 3346.79 samples/sec Loss 1.4183 LearningRate 0.0015 Epoch: 17 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:24:18,821-Speed 3376.10 samples/sec Loss 1.4276 LearningRate 0.0015 Epoch: 17 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:24:21,831-Speed 3406.26 samples/sec Loss 1.4727 LearningRate 0.0015 Epoch: 17 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:24:24,826-Speed 3419.50 samples/sec Loss 1.3794 LearningRate 0.0015 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:27,837-Speed 3401.41 samples/sec Loss 1.3600 LearningRate 0.0015 Epoch: 17 Global Step: 88760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:30,853-Speed 3397.10 samples/sec Loss 1.4284 LearningRate 0.0015 Epoch: 17 Global Step: 88770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:33,861-Speed 3405.13 samples/sec Loss 1.2897 LearningRate 0.0015 Epoch: 17 Global Step: 88780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:36,940-Speed 3326.57 samples/sec Loss 1.3623 LearningRate 0.0015 Epoch: 17 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:39,953-Speed 3398.81 samples/sec Loss 1.4036 LearningRate 0.0015 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:43,002-Speed 3359.50 samples/sec Loss 1.3823 LearningRate 0.0015 Epoch: 17 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:46,017-Speed 3397.83 samples/sec Loss 1.4101 LearningRate 0.0015 Epoch: 17 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:49,064-Speed 3361.24 samples/sec Loss 1.3755 LearningRate 0.0015 Epoch: 17 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:52,082-Speed 3393.32 samples/sec Loss 1.4640 LearningRate 0.0015 Epoch: 17 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:24:55,112-Speed 3380.39 samples/sec Loss 1.3259 LearningRate 0.0015 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:24:58,134-Speed 3389.56 samples/sec Loss 1.3523 LearningRate 0.0015 Epoch: 17 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:01,165-Speed 3380.24 samples/sec Loss 1.4420 LearningRate 0.0015 Epoch: 17 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:04,210-Speed 3362.88 samples/sec Loss 1.3838 LearningRate 0.0015 Epoch: 17 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:07,204-Speed 3421.38 samples/sec Loss 1.3355 LearningRate 0.0015 Epoch: 17 Global Step: 88890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:10,216-Speed 3401.49 samples/sec Loss 1.3297 LearningRate 0.0015 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:13,227-Speed 3401.33 samples/sec Loss 1.3608 LearningRate 0.0015 Epoch: 17 Global Step: 88910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:16,270-Speed 3366.27 samples/sec Loss 1.4569 LearningRate 0.0015 Epoch: 17 Global Step: 88920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:19,283-Speed 3398.88 samples/sec Loss 1.3981 LearningRate 0.0015 Epoch: 17 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:22,294-Speed 3401.59 samples/sec Loss 1.3759 LearningRate 0.0015 Epoch: 17 Global Step: 88940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:25,315-Speed 3391.14 samples/sec Loss 1.4786 LearningRate 0.0015 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:28,327-Speed 3399.94 samples/sec Loss 1.4175 LearningRate 0.0015 Epoch: 17 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:31,340-Speed 3399.76 samples/sec Loss 1.4053 LearningRate 0.0015 Epoch: 17 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:34,354-Speed 3398.73 samples/sec Loss 1.4606 LearningRate 0.0014 Epoch: 17 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:25:37,372-Speed 3393.85 samples/sec Loss 1.4143 LearningRate 0.0014 Epoch: 17 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:40,383-Speed 3401.77 samples/sec Loss 1.2947 LearningRate 0.0014 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:43,399-Speed 3395.76 samples/sec Loss 1.4162 LearningRate 0.0014 Epoch: 17 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:46,416-Speed 3394.87 samples/sec Loss 1.4179 LearningRate 0.0014 Epoch: 17 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:49,521-Speed 3299.59 samples/sec Loss 1.3416 LearningRate 0.0014 Epoch: 17 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:52,540-Speed 3391.77 samples/sec Loss 1.4311 LearningRate 0.0014 Epoch: 17 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:55,578-Speed 3371.62 samples/sec Loss 1.4142 LearningRate 0.0014 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:25:58,589-Speed 3402.32 samples/sec Loss 1.3631 LearningRate 0.0014 Epoch: 17 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:01,608-Speed 3393.55 samples/sec Loss 1.3278 LearningRate 0.0014 Epoch: 17 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:04,633-Speed 3385.99 samples/sec Loss 1.4604 LearningRate 0.0014 Epoch: 17 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:07,632-Speed 3415.50 samples/sec Loss 1.4872 LearningRate 0.0014 Epoch: 17 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:10,645-Speed 3399.66 samples/sec Loss 1.4158 LearningRate 0.0014 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:13,670-Speed 3385.90 samples/sec Loss 1.4127 LearningRate 0.0014 Epoch: 17 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:16,688-Speed 3394.35 samples/sec Loss 1.3534 LearningRate 0.0014 Epoch: 17 Global Step: 89120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:19,701-Speed 3399.27 samples/sec Loss 1.3737 LearningRate 0.0014 Epoch: 17 Global Step: 89130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:22,714-Speed 3399.59 samples/sec Loss 1.4317 LearningRate 0.0014 Epoch: 17 Global Step: 89140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:25,728-Speed 3398.47 samples/sec Loss 1.2359 LearningRate 0.0014 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:28,744-Speed 3396.76 samples/sec Loss 1.3293 LearningRate 0.0014 Epoch: 17 Global Step: 89160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:26:31,740-Speed 3419.05 samples/sec Loss 1.4270 LearningRate 0.0014 Epoch: 17 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:34,767-Speed 3383.56 samples/sec Loss 1.4553 LearningRate 0.0014 Epoch: 17 Global Step: 89180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:37,788-Speed 3389.90 samples/sec Loss 1.3982 LearningRate 0.0014 Epoch: 17 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:40,807-Speed 3393.23 samples/sec Loss 1.3793 LearningRate 0.0014 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:43,820-Speed 3399.12 samples/sec Loss 1.3996 LearningRate 0.0014 Epoch: 17 Global Step: 89210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:46,840-Speed 3392.46 samples/sec Loss 1.3856 LearningRate 0.0014 Epoch: 17 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:49,870-Speed 3379.54 samples/sec Loss 1.3290 LearningRate 0.0014 Epoch: 17 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:52,886-Speed 3396.38 samples/sec Loss 1.2532 LearningRate 0.0014 Epoch: 17 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:55,906-Speed 3391.36 samples/sec Loss 1.4547 LearningRate 0.0014 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:26:58,946-Speed 3369.79 samples/sec Loss 1.3569 LearningRate 0.0014 Epoch: 17 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:27:01,999-Speed 3354.76 samples/sec Loss 1.4087 LearningRate 0.0014 Epoch: 17 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:05,025-Speed 3384.89 samples/sec Loss 1.4653 LearningRate 0.0014 Epoch: 17 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:08,055-Speed 3381.20 samples/sec Loss 1.2970 LearningRate 0.0014 Epoch: 17 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:11,067-Speed 3400.84 samples/sec Loss 1.3489 LearningRate 0.0014 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:14,090-Speed 3387.65 samples/sec Loss 1.3598 LearningRate 0.0014 Epoch: 17 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:17,149-Speed 3348.88 samples/sec Loss 1.4840 LearningRate 0.0014 Epoch: 17 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:20,186-Speed 3373.69 samples/sec Loss 1.4105 LearningRate 0.0014 Epoch: 17 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:23,214-Speed 3381.56 samples/sec Loss 1.3612 LearningRate 0.0014 Epoch: 17 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:26,224-Speed 3403.69 samples/sec Loss 1.4181 LearningRate 0.0014 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:29,248-Speed 3387.36 samples/sec Loss 1.3739 LearningRate 0.0014 Epoch: 17 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:32,258-Speed 3402.92 samples/sec Loss 1.3540 LearningRate 0.0014 Epoch: 17 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:35,268-Speed 3402.45 samples/sec Loss 1.4308 LearningRate 0.0014 Epoch: 17 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:38,282-Speed 3398.46 samples/sec Loss 1.3542 LearningRate 0.0014 Epoch: 17 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:41,300-Speed 3393.66 samples/sec Loss 1.3675 LearningRate 0.0014 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:44,310-Speed 3402.92 samples/sec Loss 1.3921 LearningRate 0.0013 Epoch: 17 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:47,339-Speed 3381.73 samples/sec Loss 1.2937 LearningRate 0.0013 Epoch: 17 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:50,376-Speed 3373.20 samples/sec Loss 1.3541 LearningRate 0.0013 Epoch: 17 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:53,398-Speed 3389.16 samples/sec Loss 1.2885 LearningRate 0.0013 Epoch: 17 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:56,418-Speed 3391.79 samples/sec Loss 1.3399 LearningRate 0.0013 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:27:59,433-Speed 3396.65 samples/sec Loss 1.4533 LearningRate 0.0013 Epoch: 17 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:02,437-Speed 3409.19 samples/sec Loss 1.3983 LearningRate 0.0013 Epoch: 17 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:05,448-Speed 3402.08 samples/sec Loss 1.4007 LearningRate 0.0013 Epoch: 17 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:08,469-Speed 3390.21 samples/sec Loss 1.4584 LearningRate 0.0013 Epoch: 17 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:11,495-Speed 3385.26 samples/sec Loss 1.3226 LearningRate 0.0013 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:14,538-Speed 3366.25 samples/sec Loss 1.3837 LearningRate 0.0013 Epoch: 17 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:17,560-Speed 3388.84 samples/sec Loss 1.3482 LearningRate 0.0013 Epoch: 17 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:20,582-Speed 3390.39 samples/sec Loss 1.3723 LearningRate 0.0013 Epoch: 17 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:23,592-Speed 3402.73 samples/sec Loss 1.3793 LearningRate 0.0013 Epoch: 17 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:26,611-Speed 3392.31 samples/sec Loss 1.2868 LearningRate 0.0013 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:28:29,608-Speed 3417.72 samples/sec Loss 1.3836 LearningRate 0.0013 Epoch: 17 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:32,621-Speed 3399.54 samples/sec Loss 1.3915 LearningRate 0.0013 Epoch: 17 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:35,636-Speed 3397.68 samples/sec Loss 1.3820 LearningRate 0.0013 Epoch: 17 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:38,654-Speed 3394.17 samples/sec Loss 1.3796 LearningRate 0.0013 Epoch: 17 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:41,667-Speed 3399.63 samples/sec Loss 1.3268 LearningRate 0.0013 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:44,705-Speed 3370.47 samples/sec Loss 1.4160 LearningRate 0.0013 Epoch: 17 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:47,729-Speed 3387.32 samples/sec Loss 1.3690 LearningRate 0.0013 Epoch: 17 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:50,751-Speed 3390.11 samples/sec Loss 1.3650 LearningRate 0.0013 Epoch: 17 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:53,769-Speed 3393.69 samples/sec Loss 1.3764 LearningRate 0.0013 Epoch: 17 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:56,794-Speed 3386.44 samples/sec Loss 1.3678 LearningRate 0.0013 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:28:59,820-Speed 3384.12 samples/sec Loss 1.2955 LearningRate 0.0013 Epoch: 17 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:02,883-Speed 3343.55 samples/sec Loss 1.4281 LearningRate 0.0013 Epoch: 17 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:05,903-Speed 3391.75 samples/sec Loss 1.4150 LearningRate 0.0013 Epoch: 17 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:08,947-Speed 3365.24 samples/sec Loss 1.4184 LearningRate 0.0013 Epoch: 17 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:11,987-Speed 3368.93 samples/sec Loss 1.4519 LearningRate 0.0013 Epoch: 17 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:15,005-Speed 3394.15 samples/sec Loss 1.4370 LearningRate 0.0013 Epoch: 17 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:18,030-Speed 3385.81 samples/sec Loss 1.3717 LearningRate 0.0013 Epoch: 17 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:21,032-Speed 3412.35 samples/sec Loss 1.4792 LearningRate 0.0013 Epoch: 17 Global Step: 89730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:24,058-Speed 3385.20 samples/sec Loss 1.3761 LearningRate 0.0013 Epoch: 17 Global Step: 89740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:27,079-Speed 3389.84 samples/sec Loss 1.3842 LearningRate 0.0013 Epoch: 17 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:30,093-Speed 3398.72 samples/sec Loss 1.4607 LearningRate 0.0013 Epoch: 17 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:33,114-Speed 3389.81 samples/sec Loss 1.3456 LearningRate 0.0013 Epoch: 17 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:36,121-Speed 3406.37 samples/sec Loss 1.4132 LearningRate 0.0013 Epoch: 17 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:39,163-Speed 3367.49 samples/sec Loss 1.3965 LearningRate 0.0013 Epoch: 17 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:42,176-Speed 3399.46 samples/sec Loss 1.3659 LearningRate 0.0013 Epoch: 17 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:45,192-Speed 3396.14 samples/sec Loss 1.3455 LearningRate 0.0013 Epoch: 17 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:48,203-Speed 3402.35 samples/sec Loss 1.3495 LearningRate 0.0013 Epoch: 17 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:29:51,220-Speed 3394.25 samples/sec Loss 1.4227 LearningRate 0.0013 Epoch: 17 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:54,233-Speed 3400.07 samples/sec Loss 1.4103 LearningRate 0.0013 Epoch: 17 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:29:57,248-Speed 3397.02 samples/sec Loss 1.3454 LearningRate 0.0012 Epoch: 17 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:00,260-Speed 3400.66 samples/sec Loss 1.3811 LearningRate 0.0012 Epoch: 17 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:03,324-Speed 3342.62 samples/sec Loss 1.4039 LearningRate 0.0012 Epoch: 17 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:06,337-Speed 3399.87 samples/sec Loss 1.3637 LearningRate 0.0012 Epoch: 17 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:09,353-Speed 3395.58 samples/sec Loss 1.3696 LearningRate 0.0012 Epoch: 17 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:12,368-Speed 3398.15 samples/sec Loss 1.4211 LearningRate 0.0012 Epoch: 17 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:30:15,420-Speed 3355.53 samples/sec Loss 1.4505 LearningRate 0.0012 Epoch: 17 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:18,434-Speed 3397.43 samples/sec Loss 1.4442 LearningRate 0.0012 Epoch: 17 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:21,446-Speed 3401.09 samples/sec Loss 1.3834 LearningRate 0.0012 Epoch: 17 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:24,469-Speed 3388.10 samples/sec Loss 1.3818 LearningRate 0.0012 Epoch: 17 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:27,488-Speed 3393.72 samples/sec Loss 1.3708 LearningRate 0.0012 Epoch: 17 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:30,507-Speed 3392.24 samples/sec Loss 1.4708 LearningRate 0.0012 Epoch: 17 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:33,525-Speed 3393.81 samples/sec Loss 1.3246 LearningRate 0.0012 Epoch: 17 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:36,545-Speed 3392.26 samples/sec Loss 1.3542 LearningRate 0.0012 Epoch: 17 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:39,672-Speed 3276.33 samples/sec Loss 1.4309 LearningRate 0.0012 Epoch: 17 Global Step: 89990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:30:42,747-Speed 3330.61 samples/sec Loss 1.4388 LearningRate 0.0012 Epoch: 17 Global Step: 90000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:31:27,017-[lfw][90000]XNorm: 21.895830 Training: 2022-04-11 08:31:27,017-[lfw][90000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:31:27,018-[lfw][90000]Accuracy-Highest: 0.99850 Training: 2022-04-11 08:32:18,246-[cfp_fp][90000]XNorm: 21.862363 Training: 2022-04-11 08:32:18,247-[cfp_fp][90000]Accuracy-Flip: 0.98800+-0.00483 Training: 2022-04-11 08:32:18,247-[cfp_fp][90000]Accuracy-Highest: 0.98857 Training: 2022-04-11 08:33:02,231-[agedb_30][90000]XNorm: 22.314070 Training: 2022-04-11 08:33:02,232-[agedb_30][90000]Accuracy-Flip: 0.98317+-0.00845 Training: 2022-04-11 08:33:02,233-[agedb_30][90000]Accuracy-Highest: 0.98550 Training: 2022-04-11 08:33:05,241-Speed 71.86 samples/sec Loss 1.4257 LearningRate 0.0012 Epoch: 17 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:08,240-Speed 3414.98 samples/sec Loss 1.4122 LearningRate 0.0012 Epoch: 17 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:11,231-Speed 3424.26 samples/sec Loss 1.2799 LearningRate 0.0012 Epoch: 17 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:14,222-Speed 3425.35 samples/sec Loss 1.4051 LearningRate 0.0012 Epoch: 17 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:17,271-Speed 3358.70 samples/sec Loss 1.3326 LearningRate 0.0012 Epoch: 17 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:20,265-Speed 3421.85 samples/sec Loss 1.4011 LearningRate 0.0012 Epoch: 17 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:23,262-Speed 3417.72 samples/sec Loss 1.3138 LearningRate 0.0012 Epoch: 17 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:26,259-Speed 3417.96 samples/sec Loss 1.4080 LearningRate 0.0012 Epoch: 17 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:29,309-Speed 3358.20 samples/sec Loss 1.5181 LearningRate 0.0012 Epoch: 17 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:32,336-Speed 3383.95 samples/sec Loss 1.3743 LearningRate 0.0012 Epoch: 17 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:35,335-Speed 3414.94 samples/sec Loss 1.4532 LearningRate 0.0012 Epoch: 17 Global Step: 90110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:38,356-Speed 3390.79 samples/sec Loss 1.3527 LearningRate 0.0012 Epoch: 17 Global Step: 90120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:41,370-Speed 3397.78 samples/sec Loss 1.3962 LearningRate 0.0012 Epoch: 17 Global Step: 90130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:44,372-Speed 3412.62 samples/sec Loss 1.3624 LearningRate 0.0012 Epoch: 17 Global Step: 90140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:47,373-Speed 3412.51 samples/sec Loss 1.4065 LearningRate 0.0012 Epoch: 17 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:33:50,393-Speed 3392.12 samples/sec Loss 1.3207 LearningRate 0.0012 Epoch: 17 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:53,401-Speed 3404.82 samples/sec Loss 1.3334 LearningRate 0.0012 Epoch: 17 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:56,419-Speed 3393.98 samples/sec Loss 1.3433 LearningRate 0.0012 Epoch: 17 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:33:59,441-Speed 3389.41 samples/sec Loss 1.4739 LearningRate 0.0012 Epoch: 17 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:02,444-Speed 3411.13 samples/sec Loss 1.3214 LearningRate 0.0012 Epoch: 17 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:05,495-Speed 3356.95 samples/sec Loss 1.3471 LearningRate 0.0012 Epoch: 17 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:08,500-Speed 3408.94 samples/sec Loss 1.4221 LearningRate 0.0012 Epoch: 17 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:11,502-Speed 3411.99 samples/sec Loss 1.3915 LearningRate 0.0012 Epoch: 17 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:14,490-Speed 3428.17 samples/sec Loss 1.3968 LearningRate 0.0012 Epoch: 17 Global Step: 90240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:17,496-Speed 3407.77 samples/sec Loss 1.3854 LearningRate 0.0012 Epoch: 17 Global Step: 90250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:20,494-Speed 3415.64 samples/sec Loss 1.4166 LearningRate 0.0012 Epoch: 17 Global Step: 90260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:23,496-Speed 3412.98 samples/sec Loss 1.3675 LearningRate 0.0012 Epoch: 17 Global Step: 90270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:26,502-Speed 3406.36 samples/sec Loss 1.4084 LearningRate 0.0012 Epoch: 17 Global Step: 90280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:29,521-Speed 3393.22 samples/sec Loss 1.3558 LearningRate 0.0012 Epoch: 17 Global Step: 90290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:32,520-Speed 3415.82 samples/sec Loss 1.3855 LearningRate 0.0012 Epoch: 17 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:35,526-Speed 3406.89 samples/sec Loss 1.4227 LearningRate 0.0012 Epoch: 17 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:38,598-Speed 3334.81 samples/sec Loss 1.4081 LearningRate 0.0011 Epoch: 17 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:41,603-Speed 3408.25 samples/sec Loss 1.4079 LearningRate 0.0011 Epoch: 17 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:34:44,606-Speed 3410.54 samples/sec Loss 1.4093 LearningRate 0.0011 Epoch: 17 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:47,610-Speed 3411.04 samples/sec Loss 1.4071 LearningRate 0.0011 Epoch: 17 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:50,619-Speed 3404.19 samples/sec Loss 1.3871 LearningRate 0.0011 Epoch: 17 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:53,665-Speed 3362.31 samples/sec Loss 1.4118 LearningRate 0.0011 Epoch: 17 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:56,669-Speed 3409.45 samples/sec Loss 1.4699 LearningRate 0.0011 Epoch: 17 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:34:59,705-Speed 3374.53 samples/sec Loss 1.3549 LearningRate 0.0011 Epoch: 17 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:02,732-Speed 3383.79 samples/sec Loss 1.3175 LearningRate 0.0011 Epoch: 17 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:05,766-Speed 3375.04 samples/sec Loss 1.4097 LearningRate 0.0011 Epoch: 17 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:08,772-Speed 3407.60 samples/sec Loss 1.4357 LearningRate 0.0011 Epoch: 17 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:11,757-Speed 3431.69 samples/sec Loss 1.3890 LearningRate 0.0011 Epoch: 17 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:14,757-Speed 3414.82 samples/sec Loss 1.2800 LearningRate 0.0011 Epoch: 17 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:17,765-Speed 3403.97 samples/sec Loss 1.4075 LearningRate 0.0011 Epoch: 17 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:20,766-Speed 3413.33 samples/sec Loss 1.3852 LearningRate 0.0011 Epoch: 17 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:23,799-Speed 3377.76 samples/sec Loss 1.5125 LearningRate 0.0011 Epoch: 17 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:26,813-Speed 3398.76 samples/sec Loss 1.4211 LearningRate 0.0011 Epoch: 17 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:29,825-Speed 3400.62 samples/sec Loss 1.4674 LearningRate 0.0011 Epoch: 17 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:32,825-Speed 3413.33 samples/sec Loss 1.4754 LearningRate 0.0011 Epoch: 17 Global Step: 90500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:35,828-Speed 3410.87 samples/sec Loss 1.3604 LearningRate 0.0011 Epoch: 17 Global Step: 90510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:38,834-Speed 3407.94 samples/sec Loss 1.4980 LearningRate 0.0011 Epoch: 17 Global Step: 90520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:35:41,845-Speed 3401.28 samples/sec Loss 1.3171 LearningRate 0.0011 Epoch: 17 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:44,847-Speed 3411.90 samples/sec Loss 1.4060 LearningRate 0.0011 Epoch: 17 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:47,864-Speed 3395.72 samples/sec Loss 1.4405 LearningRate 0.0011 Epoch: 17 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:50,865-Speed 3412.95 samples/sec Loss 1.4268 LearningRate 0.0011 Epoch: 17 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:53,873-Speed 3406.82 samples/sec Loss 1.3695 LearningRate 0.0011 Epoch: 17 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:56,879-Speed 3407.13 samples/sec Loss 1.3692 LearningRate 0.0011 Epoch: 17 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:35:59,886-Speed 3406.40 samples/sec Loss 1.4414 LearningRate 0.0011 Epoch: 17 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:02,908-Speed 3389.62 samples/sec Loss 1.4647 LearningRate 0.0011 Epoch: 17 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:05,914-Speed 3406.98 samples/sec Loss 1.3967 LearningRate 0.0011 Epoch: 17 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:08,920-Speed 3407.86 samples/sec Loss 1.5051 LearningRate 0.0011 Epoch: 17 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:11,929-Speed 3404.36 samples/sec Loss 1.4021 LearningRate 0.0011 Epoch: 17 Global Step: 90630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 08:36:14,928-Speed 3415.13 samples/sec Loss 1.3872 LearningRate 0.0011 Epoch: 17 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:17,937-Speed 3404.83 samples/sec Loss 1.3779 LearningRate 0.0011 Epoch: 17 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:20,941-Speed 3408.73 samples/sec Loss 1.3093 LearningRate 0.0011 Epoch: 17 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:23,965-Speed 3387.82 samples/sec Loss 1.3689 LearningRate 0.0011 Epoch: 17 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:26,968-Speed 3411.14 samples/sec Loss 1.4241 LearningRate 0.0011 Epoch: 17 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:29,979-Speed 3402.13 samples/sec Loss 1.4954 LearningRate 0.0011 Epoch: 17 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:32,990-Speed 3401.49 samples/sec Loss 1.4268 LearningRate 0.0011 Epoch: 17 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:36,004-Speed 3397.57 samples/sec Loss 1.3354 LearningRate 0.0011 Epoch: 17 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:36:39,014-Speed 3403.64 samples/sec Loss 1.4129 LearningRate 0.0011 Epoch: 17 Global Step: 90720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:42,021-Speed 3405.66 samples/sec Loss 1.4140 LearningRate 0.0011 Epoch: 17 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:45,035-Speed 3398.68 samples/sec Loss 1.3548 LearningRate 0.0011 Epoch: 17 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:48,041-Speed 3407.79 samples/sec Loss 1.3688 LearningRate 0.0011 Epoch: 17 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:51,051-Speed 3402.72 samples/sec Loss 1.3722 LearningRate 0.0011 Epoch: 17 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:54,058-Speed 3407.34 samples/sec Loss 1.3412 LearningRate 0.0011 Epoch: 17 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:36:57,070-Speed 3400.23 samples/sec Loss 1.3077 LearningRate 0.0011 Epoch: 17 Global Step: 90780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:00,099-Speed 3381.00 samples/sec Loss 1.4634 LearningRate 0.0011 Epoch: 17 Global Step: 90790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:03,107-Speed 3406.05 samples/sec Loss 1.4237 LearningRate 0.0010 Epoch: 17 Global Step: 90800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:06,117-Speed 3402.69 samples/sec Loss 1.4021 LearningRate 0.0010 Epoch: 17 Global Step: 90810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:09,119-Speed 3411.67 samples/sec Loss 1.3794 LearningRate 0.0010 Epoch: 17 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:12,145-Speed 3385.30 samples/sec Loss 1.3743 LearningRate 0.0010 Epoch: 17 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:15,157-Speed 3401.55 samples/sec Loss 1.3615 LearningRate 0.0010 Epoch: 17 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:18,238-Speed 3323.61 samples/sec Loss 1.4248 LearningRate 0.0010 Epoch: 17 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:21,249-Speed 3401.42 samples/sec Loss 1.4185 LearningRate 0.0010 Epoch: 17 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:24,294-Speed 3364.84 samples/sec Loss 1.4595 LearningRate 0.0010 Epoch: 17 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:27,315-Speed 3389.81 samples/sec Loss 1.3541 LearningRate 0.0010 Epoch: 17 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:30,353-Speed 3372.23 samples/sec Loss 1.3894 LearningRate 0.0010 Epoch: 17 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:37:33,347-Speed 3422.16 samples/sec Loss 1.3121 LearningRate 0.0010 Epoch: 17 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:36,417-Speed 3335.98 samples/sec Loss 1.3764 LearningRate 0.0010 Epoch: 17 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:39,461-Speed 3365.63 samples/sec Loss 1.3534 LearningRate 0.0010 Epoch: 17 Global Step: 90920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:42,482-Speed 3390.28 samples/sec Loss 1.4021 LearningRate 0.0010 Epoch: 17 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:45,491-Speed 3404.31 samples/sec Loss 1.4274 LearningRate 0.0010 Epoch: 17 Global Step: 90940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:48,516-Speed 3386.09 samples/sec Loss 1.3361 LearningRate 0.0010 Epoch: 17 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:51,549-Speed 3376.77 samples/sec Loss 1.3534 LearningRate 0.0010 Epoch: 17 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:54,556-Speed 3406.83 samples/sec Loss 1.3055 LearningRate 0.0010 Epoch: 17 Global Step: 90970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:37:57,564-Speed 3404.57 samples/sec Loss 1.3674 LearningRate 0.0010 Epoch: 17 Global Step: 90980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:38:00,574-Speed 3402.94 samples/sec Loss 1.3546 LearningRate 0.0010 Epoch: 17 Global Step: 90990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:38:03,586-Speed 3401.36 samples/sec Loss 1.5091 LearningRate 0.0010 Epoch: 17 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:06,608-Speed 3388.57 samples/sec Loss 1.5032 LearningRate 0.0010 Epoch: 17 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:09,618-Speed 3403.50 samples/sec Loss 1.3948 LearningRate 0.0010 Epoch: 17 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:12,629-Speed 3401.47 samples/sec Loss 1.3898 LearningRate 0.0010 Epoch: 17 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:15,802-Speed 3227.97 samples/sec Loss 1.3089 LearningRate 0.0010 Epoch: 17 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:29,207-Speed 764.02 samples/sec Loss 1.1833 LearningRate 0.0010 Epoch: 18 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:32,231-Speed 3390.83 samples/sec Loss 0.9903 LearningRate 0.0010 Epoch: 18 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:35,251-Speed 3391.35 samples/sec Loss 1.0522 LearningRate 0.0010 Epoch: 18 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:38,272-Speed 3392.58 samples/sec Loss 1.0567 LearningRate 0.0010 Epoch: 18 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:41,331-Speed 3348.27 samples/sec Loss 1.1006 LearningRate 0.0010 Epoch: 18 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:44,341-Speed 3402.65 samples/sec Loss 1.0067 LearningRate 0.0010 Epoch: 18 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:47,354-Speed 3399.38 samples/sec Loss 1.1515 LearningRate 0.0010 Epoch: 18 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:50,376-Speed 3390.34 samples/sec Loss 0.9838 LearningRate 0.0010 Epoch: 18 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:53,391-Speed 3396.90 samples/sec Loss 1.0968 LearningRate 0.0010 Epoch: 18 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:56,401-Speed 3403.27 samples/sec Loss 1.0110 LearningRate 0.0010 Epoch: 18 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:38:59,421-Speed 3390.79 samples/sec Loss 1.0812 LearningRate 0.0010 Epoch: 18 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:02,445-Speed 3387.07 samples/sec Loss 1.0058 LearningRate 0.0010 Epoch: 18 Global Step: 91160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:05,484-Speed 3371.02 samples/sec Loss 1.0513 LearningRate 0.0010 Epoch: 18 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:08,510-Speed 3385.13 samples/sec Loss 1.1358 LearningRate 0.0010 Epoch: 18 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:11,579-Speed 3336.83 samples/sec Loss 1.0471 LearningRate 0.0010 Epoch: 18 Global Step: 91190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:14,597-Speed 3394.80 samples/sec Loss 1.0690 LearningRate 0.0010 Epoch: 18 Global Step: 91200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:17,620-Speed 3388.08 samples/sec Loss 1.0274 LearningRate 0.0010 Epoch: 18 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:20,641-Speed 3390.84 samples/sec Loss 1.0004 LearningRate 0.0010 Epoch: 18 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:23,666-Speed 3385.74 samples/sec Loss 1.0417 LearningRate 0.0010 Epoch: 18 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:26,703-Speed 3371.98 samples/sec Loss 1.0841 LearningRate 0.0010 Epoch: 18 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:29,747-Speed 3365.55 samples/sec Loss 1.0275 LearningRate 0.0010 Epoch: 18 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:39:32,773-Speed 3385.22 samples/sec Loss 1.0354 LearningRate 0.0010 Epoch: 18 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:35,798-Speed 3386.00 samples/sec Loss 1.1228 LearningRate 0.0010 Epoch: 18 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:38,836-Speed 3371.86 samples/sec Loss 1.0553 LearningRate 0.0010 Epoch: 18 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:41,870-Speed 3375.11 samples/sec Loss 1.0123 LearningRate 0.0010 Epoch: 18 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:44,895-Speed 3386.33 samples/sec Loss 1.0823 LearningRate 0.0010 Epoch: 18 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:47,920-Speed 3386.22 samples/sec Loss 1.0779 LearningRate 0.0009 Epoch: 18 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:51,032-Speed 3290.52 samples/sec Loss 1.0932 LearningRate 0.0009 Epoch: 18 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:54,063-Speed 3380.66 samples/sec Loss 1.1537 LearningRate 0.0009 Epoch: 18 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:39:57,083-Speed 3391.59 samples/sec Loss 1.0946 LearningRate 0.0009 Epoch: 18 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:00,107-Speed 3387.60 samples/sec Loss 1.1268 LearningRate 0.0009 Epoch: 18 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:03,109-Speed 3411.59 samples/sec Loss 1.1080 LearningRate 0.0009 Epoch: 18 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:06,131-Speed 3388.41 samples/sec Loss 1.0051 LearningRate 0.0009 Epoch: 18 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:09,149-Speed 3393.98 samples/sec Loss 1.0486 LearningRate 0.0009 Epoch: 18 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:12,166-Speed 3395.03 samples/sec Loss 1.0398 LearningRate 0.0009 Epoch: 18 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:15,194-Speed 3383.41 samples/sec Loss 1.0745 LearningRate 0.0009 Epoch: 18 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:18,224-Speed 3379.78 samples/sec Loss 1.0752 LearningRate 0.0009 Epoch: 18 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:21,243-Speed 3392.08 samples/sec Loss 1.1290 LearningRate 0.0009 Epoch: 18 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:24,256-Speed 3399.74 samples/sec Loss 0.9679 LearningRate 0.0009 Epoch: 18 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:27,302-Speed 3363.23 samples/sec Loss 1.0701 LearningRate 0.0009 Epoch: 18 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:30,315-Speed 3400.20 samples/sec Loss 1.0569 LearningRate 0.0009 Epoch: 18 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:33,306-Speed 3424.24 samples/sec Loss 1.0379 LearningRate 0.0009 Epoch: 18 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:36,315-Speed 3403.76 samples/sec Loss 1.1457 LearningRate 0.0009 Epoch: 18 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:39,323-Speed 3405.26 samples/sec Loss 0.9907 LearningRate 0.0009 Epoch: 18 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:42,331-Speed 3405.02 samples/sec Loss 1.0226 LearningRate 0.0009 Epoch: 18 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:45,344-Speed 3400.13 samples/sec Loss 1.0609 LearningRate 0.0009 Epoch: 18 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:48,352-Speed 3404.32 samples/sec Loss 1.0062 LearningRate 0.0009 Epoch: 18 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:51,364-Speed 3401.06 samples/sec Loss 0.9895 LearningRate 0.0009 Epoch: 18 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:54,374-Speed 3403.34 samples/sec Loss 1.0796 LearningRate 0.0009 Epoch: 18 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:40:57,382-Speed 3405.34 samples/sec Loss 1.1312 LearningRate 0.0009 Epoch: 18 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:00,391-Speed 3403.65 samples/sec Loss 1.0614 LearningRate 0.0009 Epoch: 18 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:03,408-Speed 3395.41 samples/sec Loss 1.0195 LearningRate 0.0009 Epoch: 18 Global Step: 91560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 08:41:06,405-Speed 3417.56 samples/sec Loss 1.1025 LearningRate 0.0009 Epoch: 18 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:09,417-Speed 3399.90 samples/sec Loss 1.0582 LearningRate 0.0009 Epoch: 18 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:12,426-Speed 3404.02 samples/sec Loss 1.0220 LearningRate 0.0009 Epoch: 18 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:15,434-Speed 3405.83 samples/sec Loss 1.0398 LearningRate 0.0009 Epoch: 18 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:18,469-Speed 3373.84 samples/sec Loss 1.0808 LearningRate 0.0009 Epoch: 18 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:21,481-Speed 3400.77 samples/sec Loss 1.0789 LearningRate 0.0009 Epoch: 18 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:24,494-Speed 3399.36 samples/sec Loss 1.0669 LearningRate 0.0009 Epoch: 18 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:27,530-Speed 3374.11 samples/sec Loss 1.0251 LearningRate 0.0009 Epoch: 18 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:30,546-Speed 3396.89 samples/sec Loss 1.1664 LearningRate 0.0009 Epoch: 18 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:33,561-Speed 3396.84 samples/sec Loss 1.0680 LearningRate 0.0009 Epoch: 18 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:36,568-Speed 3406.51 samples/sec Loss 1.0228 LearningRate 0.0009 Epoch: 18 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:39,603-Speed 3374.63 samples/sec Loss 1.0556 LearningRate 0.0009 Epoch: 18 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:42,618-Speed 3396.65 samples/sec Loss 1.1236 LearningRate 0.0009 Epoch: 18 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:45,629-Speed 3403.58 samples/sec Loss 1.0894 LearningRate 0.0009 Epoch: 18 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:48,643-Speed 3398.16 samples/sec Loss 1.1527 LearningRate 0.0009 Epoch: 18 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:51,661-Speed 3393.02 samples/sec Loss 1.0358 LearningRate 0.0009 Epoch: 18 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:54,680-Speed 3393.42 samples/sec Loss 1.0733 LearningRate 0.0009 Epoch: 18 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:41:57,692-Speed 3400.56 samples/sec Loss 1.0901 LearningRate 0.0009 Epoch: 18 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:00,704-Speed 3400.89 samples/sec Loss 1.0137 LearningRate 0.0009 Epoch: 18 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:03,726-Speed 3388.52 samples/sec Loss 1.0187 LearningRate 0.0009 Epoch: 18 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:06,736-Speed 3402.90 samples/sec Loss 1.1006 LearningRate 0.0009 Epoch: 18 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:09,751-Speed 3397.66 samples/sec Loss 1.1026 LearningRate 0.0009 Epoch: 18 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:12,754-Speed 3410.77 samples/sec Loss 1.0130 LearningRate 0.0009 Epoch: 18 Global Step: 91790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:15,773-Speed 3393.18 samples/sec Loss 1.0269 LearningRate 0.0009 Epoch: 18 Global Step: 91800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:18,789-Speed 3395.92 samples/sec Loss 1.0526 LearningRate 0.0009 Epoch: 18 Global Step: 91810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:21,803-Speed 3398.01 samples/sec Loss 1.1464 LearningRate 0.0009 Epoch: 18 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:24,823-Speed 3392.32 samples/sec Loss 1.0923 LearningRate 0.0009 Epoch: 18 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:27,836-Speed 3399.37 samples/sec Loss 1.0505 LearningRate 0.0008 Epoch: 18 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:30,857-Speed 3390.13 samples/sec Loss 1.1434 LearningRate 0.0008 Epoch: 18 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:33,873-Speed 3396.61 samples/sec Loss 0.9943 LearningRate 0.0008 Epoch: 18 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:36,912-Speed 3370.14 samples/sec Loss 1.0313 LearningRate 0.0008 Epoch: 18 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:39,924-Speed 3400.46 samples/sec Loss 1.0956 LearningRate 0.0008 Epoch: 18 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:42:42,943-Speed 3392.80 samples/sec Loss 1.0525 LearningRate 0.0008 Epoch: 18 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:45,966-Speed 3388.68 samples/sec Loss 1.0257 LearningRate 0.0008 Epoch: 18 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:49,007-Speed 3367.73 samples/sec Loss 1.0717 LearningRate 0.0008 Epoch: 18 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:52,047-Speed 3369.38 samples/sec Loss 1.0583 LearningRate 0.0008 Epoch: 18 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:55,064-Speed 3395.16 samples/sec Loss 1.0084 LearningRate 0.0008 Epoch: 18 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:42:58,075-Speed 3401.51 samples/sec Loss 1.1328 LearningRate 0.0008 Epoch: 18 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:43:01,074-Speed 3415.67 samples/sec Loss 1.0268 LearningRate 0.0008 Epoch: 18 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:43:04,091-Speed 3394.58 samples/sec Loss 1.1136 LearningRate 0.0008 Epoch: 18 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:43:07,114-Speed 3388.11 samples/sec Loss 1.1363 LearningRate 0.0008 Epoch: 18 Global Step: 91970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:43:10,133-Speed 3392.42 samples/sec Loss 1.0882 LearningRate 0.0008 Epoch: 18 Global Step: 91980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:43:13,155-Speed 3389.79 samples/sec Loss 1.1452 LearningRate 0.0008 Epoch: 18 Global Step: 91990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:43:16,187-Speed 3378.09 samples/sec Loss 1.0310 LearningRate 0.0008 Epoch: 18 Global Step: 92000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:44:00,228-[lfw][92000]XNorm: 22.418110 Training: 2022-04-11 08:44:00,229-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:44:00,229-[lfw][92000]Accuracy-Highest: 0.99850 Training: 2022-04-11 08:44:51,478-[cfp_fp][92000]XNorm: 22.511127 Training: 2022-04-11 08:44:51,478-[cfp_fp][92000]Accuracy-Flip: 0.98843+-0.00411 Training: 2022-04-11 08:44:51,479-[cfp_fp][92000]Accuracy-Highest: 0.98857 Training: 2022-04-11 08:45:35,388-[agedb_30][92000]XNorm: 22.807042 Training: 2022-04-11 08:45:35,389-[agedb_30][92000]Accuracy-Flip: 0.98433+-0.00746 Training: 2022-04-11 08:45:35,389-[agedb_30][92000]Accuracy-Highest: 0.98550 Training: 2022-04-11 08:45:38,383-Speed 72.01 samples/sec Loss 1.1208 LearningRate 0.0008 Epoch: 18 Global Step: 92010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:45:41,379-Speed 3418.78 samples/sec Loss 1.1054 LearningRate 0.0008 Epoch: 18 Global Step: 92020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:45:44,372-Speed 3422.27 samples/sec Loss 1.0292 LearningRate 0.0008 Epoch: 18 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:45:47,360-Speed 3427.72 samples/sec Loss 1.0697 LearningRate 0.0008 Epoch: 18 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:45:50,365-Speed 3409.16 samples/sec Loss 1.0968 LearningRate 0.0008 Epoch: 18 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:45:53,362-Speed 3418.20 samples/sec Loss 1.1078 LearningRate 0.0008 Epoch: 18 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:45:56,356-Speed 3420.53 samples/sec Loss 1.1083 LearningRate 0.0008 Epoch: 18 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:45:59,348-Speed 3423.48 samples/sec Loss 1.0593 LearningRate 0.0008 Epoch: 18 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:02,349-Speed 3412.25 samples/sec Loss 1.0158 LearningRate 0.0008 Epoch: 18 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:05,346-Speed 3418.09 samples/sec Loss 1.0475 LearningRate 0.0008 Epoch: 18 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:08,345-Speed 3415.42 samples/sec Loss 1.1007 LearningRate 0.0008 Epoch: 18 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:11,344-Speed 3414.68 samples/sec Loss 1.0266 LearningRate 0.0008 Epoch: 18 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:14,352-Speed 3405.35 samples/sec Loss 1.0423 LearningRate 0.0008 Epoch: 18 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:17,351-Speed 3416.61 samples/sec Loss 1.1103 LearningRate 0.0008 Epoch: 18 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:20,336-Speed 3430.91 samples/sec Loss 1.0137 LearningRate 0.0008 Epoch: 18 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:23,337-Speed 3413.47 samples/sec Loss 1.0261 LearningRate 0.0008 Epoch: 18 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:26,340-Speed 3410.39 samples/sec Loss 1.0407 LearningRate 0.0008 Epoch: 18 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:29,361-Speed 3389.57 samples/sec Loss 1.1007 LearningRate 0.0008 Epoch: 18 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:32,474-Speed 3290.57 samples/sec Loss 1.0975 LearningRate 0.0008 Epoch: 18 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:35,486-Speed 3400.62 samples/sec Loss 1.0782 LearningRate 0.0008 Epoch: 18 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:38,491-Speed 3408.78 samples/sec Loss 1.1171 LearningRate 0.0008 Epoch: 18 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:41,504-Speed 3399.28 samples/sec Loss 1.1254 LearningRate 0.0008 Epoch: 18 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:44,525-Speed 3391.31 samples/sec Loss 1.0952 LearningRate 0.0008 Epoch: 18 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:47,533-Speed 3405.15 samples/sec Loss 1.0783 LearningRate 0.0008 Epoch: 18 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:50,544-Speed 3400.93 samples/sec Loss 1.0669 LearningRate 0.0008 Epoch: 18 Global Step: 92250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 08:46:53,541-Speed 3418.15 samples/sec Loss 1.0970 LearningRate 0.0008 Epoch: 18 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:56,548-Speed 3405.81 samples/sec Loss 1.1022 LearningRate 0.0008 Epoch: 18 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:46:59,555-Speed 3407.04 samples/sec Loss 1.0709 LearningRate 0.0008 Epoch: 18 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:02,567-Speed 3401.48 samples/sec Loss 1.0356 LearningRate 0.0008 Epoch: 18 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:05,624-Speed 3349.49 samples/sec Loss 1.0834 LearningRate 0.0008 Epoch: 18 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:08,639-Speed 3397.90 samples/sec Loss 1.1275 LearningRate 0.0008 Epoch: 18 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:11,670-Speed 3379.60 samples/sec Loss 0.9912 LearningRate 0.0008 Epoch: 18 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:14,719-Speed 3358.62 samples/sec Loss 1.1413 LearningRate 0.0008 Epoch: 18 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:17,726-Speed 3406.82 samples/sec Loss 1.0996 LearningRate 0.0008 Epoch: 18 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:20,732-Speed 3407.78 samples/sec Loss 1.0553 LearningRate 0.0008 Epoch: 18 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:23,714-Speed 3434.04 samples/sec Loss 1.0898 LearningRate 0.0008 Epoch: 18 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:26,719-Speed 3408.51 samples/sec Loss 1.1094 LearningRate 0.0008 Epoch: 18 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:29,724-Speed 3409.20 samples/sec Loss 1.0688 LearningRate 0.0008 Epoch: 18 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:32,724-Speed 3414.38 samples/sec Loss 1.0121 LearningRate 0.0008 Epoch: 18 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:35,726-Speed 3410.84 samples/sec Loss 1.1595 LearningRate 0.0007 Epoch: 18 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:38,745-Speed 3393.46 samples/sec Loss 1.1237 LearningRate 0.0007 Epoch: 18 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:41,756-Speed 3401.48 samples/sec Loss 1.1628 LearningRate 0.0007 Epoch: 18 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:44,789-Speed 3377.68 samples/sec Loss 1.0900 LearningRate 0.0007 Epoch: 18 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:47,789-Speed 3413.91 samples/sec Loss 1.0153 LearningRate 0.0007 Epoch: 18 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:50,838-Speed 3359.18 samples/sec Loss 1.1235 LearningRate 0.0007 Epoch: 18 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:53,837-Speed 3415.98 samples/sec Loss 1.1016 LearningRate 0.0007 Epoch: 18 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:56,839-Speed 3412.08 samples/sec Loss 1.1104 LearningRate 0.0007 Epoch: 18 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:47:59,845-Speed 3406.87 samples/sec Loss 1.0812 LearningRate 0.0007 Epoch: 18 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:02,845-Speed 3414.34 samples/sec Loss 1.1353 LearningRate 0.0007 Epoch: 18 Global Step: 92490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:05,872-Speed 3383.97 samples/sec Loss 1.0978 LearningRate 0.0007 Epoch: 18 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:08,871-Speed 3415.10 samples/sec Loss 1.1017 LearningRate 0.0007 Epoch: 18 Global Step: 92510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:11,885-Speed 3398.62 samples/sec Loss 1.0565 LearningRate 0.0007 Epoch: 18 Global Step: 92520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:14,888-Speed 3411.23 samples/sec Loss 1.1263 LearningRate 0.0007 Epoch: 18 Global Step: 92530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:17,909-Speed 3390.45 samples/sec Loss 1.0936 LearningRate 0.0007 Epoch: 18 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:20,912-Speed 3411.02 samples/sec Loss 1.0052 LearningRate 0.0007 Epoch: 18 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:23,919-Speed 3405.72 samples/sec Loss 1.0323 LearningRate 0.0007 Epoch: 18 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:26,924-Speed 3408.90 samples/sec Loss 1.0708 LearningRate 0.0007 Epoch: 18 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:29,992-Speed 3338.70 samples/sec Loss 1.0501 LearningRate 0.0007 Epoch: 18 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:33,042-Speed 3357.61 samples/sec Loss 1.1311 LearningRate 0.0007 Epoch: 18 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:36,071-Speed 3381.31 samples/sec Loss 1.1357 LearningRate 0.0007 Epoch: 18 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:39,139-Speed 3339.53 samples/sec Loss 1.1133 LearningRate 0.0007 Epoch: 18 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:42,144-Speed 3408.74 samples/sec Loss 1.0110 LearningRate 0.0007 Epoch: 18 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:45,146-Speed 3411.72 samples/sec Loss 1.0730 LearningRate 0.0007 Epoch: 18 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:48,155-Speed 3404.22 samples/sec Loss 1.1162 LearningRate 0.0007 Epoch: 18 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:51,165-Speed 3402.86 samples/sec Loss 1.0262 LearningRate 0.0007 Epoch: 18 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:54,159-Speed 3420.94 samples/sec Loss 1.0274 LearningRate 0.0007 Epoch: 18 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:48:57,162-Speed 3410.03 samples/sec Loss 1.1651 LearningRate 0.0007 Epoch: 18 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:00,168-Speed 3407.34 samples/sec Loss 1.1015 LearningRate 0.0007 Epoch: 18 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:03,182-Speed 3399.36 samples/sec Loss 1.1103 LearningRate 0.0007 Epoch: 18 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:06,189-Speed 3406.01 samples/sec Loss 1.0455 LearningRate 0.0007 Epoch: 18 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:09,193-Speed 3409.57 samples/sec Loss 1.0930 LearningRate 0.0007 Epoch: 18 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:12,192-Speed 3415.16 samples/sec Loss 1.0891 LearningRate 0.0007 Epoch: 18 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:15,203-Speed 3402.58 samples/sec Loss 1.1863 LearningRate 0.0007 Epoch: 18 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:18,208-Speed 3408.23 samples/sec Loss 1.1439 LearningRate 0.0007 Epoch: 18 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:21,214-Speed 3407.42 samples/sec Loss 1.0417 LearningRate 0.0007 Epoch: 18 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:24,203-Speed 3426.22 samples/sec Loss 1.0678 LearningRate 0.0007 Epoch: 18 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:27,215-Speed 3400.78 samples/sec Loss 1.0942 LearningRate 0.0007 Epoch: 18 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:30,234-Speed 3392.36 samples/sec Loss 1.1281 LearningRate 0.0007 Epoch: 18 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:33,242-Speed 3405.09 samples/sec Loss 1.1663 LearningRate 0.0007 Epoch: 18 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:36,253-Speed 3402.78 samples/sec Loss 1.0605 LearningRate 0.0007 Epoch: 18 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:39,287-Speed 3375.90 samples/sec Loss 1.1155 LearningRate 0.0007 Epoch: 18 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:42,295-Speed 3404.82 samples/sec Loss 1.1226 LearningRate 0.0007 Epoch: 18 Global Step: 92820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:45,312-Speed 3394.58 samples/sec Loss 1.1116 LearningRate 0.0007 Epoch: 18 Global Step: 92830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:48,319-Speed 3406.54 samples/sec Loss 1.2150 LearningRate 0.0007 Epoch: 18 Global Step: 92840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:51,348-Speed 3381.43 samples/sec Loss 1.1361 LearningRate 0.0007 Epoch: 18 Global Step: 92850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:54,334-Speed 3430.30 samples/sec Loss 1.0220 LearningRate 0.0007 Epoch: 18 Global Step: 92860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:49:57,346-Speed 3400.25 samples/sec Loss 1.1104 LearningRate 0.0007 Epoch: 18 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:50:00,360-Speed 3399.12 samples/sec Loss 1.1226 LearningRate 0.0007 Epoch: 18 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:50:03,354-Speed 3421.19 samples/sec Loss 1.0486 LearningRate 0.0007 Epoch: 18 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:06,382-Speed 3382.48 samples/sec Loss 1.1574 LearningRate 0.0007 Epoch: 18 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:09,387-Speed 3409.16 samples/sec Loss 1.1596 LearningRate 0.0007 Epoch: 18 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:12,391-Speed 3409.63 samples/sec Loss 1.0304 LearningRate 0.0007 Epoch: 18 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:15,418-Speed 3383.04 samples/sec Loss 1.0609 LearningRate 0.0007 Epoch: 18 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:18,455-Speed 3373.09 samples/sec Loss 1.1039 LearningRate 0.0007 Epoch: 18 Global Step: 92940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:21,468-Speed 3399.00 samples/sec Loss 1.0724 LearningRate 0.0007 Epoch: 18 Global Step: 92950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:24,476-Speed 3404.78 samples/sec Loss 1.0200 LearningRate 0.0007 Epoch: 18 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:27,490-Speed 3398.66 samples/sec Loss 1.0454 LearningRate 0.0007 Epoch: 18 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:30,505-Speed 3396.91 samples/sec Loss 1.0981 LearningRate 0.0007 Epoch: 18 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:33,524-Speed 3394.05 samples/sec Loss 1.0929 LearningRate 0.0007 Epoch: 18 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:50:36,542-Speed 3392.87 samples/sec Loss 1.0245 LearningRate 0.0007 Epoch: 18 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:39,552-Speed 3403.02 samples/sec Loss 1.1445 LearningRate 0.0006 Epoch: 18 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:42,567-Speed 3397.18 samples/sec Loss 1.0291 LearningRate 0.0006 Epoch: 18 Global Step: 93020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:45,572-Speed 3408.23 samples/sec Loss 1.1055 LearningRate 0.0006 Epoch: 18 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:48,605-Speed 3377.71 samples/sec Loss 1.1606 LearningRate 0.0006 Epoch: 18 Global Step: 93040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:51,611-Speed 3406.96 samples/sec Loss 1.0666 LearningRate 0.0006 Epoch: 18 Global Step: 93050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:54,630-Speed 3393.15 samples/sec Loss 1.0951 LearningRate 0.0006 Epoch: 18 Global Step: 93060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:50:57,634-Speed 3409.87 samples/sec Loss 1.0670 LearningRate 0.0006 Epoch: 18 Global Step: 93070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:00,657-Speed 3388.10 samples/sec Loss 1.1526 LearningRate 0.0006 Epoch: 18 Global Step: 93080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:03,671-Speed 3398.12 samples/sec Loss 1.0836 LearningRate 0.0006 Epoch: 18 Global Step: 93090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:06,684-Speed 3399.23 samples/sec Loss 1.0538 LearningRate 0.0006 Epoch: 18 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:51:09,694-Speed 3403.44 samples/sec Loss 1.0553 LearningRate 0.0006 Epoch: 18 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:51:12,699-Speed 3408.30 samples/sec Loss 1.0296 LearningRate 0.0006 Epoch: 18 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:51:15,691-Speed 3423.71 samples/sec Loss 1.0784 LearningRate 0.0006 Epoch: 18 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:18,702-Speed 3401.73 samples/sec Loss 1.0885 LearningRate 0.0006 Epoch: 18 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:21,710-Speed 3404.64 samples/sec Loss 1.1433 LearningRate 0.0006 Epoch: 18 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:24,730-Speed 3391.66 samples/sec Loss 1.0329 LearningRate 0.0006 Epoch: 18 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:27,800-Speed 3336.17 samples/sec Loss 1.1257 LearningRate 0.0006 Epoch: 18 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:30,850-Speed 3357.79 samples/sec Loss 1.0253 LearningRate 0.0006 Epoch: 18 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:33,863-Speed 3400.45 samples/sec Loss 1.0354 LearningRate 0.0006 Epoch: 18 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:36,875-Speed 3400.17 samples/sec Loss 1.0358 LearningRate 0.0006 Epoch: 18 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:39,892-Speed 3395.78 samples/sec Loss 1.0752 LearningRate 0.0006 Epoch: 18 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:42,898-Speed 3407.51 samples/sec Loss 1.1800 LearningRate 0.0006 Epoch: 18 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:45,911-Speed 3398.98 samples/sec Loss 1.0752 LearningRate 0.0006 Epoch: 18 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:51:48,907-Speed 3418.32 samples/sec Loss 1.0519 LearningRate 0.0006 Epoch: 18 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:51,920-Speed 3399.43 samples/sec Loss 1.0169 LearningRate 0.0006 Epoch: 18 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:54,944-Speed 3387.68 samples/sec Loss 1.0587 LearningRate 0.0006 Epoch: 18 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:51:58,018-Speed 3331.55 samples/sec Loss 1.1472 LearningRate 0.0006 Epoch: 18 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:01,037-Speed 3392.94 samples/sec Loss 1.1452 LearningRate 0.0006 Epoch: 18 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:04,073-Speed 3374.21 samples/sec Loss 1.1250 LearningRate 0.0006 Epoch: 18 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:07,085-Speed 3400.12 samples/sec Loss 1.0450 LearningRate 0.0006 Epoch: 18 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:10,094-Speed 3404.30 samples/sec Loss 1.0845 LearningRate 0.0006 Epoch: 18 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:13,115-Speed 3390.75 samples/sec Loss 1.0849 LearningRate 0.0006 Epoch: 18 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:16,129-Speed 3397.63 samples/sec Loss 1.1025 LearningRate 0.0006 Epoch: 18 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:19,153-Speed 3387.71 samples/sec Loss 1.0763 LearningRate 0.0006 Epoch: 18 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:52:22,165-Speed 3400.67 samples/sec Loss 1.0712 LearningRate 0.0006 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:52:25,200-Speed 3374.76 samples/sec Loss 1.1174 LearningRate 0.0006 Epoch: 18 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:52:28,293-Speed 3311.12 samples/sec Loss 1.1442 LearningRate 0.0006 Epoch: 18 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:31,323-Speed 3380.38 samples/sec Loss 1.0697 LearningRate 0.0006 Epoch: 18 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:34,347-Speed 3387.81 samples/sec Loss 0.9622 LearningRate 0.0006 Epoch: 18 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:37,365-Speed 3393.87 samples/sec Loss 1.0818 LearningRate 0.0006 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:40,396-Speed 3379.97 samples/sec Loss 1.0984 LearningRate 0.0006 Epoch: 18 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:43,406-Speed 3402.33 samples/sec Loss 1.1497 LearningRate 0.0006 Epoch: 18 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:46,420-Speed 3398.43 samples/sec Loss 1.1048 LearningRate 0.0006 Epoch: 18 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:49,436-Speed 3395.31 samples/sec Loss 1.1039 LearningRate 0.0006 Epoch: 18 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:52,467-Speed 3379.89 samples/sec Loss 1.0861 LearningRate 0.0006 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:55,474-Speed 3406.01 samples/sec Loss 1.1012 LearningRate 0.0006 Epoch: 18 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:52:58,485-Speed 3402.20 samples/sec Loss 1.0056 LearningRate 0.0006 Epoch: 18 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:01,499-Speed 3397.79 samples/sec Loss 1.0504 LearningRate 0.0006 Epoch: 18 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:04,512-Speed 3399.72 samples/sec Loss 1.1256 LearningRate 0.0006 Epoch: 18 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:07,525-Speed 3399.91 samples/sec Loss 1.0048 LearningRate 0.0006 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:10,534-Speed 3404.19 samples/sec Loss 1.1009 LearningRate 0.0006 Epoch: 18 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:13,561-Speed 3383.88 samples/sec Loss 1.1852 LearningRate 0.0006 Epoch: 18 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:16,576-Speed 3396.23 samples/sec Loss 1.1244 LearningRate 0.0006 Epoch: 18 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:19,591-Speed 3397.25 samples/sec Loss 1.1234 LearningRate 0.0006 Epoch: 18 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:22,606-Speed 3398.02 samples/sec Loss 1.0713 LearningRate 0.0006 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:25,618-Speed 3399.93 samples/sec Loss 1.1481 LearningRate 0.0006 Epoch: 18 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:28,614-Speed 3418.86 samples/sec Loss 1.1240 LearningRate 0.0006 Epoch: 18 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:31,635-Speed 3390.50 samples/sec Loss 1.0822 LearningRate 0.0006 Epoch: 18 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:34,651-Speed 3396.42 samples/sec Loss 1.0139 LearningRate 0.0006 Epoch: 18 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:37,678-Speed 3384.06 samples/sec Loss 1.1199 LearningRate 0.0006 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:40,690-Speed 3399.77 samples/sec Loss 1.0980 LearningRate 0.0006 Epoch: 18 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:43,737-Speed 3361.49 samples/sec Loss 1.0713 LearningRate 0.0006 Epoch: 18 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:53:46,731-Speed 3421.01 samples/sec Loss 1.0913 LearningRate 0.0006 Epoch: 18 Global Step: 93630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:53:49,754-Speed 3388.60 samples/sec Loss 1.0853 LearningRate 0.0006 Epoch: 18 Global Step: 93640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:53:52,814-Speed 3346.62 samples/sec Loss 1.1123 LearningRate 0.0006 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:53:55,863-Speed 3359.31 samples/sec Loss 1.0972 LearningRate 0.0005 Epoch: 18 Global Step: 93660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:53:58,879-Speed 3396.37 samples/sec Loss 1.0730 LearningRate 0.0005 Epoch: 18 Global Step: 93670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:01,906-Speed 3384.29 samples/sec Loss 1.0576 LearningRate 0.0005 Epoch: 18 Global Step: 93680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:04,923-Speed 3394.34 samples/sec Loss 1.0689 LearningRate 0.0005 Epoch: 18 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:07,939-Speed 3396.18 samples/sec Loss 1.0319 LearningRate 0.0005 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:10,962-Speed 3388.85 samples/sec Loss 1.1535 LearningRate 0.0005 Epoch: 18 Global Step: 93710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:13,975-Speed 3399.75 samples/sec Loss 1.1542 LearningRate 0.0005 Epoch: 18 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:54:16,988-Speed 3399.06 samples/sec Loss 1.1187 LearningRate 0.0005 Epoch: 18 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:20,016-Speed 3382.49 samples/sec Loss 1.1310 LearningRate 0.0005 Epoch: 18 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:23,036-Speed 3391.93 samples/sec Loss 1.0347 LearningRate 0.0005 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:26,047-Speed 3401.05 samples/sec Loss 1.0592 LearningRate 0.0005 Epoch: 18 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:29,068-Speed 3391.52 samples/sec Loss 1.1830 LearningRate 0.0005 Epoch: 18 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:32,079-Speed 3401.17 samples/sec Loss 1.0585 LearningRate 0.0005 Epoch: 18 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:35,104-Speed 3386.28 samples/sec Loss 1.0703 LearningRate 0.0005 Epoch: 18 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:38,125-Speed 3390.28 samples/sec Loss 1.0487 LearningRate 0.0005 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:41,135-Speed 3402.13 samples/sec Loss 1.0986 LearningRate 0.0005 Epoch: 18 Global Step: 93810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:44,149-Speed 3399.50 samples/sec Loss 1.1521 LearningRate 0.0005 Epoch: 18 Global Step: 93820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:47,165-Speed 3395.48 samples/sec Loss 1.1350 LearningRate 0.0005 Epoch: 18 Global Step: 93830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 08:54:50,164-Speed 3414.58 samples/sec Loss 1.0854 LearningRate 0.0005 Epoch: 18 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:53,178-Speed 3399.23 samples/sec Loss 1.0855 LearningRate 0.0005 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:56,191-Speed 3398.89 samples/sec Loss 1.0246 LearningRate 0.0005 Epoch: 18 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:54:59,213-Speed 3389.67 samples/sec Loss 1.1095 LearningRate 0.0005 Epoch: 18 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:55:02,229-Speed 3396.38 samples/sec Loss 1.0726 LearningRate 0.0005 Epoch: 18 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:55:05,232-Speed 3410.69 samples/sec Loss 1.1101 LearningRate 0.0005 Epoch: 18 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:08,250-Speed 3394.11 samples/sec Loss 1.0552 LearningRate 0.0005 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:11,275-Speed 3386.01 samples/sec Loss 1.1025 LearningRate 0.0005 Epoch: 18 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:14,296-Speed 3390.25 samples/sec Loss 1.1374 LearningRate 0.0005 Epoch: 18 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:17,310-Speed 3397.91 samples/sec Loss 1.0834 LearningRate 0.0005 Epoch: 18 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:20,322-Speed 3400.42 samples/sec Loss 1.0702 LearningRate 0.0005 Epoch: 18 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:23,337-Speed 3397.75 samples/sec Loss 1.0185 LearningRate 0.0005 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:26,401-Speed 3342.48 samples/sec Loss 1.1813 LearningRate 0.0005 Epoch: 18 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:29,553-Speed 3249.41 samples/sec Loss 1.0647 LearningRate 0.0005 Epoch: 18 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:32,595-Speed 3367.65 samples/sec Loss 1.1892 LearningRate 0.0005 Epoch: 18 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:55:35,653-Speed 3349.11 samples/sec Loss 1.1244 LearningRate 0.0005 Epoch: 18 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:55:38,675-Speed 3389.61 samples/sec Loss 1.0864 LearningRate 0.0005 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:56:22,926-[lfw][94000]XNorm: 21.991098 Training: 2022-04-11 08:56:22,927-[lfw][94000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:56:22,927-[lfw][94000]Accuracy-Highest: 0.99850 Training: 2022-04-11 08:57:14,213-[cfp_fp][94000]XNorm: 22.270706 Training: 2022-04-11 08:57:14,214-[cfp_fp][94000]Accuracy-Flip: 0.98857+-0.00447 Training: 2022-04-11 08:57:14,214-[cfp_fp][94000]Accuracy-Highest: 0.98857 Training: 2022-04-11 08:57:58,341-[agedb_30][94000]XNorm: 22.440194 Training: 2022-04-11 08:57:58,342-[agedb_30][94000]Accuracy-Flip: 0.98417+-0.00712 Training: 2022-04-11 08:57:58,343-[agedb_30][94000]Accuracy-Highest: 0.98550 Training: 2022-04-11 08:58:01,358-Speed 71.77 samples/sec Loss 1.0852 LearningRate 0.0005 Epoch: 18 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:04,354-Speed 3418.23 samples/sec Loss 1.0973 LearningRate 0.0005 Epoch: 18 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:07,343-Speed 3426.87 samples/sec Loss 1.1198 LearningRate 0.0005 Epoch: 18 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:10,337-Speed 3421.33 samples/sec Loss 1.1165 LearningRate 0.0005 Epoch: 18 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:13,348-Speed 3400.73 samples/sec Loss 1.1465 LearningRate 0.0005 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:16,360-Speed 3401.67 samples/sec Loss 1.1132 LearningRate 0.0005 Epoch: 18 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:19,354-Speed 3421.33 samples/sec Loss 1.0779 LearningRate 0.0005 Epoch: 18 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:22,353-Speed 3414.79 samples/sec Loss 1.0929 LearningRate 0.0005 Epoch: 18 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:25,336-Speed 3433.69 samples/sec Loss 1.1390 LearningRate 0.0005 Epoch: 18 Global Step: 94090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:28,337-Speed 3412.92 samples/sec Loss 1.0897 LearningRate 0.0005 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:31,356-Speed 3392.94 samples/sec Loss 1.2061 LearningRate 0.0005 Epoch: 18 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:34,350-Speed 3420.56 samples/sec Loss 1.1090 LearningRate 0.0005 Epoch: 18 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:37,351-Speed 3413.11 samples/sec Loss 1.1085 LearningRate 0.0005 Epoch: 18 Global Step: 94130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:40,351-Speed 3414.29 samples/sec Loss 1.0667 LearningRate 0.0005 Epoch: 18 Global Step: 94140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:43,363-Speed 3400.60 samples/sec Loss 1.1059 LearningRate 0.0005 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:58:46,349-Speed 3430.68 samples/sec Loss 1.0419 LearningRate 0.0005 Epoch: 18 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:58:49,363-Speed 3398.77 samples/sec Loss 1.1155 LearningRate 0.0005 Epoch: 18 Global Step: 94170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:58:52,368-Speed 3408.07 samples/sec Loss 1.0858 LearningRate 0.0005 Epoch: 18 Global Step: 94180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:58:55,371-Speed 3410.58 samples/sec Loss 1.0501 LearningRate 0.0005 Epoch: 18 Global Step: 94190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:58:58,388-Speed 3394.50 samples/sec Loss 1.0876 LearningRate 0.0005 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:01,393-Speed 3408.67 samples/sec Loss 1.1037 LearningRate 0.0005 Epoch: 18 Global Step: 94210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:04,419-Speed 3385.43 samples/sec Loss 1.1708 LearningRate 0.0005 Epoch: 18 Global Step: 94220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:07,456-Speed 3372.34 samples/sec Loss 1.0718 LearningRate 0.0005 Epoch: 18 Global Step: 94230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:10,460-Speed 3409.17 samples/sec Loss 1.0224 LearningRate 0.0005 Epoch: 18 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:13,465-Speed 3409.15 samples/sec Loss 1.0494 LearningRate 0.0005 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 08:59:16,478-Speed 3399.17 samples/sec Loss 1.1161 LearningRate 0.0005 Epoch: 18 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:19,481-Speed 3411.66 samples/sec Loss 1.1872 LearningRate 0.0005 Epoch: 18 Global Step: 94270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:22,490-Speed 3404.29 samples/sec Loss 1.0669 LearningRate 0.0005 Epoch: 18 Global Step: 94280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:25,493-Speed 3410.76 samples/sec Loss 1.1542 LearningRate 0.0005 Epoch: 18 Global Step: 94290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:28,556-Speed 3343.15 samples/sec Loss 1.0222 LearningRate 0.0005 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:31,570-Speed 3398.45 samples/sec Loss 1.0257 LearningRate 0.0005 Epoch: 18 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:34,576-Speed 3407.24 samples/sec Loss 1.0857 LearningRate 0.0005 Epoch: 18 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:37,580-Speed 3409.37 samples/sec Loss 1.0495 LearningRate 0.0005 Epoch: 18 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:40,596-Speed 3396.56 samples/sec Loss 1.0841 LearningRate 0.0005 Epoch: 18 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:43,598-Speed 3412.82 samples/sec Loss 1.1411 LearningRate 0.0005 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:46,583-Speed 3430.82 samples/sec Loss 1.1359 LearningRate 0.0005 Epoch: 18 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:49,641-Speed 3350.20 samples/sec Loss 1.1040 LearningRate 0.0005 Epoch: 18 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:52,669-Speed 3381.82 samples/sec Loss 1.0655 LearningRate 0.0004 Epoch: 18 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:55,676-Speed 3407.33 samples/sec Loss 1.1302 LearningRate 0.0004 Epoch: 18 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 08:59:58,684-Speed 3404.81 samples/sec Loss 1.0481 LearningRate 0.0004 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:01,710-Speed 3383.81 samples/sec Loss 1.0145 LearningRate 0.0004 Epoch: 18 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:04,738-Speed 3383.85 samples/sec Loss 1.1004 LearningRate 0.0004 Epoch: 18 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:07,751-Speed 3399.14 samples/sec Loss 1.1067 LearningRate 0.0004 Epoch: 18 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:10,771-Speed 3392.27 samples/sec Loss 1.0695 LearningRate 0.0004 Epoch: 18 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:13,782-Speed 3401.50 samples/sec Loss 1.0045 LearningRate 0.0004 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:17,025-Speed 3157.35 samples/sec Loss 1.1115 LearningRate 0.0004 Epoch: 18 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:20,040-Speed 3398.01 samples/sec Loss 1.0456 LearningRate 0.0004 Epoch: 18 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:23,044-Speed 3409.33 samples/sec Loss 1.0858 LearningRate 0.0004 Epoch: 18 Global Step: 94480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:26,053-Speed 3403.83 samples/sec Loss 1.1401 LearningRate 0.0004 Epoch: 18 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:29,064-Speed 3401.91 samples/sec Loss 0.9909 LearningRate 0.0004 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:32,076-Speed 3400.30 samples/sec Loss 1.0631 LearningRate 0.0004 Epoch: 18 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:35,096-Speed 3392.25 samples/sec Loss 1.1585 LearningRate 0.0004 Epoch: 18 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:38,137-Speed 3368.20 samples/sec Loss 1.0857 LearningRate 0.0004 Epoch: 18 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:41,150-Speed 3399.13 samples/sec Loss 1.0143 LearningRate 0.0004 Epoch: 18 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:44,163-Speed 3399.46 samples/sec Loss 1.1321 LearningRate 0.0004 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:47,153-Speed 3426.43 samples/sec Loss 1.1034 LearningRate 0.0004 Epoch: 18 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:50,160-Speed 3406.68 samples/sec Loss 1.0542 LearningRate 0.0004 Epoch: 18 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:53,170-Speed 3402.97 samples/sec Loss 1.0887 LearningRate 0.0004 Epoch: 18 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:56,181-Speed 3402.29 samples/sec Loss 1.0372 LearningRate 0.0004 Epoch: 18 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:00:59,171-Speed 3424.81 samples/sec Loss 1.1624 LearningRate 0.0004 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:02,196-Speed 3385.76 samples/sec Loss 1.1158 LearningRate 0.0004 Epoch: 18 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:05,271-Speed 3331.87 samples/sec Loss 1.0864 LearningRate 0.0004 Epoch: 18 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:08,301-Speed 3380.15 samples/sec Loss 1.0384 LearningRate 0.0004 Epoch: 18 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:11,320-Speed 3393.52 samples/sec Loss 1.0874 LearningRate 0.0004 Epoch: 18 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:14,333-Speed 3399.06 samples/sec Loss 1.0803 LearningRate 0.0004 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:17,348-Speed 3397.47 samples/sec Loss 1.0294 LearningRate 0.0004 Epoch: 18 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:20,355-Speed 3405.58 samples/sec Loss 1.0677 LearningRate 0.0004 Epoch: 18 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:23,367-Speed 3400.38 samples/sec Loss 1.0963 LearningRate 0.0004 Epoch: 18 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:26,429-Speed 3345.75 samples/sec Loss 1.0727 LearningRate 0.0004 Epoch: 18 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:01:29,459-Speed 3379.60 samples/sec Loss 1.0169 LearningRate 0.0004 Epoch: 18 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:32,473-Speed 3398.87 samples/sec Loss 1.0862 LearningRate 0.0004 Epoch: 18 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:35,492-Speed 3393.24 samples/sec Loss 1.0989 LearningRate 0.0004 Epoch: 18 Global Step: 94720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:38,516-Speed 3386.84 samples/sec Loss 1.0876 LearningRate 0.0004 Epoch: 18 Global Step: 94730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:41,542-Speed 3385.84 samples/sec Loss 1.1147 LearningRate 0.0004 Epoch: 18 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:44,554-Speed 3400.63 samples/sec Loss 1.0732 LearningRate 0.0004 Epoch: 18 Global Step: 94750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:47,568-Speed 3398.10 samples/sec Loss 1.0144 LearningRate 0.0004 Epoch: 18 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:50,589-Speed 3390.66 samples/sec Loss 1.0253 LearningRate 0.0004 Epoch: 18 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:53,609-Speed 3391.38 samples/sec Loss 1.1077 LearningRate 0.0004 Epoch: 18 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:56,620-Speed 3401.71 samples/sec Loss 1.0091 LearningRate 0.0004 Epoch: 18 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:01:59,608-Speed 3427.36 samples/sec Loss 1.0608 LearningRate 0.0004 Epoch: 18 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:02,619-Speed 3401.22 samples/sec Loss 1.1146 LearningRate 0.0004 Epoch: 18 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:05,635-Speed 3397.62 samples/sec Loss 1.1705 LearningRate 0.0004 Epoch: 18 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:08,644-Speed 3404.40 samples/sec Loss 1.0856 LearningRate 0.0004 Epoch: 18 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:11,657-Speed 3399.90 samples/sec Loss 1.0748 LearningRate 0.0004 Epoch: 18 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:14,666-Speed 3403.14 samples/sec Loss 1.0086 LearningRate 0.0004 Epoch: 18 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:17,679-Speed 3399.66 samples/sec Loss 0.9590 LearningRate 0.0004 Epoch: 18 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:20,692-Speed 3400.00 samples/sec Loss 1.1557 LearningRate 0.0004 Epoch: 18 Global Step: 94870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:23,712-Speed 3391.33 samples/sec Loss 1.0812 LearningRate 0.0004 Epoch: 18 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:26,733-Speed 3390.77 samples/sec Loss 1.0398 LearningRate 0.0004 Epoch: 18 Global Step: 94890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:02:29,743-Speed 3402.46 samples/sec Loss 1.0868 LearningRate 0.0004 Epoch: 18 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:32,765-Speed 3388.75 samples/sec Loss 1.0765 LearningRate 0.0004 Epoch: 18 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:35,779-Speed 3399.35 samples/sec Loss 1.0913 LearningRate 0.0004 Epoch: 18 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:38,824-Speed 3363.64 samples/sec Loss 1.0335 LearningRate 0.0004 Epoch: 18 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:41,854-Speed 3380.08 samples/sec Loss 1.0154 LearningRate 0.0004 Epoch: 18 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:44,868-Speed 3399.46 samples/sec Loss 1.0505 LearningRate 0.0004 Epoch: 18 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:47,884-Speed 3395.41 samples/sec Loss 1.1120 LearningRate 0.0004 Epoch: 18 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:50,895-Speed 3402.16 samples/sec Loss 1.0129 LearningRate 0.0004 Epoch: 18 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:53,921-Speed 3384.03 samples/sec Loss 1.0371 LearningRate 0.0004 Epoch: 18 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:56,948-Speed 3383.65 samples/sec Loss 1.0915 LearningRate 0.0004 Epoch: 18 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:02:59,949-Speed 3413.72 samples/sec Loss 1.0145 LearningRate 0.0004 Epoch: 18 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:02,970-Speed 3390.69 samples/sec Loss 1.0955 LearningRate 0.0004 Epoch: 18 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:05,992-Speed 3390.42 samples/sec Loss 1.0316 LearningRate 0.0004 Epoch: 18 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:09,001-Speed 3403.08 samples/sec Loss 1.0601 LearningRate 0.0004 Epoch: 18 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:12,022-Speed 3391.04 samples/sec Loss 1.0733 LearningRate 0.0004 Epoch: 18 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:15,036-Speed 3398.25 samples/sec Loss 1.0653 LearningRate 0.0004 Epoch: 18 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:18,060-Speed 3387.04 samples/sec Loss 1.0970 LearningRate 0.0004 Epoch: 18 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:21,080-Speed 3391.98 samples/sec Loss 1.0245 LearningRate 0.0004 Epoch: 18 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:24,113-Speed 3376.75 samples/sec Loss 1.1493 LearningRate 0.0004 Epoch: 18 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:27,131-Speed 3392.92 samples/sec Loss 1.0852 LearningRate 0.0004 Epoch: 18 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:30,133-Speed 3412.74 samples/sec Loss 1.0294 LearningRate 0.0004 Epoch: 18 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:33,145-Speed 3401.04 samples/sec Loss 1.1876 LearningRate 0.0004 Epoch: 18 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:36,168-Speed 3388.04 samples/sec Loss 1.1260 LearningRate 0.0004 Epoch: 18 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:39,185-Speed 3394.79 samples/sec Loss 1.1275 LearningRate 0.0004 Epoch: 18 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:42,199-Speed 3398.47 samples/sec Loss 1.0762 LearningRate 0.0004 Epoch: 18 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:45,210-Speed 3401.89 samples/sec Loss 1.0620 LearningRate 0.0004 Epoch: 18 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:48,223-Speed 3398.73 samples/sec Loss 1.1781 LearningRate 0.0004 Epoch: 18 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:51,241-Speed 3394.54 samples/sec Loss 1.1613 LearningRate 0.0004 Epoch: 18 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:54,276-Speed 3374.12 samples/sec Loss 1.0521 LearningRate 0.0003 Epoch: 18 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:03:57,299-Speed 3388.26 samples/sec Loss 1.1169 LearningRate 0.0003 Epoch: 18 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:00,300-Speed 3413.01 samples/sec Loss 1.1482 LearningRate 0.0003 Epoch: 18 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:03,461-Speed 3240.24 samples/sec Loss 1.1710 LearningRate 0.0003 Epoch: 18 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:06,481-Speed 3392.81 samples/sec Loss 1.0263 LearningRate 0.0003 Epoch: 18 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:09,526-Speed 3362.57 samples/sec Loss 1.1391 LearningRate 0.0003 Epoch: 18 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:12,565-Speed 3370.54 samples/sec Loss 1.0407 LearningRate 0.0003 Epoch: 18 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:15,586-Speed 3390.81 samples/sec Loss 1.0145 LearningRate 0.0003 Epoch: 18 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:18,604-Speed 3393.48 samples/sec Loss 1.0056 LearningRate 0.0003 Epoch: 18 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:21,617-Speed 3399.56 samples/sec Loss 1.1074 LearningRate 0.0003 Epoch: 18 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:24,665-Speed 3360.29 samples/sec Loss 1.0595 LearningRate 0.0003 Epoch: 18 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:04:27,675-Speed 3403.57 samples/sec Loss 1.1080 LearningRate 0.0003 Epoch: 18 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:30,688-Speed 3399.19 samples/sec Loss 1.0418 LearningRate 0.0003 Epoch: 18 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:33,701-Speed 3399.95 samples/sec Loss 1.0239 LearningRate 0.0003 Epoch: 18 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:36,719-Speed 3393.41 samples/sec Loss 1.0068 LearningRate 0.0003 Epoch: 18 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:39,753-Speed 3375.57 samples/sec Loss 1.0441 LearningRate 0.0003 Epoch: 18 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:42,775-Speed 3389.87 samples/sec Loss 1.1489 LearningRate 0.0003 Epoch: 18 Global Step: 95340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:45,786-Speed 3401.60 samples/sec Loss 1.0532 LearningRate 0.0003 Epoch: 18 Global Step: 95350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:48,808-Speed 3388.48 samples/sec Loss 1.0841 LearningRate 0.0003 Epoch: 18 Global Step: 95360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:51,829-Speed 3391.89 samples/sec Loss 0.9938 LearningRate 0.0003 Epoch: 18 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:54,848-Speed 3392.62 samples/sec Loss 1.0629 LearningRate 0.0003 Epoch: 18 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:04:57,861-Speed 3399.37 samples/sec Loss 1.0370 LearningRate 0.0003 Epoch: 18 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:00,872-Speed 3402.08 samples/sec Loss 1.0187 LearningRate 0.0003 Epoch: 18 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:03,891-Speed 3392.29 samples/sec Loss 1.1034 LearningRate 0.0003 Epoch: 18 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:06,907-Speed 3396.23 samples/sec Loss 1.0427 LearningRate 0.0003 Epoch: 18 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:09,929-Speed 3388.61 samples/sec Loss 1.1133 LearningRate 0.0003 Epoch: 18 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:12,945-Speed 3396.78 samples/sec Loss 1.1168 LearningRate 0.0003 Epoch: 18 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:15,959-Speed 3397.60 samples/sec Loss 1.1911 LearningRate 0.0003 Epoch: 18 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:18,987-Speed 3382.80 samples/sec Loss 1.1148 LearningRate 0.0003 Epoch: 18 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:22,013-Speed 3385.09 samples/sec Loss 1.0773 LearningRate 0.0003 Epoch: 18 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:25,045-Speed 3377.89 samples/sec Loss 1.1005 LearningRate 0.0003 Epoch: 18 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:28,050-Speed 3408.30 samples/sec Loss 1.1612 LearningRate 0.0003 Epoch: 18 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:31,086-Speed 3374.60 samples/sec Loss 1.1381 LearningRate 0.0003 Epoch: 18 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:34,102-Speed 3395.73 samples/sec Loss 1.0780 LearningRate 0.0003 Epoch: 18 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:37,118-Speed 3395.90 samples/sec Loss 1.0552 LearningRate 0.0003 Epoch: 18 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:40,137-Speed 3393.08 samples/sec Loss 1.0729 LearningRate 0.0003 Epoch: 18 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:43,162-Speed 3385.25 samples/sec Loss 1.1469 LearningRate 0.0003 Epoch: 18 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:46,182-Speed 3392.12 samples/sec Loss 0.9925 LearningRate 0.0003 Epoch: 18 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:49,202-Speed 3391.80 samples/sec Loss 1.0105 LearningRate 0.0003 Epoch: 18 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:52,215-Speed 3399.00 samples/sec Loss 1.0639 LearningRate 0.0003 Epoch: 18 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:55,282-Speed 3339.62 samples/sec Loss 1.0604 LearningRate 0.0003 Epoch: 18 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:05:58,274-Speed 3423.63 samples/sec Loss 1.1456 LearningRate 0.0003 Epoch: 18 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:01,289-Speed 3397.34 samples/sec Loss 1.0473 LearningRate 0.0003 Epoch: 18 Global Step: 95600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:04,307-Speed 3393.96 samples/sec Loss 1.0827 LearningRate 0.0003 Epoch: 18 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:07,325-Speed 3393.53 samples/sec Loss 0.9952 LearningRate 0.0003 Epoch: 18 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:10,342-Speed 3395.00 samples/sec Loss 0.9496 LearningRate 0.0003 Epoch: 18 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:13,348-Speed 3407.50 samples/sec Loss 1.1354 LearningRate 0.0003 Epoch: 18 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:16,391-Speed 3365.76 samples/sec Loss 1.1330 LearningRate 0.0003 Epoch: 18 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:19,410-Speed 3392.90 samples/sec Loss 1.0368 LearningRate 0.0003 Epoch: 18 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:22,421-Speed 3402.23 samples/sec Loss 1.0709 LearningRate 0.0003 Epoch: 18 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:25,436-Speed 3396.64 samples/sec Loss 1.0738 LearningRate 0.0003 Epoch: 18 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:28,452-Speed 3396.36 samples/sec Loss 1.0860 LearningRate 0.0003 Epoch: 18 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:31,458-Speed 3407.23 samples/sec Loss 1.0284 LearningRate 0.0003 Epoch: 18 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:34,495-Speed 3372.62 samples/sec Loss 1.1636 LearningRate 0.0003 Epoch: 18 Global Step: 95710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:37,519-Speed 3387.75 samples/sec Loss 1.0496 LearningRate 0.0003 Epoch: 18 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:40,531-Speed 3400.63 samples/sec Loss 1.0507 LearningRate 0.0003 Epoch: 18 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:06:43,547-Speed 3395.95 samples/sec Loss 1.1413 LearningRate 0.0003 Epoch: 18 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:46,567-Speed 3392.22 samples/sec Loss 1.1856 LearningRate 0.0003 Epoch: 18 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:49,587-Speed 3390.94 samples/sec Loss 1.0560 LearningRate 0.0003 Epoch: 18 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:52,708-Speed 3281.71 samples/sec Loss 1.1247 LearningRate 0.0003 Epoch: 18 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:55,727-Speed 3392.70 samples/sec Loss 1.0836 LearningRate 0.0003 Epoch: 18 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:06:58,747-Speed 3391.31 samples/sec Loss 1.0419 LearningRate 0.0003 Epoch: 18 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:01,768-Speed 3390.49 samples/sec Loss 1.0826 LearningRate 0.0003 Epoch: 18 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:04,796-Speed 3382.91 samples/sec Loss 1.1263 LearningRate 0.0003 Epoch: 18 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:07,813-Speed 3395.60 samples/sec Loss 1.1017 LearningRate 0.0003 Epoch: 18 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:10,802-Speed 3426.83 samples/sec Loss 1.0606 LearningRate 0.0003 Epoch: 18 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:13,814-Speed 3400.09 samples/sec Loss 1.0048 LearningRate 0.0003 Epoch: 18 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:16,829-Speed 3397.83 samples/sec Loss 1.0406 LearningRate 0.0003 Epoch: 18 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:19,842-Speed 3399.04 samples/sec Loss 1.0877 LearningRate 0.0003 Epoch: 18 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:22,856-Speed 3398.08 samples/sec Loss 1.1388 LearningRate 0.0003 Epoch: 18 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:25,869-Speed 3399.54 samples/sec Loss 1.1447 LearningRate 0.0003 Epoch: 18 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:28,885-Speed 3395.72 samples/sec Loss 1.1014 LearningRate 0.0003 Epoch: 18 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:31,908-Speed 3388.11 samples/sec Loss 1.0975 LearningRate 0.0003 Epoch: 18 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:34,939-Speed 3380.05 samples/sec Loss 1.1125 LearningRate 0.0003 Epoch: 18 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:37,962-Speed 3387.92 samples/sec Loss 1.0982 LearningRate 0.0003 Epoch: 18 Global Step: 95920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:07:40,978-Speed 3396.89 samples/sec Loss 0.9994 LearningRate 0.0003 Epoch: 18 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:44,003-Speed 3385.83 samples/sec Loss 1.1448 LearningRate 0.0003 Epoch: 18 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:47,025-Speed 3389.67 samples/sec Loss 1.0878 LearningRate 0.0003 Epoch: 18 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:50,034-Speed 3402.84 samples/sec Loss 1.1104 LearningRate 0.0003 Epoch: 18 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:53,134-Speed 3304.60 samples/sec Loss 1.0581 LearningRate 0.0003 Epoch: 18 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:56,187-Speed 3354.24 samples/sec Loss 1.0635 LearningRate 0.0003 Epoch: 18 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:07:59,238-Speed 3357.73 samples/sec Loss 1.0586 LearningRate 0.0003 Epoch: 18 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:08:02,250-Speed 3401.37 samples/sec Loss 1.0423 LearningRate 0.0003 Epoch: 18 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:08:46,221-[lfw][96000]XNorm: 22.269381 Training: 2022-04-11 09:08:46,222-[lfw][96000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 09:08:46,222-[lfw][96000]Accuracy-Highest: 0.99850 Training: 2022-04-11 09:09:37,551-[cfp_fp][96000]XNorm: 22.442077 Training: 2022-04-11 09:09:37,551-[cfp_fp][96000]Accuracy-Flip: 0.98943+-0.00539 Training: 2022-04-11 09:09:37,552-[cfp_fp][96000]Accuracy-Highest: 0.98943 Training: 2022-04-11 09:10:21,666-[agedb_30][96000]XNorm: 22.679242 Training: 2022-04-11 09:10:21,667-[agedb_30][96000]Accuracy-Flip: 0.98400+-0.00688 Training: 2022-04-11 09:10:21,667-[agedb_30][96000]Accuracy-Highest: 0.98550 Training: 2022-04-11 09:10:24,666-Speed 71.90 samples/sec Loss 1.0744 LearningRate 0.0003 Epoch: 18 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:27,670-Speed 3409.69 samples/sec Loss 1.0556 LearningRate 0.0003 Epoch: 18 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:30,640-Speed 3448.40 samples/sec Loss 1.0444 LearningRate 0.0003 Epoch: 18 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:33,635-Speed 3419.98 samples/sec Loss 1.0770 LearningRate 0.0003 Epoch: 18 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:36,631-Speed 3418.43 samples/sec Loss 1.1678 LearningRate 0.0003 Epoch: 18 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:39,643-Speed 3400.69 samples/sec Loss 1.0820 LearningRate 0.0003 Epoch: 18 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:42,647-Speed 3410.39 samples/sec Loss 1.0032 LearningRate 0.0003 Epoch: 18 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:10:45,633-Speed 3429.92 samples/sec Loss 1.0342 LearningRate 0.0003 Epoch: 18 Global Step: 96080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:10:48,629-Speed 3419.16 samples/sec Loss 1.0957 LearningRate 0.0003 Epoch: 18 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:10:51,726-Speed 3306.71 samples/sec Loss 1.0103 LearningRate 0.0003 Epoch: 18 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:03,906-Speed 840.83 samples/sec Loss 1.0279 LearningRate 0.0002 Epoch: 19 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:07,078-Speed 3229.52 samples/sec Loss 0.9468 LearningRate 0.0002 Epoch: 19 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:10,072-Speed 3420.43 samples/sec Loss 1.0121 LearningRate 0.0002 Epoch: 19 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:13,074-Speed 3412.40 samples/sec Loss 0.9285 LearningRate 0.0002 Epoch: 19 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:16,088-Speed 3397.93 samples/sec Loss 0.9187 LearningRate 0.0002 Epoch: 19 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:19,090-Speed 3412.86 samples/sec Loss 0.8867 LearningRate 0.0002 Epoch: 19 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:22,089-Speed 3414.22 samples/sec Loss 0.9566 LearningRate 0.0002 Epoch: 19 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:25,101-Speed 3401.12 samples/sec Loss 0.9586 LearningRate 0.0002 Epoch: 19 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:28,130-Speed 3382.05 samples/sec Loss 0.9255 LearningRate 0.0002 Epoch: 19 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:31,136-Speed 3406.92 samples/sec Loss 0.8500 LearningRate 0.0002 Epoch: 19 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:34,146-Speed 3403.90 samples/sec Loss 0.8934 LearningRate 0.0002 Epoch: 19 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:37,156-Speed 3401.82 samples/sec Loss 0.8999 LearningRate 0.0002 Epoch: 19 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:40,159-Speed 3410.79 samples/sec Loss 0.8705 LearningRate 0.0002 Epoch: 19 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:43,164-Speed 3408.37 samples/sec Loss 0.8308 LearningRate 0.0002 Epoch: 19 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 09:11:46,154-Speed 3426.36 samples/sec Loss 0.9282 LearningRate 0.0002 Epoch: 19 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:49,171-Speed 3394.62 samples/sec Loss 0.9264 LearningRate 0.0002 Epoch: 19 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:52,224-Speed 3355.25 samples/sec Loss 0.9794 LearningRate 0.0002 Epoch: 19 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:55,237-Speed 3399.45 samples/sec Loss 0.9590 LearningRate 0.0002 Epoch: 19 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:11:58,247-Speed 3402.68 samples/sec Loss 0.8906 LearningRate 0.0002 Epoch: 19 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:12:01,257-Speed 3402.75 samples/sec Loss 0.9315 LearningRate 0.0002 Epoch: 19 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:12:04,266-Speed 3404.44 samples/sec Loss 0.9217 LearningRate 0.0002 Epoch: 19 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:12:07,271-Speed 3408.41 samples/sec Loss 0.9779 LearningRate 0.0002 Epoch: 19 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 09:12:10,278-Speed 3406.36 samples/sec Loss 0.9449 LearningRate 0.0002 Epoch: 19 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:12:13,296-Speed 3394.18 samples/sec Loss 0.9027 LearningRate 0.0002 Epoch: 19 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:12:16,327-Speed 3378.21 samples/sec Loss 0.9467 LearningRate 0.0002 Epoch: 19 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:19,336-Speed 3404.98 samples/sec Loss 0.9227 LearningRate 0.0002 Epoch: 19 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:22,342-Speed 3407.36 samples/sec Loss 0.9339 LearningRate 0.0002 Epoch: 19 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:25,356-Speed 3398.32 samples/sec Loss 0.8804 LearningRate 0.0002 Epoch: 19 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:28,461-Speed 3299.49 samples/sec Loss 0.8729 LearningRate 0.0002 Epoch: 19 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:31,514-Speed 3355.48 samples/sec Loss 0.9932 LearningRate 0.0002 Epoch: 19 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:34,515-Speed 3413.01 samples/sec Loss 0.9415 LearningRate 0.0002 Epoch: 19 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:37,523-Speed 3404.68 samples/sec Loss 0.9586 LearningRate 0.0002 Epoch: 19 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:40,570-Speed 3361.51 samples/sec Loss 0.9786 LearningRate 0.0002 Epoch: 19 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:43,572-Speed 3412.06 samples/sec Loss 0.9132 LearningRate 0.0002 Epoch: 19 Global Step: 96440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:46,567-Speed 3419.39 samples/sec Loss 0.8701 LearningRate 0.0002 Epoch: 19 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:49,574-Speed 3407.54 samples/sec Loss 0.8504 LearningRate 0.0002 Epoch: 19 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:52,577-Speed 3409.96 samples/sec Loss 0.9865 LearningRate 0.0002 Epoch: 19 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:55,583-Speed 3407.66 samples/sec Loss 0.9856 LearningRate 0.0002 Epoch: 19 Global Step: 96480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:12:58,594-Speed 3401.71 samples/sec Loss 0.8549 LearningRate 0.0002 Epoch: 19 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:01,616-Speed 3388.63 samples/sec Loss 0.9229 LearningRate 0.0002 Epoch: 19 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:04,625-Speed 3404.53 samples/sec Loss 0.9237 LearningRate 0.0002 Epoch: 19 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:07,632-Speed 3405.65 samples/sec Loss 0.8948 LearningRate 0.0002 Epoch: 19 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:10,643-Speed 3401.74 samples/sec Loss 1.0667 LearningRate 0.0002 Epoch: 19 Global Step: 96530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:13,680-Speed 3372.88 samples/sec Loss 0.9010 LearningRate 0.0002 Epoch: 19 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:16,675-Speed 3420.01 samples/sec Loss 1.0138 LearningRate 0.0002 Epoch: 19 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:19,695-Speed 3392.19 samples/sec Loss 1.0427 LearningRate 0.0002 Epoch: 19 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:22,699-Speed 3409.30 samples/sec Loss 0.9623 LearningRate 0.0002 Epoch: 19 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:25,703-Speed 3409.86 samples/sec Loss 0.8811 LearningRate 0.0002 Epoch: 19 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:28,711-Speed 3405.40 samples/sec Loss 0.9198 LearningRate 0.0002 Epoch: 19 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:13:31,702-Speed 3423.38 samples/sec Loss 0.8726 LearningRate 0.0002 Epoch: 19 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:34,712-Speed 3403.54 samples/sec Loss 1.0044 LearningRate 0.0002 Epoch: 19 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:37,716-Speed 3409.23 samples/sec Loss 0.9115 LearningRate 0.0002 Epoch: 19 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:40,728-Speed 3400.71 samples/sec Loss 0.8478 LearningRate 0.0002 Epoch: 19 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:43,732-Speed 3410.22 samples/sec Loss 0.9630 LearningRate 0.0002 Epoch: 19 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:46,739-Speed 3406.41 samples/sec Loss 0.9256 LearningRate 0.0002 Epoch: 19 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:49,744-Speed 3408.59 samples/sec Loss 0.9525 LearningRate 0.0002 Epoch: 19 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:52,749-Speed 3408.17 samples/sec Loss 0.9491 LearningRate 0.0002 Epoch: 19 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:55,760-Speed 3401.99 samples/sec Loss 0.9441 LearningRate 0.0002 Epoch: 19 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:13:58,764-Speed 3408.91 samples/sec Loss 0.9445 LearningRate 0.0002 Epoch: 19 Global Step: 96690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:01,800-Speed 3373.95 samples/sec Loss 0.8916 LearningRate 0.0002 Epoch: 19 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:04,810-Speed 3402.63 samples/sec Loss 0.9764 LearningRate 0.0002 Epoch: 19 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:07,824-Speed 3399.18 samples/sec Loss 0.9175 LearningRate 0.0002 Epoch: 19 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:10,832-Speed 3404.98 samples/sec Loss 0.9888 LearningRate 0.0002 Epoch: 19 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:13,886-Speed 3353.48 samples/sec Loss 0.9011 LearningRate 0.0002 Epoch: 19 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:16,907-Speed 3390.60 samples/sec Loss 0.8989 LearningRate 0.0002 Epoch: 19 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:19,918-Speed 3401.81 samples/sec Loss 0.9884 LearningRate 0.0002 Epoch: 19 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:22,940-Speed 3388.96 samples/sec Loss 0.8590 LearningRate 0.0002 Epoch: 19 Global Step: 96770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:25,958-Speed 3394.44 samples/sec Loss 1.0120 LearningRate 0.0002 Epoch: 19 Global Step: 96780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:28,967-Speed 3404.32 samples/sec Loss 0.9700 LearningRate 0.0002 Epoch: 19 Global Step: 96790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:31,969-Speed 3411.46 samples/sec Loss 0.9183 LearningRate 0.0002 Epoch: 19 Global Step: 96800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:34,976-Speed 3406.71 samples/sec Loss 0.9458 LearningRate 0.0002 Epoch: 19 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:14:37,982-Speed 3407.30 samples/sec Loss 0.9872 LearningRate 0.0002 Epoch: 19 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:40,999-Speed 3395.05 samples/sec Loss 0.8750 LearningRate 0.0002 Epoch: 19 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:44,046-Speed 3361.66 samples/sec Loss 0.9312 LearningRate 0.0002 Epoch: 19 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:47,052-Speed 3407.08 samples/sec Loss 0.9929 LearningRate 0.0002 Epoch: 19 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:50,061-Speed 3404.46 samples/sec Loss 0.9941 LearningRate 0.0002 Epoch: 19 Global Step: 96860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:53,088-Speed 3383.71 samples/sec Loss 0.9375 LearningRate 0.0002 Epoch: 19 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:56,099-Speed 3400.90 samples/sec Loss 0.8960 LearningRate 0.0002 Epoch: 19 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:14:59,116-Speed 3395.14 samples/sec Loss 0.9910 LearningRate 0.0002 Epoch: 19 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:15:02,132-Speed 3396.63 samples/sec Loss 0.9238 LearningRate 0.0002 Epoch: 19 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:15:05,177-Speed 3363.43 samples/sec Loss 1.0153 LearningRate 0.0002 Epoch: 19 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:15:08,190-Speed 3399.89 samples/sec Loss 1.0067 LearningRate 0.0002 Epoch: 19 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:11,219-Speed 3380.72 samples/sec Loss 1.0099 LearningRate 0.0002 Epoch: 19 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:14,237-Speed 3394.41 samples/sec Loss 0.9483 LearningRate 0.0002 Epoch: 19 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:17,243-Speed 3407.44 samples/sec Loss 0.8893 LearningRate 0.0002 Epoch: 19 Global Step: 96950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:20,266-Speed 3388.20 samples/sec Loss 0.9466 LearningRate 0.0002 Epoch: 19 Global Step: 96960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:23,275-Speed 3404.12 samples/sec Loss 0.9797 LearningRate 0.0002 Epoch: 19 Global Step: 96970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:26,286-Speed 3401.06 samples/sec Loss 0.8678 LearningRate 0.0002 Epoch: 19 Global Step: 96980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:29,299-Speed 3400.40 samples/sec Loss 0.9483 LearningRate 0.0002 Epoch: 19 Global Step: 96990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:32,311-Speed 3400.98 samples/sec Loss 0.9644 LearningRate 0.0002 Epoch: 19 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:35,333-Speed 3388.27 samples/sec Loss 1.0240 LearningRate 0.0002 Epoch: 19 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:38,329-Speed 3419.92 samples/sec Loss 0.9197 LearningRate 0.0002 Epoch: 19 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:41,344-Speed 3397.07 samples/sec Loss 0.9874 LearningRate 0.0002 Epoch: 19 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:44,367-Speed 3388.07 samples/sec Loss 0.9417 LearningRate 0.0002 Epoch: 19 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:47,383-Speed 3396.22 samples/sec Loss 0.9329 LearningRate 0.0002 Epoch: 19 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:50,396-Speed 3399.06 samples/sec Loss 0.8961 LearningRate 0.0002 Epoch: 19 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:53,421-Speed 3386.07 samples/sec Loss 0.9989 LearningRate 0.0002 Epoch: 19 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:56,436-Speed 3396.43 samples/sec Loss 0.9683 LearningRate 0.0002 Epoch: 19 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:15:59,456-Speed 3392.32 samples/sec Loss 0.9939 LearningRate 0.0002 Epoch: 19 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:02,471-Speed 3397.40 samples/sec Loss 0.9089 LearningRate 0.0002 Epoch: 19 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:05,493-Speed 3389.52 samples/sec Loss 1.0183 LearningRate 0.0002 Epoch: 19 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:08,487-Speed 3421.51 samples/sec Loss 1.0066 LearningRate 0.0002 Epoch: 19 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:11,505-Speed 3393.27 samples/sec Loss 1.0003 LearningRate 0.0002 Epoch: 19 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:14,552-Speed 3361.78 samples/sec Loss 0.9928 LearningRate 0.0002 Epoch: 19 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:17,564-Speed 3400.73 samples/sec Loss 0.9115 LearningRate 0.0002 Epoch: 19 Global Step: 97150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:20,594-Speed 3380.35 samples/sec Loss 0.9787 LearningRate 0.0002 Epoch: 19 Global Step: 97160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:23,610-Speed 3396.33 samples/sec Loss 0.8785 LearningRate 0.0002 Epoch: 19 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:26,627-Speed 3394.46 samples/sec Loss 0.9163 LearningRate 0.0002 Epoch: 19 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:29,639-Speed 3401.06 samples/sec Loss 1.0230 LearningRate 0.0002 Epoch: 19 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:32,657-Speed 3393.99 samples/sec Loss 0.9265 LearningRate 0.0002 Epoch: 19 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:35,670-Speed 3399.60 samples/sec Loss 0.9268 LearningRate 0.0002 Epoch: 19 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:38,692-Speed 3389.64 samples/sec Loss 0.9841 LearningRate 0.0002 Epoch: 19 Global Step: 97220 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:16:41,693-Speed 3412.76 samples/sec Loss 0.9469 LearningRate 0.0002 Epoch: 19 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:44,716-Speed 3388.30 samples/sec Loss 0.9163 LearningRate 0.0002 Epoch: 19 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:47,731-Speed 3396.92 samples/sec Loss 0.8405 LearningRate 0.0001 Epoch: 19 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:50,753-Speed 3388.73 samples/sec Loss 0.9545 LearningRate 0.0001 Epoch: 19 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:53,767-Speed 3399.31 samples/sec Loss 0.8797 LearningRate 0.0001 Epoch: 19 Global Step: 97270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:56,789-Speed 3388.77 samples/sec Loss 0.8880 LearningRate 0.0001 Epoch: 19 Global Step: 97280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:16:59,805-Speed 3396.84 samples/sec Loss 0.9742 LearningRate 0.0001 Epoch: 19 Global Step: 97290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:02,849-Speed 3365.10 samples/sec Loss 0.9640 LearningRate 0.0001 Epoch: 19 Global Step: 97300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:05,874-Speed 3385.27 samples/sec Loss 0.9167 LearningRate 0.0001 Epoch: 19 Global Step: 97310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:08,891-Speed 3395.56 samples/sec Loss 0.9266 LearningRate 0.0001 Epoch: 19 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:11,895-Speed 3409.39 samples/sec Loss 1.0009 LearningRate 0.0001 Epoch: 19 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:14,907-Speed 3400.94 samples/sec Loss 0.8761 LearningRate 0.0001 Epoch: 19 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:17,922-Speed 3396.91 samples/sec Loss 0.9922 LearningRate 0.0001 Epoch: 19 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:20,936-Speed 3398.08 samples/sec Loss 0.9534 LearningRate 0.0001 Epoch: 19 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:23,971-Speed 3375.46 samples/sec Loss 0.9479 LearningRate 0.0001 Epoch: 19 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:27,027-Speed 3351.89 samples/sec Loss 0.9020 LearningRate 0.0001 Epoch: 19 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:30,050-Speed 3387.91 samples/sec Loss 0.8778 LearningRate 0.0001 Epoch: 19 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:33,069-Speed 3393.55 samples/sec Loss 0.9756 LearningRate 0.0001 Epoch: 19 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:36,087-Speed 3393.03 samples/sec Loss 0.9760 LearningRate 0.0001 Epoch: 19 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:39,099-Speed 3400.85 samples/sec Loss 0.9176 LearningRate 0.0001 Epoch: 19 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:42,109-Speed 3403.15 samples/sec Loss 0.9384 LearningRate 0.0001 Epoch: 19 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:45,125-Speed 3395.47 samples/sec Loss 0.9558 LearningRate 0.0001 Epoch: 19 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:48,140-Speed 3397.32 samples/sec Loss 0.9643 LearningRate 0.0001 Epoch: 19 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:51,160-Speed 3391.14 samples/sec Loss 0.8869 LearningRate 0.0001 Epoch: 19 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:17:54,174-Speed 3398.42 samples/sec Loss 0.9561 LearningRate 0.0001 Epoch: 19 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:17:57,186-Speed 3400.88 samples/sec Loss 0.9285 LearningRate 0.0001 Epoch: 19 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:00,201-Speed 3397.44 samples/sec Loss 0.9136 LearningRate 0.0001 Epoch: 19 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:03,218-Speed 3394.79 samples/sec Loss 0.9799 LearningRate 0.0001 Epoch: 19 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:06,232-Speed 3398.97 samples/sec Loss 0.8577 LearningRate 0.0001 Epoch: 19 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:09,251-Speed 3393.50 samples/sec Loss 1.0183 LearningRate 0.0001 Epoch: 19 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:12,264-Speed 3398.40 samples/sec Loss 0.9680 LearningRate 0.0001 Epoch: 19 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:15,283-Speed 3393.62 samples/sec Loss 0.8319 LearningRate 0.0001 Epoch: 19 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:18,344-Speed 3346.28 samples/sec Loss 0.8880 LearningRate 0.0001 Epoch: 19 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:21,360-Speed 3396.29 samples/sec Loss 0.8991 LearningRate 0.0001 Epoch: 19 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:24,375-Speed 3396.91 samples/sec Loss 0.9171 LearningRate 0.0001 Epoch: 19 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:27,389-Speed 3398.10 samples/sec Loss 0.9749 LearningRate 0.0001 Epoch: 19 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:30,402-Speed 3399.91 samples/sec Loss 0.9032 LearningRate 0.0001 Epoch: 19 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:33,417-Speed 3397.56 samples/sec Loss 0.9170 LearningRate 0.0001 Epoch: 19 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:36,449-Speed 3377.52 samples/sec Loss 0.9291 LearningRate 0.0001 Epoch: 19 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:39,518-Speed 3337.91 samples/sec Loss 0.8684 LearningRate 0.0001 Epoch: 19 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:42,535-Speed 3394.64 samples/sec Loss 0.9586 LearningRate 0.0001 Epoch: 19 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:45,558-Speed 3387.99 samples/sec Loss 1.0078 LearningRate 0.0001 Epoch: 19 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:48,572-Speed 3398.64 samples/sec Loss 0.9140 LearningRate 0.0001 Epoch: 19 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:51,596-Speed 3387.49 samples/sec Loss 0.9323 LearningRate 0.0001 Epoch: 19 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:54,594-Speed 3416.38 samples/sec Loss 0.9259 LearningRate 0.0001 Epoch: 19 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:18:57,593-Speed 3415.52 samples/sec Loss 0.8716 LearningRate 0.0001 Epoch: 19 Global Step: 97680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:00,609-Speed 3396.20 samples/sec Loss 0.9198 LearningRate 0.0001 Epoch: 19 Global Step: 97690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:03,650-Speed 3367.83 samples/sec Loss 0.9470 LearningRate 0.0001 Epoch: 19 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:06,673-Speed 3388.36 samples/sec Loss 0.9561 LearningRate 0.0001 Epoch: 19 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:09,694-Speed 3391.21 samples/sec Loss 0.9151 LearningRate 0.0001 Epoch: 19 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:12,711-Speed 3395.20 samples/sec Loss 0.9156 LearningRate 0.0001 Epoch: 19 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:15,745-Speed 3376.06 samples/sec Loss 0.9707 LearningRate 0.0001 Epoch: 19 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:18,759-Speed 3400.12 samples/sec Loss 0.9406 LearningRate 0.0001 Epoch: 19 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:21,788-Speed 3381.97 samples/sec Loss 1.0063 LearningRate 0.0001 Epoch: 19 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:24,800-Speed 3400.99 samples/sec Loss 0.9797 LearningRate 0.0001 Epoch: 19 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:19:27,820-Speed 3391.64 samples/sec Loss 0.9540 LearningRate 0.0001 Epoch: 19 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:30,843-Speed 3388.06 samples/sec Loss 0.9982 LearningRate 0.0001 Epoch: 19 Global Step: 97790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:33,870-Speed 3384.01 samples/sec Loss 0.9542 LearningRate 0.0001 Epoch: 19 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:36,911-Speed 3367.14 samples/sec Loss 0.8779 LearningRate 0.0001 Epoch: 19 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:39,935-Speed 3387.74 samples/sec Loss 0.9703 LearningRate 0.0001 Epoch: 19 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:42,978-Speed 3365.57 samples/sec Loss 0.9279 LearningRate 0.0001 Epoch: 19 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:45,990-Speed 3400.91 samples/sec Loss 0.9863 LearningRate 0.0001 Epoch: 19 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:49,010-Speed 3391.46 samples/sec Loss 1.0200 LearningRate 0.0001 Epoch: 19 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:52,046-Speed 3373.91 samples/sec Loss 0.9333 LearningRate 0.0001 Epoch: 19 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:55,066-Speed 3392.05 samples/sec Loss 0.9257 LearningRate 0.0001 Epoch: 19 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:19:58,064-Speed 3416.84 samples/sec Loss 0.9376 LearningRate 0.0001 Epoch: 19 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:01,079-Speed 3396.26 samples/sec Loss 0.9141 LearningRate 0.0001 Epoch: 19 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:04,103-Speed 3387.27 samples/sec Loss 0.9369 LearningRate 0.0001 Epoch: 19 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:07,127-Speed 3387.75 samples/sec Loss 0.9113 LearningRate 0.0001 Epoch: 19 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:10,144-Speed 3394.20 samples/sec Loss 0.9176 LearningRate 0.0001 Epoch: 19 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:13,171-Speed 3383.63 samples/sec Loss 0.9181 LearningRate 0.0001 Epoch: 19 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:16,220-Speed 3359.74 samples/sec Loss 0.9541 LearningRate 0.0001 Epoch: 19 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:19,241-Speed 3391.35 samples/sec Loss 0.9354 LearningRate 0.0001 Epoch: 19 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:22,256-Speed 3396.35 samples/sec Loss 0.9377 LearningRate 0.0001 Epoch: 19 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:25,271-Speed 3397.46 samples/sec Loss 0.9730 LearningRate 0.0001 Epoch: 19 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:28,287-Speed 3396.94 samples/sec Loss 0.9343 LearningRate 0.0001 Epoch: 19 Global Step: 97980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:20:31,282-Speed 3418.65 samples/sec Loss 0.9609 LearningRate 0.0001 Epoch: 19 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:20:34,304-Speed 3390.47 samples/sec Loss 0.9405 LearningRate 0.0001 Epoch: 19 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:21:18,614-[lfw][98000]XNorm: 22.098909 Training: 2022-04-11 09:21:18,615-[lfw][98000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 09:21:18,615-[lfw][98000]Accuracy-Highest: 0.99850 Training: 2022-04-11 09:22:10,011-[cfp_fp][98000]XNorm: 22.369260 Training: 2022-04-11 09:22:10,011-[cfp_fp][98000]Accuracy-Flip: 0.98986+-0.00573 Training: 2022-04-11 09:22:10,012-[cfp_fp][98000]Accuracy-Highest: 0.98986 Training: 2022-04-11 09:22:54,072-[agedb_30][98000]XNorm: 22.539158 Training: 2022-04-11 09:22:54,073-[agedb_30][98000]Accuracy-Flip: 0.98517+-0.00728 Training: 2022-04-11 09:22:54,073-[agedb_30][98000]Accuracy-Highest: 0.98550 Training: 2022-04-11 09:22:57,072-Speed 71.72 samples/sec Loss 0.9238 LearningRate 0.0001 Epoch: 19 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:00,057-Speed 3431.97 samples/sec Loss 0.9675 LearningRate 0.0001 Epoch: 19 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:03,105-Speed 3359.67 samples/sec Loss 0.9578 LearningRate 0.0001 Epoch: 19 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:06,161-Speed 3352.48 samples/sec Loss 0.9713 LearningRate 0.0001 Epoch: 19 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:09,145-Speed 3431.96 samples/sec Loss 0.8549 LearningRate 0.0001 Epoch: 19 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:12,142-Speed 3418.17 samples/sec Loss 0.9819 LearningRate 0.0001 Epoch: 19 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:15,144-Speed 3411.87 samples/sec Loss 0.8641 LearningRate 0.0001 Epoch: 19 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:18,141-Speed 3417.25 samples/sec Loss 0.8440 LearningRate 0.0001 Epoch: 19 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:21,134-Speed 3422.97 samples/sec Loss 0.9525 LearningRate 0.0001 Epoch: 19 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:24,127-Speed 3422.70 samples/sec Loss 0.9672 LearningRate 0.0001 Epoch: 19 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:27,130-Speed 3410.09 samples/sec Loss 0.9190 LearningRate 0.0001 Epoch: 19 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:30,131-Speed 3413.71 samples/sec Loss 0.9625 LearningRate 0.0001 Epoch: 19 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:33,129-Speed 3416.79 samples/sec Loss 0.9255 LearningRate 0.0001 Epoch: 19 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:36,129-Speed 3413.92 samples/sec Loss 0.8783 LearningRate 0.0001 Epoch: 19 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:39,130-Speed 3412.94 samples/sec Loss 0.9323 LearningRate 0.0001 Epoch: 19 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:42,132-Speed 3413.25 samples/sec Loss 0.9394 LearningRate 0.0001 Epoch: 19 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:45,141-Speed 3403.79 samples/sec Loss 0.9603 LearningRate 0.0001 Epoch: 19 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:48,142-Speed 3412.01 samples/sec Loss 0.9285 LearningRate 0.0001 Epoch: 19 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:23:51,128-Speed 3430.40 samples/sec Loss 0.9461 LearningRate 0.0001 Epoch: 19 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:54,136-Speed 3404.90 samples/sec Loss 0.9514 LearningRate 0.0001 Epoch: 19 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:23:57,144-Speed 3407.50 samples/sec Loss 0.9445 LearningRate 0.0001 Epoch: 19 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:00,148-Speed 3408.91 samples/sec Loss 1.0286 LearningRate 0.0001 Epoch: 19 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:03,154-Speed 3406.88 samples/sec Loss 0.8892 LearningRate 0.0001 Epoch: 19 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:06,188-Speed 3376.34 samples/sec Loss 0.9331 LearningRate 0.0001 Epoch: 19 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:09,196-Speed 3405.96 samples/sec Loss 0.9671 LearningRate 0.0001 Epoch: 19 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:12,229-Speed 3377.12 samples/sec Loss 0.9529 LearningRate 0.0001 Epoch: 19 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:15,367-Speed 3263.76 samples/sec Loss 0.9886 LearningRate 0.0001 Epoch: 19 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:18,385-Speed 3394.07 samples/sec Loss 0.9573 LearningRate 0.0001 Epoch: 19 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:24:21,396-Speed 3402.06 samples/sec Loss 0.9735 LearningRate 0.0001 Epoch: 19 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:24,401-Speed 3407.73 samples/sec Loss 0.9160 LearningRate 0.0001 Epoch: 19 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:27,478-Speed 3328.89 samples/sec Loss 0.8619 LearningRate 0.0001 Epoch: 19 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:30,491-Speed 3400.23 samples/sec Loss 0.9473 LearningRate 0.0001 Epoch: 19 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:33,501-Speed 3402.15 samples/sec Loss 0.9588 LearningRate 0.0001 Epoch: 19 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:36,547-Speed 3362.93 samples/sec Loss 0.8945 LearningRate 0.0001 Epoch: 19 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:39,558-Speed 3402.31 samples/sec Loss 0.9011 LearningRate 0.0001 Epoch: 19 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:42,577-Speed 3392.43 samples/sec Loss 0.9349 LearningRate 0.0001 Epoch: 19 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:45,590-Speed 3399.20 samples/sec Loss 0.8741 LearningRate 0.0001 Epoch: 19 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:48,620-Speed 3380.45 samples/sec Loss 1.0406 LearningRate 0.0001 Epoch: 19 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:51,645-Speed 3386.33 samples/sec Loss 0.9111 LearningRate 0.0001 Epoch: 19 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:54,679-Speed 3375.80 samples/sec Loss 1.0282 LearningRate 0.0001 Epoch: 19 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:24:57,694-Speed 3397.48 samples/sec Loss 0.9186 LearningRate 0.0001 Epoch: 19 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:00,702-Speed 3404.54 samples/sec Loss 0.9206 LearningRate 0.0001 Epoch: 19 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:03,791-Speed 3316.12 samples/sec Loss 0.9057 LearningRate 0.0001 Epoch: 19 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:06,901-Speed 3293.76 samples/sec Loss 0.9789 LearningRate 0.0001 Epoch: 19 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:09,919-Speed 3394.06 samples/sec Loss 0.9632 LearningRate 0.0001 Epoch: 19 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:12,930-Speed 3401.46 samples/sec Loss 1.0140 LearningRate 0.0001 Epoch: 19 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:15,946-Speed 3396.89 samples/sec Loss 1.0269 LearningRate 0.0001 Epoch: 19 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:18,964-Speed 3393.22 samples/sec Loss 0.9304 LearningRate 0.0001 Epoch: 19 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:21,966-Speed 3411.97 samples/sec Loss 0.8376 LearningRate 0.0001 Epoch: 19 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:24,973-Speed 3405.74 samples/sec Loss 0.9310 LearningRate 0.0001 Epoch: 19 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:27,980-Speed 3407.01 samples/sec Loss 0.9430 LearningRate 0.0001 Epoch: 19 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:30,989-Speed 3403.82 samples/sec Loss 0.9041 LearningRate 0.0001 Epoch: 19 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:34,002-Speed 3398.69 samples/sec Loss 1.0325 LearningRate 0.0001 Epoch: 19 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:25:37,007-Speed 3409.96 samples/sec Loss 0.9460 LearningRate 0.0001 Epoch: 19 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:40,021-Speed 3398.13 samples/sec Loss 0.9158 LearningRate 0.0001 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:43,025-Speed 3409.25 samples/sec Loss 0.9193 LearningRate 0.0001 Epoch: 19 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:46,032-Speed 3406.64 samples/sec Loss 0.8878 LearningRate 0.0001 Epoch: 19 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:49,041-Speed 3403.93 samples/sec Loss 1.0525 LearningRate 0.0001 Epoch: 19 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:52,050-Speed 3404.91 samples/sec Loss 0.9010 LearningRate 0.0001 Epoch: 19 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:55,121-Speed 3335.59 samples/sec Loss 0.9888 LearningRate 0.0001 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:25:58,133-Speed 3400.59 samples/sec Loss 0.9210 LearningRate 0.0001 Epoch: 19 Global Step: 98610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:01,119-Speed 3430.27 samples/sec Loss 0.9362 LearningRate 0.0001 Epoch: 19 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:04,131-Speed 3399.72 samples/sec Loss 0.9207 LearningRate 0.0001 Epoch: 19 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:07,147-Speed 3396.68 samples/sec Loss 1.0382 LearningRate 0.0001 Epoch: 19 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:10,160-Speed 3400.54 samples/sec Loss 1.0041 LearningRate 0.0001 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:13,214-Speed 3353.27 samples/sec Loss 0.8792 LearningRate 0.0001 Epoch: 19 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:16,224-Speed 3403.02 samples/sec Loss 0.9341 LearningRate 0.0001 Epoch: 19 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:19,236-Speed 3400.03 samples/sec Loss 0.9655 LearningRate 0.0001 Epoch: 19 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:22,242-Speed 3408.15 samples/sec Loss 0.9783 LearningRate 0.0001 Epoch: 19 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:25,297-Speed 3351.89 samples/sec Loss 0.9373 LearningRate 0.0001 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:28,310-Speed 3400.16 samples/sec Loss 0.9283 LearningRate 0.0001 Epoch: 19 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:26:31,319-Speed 3403.76 samples/sec Loss 0.9363 LearningRate 0.0001 Epoch: 19 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:34,334-Speed 3397.45 samples/sec Loss 1.0680 LearningRate 0.0001 Epoch: 19 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:37,349-Speed 3397.96 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:40,386-Speed 3372.21 samples/sec Loss 1.0082 LearningRate 0.0001 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:43,395-Speed 3403.35 samples/sec Loss 0.9346 LearningRate 0.0001 Epoch: 19 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:46,403-Speed 3405.79 samples/sec Loss 0.9442 LearningRate 0.0001 Epoch: 19 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:49,427-Speed 3386.89 samples/sec Loss 0.9642 LearningRate 0.0001 Epoch: 19 Global Step: 98780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:52,439-Speed 3400.21 samples/sec Loss 0.9331 LearningRate 0.0001 Epoch: 19 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:55,450-Speed 3401.97 samples/sec Loss 0.9039 LearningRate 0.0001 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:26:58,463-Speed 3398.96 samples/sec Loss 0.9513 LearningRate 0.0001 Epoch: 19 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:01,484-Speed 3391.35 samples/sec Loss 0.9288 LearningRate 0.0001 Epoch: 19 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:04,498-Speed 3397.85 samples/sec Loss 0.9680 LearningRate 0.0001 Epoch: 19 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:07,522-Speed 3387.12 samples/sec Loss 0.8823 LearningRate 0.0001 Epoch: 19 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:10,548-Speed 3385.29 samples/sec Loss 0.9618 LearningRate 0.0001 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:13,566-Speed 3393.64 samples/sec Loss 0.9393 LearningRate 0.0001 Epoch: 19 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:16,559-Speed 3422.65 samples/sec Loss 1.0388 LearningRate 0.0001 Epoch: 19 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:19,600-Speed 3368.41 samples/sec Loss 0.9560 LearningRate 0.0001 Epoch: 19 Global Step: 98880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:22,616-Speed 3395.41 samples/sec Loss 0.8986 LearningRate 0.0001 Epoch: 19 Global Step: 98890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:25,624-Speed 3404.99 samples/sec Loss 0.9668 LearningRate 0.0000 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:28,649-Speed 3386.17 samples/sec Loss 0.9684 LearningRate 0.0000 Epoch: 19 Global Step: 98910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:31,668-Speed 3392.72 samples/sec Loss 0.9299 LearningRate 0.0000 Epoch: 19 Global Step: 98920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:34,676-Speed 3405.16 samples/sec Loss 0.9246 LearningRate 0.0000 Epoch: 19 Global Step: 98930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:37,700-Speed 3387.66 samples/sec Loss 0.9591 LearningRate 0.0000 Epoch: 19 Global Step: 98940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:40,713-Speed 3398.95 samples/sec Loss 0.9892 LearningRate 0.0000 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:43,723-Speed 3403.52 samples/sec Loss 0.9366 LearningRate 0.0000 Epoch: 19 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:27:46,742-Speed 3392.55 samples/sec Loss 0.9409 LearningRate 0.0000 Epoch: 19 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:49,783-Speed 3367.87 samples/sec Loss 0.9604 LearningRate 0.0000 Epoch: 19 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:52,799-Speed 3396.26 samples/sec Loss 0.9496 LearningRate 0.0000 Epoch: 19 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:55,814-Speed 3397.57 samples/sec Loss 1.0157 LearningRate 0.0000 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:27:58,824-Speed 3402.86 samples/sec Loss 0.9738 LearningRate 0.0000 Epoch: 19 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:01,839-Speed 3396.89 samples/sec Loss 0.9773 LearningRate 0.0000 Epoch: 19 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:04,860-Speed 3391.19 samples/sec Loss 0.9722 LearningRate 0.0000 Epoch: 19 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:07,882-Speed 3389.46 samples/sec Loss 0.9405 LearningRate 0.0000 Epoch: 19 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:10,893-Speed 3401.34 samples/sec Loss 0.9069 LearningRate 0.0000 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:13,926-Speed 3377.53 samples/sec Loss 0.9973 LearningRate 0.0000 Epoch: 19 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:16,962-Speed 3373.73 samples/sec Loss 0.9320 LearningRate 0.0000 Epoch: 19 Global Step: 99070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:28:19,968-Speed 3406.86 samples/sec Loss 0.9102 LearningRate 0.0000 Epoch: 19 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:22,980-Speed 3400.46 samples/sec Loss 1.0029 LearningRate 0.0000 Epoch: 19 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:26,002-Speed 3389.83 samples/sec Loss 0.9574 LearningRate 0.0000 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:29,023-Speed 3389.81 samples/sec Loss 0.9679 LearningRate 0.0000 Epoch: 19 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:32,040-Speed 3396.07 samples/sec Loss 0.9062 LearningRate 0.0000 Epoch: 19 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:35,076-Speed 3373.82 samples/sec Loss 0.9821 LearningRate 0.0000 Epoch: 19 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:38,136-Speed 3346.97 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:41,193-Speed 3350.43 samples/sec Loss 0.8960 LearningRate 0.0000 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:44,228-Speed 3375.19 samples/sec Loss 0.8948 LearningRate 0.0000 Epoch: 19 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:47,252-Speed 3387.00 samples/sec Loss 0.9516 LearningRate 0.0000 Epoch: 19 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:50,247-Speed 3419.77 samples/sec Loss 0.9877 LearningRate 0.0000 Epoch: 19 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:53,286-Speed 3370.16 samples/sec Loss 0.9073 LearningRate 0.0000 Epoch: 19 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:56,331-Speed 3364.00 samples/sec Loss 0.8788 LearningRate 0.0000 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:28:59,345-Speed 3398.48 samples/sec Loss 0.9206 LearningRate 0.0000 Epoch: 19 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:02,382-Speed 3373.17 samples/sec Loss 0.9614 LearningRate 0.0000 Epoch: 19 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:05,458-Speed 3330.32 samples/sec Loss 0.9602 LearningRate 0.0000 Epoch: 19 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:08,502-Speed 3364.33 samples/sec Loss 0.8629 LearningRate 0.0000 Epoch: 19 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:11,525-Speed 3388.22 samples/sec Loss 0.8969 LearningRate 0.0000 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:14,606-Speed 3324.37 samples/sec Loss 0.9541 LearningRate 0.0000 Epoch: 19 Global Step: 99260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:17,618-Speed 3401.25 samples/sec Loss 0.9980 LearningRate 0.0000 Epoch: 19 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:20,630-Speed 3399.62 samples/sec Loss 0.9281 LearningRate 0.0000 Epoch: 19 Global Step: 99280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:29:23,627-Speed 3417.60 samples/sec Loss 0.8976 LearningRate 0.0000 Epoch: 19 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:26,642-Speed 3398.40 samples/sec Loss 0.9155 LearningRate 0.0000 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:29,660-Speed 3392.98 samples/sec Loss 0.8570 LearningRate 0.0000 Epoch: 19 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:32,681-Speed 3391.63 samples/sec Loss 0.9645 LearningRate 0.0000 Epoch: 19 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:35,701-Speed 3390.90 samples/sec Loss 0.9438 LearningRate 0.0000 Epoch: 19 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:38,774-Speed 3332.94 samples/sec Loss 0.9603 LearningRate 0.0000 Epoch: 19 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:41,815-Speed 3367.99 samples/sec Loss 0.9515 LearningRate 0.0000 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:29:44,821-Speed 3407.28 samples/sec Loss 0.9083 LearningRate 0.0000 Epoch: 19 Global Step: 99360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:29:47,860-Speed 3371.77 samples/sec Loss 0.9976 LearningRate 0.0000 Epoch: 19 Global Step: 99370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:29:50,900-Speed 3369.28 samples/sec Loss 0.9275 LearningRate 0.0000 Epoch: 19 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:29:53,924-Speed 3387.65 samples/sec Loss 0.9560 LearningRate 0.0000 Epoch: 19 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:29:56,940-Speed 3396.70 samples/sec Loss 0.9439 LearningRate 0.0000 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:29:59,958-Speed 3393.61 samples/sec Loss 0.9191 LearningRate 0.0000 Epoch: 19 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:30:02,996-Speed 3370.87 samples/sec Loss 0.9222 LearningRate 0.0000 Epoch: 19 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:30:06,039-Speed 3366.56 samples/sec Loss 0.8746 LearningRate 0.0000 Epoch: 19 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:30:09,055-Speed 3396.06 samples/sec Loss 0.9368 LearningRate 0.0000 Epoch: 19 Global Step: 99440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:30:12,081-Speed 3384.62 samples/sec Loss 0.9040 LearningRate 0.0000 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:30:15,115-Speed 3375.72 samples/sec Loss 0.9662 LearningRate 0.0000 Epoch: 19 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:18,134-Speed 3392.41 samples/sec Loss 0.9499 LearningRate 0.0000 Epoch: 19 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:21,152-Speed 3394.55 samples/sec Loss 0.9763 LearningRate 0.0000 Epoch: 19 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:24,161-Speed 3403.52 samples/sec Loss 0.9598 LearningRate 0.0000 Epoch: 19 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:27,181-Speed 3392.46 samples/sec Loss 0.9027 LearningRate 0.0000 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:30,191-Speed 3402.06 samples/sec Loss 0.9501 LearningRate 0.0000 Epoch: 19 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:33,204-Speed 3399.21 samples/sec Loss 1.0037 LearningRate 0.0000 Epoch: 19 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:36,238-Speed 3376.39 samples/sec Loss 0.9843 LearningRate 0.0000 Epoch: 19 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:39,298-Speed 3347.27 samples/sec Loss 0.9192 LearningRate 0.0000 Epoch: 19 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:42,315-Speed 3395.05 samples/sec Loss 0.9427 LearningRate 0.0000 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:45,307-Speed 3423.06 samples/sec Loss 0.9321 LearningRate 0.0000 Epoch: 19 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:48,334-Speed 3384.39 samples/sec Loss 0.9552 LearningRate 0.0000 Epoch: 19 Global Step: 99570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:51,357-Speed 3388.18 samples/sec Loss 0.9047 LearningRate 0.0000 Epoch: 19 Global Step: 99580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:54,381-Speed 3387.31 samples/sec Loss 0.9388 LearningRate 0.0000 Epoch: 19 Global Step: 99590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:30:57,398-Speed 3394.58 samples/sec Loss 0.9349 LearningRate 0.0000 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:00,502-Speed 3300.28 samples/sec Loss 0.9102 LearningRate 0.0000 Epoch: 19 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:03,576-Speed 3331.67 samples/sec Loss 1.0125 LearningRate 0.0000 Epoch: 19 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:06,679-Speed 3300.48 samples/sec Loss 0.9832 LearningRate 0.0000 Epoch: 19 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:09,696-Speed 3395.39 samples/sec Loss 0.9321 LearningRate 0.0000 Epoch: 19 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:12,708-Speed 3401.20 samples/sec Loss 0.9047 LearningRate 0.0000 Epoch: 19 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:15,708-Speed 3414.17 samples/sec Loss 0.9448 LearningRate 0.0000 Epoch: 19 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:18,728-Speed 3390.88 samples/sec Loss 1.0237 LearningRate 0.0000 Epoch: 19 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:21,740-Speed 3401.49 samples/sec Loss 0.9390 LearningRate 0.0000 Epoch: 19 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:24,756-Speed 3395.78 samples/sec Loss 0.9053 LearningRate 0.0000 Epoch: 19 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:27,775-Speed 3393.30 samples/sec Loss 0.9115 LearningRate 0.0000 Epoch: 19 Global Step: 99700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:30,794-Speed 3392.14 samples/sec Loss 0.9839 LearningRate 0.0000 Epoch: 19 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:33,810-Speed 3395.38 samples/sec Loss 0.9256 LearningRate 0.0000 Epoch: 19 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:36,826-Speed 3397.14 samples/sec Loss 0.9657 LearningRate 0.0000 Epoch: 19 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:39,838-Speed 3400.43 samples/sec Loss 0.9455 LearningRate 0.0000 Epoch: 19 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:42,853-Speed 3397.39 samples/sec Loss 0.8892 LearningRate 0.0000 Epoch: 19 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:45,870-Speed 3395.38 samples/sec Loss 0.8274 LearningRate 0.0000 Epoch: 19 Global Step: 99760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:31:48,868-Speed 3416.23 samples/sec Loss 0.9060 LearningRate 0.0000 Epoch: 19 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:51,883-Speed 3397.76 samples/sec Loss 0.9444 LearningRate 0.0000 Epoch: 19 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:54,921-Speed 3371.37 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 19 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:31:57,941-Speed 3391.11 samples/sec Loss 0.9192 LearningRate 0.0000 Epoch: 19 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:00,958-Speed 3395.14 samples/sec Loss 0.9567 LearningRate 0.0000 Epoch: 19 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:03,977-Speed 3392.64 samples/sec Loss 0.9253 LearningRate 0.0000 Epoch: 19 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:06,996-Speed 3392.81 samples/sec Loss 0.9064 LearningRate 0.0000 Epoch: 19 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:10,009-Speed 3400.38 samples/sec Loss 0.9763 LearningRate 0.0000 Epoch: 19 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:13,027-Speed 3394.13 samples/sec Loss 0.8494 LearningRate 0.0000 Epoch: 19 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:16,041-Speed 3398.73 samples/sec Loss 0.9918 LearningRate 0.0000 Epoch: 19 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:19,046-Speed 3408.31 samples/sec Loss 0.9489 LearningRate 0.0000 Epoch: 19 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:22,053-Speed 3406.05 samples/sec Loss 0.8741 LearningRate 0.0000 Epoch: 19 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:25,066-Speed 3398.99 samples/sec Loss 0.9388 LearningRate 0.0000 Epoch: 19 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:28,081-Speed 3397.71 samples/sec Loss 0.9625 LearningRate 0.0000 Epoch: 19 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:31,093-Speed 3400.61 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 19 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:32:34,086-Speed 3421.94 samples/sec Loss 1.0179 LearningRate 0.0000 Epoch: 19 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:37,101-Speed 3397.47 samples/sec Loss 0.9251 LearningRate 0.0000 Epoch: 19 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:40,128-Speed 3384.03 samples/sec Loss 0.9257 LearningRate 0.0000 Epoch: 19 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:43,141-Speed 3399.40 samples/sec Loss 0.9082 LearningRate 0.0000 Epoch: 19 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:46,159-Speed 3394.50 samples/sec Loss 0.8859 LearningRate 0.0000 Epoch: 19 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:49,172-Speed 3398.52 samples/sec Loss 0.9825 LearningRate 0.0000 Epoch: 19 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:52,197-Speed 3386.98 samples/sec Loss 0.9582 LearningRate 0.0000 Epoch: 19 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:55,211-Speed 3398.07 samples/sec Loss 0.8966 LearningRate 0.0000 Epoch: 19 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:32:58,234-Speed 3388.10 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 19 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:33:42,435-[lfw][100000]XNorm: 22.076277 Training: 2022-04-11 09:33:42,435-[lfw][100000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 09:33:42,436-[lfw][100000]Accuracy-Highest: 0.99850 Training: 2022-04-11 09:34:33,652-[cfp_fp][100000]XNorm: 22.350055 Training: 2022-04-11 09:34:33,653-[cfp_fp][100000]Accuracy-Flip: 0.98929+-0.00547 Training: 2022-04-11 09:34:33,653-[cfp_fp][100000]Accuracy-Highest: 0.98986 Training: 2022-04-11 09:35:17,945-[agedb_30][100000]XNorm: 22.507673 Training: 2022-04-11 09:35:17,945-[agedb_30][100000]Accuracy-Flip: 0.98550+-0.00683 Training: 2022-04-11 09:35:17,946-[agedb_30][100000]Accuracy-Highest: 0.98550 Training: 2022-04-11 09:35:20,950-Speed 71.75 samples/sec Loss 0.9284 LearningRate 0.0000 Epoch: 19 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:35:24,000-Speed 3358.19 samples/sec Loss 0.9416 LearningRate 0.0000 Epoch: 19 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:27,028-Speed 3382.33 samples/sec Loss 0.9557 LearningRate 0.0000 Epoch: 19 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:30,042-Speed 3398.70 samples/sec Loss 1.0298 LearningRate 0.0000 Epoch: 19 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:33,038-Speed 3419.55 samples/sec Loss 0.9190 LearningRate 0.0000 Epoch: 19 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:36,041-Speed 3410.41 samples/sec Loss 0.8940 LearningRate 0.0000 Epoch: 19 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:39,056-Speed 3397.62 samples/sec Loss 0.9254 LearningRate 0.0000 Epoch: 19 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:42,059-Speed 3411.11 samples/sec Loss 0.9074 LearningRate 0.0000 Epoch: 19 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:45,059-Speed 3414.36 samples/sec Loss 0.9086 LearningRate 0.0000 Epoch: 19 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:48,077-Speed 3393.72 samples/sec Loss 0.9233 LearningRate 0.0000 Epoch: 19 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:51,079-Speed 3411.84 samples/sec Loss 0.9681 LearningRate 0.0000 Epoch: 19 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:35:54,083-Speed 3409.30 samples/sec Loss 1.0167 LearningRate 0.0000 Epoch: 19 Global Step: 100120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:35:57,064-Speed 3436.40 samples/sec Loss 1.0122 LearningRate 0.0000 Epoch: 19 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:00,067-Speed 3411.36 samples/sec Loss 0.9070 LearningRate 0.0000 Epoch: 19 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:03,068-Speed 3412.95 samples/sec Loss 0.9090 LearningRate 0.0000 Epoch: 19 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:06,083-Speed 3396.57 samples/sec Loss 0.9611 LearningRate 0.0000 Epoch: 19 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:09,094-Speed 3402.42 samples/sec Loss 0.9036 LearningRate 0.0000 Epoch: 19 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:12,109-Speed 3397.51 samples/sec Loss 0.9481 LearningRate 0.0000 Epoch: 19 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:15,115-Speed 3406.89 samples/sec Loss 1.0183 LearningRate 0.0000 Epoch: 19 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:18,097-Speed 3434.57 samples/sec Loss 0.9669 LearningRate 0.0000 Epoch: 19 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:21,098-Speed 3413.10 samples/sec Loss 0.9341 LearningRate 0.0000 Epoch: 19 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:24,149-Speed 3356.87 samples/sec Loss 0.9040 LearningRate 0.0000 Epoch: 19 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:27,160-Speed 3401.97 samples/sec Loss 0.9640 LearningRate 0.0000 Epoch: 19 Global Step: 100230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:30,178-Speed 3394.58 samples/sec Loss 0.9974 LearningRate 0.0000 Epoch: 19 Global Step: 100240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:33,178-Speed 3414.44 samples/sec Loss 0.9363 LearningRate 0.0000 Epoch: 19 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:36,181-Speed 3409.81 samples/sec Loss 0.9194 LearningRate 0.0000 Epoch: 19 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:39,190-Speed 3404.89 samples/sec Loss 0.9072 LearningRate 0.0000 Epoch: 19 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:42,193-Speed 3410.56 samples/sec Loss 0.9098 LearningRate 0.0000 Epoch: 19 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:45,197-Speed 3409.67 samples/sec Loss 0.9301 LearningRate 0.0000 Epoch: 19 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:36:48,207-Speed 3402.93 samples/sec Loss 0.9465 LearningRate 0.0000 Epoch: 19 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:51,217-Speed 3402.57 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:54,221-Speed 3410.82 samples/sec Loss 0.9577 LearningRate 0.0000 Epoch: 19 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:36:57,225-Speed 3409.43 samples/sec Loss 1.0226 LearningRate 0.0000 Epoch: 19 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:00,228-Speed 3411.79 samples/sec Loss 0.8721 LearningRate 0.0000 Epoch: 19 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:03,243-Speed 3396.82 samples/sec Loss 0.9601 LearningRate 0.0000 Epoch: 19 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:06,255-Speed 3400.34 samples/sec Loss 0.9460 LearningRate 0.0000 Epoch: 19 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:09,275-Speed 3391.35 samples/sec Loss 0.9599 LearningRate 0.0000 Epoch: 19 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:12,285-Speed 3403.24 samples/sec Loss 1.0474 LearningRate 0.0000 Epoch: 19 Global Step: 100380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:15,296-Speed 3401.66 samples/sec Loss 0.9770 LearningRate 0.0000 Epoch: 19 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:18,303-Speed 3406.73 samples/sec Loss 0.9273 LearningRate 0.0000 Epoch: 19 Global Step: 100400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:21,309-Speed 3407.05 samples/sec Loss 0.8961 LearningRate 0.0000 Epoch: 19 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:24,313-Speed 3409.46 samples/sec Loss 0.9129 LearningRate 0.0000 Epoch: 19 Global Step: 100420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:27,330-Speed 3395.39 samples/sec Loss 0.9632 LearningRate 0.0000 Epoch: 19 Global Step: 100430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:37:30,324-Speed 3421.51 samples/sec Loss 0.9396 LearningRate 0.0000 Epoch: 19 Global Step: 100440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:33,334-Speed 3402.50 samples/sec Loss 0.9458 LearningRate 0.0000 Epoch: 19 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:36,343-Speed 3403.54 samples/sec Loss 0.9397 LearningRate 0.0000 Epoch: 19 Global Step: 100460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:39,350-Speed 3406.57 samples/sec Loss 1.0101 LearningRate 0.0000 Epoch: 19 Global Step: 100470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:42,357-Speed 3406.13 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 19 Global Step: 100480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:45,363-Speed 3407.33 samples/sec Loss 0.8953 LearningRate 0.0000 Epoch: 19 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:48,371-Speed 3405.97 samples/sec Loss 0.8728 LearningRate 0.0000 Epoch: 19 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:51,393-Speed 3389.25 samples/sec Loss 0.9537 LearningRate 0.0000 Epoch: 19 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:54,420-Speed 3383.65 samples/sec Loss 0.9204 LearningRate 0.0000 Epoch: 19 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:37:57,428-Speed 3405.03 samples/sec Loss 0.8309 LearningRate 0.0000 Epoch: 19 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 09:38:00,438-Speed 3402.68 samples/sec Loss 0.9710 LearningRate 0.0000 Epoch: 19 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:03,457-Speed 3393.49 samples/sec Loss 0.9180 LearningRate 0.0000 Epoch: 19 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:06,502-Speed 3363.40 samples/sec Loss 0.9362 LearningRate 0.0000 Epoch: 19 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:09,521-Speed 3393.01 samples/sec Loss 0.8906 LearningRate 0.0000 Epoch: 19 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:12,524-Speed 3410.57 samples/sec Loss 0.9414 LearningRate 0.0000 Epoch: 19 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:15,534-Speed 3403.29 samples/sec Loss 0.9649 LearningRate 0.0000 Epoch: 19 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:18,543-Speed 3404.02 samples/sec Loss 0.9452 LearningRate 0.0000 Epoch: 19 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:21,551-Speed 3404.53 samples/sec Loss 0.9395 LearningRate 0.0000 Epoch: 19 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:24,558-Speed 3406.46 samples/sec Loss 0.9982 LearningRate 0.0000 Epoch: 19 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:27,581-Speed 3388.85 samples/sec Loss 0.9475 LearningRate 0.0000 Epoch: 19 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:30,569-Speed 3427.92 samples/sec Loss 0.9168 LearningRate 0.0000 Epoch: 19 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:33,578-Speed 3404.02 samples/sec Loss 0.9311 LearningRate 0.0000 Epoch: 19 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:36,583-Speed 3407.89 samples/sec Loss 0.9088 LearningRate 0.0000 Epoch: 19 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:39,596-Speed 3399.53 samples/sec Loss 0.9356 LearningRate 0.0000 Epoch: 19 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:42,611-Speed 3397.51 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 19 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:45,630-Speed 3393.68 samples/sec Loss 0.8974 LearningRate 0.0000 Epoch: 19 Global Step: 100690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:48,637-Speed 3405.51 samples/sec Loss 0.9758 LearningRate 0.0000 Epoch: 19 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:51,644-Speed 3406.96 samples/sec Loss 0.8962 LearningRate 0.0000 Epoch: 19 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:54,663-Speed 3392.78 samples/sec Loss 1.0226 LearningRate 0.0000 Epoch: 19 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:38:57,667-Speed 3408.78 samples/sec Loss 0.8879 LearningRate 0.0000 Epoch: 19 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:00,671-Speed 3410.51 samples/sec Loss 0.8932 LearningRate 0.0000 Epoch: 19 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:03,813-Speed 3259.75 samples/sec Loss 0.8953 LearningRate 0.0000 Epoch: 19 Global Step: 100750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:06,859-Speed 3361.99 samples/sec Loss 0.9205 LearningRate 0.0000 Epoch: 19 Global Step: 100760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:09,865-Speed 3407.51 samples/sec Loss 0.9130 LearningRate 0.0000 Epoch: 19 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:12,882-Speed 3395.63 samples/sec Loss 0.9892 LearningRate 0.0000 Epoch: 19 Global Step: 100780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:15,888-Speed 3407.43 samples/sec Loss 0.9326 LearningRate 0.0000 Epoch: 19 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:18,905-Speed 3394.31 samples/sec Loss 0.9500 LearningRate 0.0000 Epoch: 19 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:21,923-Speed 3394.37 samples/sec Loss 0.9113 LearningRate 0.0000 Epoch: 19 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:24,948-Speed 3385.74 samples/sec Loss 0.9228 LearningRate 0.0000 Epoch: 19 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:27,977-Speed 3381.82 samples/sec Loss 0.9844 LearningRate 0.0000 Epoch: 19 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:30,982-Speed 3408.87 samples/sec Loss 0.9045 LearningRate 0.0000 Epoch: 19 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:33,991-Speed 3403.98 samples/sec Loss 0.9299 LearningRate 0.0000 Epoch: 19 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:37,004-Speed 3398.76 samples/sec Loss 0.9090 LearningRate 0.0000 Epoch: 19 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:40,014-Speed 3403.75 samples/sec Loss 0.8382 LearningRate 0.0000 Epoch: 19 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:43,026-Speed 3400.55 samples/sec Loss 0.9167 LearningRate 0.0000 Epoch: 19 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:46,046-Speed 3391.44 samples/sec Loss 0.9142 LearningRate 0.0000 Epoch: 19 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:49,064-Speed 3393.13 samples/sec Loss 0.9096 LearningRate 0.0000 Epoch: 19 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:52,076-Speed 3400.45 samples/sec Loss 0.9084 LearningRate 0.0000 Epoch: 19 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:55,086-Speed 3403.99 samples/sec Loss 0.9571 LearningRate 0.0000 Epoch: 19 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:39:58,100-Speed 3397.98 samples/sec Loss 0.9287 LearningRate 0.0000 Epoch: 19 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:01,106-Speed 3407.34 samples/sec Loss 0.8343 LearningRate 0.0000 Epoch: 19 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:04,117-Speed 3402.17 samples/sec Loss 0.8951 LearningRate 0.0000 Epoch: 19 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:07,129-Speed 3399.99 samples/sec Loss 0.9420 LearningRate 0.0000 Epoch: 19 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:10,137-Speed 3405.57 samples/sec Loss 0.9020 LearningRate 0.0000 Epoch: 19 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:13,148-Speed 3402.53 samples/sec Loss 0.9236 LearningRate 0.0000 Epoch: 19 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:16,199-Speed 3357.21 samples/sec Loss 0.9573 LearningRate 0.0000 Epoch: 19 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:19,263-Speed 3342.34 samples/sec Loss 0.9286 LearningRate 0.0000 Epoch: 19 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:22,276-Speed 3399.50 samples/sec Loss 0.9515 LearningRate 0.0000 Epoch: 19 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:25,288-Speed 3400.78 samples/sec Loss 0.9158 LearningRate 0.0000 Epoch: 19 Global Step: 101020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:28,328-Speed 3368.60 samples/sec Loss 0.8932 LearningRate 0.0000 Epoch: 19 Global Step: 101030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:31,341-Speed 3399.85 samples/sec Loss 0.9671 LearningRate 0.0000 Epoch: 19 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 09:40:34,345-Speed 3410.27 samples/sec Loss 0.8500 LearningRate 0.0000 Epoch: 19 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:37,385-Speed 3369.19 samples/sec Loss 0.9696 LearningRate 0.0000 Epoch: 19 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:40,438-Speed 3355.85 samples/sec Loss 0.9420 LearningRate 0.0000 Epoch: 19 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:43,449-Speed 3401.25 samples/sec Loss 1.0150 LearningRate 0.0000 Epoch: 19 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:46,457-Speed 3405.35 samples/sec Loss 0.9804 LearningRate 0.0000 Epoch: 19 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:49,468-Speed 3401.80 samples/sec Loss 0.9802 LearningRate 0.0000 Epoch: 19 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:52,490-Speed 3388.49 samples/sec Loss 0.9941 LearningRate 0.0000 Epoch: 19 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:55,511-Speed 3390.94 samples/sec Loss 0.8778 LearningRate 0.0000 Epoch: 19 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:40:58,529-Speed 3393.66 samples/sec Loss 0.8918 LearningRate 0.0000 Epoch: 19 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:41:01,549-Speed 3391.93 samples/sec Loss 0.9865 LearningRate 0.0000 Epoch: 19 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:41:04,623-Speed 3332.65 samples/sec Loss 0.9510 LearningRate 0.0000 Epoch: 19 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 09:41:07,635-Speed 3399.95 samples/sec Loss 0.9520 LearningRate 0.0000 Epoch: 19 Global Step: 101160 Fp16 Grad Scale: 65536 Required: -0 hours