Training: 2022-04-11 10:17:26,645-rank_id: 0 Training: 2022-04-11 10:17:39,953-: margin_list [1.0, 0.5, 0.0] Training: 2022-04-11 10:17:39,954-: network r50 Training: 2022-04-11 10:17:39,954-: resume False Training: 2022-04-11 10:17:39,954-: output work_dirs/ms1mv3_r50 Training: 2022-04-11 10:17:39,954-: embedding_size 512 Training: 2022-04-11 10:17:39,954-: sample_rate 1.0 Training: 2022-04-11 10:17:39,954-: interclass_filtering_threshold0 Training: 2022-04-11 10:17:39,954-: fp16 True Training: 2022-04-11 10:17:39,954-: batch_size 128 Training: 2022-04-11 10:17:39,954-: optimizer sgd Training: 2022-04-11 10:17:39,954-: lr 0.1 Training: 2022-04-11 10:17:39,954-: momentum 0.9 Training: 2022-04-11 10:17:39,955-: weight_decay 0.0005 Training: 2022-04-11 10:17:39,955-: verbose 2000 Training: 2022-04-11 10:17:39,955-: frequent 10 Training: 2022-04-11 10:17:39,955-: dali False Training: 2022-04-11 10:17:39,955-: rec /train_tmp/ms1m-retinaface-t1 Training: 2022-04-11 10:17:39,955-: num_classes 93431 Training: 2022-04-11 10:17:39,955-: num_image 5179510 Training: 2022-04-11 10:17:39,955-: num_epoch 20 Training: 2022-04-11 10:17:39,955-: warmup_epoch 0 Training: 2022-04-11 10:17:39,955-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-11 10:17:39,955-: total_batch_size 1024 Training: 2022-04-11 10:17:39,955-: warmup_step 0 Training: 2022-04-11 10:17:39,955-: total_step 101160 Training: 2022-04-11 10:18:47,693-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-11 10:18:51,063-Speed 5516.93 samples/sec Loss 46.4909 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-11 10:18:52,897-Speed 5584.33 samples/sec Loss 47.7616 LearningRate 0.0999 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-11 10:18:54,750-Speed 5529.89 samples/sec Loss 48.9438 LearningRate 0.0999 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-11 10:18:56,555-Speed 5677.36 samples/sec Loss 47.4847 LearningRate 0.0999 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 10:18:58,388-Speed 5589.57 samples/sec Loss 47.7831 LearningRate 0.0999 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 10:19:00,209-Speed 5626.29 samples/sec Loss 47.4355 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-11 10:19:02,042-Speed 5587.78 samples/sec Loss 47.4098 LearningRate 0.0998 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-11 10:19:03,851-Speed 5666.01 samples/sec Loss 46.9978 LearningRate 0.0998 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-11 10:19:05,670-Speed 5632.45 samples/sec Loss 46.8949 LearningRate 0.0998 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-11 10:19:07,493-Speed 5618.04 samples/sec Loss 46.5454 LearningRate 0.0998 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 10:19:09,309-Speed 5643.97 samples/sec Loss 46.6333 LearningRate 0.0998 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:11,127-Speed 5636.34 samples/sec Loss 46.4097 LearningRate 0.0997 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:12,938-Speed 5657.22 samples/sec Loss 46.2107 LearningRate 0.0997 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:14,760-Speed 5621.95 samples/sec Loss 46.0634 LearningRate 0.0997 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:16,563-Speed 5680.73 samples/sec Loss 45.8538 LearningRate 0.0997 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:18,397-Speed 5588.11 samples/sec Loss 45.6578 LearningRate 0.0997 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:20,208-Speed 5659.81 samples/sec Loss 45.4912 LearningRate 0.0996 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:22,031-Speed 5620.45 samples/sec Loss 45.3730 LearningRate 0.0996 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 10:19:23,882-Speed 5535.05 samples/sec Loss 45.0480 LearningRate 0.0996 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:19:25,728-Speed 5549.95 samples/sec Loss 45.0384 LearningRate 0.0996 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:27,592-Speed 5495.80 samples/sec Loss 44.6643 LearningRate 0.0996 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:29,427-Speed 5584.33 samples/sec Loss 44.5782 LearningRate 0.0995 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:31,237-Speed 5661.30 samples/sec Loss 44.4282 LearningRate 0.0995 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:33,043-Speed 5674.77 samples/sec Loss 44.2476 LearningRate 0.0995 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:34,854-Speed 5655.25 samples/sec Loss 44.1646 LearningRate 0.0995 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:36,702-Speed 5544.66 samples/sec Loss 43.8462 LearningRate 0.0995 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:38,514-Speed 5653.97 samples/sec Loss 43.6716 LearningRate 0.0994 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:40,360-Speed 5549.07 samples/sec Loss 43.5314 LearningRate 0.0994 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:42,194-Speed 5587.34 samples/sec Loss 43.3533 LearningRate 0.0994 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:19:43,998-Speed 5679.95 samples/sec Loss 43.2353 LearningRate 0.0994 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:45,810-Speed 5653.80 samples/sec Loss 43.0359 LearningRate 0.0994 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:47,613-Speed 5684.82 samples/sec Loss 42.8901 LearningRate 0.0993 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:49,446-Speed 5588.53 samples/sec Loss 42.7585 LearningRate 0.0993 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:51,250-Speed 5678.35 samples/sec Loss 42.5605 LearningRate 0.0993 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:53,050-Speed 5692.07 samples/sec Loss 42.3554 LearningRate 0.0993 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:54,854-Speed 5679.23 samples/sec Loss 42.2043 LearningRate 0.0993 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:56,650-Speed 5703.35 samples/sec Loss 42.0335 LearningRate 0.0993 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:19:58,466-Speed 5643.58 samples/sec Loss 41.7946 LearningRate 0.0992 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:00,264-Speed 5698.82 samples/sec Loss 41.8366 LearningRate 0.0992 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:02,081-Speed 5637.54 samples/sec Loss 41.6395 LearningRate 0.0992 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:03,889-Speed 5668.54 samples/sec Loss 41.3742 LearningRate 0.0992 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:05,715-Speed 5610.14 samples/sec Loss 41.3265 LearningRate 0.0992 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:07,508-Speed 5711.38 samples/sec Loss 41.2555 LearningRate 0.0991 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:09,306-Speed 5699.33 samples/sec Loss 41.0300 LearningRate 0.0991 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:11,100-Speed 5712.01 samples/sec Loss 40.8966 LearningRate 0.0991 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:12,909-Speed 5664.07 samples/sec Loss 40.7091 LearningRate 0.0991 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:14,729-Speed 5630.23 samples/sec Loss 40.4529 LearningRate 0.0991 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:16,526-Speed 5701.34 samples/sec Loss 40.3084 LearningRate 0.0990 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:18,320-Speed 5710.11 samples/sec Loss 40.1962 LearningRate 0.0990 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:20,136-Speed 5641.95 samples/sec Loss 39.9974 LearningRate 0.0990 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:20:21,935-Speed 5694.53 samples/sec Loss 39.8377 LearningRate 0.0990 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:20:23,739-Speed 5680.85 samples/sec Loss 39.7291 LearningRate 0.0990 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:25,537-Speed 5699.58 samples/sec Loss 39.6395 LearningRate 0.0989 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:27,358-Speed 5623.74 samples/sec Loss 39.4790 LearningRate 0.0989 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:29,151-Speed 5714.80 samples/sec Loss 39.4034 LearningRate 0.0989 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:30,957-Speed 5674.18 samples/sec Loss 39.1978 LearningRate 0.0989 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:32,753-Speed 5706.43 samples/sec Loss 38.9289 LearningRate 0.0989 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:34,557-Speed 5677.43 samples/sec Loss 38.6782 LearningRate 0.0988 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:36,378-Speed 5626.01 samples/sec Loss 38.6011 LearningRate 0.0988 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:38,174-Speed 5704.00 samples/sec Loss 38.4482 LearningRate 0.0988 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:39,979-Speed 5676.18 samples/sec Loss 38.3041 LearningRate 0.0988 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:20:41,778-Speed 5692.56 samples/sec Loss 38.1693 LearningRate 0.0988 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:43,580-Speed 5686.96 samples/sec Loss 38.0635 LearningRate 0.0987 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:45,402-Speed 5622.71 samples/sec Loss 37.8941 LearningRate 0.0987 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:47,248-Speed 5613.31 samples/sec Loss 37.7611 LearningRate 0.0987 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:49,058-Speed 5660.32 samples/sec Loss 37.6149 LearningRate 0.0987 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:50,853-Speed 5708.71 samples/sec Loss 37.3386 LearningRate 0.0987 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:52,666-Speed 5648.91 samples/sec Loss 37.1413 LearningRate 0.0986 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:54,463-Speed 5702.43 samples/sec Loss 36.9727 LearningRate 0.0986 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:56,274-Speed 5659.32 samples/sec Loss 36.8127 LearningRate 0.0986 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:58,072-Speed 5698.27 samples/sec Loss 36.6828 LearningRate 0.0986 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:20:59,885-Speed 5649.51 samples/sec Loss 36.5986 LearningRate 0.0986 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 10:21:01,681-Speed 5706.18 samples/sec Loss 36.3823 LearningRate 0.0985 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:03,517-Speed 5580.56 samples/sec Loss 36.2943 LearningRate 0.0985 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:05,316-Speed 5693.28 samples/sec Loss 36.0890 LearningRate 0.0985 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:07,128-Speed 5654.85 samples/sec Loss 35.8788 LearningRate 0.0985 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:08,923-Speed 5707.71 samples/sec Loss 35.8505 LearningRate 0.0985 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:10,732-Speed 5666.43 samples/sec Loss 35.6045 LearningRate 0.0984 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:12,538-Speed 5670.39 samples/sec Loss 35.4374 LearningRate 0.0984 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:14,347-Speed 5664.52 samples/sec Loss 35.2312 LearningRate 0.0984 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:16,151-Speed 5678.98 samples/sec Loss 35.0735 LearningRate 0.0984 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:17,979-Speed 5607.07 samples/sec Loss 35.0335 LearningRate 0.0984 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:19,766-Speed 5732.24 samples/sec Loss 34.7683 LearningRate 0.0983 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:21,560-Speed 5709.84 samples/sec Loss 34.7170 LearningRate 0.0983 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:23,371-Speed 5660.20 samples/sec Loss 34.6129 LearningRate 0.0983 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:25,169-Speed 5698.14 samples/sec Loss 34.2699 LearningRate 0.0983 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:26,983-Speed 5645.66 samples/sec Loss 34.1601 LearningRate 0.0983 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:28,785-Speed 5688.72 samples/sec Loss 33.9710 LearningRate 0.0982 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:30,603-Speed 5635.10 samples/sec Loss 33.7977 LearningRate 0.0982 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:32,397-Speed 5710.21 samples/sec Loss 33.6662 LearningRate 0.0982 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:34,190-Speed 5715.68 samples/sec Loss 33.5587 LearningRate 0.0982 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:35,996-Speed 5673.12 samples/sec Loss 33.2995 LearningRate 0.0982 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:37,791-Speed 5707.13 samples/sec Loss 33.2040 LearningRate 0.0982 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 10:21:39,600-Speed 5664.66 samples/sec Loss 33.0864 LearningRate 0.0981 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:41,409-Speed 5660.71 samples/sec Loss 32.8163 LearningRate 0.0981 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:43,224-Speed 5648.44 samples/sec Loss 32.6597 LearningRate 0.0981 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:45,024-Speed 5690.04 samples/sec Loss 32.6784 LearningRate 0.0981 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:46,823-Speed 5695.66 samples/sec Loss 32.3713 LearningRate 0.0981 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:48,638-Speed 5645.27 samples/sec Loss 32.2157 LearningRate 0.0980 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:50,437-Speed 5693.12 samples/sec Loss 32.0790 LearningRate 0.0980 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:52,256-Speed 5683.34 samples/sec Loss 31.9050 LearningRate 0.0980 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:54,076-Speed 5629.71 samples/sec Loss 31.7893 LearningRate 0.0980 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:55,875-Speed 5695.23 samples/sec Loss 31.5327 LearningRate 0.0980 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:57,665-Speed 5722.37 samples/sec Loss 31.3468 LearningRate 0.0979 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:21:59,483-Speed 5637.27 samples/sec Loss 31.4009 LearningRate 0.0979 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:01,284-Speed 5689.12 samples/sec Loss 30.9880 LearningRate 0.0979 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:03,094-Speed 5658.68 samples/sec Loss 30.8924 LearningRate 0.0979 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:04,893-Speed 5697.56 samples/sec Loss 30.6821 LearningRate 0.0979 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:06,707-Speed 5647.69 samples/sec Loss 30.6243 LearningRate 0.0978 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:08,503-Speed 5706.15 samples/sec Loss 30.4638 LearningRate 0.0978 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:10,314-Speed 5655.32 samples/sec Loss 30.2822 LearningRate 0.0978 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:12,138-Speed 5617.76 samples/sec Loss 29.8347 LearningRate 0.0978 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:13,954-Speed 5640.17 samples/sec Loss 30.1160 LearningRate 0.0978 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:15,763-Speed 5664.17 samples/sec Loss 29.8251 LearningRate 0.0977 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:17,568-Speed 5675.98 samples/sec Loss 29.7573 LearningRate 0.0977 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:19,401-Speed 5589.08 samples/sec Loss 29.4313 LearningRate 0.0977 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:21,199-Speed 5700.19 samples/sec Loss 29.3663 LearningRate 0.0977 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:23,014-Speed 5643.94 samples/sec Loss 29.2106 LearningRate 0.0977 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:24,828-Speed 5647.70 samples/sec Loss 28.8505 LearningRate 0.0976 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:26,666-Speed 5574.80 samples/sec Loss 28.8590 LearningRate 0.0976 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:28,473-Speed 5670.50 samples/sec Loss 28.8144 LearningRate 0.0976 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:30,293-Speed 5629.30 samples/sec Loss 28.7098 LearningRate 0.0976 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:32,091-Speed 5697.73 samples/sec Loss 28.4814 LearningRate 0.0976 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:33,896-Speed 5677.03 samples/sec Loss 28.4194 LearningRate 0.0975 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:35,748-Speed 5530.42 samples/sec Loss 28.1322 LearningRate 0.0975 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:37,585-Speed 5576.20 samples/sec Loss 28.0226 LearningRate 0.0975 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:39,420-Speed 5586.89 samples/sec Loss 28.0697 LearningRate 0.0975 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:41,236-Speed 5638.19 samples/sec Loss 27.8822 LearningRate 0.0975 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:43,036-Speed 5694.25 samples/sec Loss 27.7145 LearningRate 0.0974 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:44,837-Speed 5687.94 samples/sec Loss 27.7751 LearningRate 0.0974 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:46,658-Speed 5627.21 samples/sec Loss 27.3507 LearningRate 0.0974 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:48,456-Speed 5699.84 samples/sec Loss 27.2585 LearningRate 0.0974 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:50,277-Speed 5624.16 samples/sec Loss 27.2788 LearningRate 0.0974 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:52,092-Speed 5645.51 samples/sec Loss 26.8259 LearningRate 0.0973 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 10:22:53,917-Speed 5613.98 samples/sec Loss 26.7544 LearningRate 0.0973 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:55,719-Speed 5684.27 samples/sec Loss 26.6459 LearningRate 0.0973 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:57,517-Speed 5698.19 samples/sec Loss 26.6205 LearningRate 0.0973 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:22:59,330-Speed 5652.76 samples/sec Loss 26.3864 LearningRate 0.0973 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:01,150-Speed 5627.31 samples/sec Loss 26.3627 LearningRate 0.0973 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:02,977-Speed 5609.69 samples/sec Loss 26.3985 LearningRate 0.0972 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:04,781-Speed 5678.04 samples/sec Loss 26.1920 LearningRate 0.0972 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:06,593-Speed 5653.96 samples/sec Loss 26.1284 LearningRate 0.0972 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:08,414-Speed 5627.57 samples/sec Loss 26.1102 LearningRate 0.0972 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:10,210-Speed 5703.54 samples/sec Loss 25.8820 LearningRate 0.0972 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:12,022-Speed 5654.75 samples/sec Loss 25.8636 LearningRate 0.0971 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:13,847-Speed 5616.44 samples/sec Loss 25.2601 LearningRate 0.0971 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:15,654-Speed 5669.26 samples/sec Loss 25.2155 LearningRate 0.0971 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:17,473-Speed 5632.37 samples/sec Loss 25.0876 LearningRate 0.0971 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:19,287-Speed 5647.31 samples/sec Loss 25.2609 LearningRate 0.0971 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:21,102-Speed 5644.65 samples/sec Loss 24.8778 LearningRate 0.0970 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:22,901-Speed 5695.58 samples/sec Loss 24.7665 LearningRate 0.0970 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:24,697-Speed 5702.74 samples/sec Loss 24.8773 LearningRate 0.0970 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:26,524-Speed 5608.65 samples/sec Loss 24.9178 LearningRate 0.0970 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:28,321-Speed 5703.35 samples/sec Loss 24.6759 LearningRate 0.0970 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:30,130-Speed 5662.23 samples/sec Loss 24.5354 LearningRate 0.0969 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:31,928-Speed 5699.28 samples/sec Loss 24.4808 LearningRate 0.0969 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:33,741-Speed 5651.40 samples/sec Loss 24.2417 LearningRate 0.0969 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:35,544-Speed 5682.56 samples/sec Loss 24.0406 LearningRate 0.0969 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:37,354-Speed 5658.04 samples/sec Loss 24.0455 LearningRate 0.0969 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:39,176-Speed 5625.50 samples/sec Loss 24.0369 LearningRate 0.0968 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:41,000-Speed 5616.97 samples/sec Loss 24.0231 LearningRate 0.0968 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:42,827-Speed 5606.98 samples/sec Loss 23.9764 LearningRate 0.0968 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:44,624-Speed 5703.12 samples/sec Loss 23.6806 LearningRate 0.0968 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:46,442-Speed 5635.10 samples/sec Loss 23.7135 LearningRate 0.0968 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:48,254-Speed 5654.70 samples/sec Loss 23.4787 LearningRate 0.0967 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 10:23:50,067-Speed 5651.36 samples/sec Loss 23.4784 LearningRate 0.0967 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:23:51,884-Speed 5638.47 samples/sec Loss 23.3026 LearningRate 0.0967 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:23:53,687-Speed 5683.30 samples/sec Loss 23.1758 LearningRate 0.0967 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:23:55,495-Speed 5667.00 samples/sec Loss 23.4219 LearningRate 0.0967 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:23:57,298-Speed 5680.65 samples/sec Loss 23.2719 LearningRate 0.0966 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:23:59,106-Speed 5667.65 samples/sec Loss 23.0730 LearningRate 0.0966 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:00,909-Speed 5684.11 samples/sec Loss 22.7610 LearningRate 0.0966 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:02,725-Speed 5640.99 samples/sec Loss 22.7564 LearningRate 0.0966 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:04,528-Speed 5682.58 samples/sec Loss 22.6443 LearningRate 0.0966 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:06,359-Speed 5593.19 samples/sec Loss 22.8637 LearningRate 0.0966 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:08,161-Speed 5687.03 samples/sec Loss 22.4802 LearningRate 0.0965 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 10:24:09,971-Speed 5660.80 samples/sec Loss 22.2499 LearningRate 0.0965 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:11,774-Speed 5683.84 samples/sec Loss 22.2451 LearningRate 0.0965 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:13,598-Speed 5616.67 samples/sec Loss 22.4408 LearningRate 0.0965 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:15,404-Speed 5672.11 samples/sec Loss 22.2847 LearningRate 0.0965 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:17,235-Speed 5596.39 samples/sec Loss 22.0673 LearningRate 0.0964 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:19,038-Speed 5683.79 samples/sec Loss 22.1250 LearningRate 0.0964 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:20,847-Speed 5661.35 samples/sec Loss 21.9384 LearningRate 0.0964 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:22,653-Speed 5674.06 samples/sec Loss 21.9013 LearningRate 0.0964 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:24,483-Speed 5596.88 samples/sec Loss 21.8593 LearningRate 0.0964 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:26,290-Speed 5672.17 samples/sec Loss 21.7724 LearningRate 0.0963 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:28,114-Speed 5616.50 samples/sec Loss 21.7505 LearningRate 0.0963 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 10:24:29,929-Speed 5643.44 samples/sec Loss 21.6520 LearningRate 0.0963 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:31,744-Speed 5645.07 samples/sec Loss 21.5013 LearningRate 0.0963 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:33,559-Speed 5644.86 samples/sec Loss 21.3828 LearningRate 0.0963 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:35,381-Speed 5622.95 samples/sec Loss 21.2292 LearningRate 0.0962 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:37,189-Speed 5665.88 samples/sec Loss 21.3528 LearningRate 0.0962 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:39,012-Speed 5621.17 samples/sec Loss 21.3797 LearningRate 0.0962 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:40,835-Speed 5621.43 samples/sec Loss 21.2023 LearningRate 0.0962 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:42,645-Speed 5660.21 samples/sec Loss 21.0634 LearningRate 0.0962 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:44,445-Speed 5688.78 samples/sec Loss 20.9776 LearningRate 0.0961 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 10:24:46,252-Speed 5671.47 samples/sec Loss 20.9944 LearningRate 0.0961 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 10:24:48,053-Speed 5691.88 samples/sec Loss 20.9149 LearningRate 0.0961 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 10:24:49,851-Speed 5697.41 samples/sec Loss 20.7147 LearningRate 0.0961 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 10:25:17,220-[lfw][2000]XNorm: 22.104179 Training: 2022-04-11 10:25:17,220-[lfw][2000]Accuracy-Flip: 0.98233+-0.00429 Training: 2022-04-11 10:25:17,221-[lfw][2000]Accuracy-Highest: 0.98233 Training: 2022-04-11 10:25:48,545-[cfp_fp][2000]XNorm: 19.029918 Training: 2022-04-11 10:25:48,545-[cfp_fp][2000]Accuracy-Flip: 0.78843+-0.01866 Training: 2022-04-11 10:25:48,546-[cfp_fp][2000]Accuracy-Highest: 0.78843 Training: 2022-04-11 10:26:15,466-[agedb_30][2000]XNorm: 21.376326 Training: 2022-04-11 10:26:15,466-[agedb_30][2000]Accuracy-Flip: 0.88083+-0.02024 Training: 2022-04-11 10:26:15,467-[agedb_30][2000]Accuracy-Highest: 0.88083 Training: 2022-04-11 10:26:17,293-Speed 117.11 samples/sec Loss 20.7624 LearningRate 0.0961 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:19,124-Speed 5593.35 samples/sec Loss 20.6601 LearningRate 0.0960 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:20,920-Speed 5702.51 samples/sec Loss 20.5377 LearningRate 0.0960 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:22,722-Speed 5686.54 samples/sec Loss 20.7418 LearningRate 0.0960 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:24,526-Speed 5678.60 samples/sec Loss 20.5106 LearningRate 0.0960 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:26,341-Speed 5644.01 samples/sec Loss 20.4104 LearningRate 0.0960 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:28,152-Speed 5657.41 samples/sec Loss 20.5386 LearningRate 0.0959 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 10:26:29,961-Speed 5665.21 samples/sec Loss 20.3654 LearningRate 0.0959 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:31,759-Speed 5697.23 samples/sec Loss 20.0453 LearningRate 0.0959 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:33,580-Speed 5623.89 samples/sec Loss 20.2358 LearningRate 0.0959 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:35,381-Speed 5690.06 samples/sec Loss 19.9373 LearningRate 0.0959 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:37,193-Speed 5650.40 samples/sec Loss 20.2047 LearningRate 0.0959 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:39,015-Speed 5626.84 samples/sec Loss 19.9315 LearningRate 0.0958 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:40,832-Speed 5637.66 samples/sec Loss 20.0434 LearningRate 0.0958 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:42,633-Speed 5686.89 samples/sec Loss 19.8979 LearningRate 0.0958 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:44,434-Speed 5689.14 samples/sec Loss 19.7741 LearningRate 0.0958 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:46,239-Speed 5678.06 samples/sec Loss 19.5920 LearningRate 0.0958 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:26:48,060-Speed 5625.33 samples/sec Loss 19.8045 LearningRate 0.0957 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:49,860-Speed 5692.10 samples/sec Loss 19.4399 LearningRate 0.0957 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:51,680-Speed 5627.83 samples/sec Loss 19.6926 LearningRate 0.0957 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:53,479-Speed 5694.70 samples/sec Loss 19.7728 LearningRate 0.0957 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:55,279-Speed 5692.64 samples/sec Loss 19.5057 LearningRate 0.0957 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:57,075-Speed 5702.14 samples/sec Loss 19.4345 LearningRate 0.0956 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:26:58,874-Speed 5696.22 samples/sec Loss 19.3661 LearningRate 0.0956 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:00,704-Speed 5599.19 samples/sec Loss 19.4497 LearningRate 0.0956 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:02,513-Speed 5664.18 samples/sec Loss 19.4401 LearningRate 0.0956 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:04,325-Speed 5651.25 samples/sec Loss 19.3271 LearningRate 0.0956 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:06,117-Speed 5719.27 samples/sec Loss 19.3011 LearningRate 0.0955 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:07,936-Speed 5632.05 samples/sec Loss 19.3059 LearningRate 0.0955 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:09,737-Speed 5689.94 samples/sec Loss 19.1645 LearningRate 0.0955 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:11,578-Speed 5563.22 samples/sec Loss 19.0775 LearningRate 0.0955 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:13,380-Speed 5685.62 samples/sec Loss 19.0453 LearningRate 0.0955 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:15,203-Speed 5619.51 samples/sec Loss 19.1023 LearningRate 0.0954 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:17,011-Speed 5667.10 samples/sec Loss 19.0684 LearningRate 0.0954 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:18,843-Speed 5593.29 samples/sec Loss 18.8219 LearningRate 0.0954 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:20,640-Speed 5701.81 samples/sec Loss 18.9022 LearningRate 0.0954 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:22,440-Speed 5689.72 samples/sec Loss 18.7882 LearningRate 0.0954 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:27:24,240-Speed 5692.55 samples/sec Loss 18.8524 LearningRate 0.0953 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:26,040-Speed 5689.95 samples/sec Loss 18.5639 LearningRate 0.0953 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:27,864-Speed 5617.60 samples/sec Loss 18.6735 LearningRate 0.0953 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:29,666-Speed 5685.06 samples/sec Loss 18.7788 LearningRate 0.0953 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:31,466-Speed 5693.08 samples/sec Loss 18.7434 LearningRate 0.0953 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:33,284-Speed 5632.91 samples/sec Loss 18.7743 LearningRate 0.0953 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:35,085-Speed 5688.02 samples/sec Loss 18.5661 LearningRate 0.0952 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:36,895-Speed 5661.73 samples/sec Loss 18.4116 LearningRate 0.0952 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:38,799-Speed 5379.67 samples/sec Loss 18.2477 LearningRate 0.0952 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:40,632-Speed 5589.32 samples/sec Loss 18.3961 LearningRate 0.0952 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:42,440-Speed 5666.78 samples/sec Loss 18.4272 LearningRate 0.0952 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:44,243-Speed 5679.13 samples/sec Loss 18.1715 LearningRate 0.0951 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:46,047-Speed 5679.42 samples/sec Loss 18.3040 LearningRate 0.0951 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:47,864-Speed 5639.15 samples/sec Loss 18.0568 LearningRate 0.0951 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:49,736-Speed 5473.14 samples/sec Loss 18.1629 LearningRate 0.0951 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:51,554-Speed 5633.34 samples/sec Loss 18.0580 LearningRate 0.0951 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:53,379-Speed 5613.23 samples/sec Loss 18.0795 LearningRate 0.0950 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:55,183-Speed 5679.52 samples/sec Loss 18.2306 LearningRate 0.0950 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:57,012-Speed 5603.53 samples/sec Loss 18.1229 LearningRate 0.0950 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:27:58,814-Speed 5684.06 samples/sec Loss 18.0125 LearningRate 0.0950 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:00,623-Speed 5665.61 samples/sec Loss 18.0086 LearningRate 0.0950 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:02,454-Speed 5594.78 samples/sec Loss 18.0573 LearningRate 0.0949 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:04,261-Speed 5667.99 samples/sec Loss 17.9344 LearningRate 0.0949 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:06,066-Speed 5677.94 samples/sec Loss 17.9421 LearningRate 0.0949 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:07,878-Speed 5653.10 samples/sec Loss 17.7839 LearningRate 0.0949 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:09,678-Speed 5693.27 samples/sec Loss 17.8511 LearningRate 0.0949 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:11,478-Speed 5689.93 samples/sec Loss 17.9065 LearningRate 0.0948 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:13,282-Speed 5679.01 samples/sec Loss 17.8711 LearningRate 0.0948 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:15,104-Speed 5622.01 samples/sec Loss 17.7877 LearningRate 0.0948 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:16,907-Speed 5681.33 samples/sec Loss 17.7784 LearningRate 0.0948 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:18,716-Speed 5663.20 samples/sec Loss 17.9212 LearningRate 0.0948 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:28:20,521-Speed 5677.89 samples/sec Loss 17.6326 LearningRate 0.0948 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:22,349-Speed 5604.34 samples/sec Loss 17.5514 LearningRate 0.0947 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:24,226-Speed 5455.67 samples/sec Loss 17.5808 LearningRate 0.0947 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:26,073-Speed 5549.58 samples/sec Loss 17.4657 LearningRate 0.0947 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:27,879-Speed 5672.15 samples/sec Loss 17.3828 LearningRate 0.0947 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:29,713-Speed 5587.40 samples/sec Loss 17.3145 LearningRate 0.0947 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:31,516-Speed 5680.80 samples/sec Loss 17.5230 LearningRate 0.0946 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:33,322-Speed 5670.54 samples/sec Loss 17.3036 LearningRate 0.0946 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:35,139-Speed 5640.43 samples/sec Loss 17.1966 LearningRate 0.0946 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:36,941-Speed 5684.67 samples/sec Loss 17.4028 LearningRate 0.0946 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:38,767-Speed 5610.31 samples/sec Loss 17.5928 LearningRate 0.0946 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:40,570-Speed 5683.33 samples/sec Loss 17.1422 LearningRate 0.0945 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:42,398-Speed 5602.24 samples/sec Loss 17.1664 LearningRate 0.0945 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:44,217-Speed 5635.03 samples/sec Loss 17.2146 LearningRate 0.0945 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:46,039-Speed 5623.34 samples/sec Loss 17.2919 LearningRate 0.0945 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:47,860-Speed 5623.83 samples/sec Loss 16.9046 LearningRate 0.0945 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:49,678-Speed 5635.70 samples/sec Loss 17.1070 LearningRate 0.0944 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:51,505-Speed 5609.23 samples/sec Loss 17.1812 LearningRate 0.0944 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:53,314-Speed 5662.79 samples/sec Loss 16.9266 LearningRate 0.0944 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:55,126-Speed 5654.26 samples/sec Loss 17.0160 LearningRate 0.0944 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:56,931-Speed 5676.89 samples/sec Loss 17.0089 LearningRate 0.0944 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:28:58,757-Speed 5612.08 samples/sec Loss 17.2049 LearningRate 0.0943 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:00,559-Speed 5684.92 samples/sec Loss 17.0994 LearningRate 0.0943 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:02,377-Speed 5635.19 samples/sec Loss 16.9177 LearningRate 0.0943 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:04,192-Speed 5643.56 samples/sec Loss 16.8105 LearningRate 0.0943 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:06,014-Speed 5622.47 samples/sec Loss 16.8842 LearningRate 0.0943 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:07,826-Speed 5653.67 samples/sec Loss 16.9343 LearningRate 0.0943 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:09,627-Speed 5688.39 samples/sec Loss 16.7080 LearningRate 0.0942 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:11,454-Speed 5609.11 samples/sec Loss 16.8043 LearningRate 0.0942 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:13,263-Speed 5662.99 samples/sec Loss 16.7278 LearningRate 0.0942 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:15,081-Speed 5634.81 samples/sec Loss 16.5420 LearningRate 0.0942 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:16,890-Speed 5663.92 samples/sec Loss 16.9258 LearningRate 0.0942 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:18,715-Speed 5612.62 samples/sec Loss 16.5707 LearningRate 0.0941 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:29:20,532-Speed 5638.03 samples/sec Loss 16.5548 LearningRate 0.0941 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:22,344-Speed 5655.88 samples/sec Loss 16.5275 LearningRate 0.0941 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:24,146-Speed 5684.69 samples/sec Loss 16.6147 LearningRate 0.0941 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:25,958-Speed 5654.26 samples/sec Loss 16.6600 LearningRate 0.0941 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:27,758-Speed 5689.52 samples/sec Loss 16.7889 LearningRate 0.0940 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:29,563-Speed 5681.31 samples/sec Loss 16.5736 LearningRate 0.0940 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:31,381-Speed 5638.44 samples/sec Loss 16.6153 LearningRate 0.0940 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:33,182-Speed 5686.47 samples/sec Loss 16.4371 LearningRate 0.0940 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:35,024-Speed 5559.71 samples/sec Loss 16.4638 LearningRate 0.0940 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:36,837-Speed 5650.73 samples/sec Loss 16.4705 LearningRate 0.0939 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:38,650-Speed 5652.17 samples/sec Loss 16.3947 LearningRate 0.0939 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:40,467-Speed 5638.68 samples/sec Loss 16.5907 LearningRate 0.0939 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:42,296-Speed 5600.07 samples/sec Loss 16.3097 LearningRate 0.0939 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:44,126-Speed 5600.90 samples/sec Loss 16.4525 LearningRate 0.0939 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:45,927-Speed 5689.51 samples/sec Loss 16.4288 LearningRate 0.0939 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:47,755-Speed 5605.11 samples/sec Loss 16.1946 LearningRate 0.0938 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:49,569-Speed 5648.88 samples/sec Loss 16.2788 LearningRate 0.0938 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:51,378-Speed 5662.51 samples/sec Loss 16.2976 LearningRate 0.0938 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:53,207-Speed 5600.47 samples/sec Loss 16.2102 LearningRate 0.0938 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:55,012-Speed 5674.41 samples/sec Loss 16.0012 LearningRate 0.0938 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:56,836-Speed 5618.29 samples/sec Loss 16.1334 LearningRate 0.0937 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:29:58,647-Speed 5657.22 samples/sec Loss 16.0630 LearningRate 0.0937 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:00,463-Speed 5642.79 samples/sec Loss 15.8742 LearningRate 0.0937 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:02,295-Speed 5589.75 samples/sec Loss 15.9853 LearningRate 0.0937 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:04,109-Speed 5647.51 samples/sec Loss 16.0923 LearningRate 0.0937 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:05,914-Speed 5676.75 samples/sec Loss 16.1936 LearningRate 0.0936 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:07,741-Speed 5605.60 samples/sec Loss 15.9101 LearningRate 0.0936 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:09,563-Speed 5627.82 samples/sec Loss 16.0635 LearningRate 0.0936 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:11,372-Speed 5663.23 samples/sec Loss 16.1890 LearningRate 0.0936 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:13,177-Speed 5674.74 samples/sec Loss 16.0598 LearningRate 0.0936 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:14,997-Speed 5627.49 samples/sec Loss 15.9004 LearningRate 0.0935 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:16,807-Speed 5662.53 samples/sec Loss 15.8386 LearningRate 0.0935 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:18,645-Speed 5576.50 samples/sec Loss 15.7893 LearningRate 0.0935 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:20,451-Speed 5670.59 samples/sec Loss 15.8763 LearningRate 0.0935 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:22,280-Speed 5602.75 samples/sec Loss 15.7363 LearningRate 0.0935 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:24,086-Speed 5672.06 samples/sec Loss 15.8609 LearningRate 0.0934 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:25,913-Speed 5608.90 samples/sec Loss 15.9805 LearningRate 0.0934 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:27,725-Speed 5653.60 samples/sec Loss 15.8404 LearningRate 0.0934 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:29,547-Speed 5624.70 samples/sec Loss 15.8264 LearningRate 0.0934 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:31,352-Speed 5673.39 samples/sec Loss 15.9259 LearningRate 0.0934 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:33,159-Speed 5671.40 samples/sec Loss 15.6376 LearningRate 0.0934 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:34,977-Speed 5634.64 samples/sec Loss 15.5486 LearningRate 0.0933 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:36,780-Speed 5680.82 samples/sec Loss 15.6729 LearningRate 0.0933 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:38,625-Speed 5551.79 samples/sec Loss 15.6851 LearningRate 0.0933 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:40,436-Speed 5660.08 samples/sec Loss 15.6951 LearningRate 0.0933 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:42,264-Speed 5603.43 samples/sec Loss 15.8545 LearningRate 0.0933 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:44,079-Speed 5645.46 samples/sec Loss 15.6127 LearningRate 0.0932 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:45,886-Speed 5669.94 samples/sec Loss 15.8901 LearningRate 0.0932 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:47,697-Speed 5657.51 samples/sec Loss 15.4876 LearningRate 0.0932 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:49,551-Speed 5524.10 samples/sec Loss 15.5882 LearningRate 0.0932 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:51,356-Speed 5676.97 samples/sec Loss 15.4385 LearningRate 0.0932 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:53,180-Speed 5615.49 samples/sec Loss 15.4648 LearningRate 0.0931 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:54,985-Speed 5676.34 samples/sec Loss 15.4455 LearningRate 0.0931 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:56,821-Speed 5582.30 samples/sec Loss 15.3902 LearningRate 0.0931 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:30:58,629-Speed 5663.80 samples/sec Loss 15.4605 LearningRate 0.0931 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:00,458-Speed 5601.99 samples/sec Loss 15.3243 LearningRate 0.0931 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:02,261-Speed 5682.83 samples/sec Loss 15.5163 LearningRate 0.0930 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:04,075-Speed 5648.90 samples/sec Loss 15.2501 LearningRate 0.0930 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:05,903-Speed 5603.94 samples/sec Loss 15.3311 LearningRate 0.0930 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:07,733-Speed 5598.13 samples/sec Loss 15.5576 LearningRate 0.0930 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:09,547-Speed 5649.84 samples/sec Loss 15.3339 LearningRate 0.0930 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:11,383-Speed 5578.31 samples/sec Loss 15.4005 LearningRate 0.0930 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:13,206-Speed 5619.22 samples/sec Loss 15.5160 LearningRate 0.0929 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:15,049-Speed 5558.55 samples/sec Loss 15.4700 LearningRate 0.0929 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:16,860-Speed 5660.99 samples/sec Loss 15.3727 LearningRate 0.0929 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:18,695-Speed 5580.13 samples/sec Loss 15.0632 LearningRate 0.0929 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:20,509-Speed 5649.14 samples/sec Loss 15.4167 LearningRate 0.0929 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:22,317-Speed 5665.78 samples/sec Loss 15.2029 LearningRate 0.0928 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:24,150-Speed 5588.38 samples/sec Loss 15.0555 LearningRate 0.0928 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:25,982-Speed 5591.26 samples/sec Loss 15.2378 LearningRate 0.0928 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:27,797-Speed 5646.13 samples/sec Loss 15.1146 LearningRate 0.0928 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:29,611-Speed 5648.06 samples/sec Loss 15.3051 LearningRate 0.0928 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:31,427-Speed 5639.79 samples/sec Loss 15.2899 LearningRate 0.0927 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:33,239-Speed 5652.89 samples/sec Loss 15.2427 LearningRate 0.0927 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:35,044-Speed 5677.37 samples/sec Loss 15.2869 LearningRate 0.0927 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:36,872-Speed 5604.81 samples/sec Loss 15.0408 LearningRate 0.0927 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:38,691-Speed 5631.34 samples/sec Loss 15.0831 LearningRate 0.0927 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:40,493-Speed 5686.91 samples/sec Loss 14.9994 LearningRate 0.0926 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:42,300-Speed 5666.19 samples/sec Loss 14.7811 LearningRate 0.0926 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:44,135-Speed 5583.51 samples/sec Loss 15.1221 LearningRate 0.0926 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:45,941-Speed 5672.74 samples/sec Loss 14.9768 LearningRate 0.0926 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:47,753-Speed 5657.11 samples/sec Loss 15.0303 LearningRate 0.0926 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:31:49,561-Speed 5664.67 samples/sec Loss 15.0189 LearningRate 0.0926 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:51,368-Speed 5669.61 samples/sec Loss 15.0276 LearningRate 0.0925 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:53,234-Speed 5491.54 samples/sec Loss 15.2168 LearningRate 0.0925 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:55,059-Speed 5613.11 samples/sec Loss 14.9617 LearningRate 0.0925 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:56,872-Speed 5648.11 samples/sec Loss 15.1094 LearningRate 0.0925 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:31:58,699-Speed 5609.44 samples/sec Loss 14.8623 LearningRate 0.0925 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:00,503-Speed 5680.56 samples/sec Loss 15.1127 LearningRate 0.0924 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:02,347-Speed 5556.48 samples/sec Loss 15.0681 LearningRate 0.0924 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:04,168-Speed 5622.36 samples/sec Loss 14.9873 LearningRate 0.0924 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:05,989-Speed 5627.90 samples/sec Loss 14.8364 LearningRate 0.0924 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:07,795-Speed 5671.93 samples/sec Loss 14.9906 LearningRate 0.0924 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:09,630-Speed 5585.88 samples/sec Loss 15.0032 LearningRate 0.0923 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:11,452-Speed 5622.42 samples/sec Loss 14.8255 LearningRate 0.0923 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:13,262-Speed 5660.35 samples/sec Loss 15.0347 LearningRate 0.0923 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:15,079-Speed 5636.48 samples/sec Loss 14.8245 LearningRate 0.0923 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:16,891-Speed 5656.11 samples/sec Loss 15.0418 LearningRate 0.0923 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:32:18,717-Speed 5608.87 samples/sec Loss 14.7038 LearningRate 0.0922 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:32:46,140-[lfw][4000]XNorm: 23.260032 Training: 2022-04-11 10:32:46,141-[lfw][4000]Accuracy-Flip: 0.99183+-0.00431 Training: 2022-04-11 10:32:46,141-[lfw][4000]Accuracy-Highest: 0.99183 Training: 2022-04-11 10:33:17,820-[cfp_fp][4000]XNorm: 20.164622 Training: 2022-04-11 10:33:17,821-[cfp_fp][4000]Accuracy-Flip: 0.89571+-0.01525 Training: 2022-04-11 10:33:17,822-[cfp_fp][4000]Accuracy-Highest: 0.89571 Training: 2022-04-11 10:33:44,834-[agedb_30][4000]XNorm: 22.959244 Training: 2022-04-11 10:33:44,834-[agedb_30][4000]Accuracy-Flip: 0.93717+-0.01436 Training: 2022-04-11 10:33:44,835-[agedb_30][4000]Accuracy-Highest: 0.93717 Training: 2022-04-11 10:33:46,674-Speed 116.42 samples/sec Loss 14.8352 LearningRate 0.0922 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:48,525-Speed 5532.77 samples/sec Loss 14.7560 LearningRate 0.0922 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:50,337-Speed 5654.31 samples/sec Loss 15.0010 LearningRate 0.0922 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:52,146-Speed 5662.88 samples/sec Loss 14.6601 LearningRate 0.0922 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:53,947-Speed 5688.89 samples/sec Loss 14.6642 LearningRate 0.0922 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:55,742-Speed 5705.81 samples/sec Loss 14.9258 LearningRate 0.0921 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:57,538-Speed 5704.88 samples/sec Loss 14.6584 LearningRate 0.0921 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:33:59,367-Speed 5600.43 samples/sec Loss 14.6031 LearningRate 0.0921 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:34:01,166-Speed 5695.34 samples/sec Loss 14.8554 LearningRate 0.0921 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:34:02,990-Speed 5617.01 samples/sec Loss 14.5310 LearningRate 0.0921 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:04,796-Speed 5673.33 samples/sec Loss 14.6064 LearningRate 0.0920 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:06,592-Speed 5703.79 samples/sec Loss 14.6974 LearningRate 0.0920 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:08,403-Speed 5657.59 samples/sec Loss 14.7565 LearningRate 0.0920 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:10,199-Speed 5704.49 samples/sec Loss 14.7896 LearningRate 0.0920 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:12,032-Speed 5587.52 samples/sec Loss 14.7258 LearningRate 0.0920 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:13,829-Speed 5702.00 samples/sec Loss 14.5509 LearningRate 0.0919 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:15,655-Speed 5611.41 samples/sec Loss 14.6705 LearningRate 0.0919 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:17,490-Speed 5585.81 samples/sec Loss 14.6242 LearningRate 0.0919 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:19,305-Speed 5643.38 samples/sec Loss 14.5372 LearningRate 0.0919 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:21,107-Speed 5684.13 samples/sec Loss 14.6084 LearningRate 0.0919 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:22,922-Speed 5643.81 samples/sec Loss 14.6871 LearningRate 0.0918 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:24,741-Speed 5635.35 samples/sec Loss 14.6208 LearningRate 0.0918 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:26,552-Speed 5656.35 samples/sec Loss 14.5753 LearningRate 0.0918 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:28,380-Speed 5602.93 samples/sec Loss 14.4113 LearningRate 0.0918 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:30,197-Speed 5639.35 samples/sec Loss 14.5555 LearningRate 0.0918 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:31,999-Speed 5687.05 samples/sec Loss 14.5779 LearningRate 0.0918 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:33,821-Speed 5621.21 samples/sec Loss 14.5375 LearningRate 0.0917 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:35,632-Speed 5659.45 samples/sec Loss 14.4429 LearningRate 0.0917 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:37,479-Speed 5547.29 samples/sec Loss 14.5675 LearningRate 0.0917 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:39,283-Speed 5678.61 samples/sec Loss 14.5345 LearningRate 0.0917 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:41,089-Speed 5671.76 samples/sec Loss 14.3902 LearningRate 0.0917 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:42,921-Speed 5593.55 samples/sec Loss 14.5561 LearningRate 0.0916 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:44,719-Speed 5695.47 samples/sec Loss 14.4433 LearningRate 0.0916 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:46,543-Speed 5619.58 samples/sec Loss 14.2751 LearningRate 0.0916 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:48,352-Speed 5664.26 samples/sec Loss 14.3524 LearningRate 0.0916 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:50,157-Speed 5679.17 samples/sec Loss 14.2299 LearningRate 0.0916 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:52,018-Speed 5505.20 samples/sec Loss 14.4516 LearningRate 0.0915 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:53,816-Speed 5697.98 samples/sec Loss 14.5251 LearningRate 0.0915 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:55,635-Speed 5631.22 samples/sec Loss 14.5187 LearningRate 0.0915 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:57,452-Speed 5641.58 samples/sec Loss 14.1329 LearningRate 0.0915 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:34:59,273-Speed 5631.16 samples/sec Loss 14.6178 LearningRate 0.0915 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:01,073-Speed 5692.29 samples/sec Loss 14.4046 LearningRate 0.0915 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:02,875-Speed 5683.11 samples/sec Loss 14.2910 LearningRate 0.0914 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:04,701-Speed 5612.41 samples/sec Loss 14.4359 LearningRate 0.0914 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:06,522-Speed 5624.14 samples/sec Loss 14.4591 LearningRate 0.0914 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:08,324-Speed 5686.29 samples/sec Loss 14.3280 LearningRate 0.0914 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:10,140-Speed 5643.51 samples/sec Loss 14.3956 LearningRate 0.0914 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:11,978-Speed 5573.11 samples/sec Loss 14.1635 LearningRate 0.0913 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:13,792-Speed 5646.27 samples/sec Loss 14.3969 LearningRate 0.0913 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:15,590-Speed 5697.12 samples/sec Loss 14.1759 LearningRate 0.0913 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:17,410-Speed 5629.75 samples/sec Loss 14.2035 LearningRate 0.0913 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:19,212-Speed 5687.25 samples/sec Loss 14.1490 LearningRate 0.0913 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:21,032-Speed 5629.52 samples/sec Loss 14.2135 LearningRate 0.0912 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:35:22,847-Speed 5642.70 samples/sec Loss 14.2401 LearningRate 0.0912 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:24,651-Speed 5677.25 samples/sec Loss 14.1957 LearningRate 0.0912 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:26,459-Speed 5668.16 samples/sec Loss 14.2022 LearningRate 0.0912 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:28,257-Speed 5696.28 samples/sec Loss 14.2901 LearningRate 0.0912 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:30,083-Speed 5611.11 samples/sec Loss 14.0914 LearningRate 0.0912 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:31,895-Speed 5654.71 samples/sec Loss 14.0757 LearningRate 0.0911 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:33,701-Speed 5672.81 samples/sec Loss 14.0173 LearningRate 0.0911 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:35,519-Speed 5635.46 samples/sec Loss 13.9970 LearningRate 0.0911 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:37,349-Speed 5596.93 samples/sec Loss 14.0917 LearningRate 0.0911 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:39,152-Speed 5683.24 samples/sec Loss 14.0596 LearningRate 0.0911 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:40,963-Speed 5657.72 samples/sec Loss 14.0620 LearningRate 0.0910 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:42,778-Speed 5643.48 samples/sec Loss 14.1978 LearningRate 0.0910 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:44,588-Speed 5657.28 samples/sec Loss 14.0539 LearningRate 0.0910 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:46,407-Speed 5633.86 samples/sec Loss 14.0892 LearningRate 0.0910 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:48,227-Speed 5628.89 samples/sec Loss 14.1379 LearningRate 0.0910 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:50,041-Speed 5647.49 samples/sec Loss 14.2191 LearningRate 0.0909 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:51,842-Speed 5685.93 samples/sec Loss 14.0157 LearningRate 0.0909 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:53,642-Speed 5692.82 samples/sec Loss 14.0730 LearningRate 0.0909 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:55,443-Speed 5685.79 samples/sec Loss 14.1903 LearningRate 0.0909 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:57,243-Speed 5692.23 samples/sec Loss 13.9665 LearningRate 0.0909 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:35:59,058-Speed 5643.17 samples/sec Loss 14.1053 LearningRate 0.0908 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:00,851-Speed 5713.54 samples/sec Loss 13.9622 LearningRate 0.0908 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:02,668-Speed 5639.73 samples/sec Loss 14.0518 LearningRate 0.0908 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:04,501-Speed 5589.08 samples/sec Loss 14.0174 LearningRate 0.0908 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:06,312-Speed 5656.26 samples/sec Loss 13.9407 LearningRate 0.0908 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:08,145-Speed 5587.35 samples/sec Loss 14.1331 LearningRate 0.0908 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:09,988-Speed 5559.84 samples/sec Loss 14.0537 LearningRate 0.0907 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:11,790-Speed 5685.79 samples/sec Loss 13.8678 LearningRate 0.0907 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:13,591-Speed 5687.21 samples/sec Loss 13.8339 LearningRate 0.0907 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:15,389-Speed 5698.35 samples/sec Loss 13.9297 LearningRate 0.0907 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:17,198-Speed 5663.00 samples/sec Loss 13.9346 LearningRate 0.0907 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:19,007-Speed 5662.31 samples/sec Loss 13.8631 LearningRate 0.0906 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:20,810-Speed 5681.41 samples/sec Loss 13.7722 LearningRate 0.0906 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:22,615-Speed 5675.55 samples/sec Loss 13.6137 LearningRate 0.0906 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:24,420-Speed 5675.64 samples/sec Loss 13.6963 LearningRate 0.0906 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:26,229-Speed 5661.99 samples/sec Loss 14.0454 LearningRate 0.0906 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:28,030-Speed 5687.19 samples/sec Loss 13.8097 LearningRate 0.0905 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:29,839-Speed 5663.84 samples/sec Loss 14.0158 LearningRate 0.0905 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:31,654-Speed 5643.56 samples/sec Loss 13.6022 LearningRate 0.0905 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:33,463-Speed 5662.89 samples/sec Loss 13.8251 LearningRate 0.0905 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:35,281-Speed 5634.20 samples/sec Loss 13.6397 LearningRate 0.0905 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:37,099-Speed 5635.56 samples/sec Loss 13.7075 LearningRate 0.0905 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:38,899-Speed 5692.91 samples/sec Loss 13.8186 LearningRate 0.0904 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:40,725-Speed 5610.90 samples/sec Loss 13.9245 LearningRate 0.0904 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:42,536-Speed 5655.81 samples/sec Loss 13.7158 LearningRate 0.0904 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:36:44,339-Speed 5682.37 samples/sec Loss 13.8730 LearningRate 0.0904 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:46,151-Speed 5652.60 samples/sec Loss 13.7772 LearningRate 0.0904 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:47,959-Speed 5667.94 samples/sec Loss 13.9282 LearningRate 0.0903 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:49,783-Speed 5617.14 samples/sec Loss 13.7227 LearningRate 0.0903 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:51,587-Speed 5678.14 samples/sec Loss 13.9853 LearningRate 0.0903 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:53,408-Speed 5623.99 samples/sec Loss 13.6879 LearningRate 0.0903 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:36:55,289-Speed 5446.22 samples/sec Loss 13.7270 LearningRate 0.0903 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:37:06,416-Speed 920.43 samples/sec Loss 13.5324 LearningRate 0.0902 Epoch: 1 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:37:08,279-Speed 5502.00 samples/sec Loss 12.7101 LearningRate 0.0902 Epoch: 1 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:37:10,119-Speed 5568.73 samples/sec Loss 12.6316 LearningRate 0.0902 Epoch: 1 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:37:12,312-Speed 4672.67 samples/sec Loss 12.7901 LearningRate 0.0902 Epoch: 1 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:14,183-Speed 5474.43 samples/sec Loss 12.9441 LearningRate 0.0902 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:16,007-Speed 5620.51 samples/sec Loss 12.5830 LearningRate 0.0902 Epoch: 1 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:17,866-Speed 5512.19 samples/sec Loss 12.7615 LearningRate 0.0901 Epoch: 1 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:19,686-Speed 5628.77 samples/sec Loss 12.8707 LearningRate 0.0901 Epoch: 1 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:21,491-Speed 5674.53 samples/sec Loss 12.9579 LearningRate 0.0901 Epoch: 1 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:23,361-Speed 5482.46 samples/sec Loss 12.5509 LearningRate 0.0901 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:25,182-Speed 5624.37 samples/sec Loss 13.0103 LearningRate 0.0901 Epoch: 1 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:27,073-Speed 5418.23 samples/sec Loss 12.8882 LearningRate 0.0900 Epoch: 1 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:28,903-Speed 5599.09 samples/sec Loss 12.8982 LearningRate 0.0900 Epoch: 1 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:30,745-Speed 5561.18 samples/sec Loss 12.7832 LearningRate 0.0900 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:32,552-Speed 5671.06 samples/sec Loss 12.9718 LearningRate 0.0900 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:34,374-Speed 5620.63 samples/sec Loss 13.0446 LearningRate 0.0900 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:36,211-Speed 5575.52 samples/sec Loss 12.9717 LearningRate 0.0899 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:38,111-Speed 5392.11 samples/sec Loss 12.9838 LearningRate 0.0899 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:39,927-Speed 5641.45 samples/sec Loss 12.8896 LearningRate 0.0899 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:41,826-Speed 5394.45 samples/sec Loss 12.9028 LearningRate 0.0899 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:43,632-Speed 5673.17 samples/sec Loss 13.1036 LearningRate 0.0899 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:45,443-Speed 5657.97 samples/sec Loss 12.9867 LearningRate 0.0899 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:47,251-Speed 5665.06 samples/sec Loss 12.9529 LearningRate 0.0898 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:49,055-Speed 5679.81 samples/sec Loss 13.0769 LearningRate 0.0898 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:50,871-Speed 5639.77 samples/sec Loss 13.0320 LearningRate 0.0898 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:52,725-Speed 5527.67 samples/sec Loss 12.9372 LearningRate 0.0898 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:54,558-Speed 5587.96 samples/sec Loss 13.1562 LearningRate 0.0898 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:56,410-Speed 5531.21 samples/sec Loss 13.1167 LearningRate 0.0897 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:37:58,233-Speed 5620.02 samples/sec Loss 13.0225 LearningRate 0.0897 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:00,064-Speed 5592.58 samples/sec Loss 12.9835 LearningRate 0.0897 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:01,885-Speed 5626.09 samples/sec Loss 13.1080 LearningRate 0.0897 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:03,714-Speed 5600.97 samples/sec Loss 12.9629 LearningRate 0.0897 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:05,713-Speed 5126.55 samples/sec Loss 13.0872 LearningRate 0.0896 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:07,568-Speed 5524.43 samples/sec Loss 13.2431 LearningRate 0.0896 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:09,375-Speed 5666.66 samples/sec Loss 13.0025 LearningRate 0.0896 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:11,185-Speed 5661.89 samples/sec Loss 13.1376 LearningRate 0.0896 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:13,012-Speed 5607.12 samples/sec Loss 13.2745 LearningRate 0.0896 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:14,831-Speed 5632.28 samples/sec Loss 13.2431 LearningRate 0.0896 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:16,701-Speed 5479.26 samples/sec Loss 12.9731 LearningRate 0.0895 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:18,546-Speed 5551.75 samples/sec Loss 13.0090 LearningRate 0.0895 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:20,380-Speed 5585.57 samples/sec Loss 13.3556 LearningRate 0.0895 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:22,224-Speed 5557.75 samples/sec Loss 12.8313 LearningRate 0.0895 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:24,041-Speed 5637.47 samples/sec Loss 13.0894 LearningRate 0.0895 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:25,920-Speed 5450.83 samples/sec Loss 12.9563 LearningRate 0.0894 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:27,794-Speed 5465.56 samples/sec Loss 13.0975 LearningRate 0.0894 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:29,606-Speed 5655.38 samples/sec Loss 13.0973 LearningRate 0.0894 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:31,431-Speed 5611.39 samples/sec Loss 13.1723 LearningRate 0.0894 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:33,233-Speed 5685.19 samples/sec Loss 12.7862 LearningRate 0.0894 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:35,039-Speed 5671.24 samples/sec Loss 13.0970 LearningRate 0.0893 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:36,892-Speed 5530.19 samples/sec Loss 12.9878 LearningRate 0.0893 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:38,698-Speed 5670.87 samples/sec Loss 13.0856 LearningRate 0.0893 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:40,510-Speed 5654.31 samples/sec Loss 13.0774 LearningRate 0.0893 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:42,350-Speed 5566.90 samples/sec Loss 13.1334 LearningRate 0.0893 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:44,152-Speed 5686.13 samples/sec Loss 13.1176 LearningRate 0.0893 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:45,972-Speed 5630.77 samples/sec Loss 12.8540 LearningRate 0.0892 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:47,774-Speed 5683.47 samples/sec Loss 13.2200 LearningRate 0.0892 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:49,581-Speed 5667.52 samples/sec Loss 13.0439 LearningRate 0.0892 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:51,387-Speed 5674.67 samples/sec Loss 12.9588 LearningRate 0.0892 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:53,191-Speed 5679.60 samples/sec Loss 13.1851 LearningRate 0.0892 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:54,999-Speed 5663.03 samples/sec Loss 13.0537 LearningRate 0.0891 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:38:56,823-Speed 5616.17 samples/sec Loss 12.9605 LearningRate 0.0891 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:38:58,662-Speed 5572.97 samples/sec Loss 13.1506 LearningRate 0.0891 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:00,465-Speed 5681.15 samples/sec Loss 13.0340 LearningRate 0.0891 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:02,299-Speed 5585.14 samples/sec Loss 13.0677 LearningRate 0.0891 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:04,108-Speed 5663.16 samples/sec Loss 13.0480 LearningRate 0.0890 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:05,914-Speed 5672.65 samples/sec Loss 13.0132 LearningRate 0.0890 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:07,729-Speed 5644.66 samples/sec Loss 13.2471 LearningRate 0.0890 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:09,556-Speed 5605.90 samples/sec Loss 13.0837 LearningRate 0.0890 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:11,375-Speed 5634.81 samples/sec Loss 13.2122 LearningRate 0.0890 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:13,194-Speed 5631.08 samples/sec Loss 12.8343 LearningRate 0.0890 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:15,001-Speed 5669.84 samples/sec Loss 12.9690 LearningRate 0.0889 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:16,805-Speed 5675.81 samples/sec Loss 13.1013 LearningRate 0.0889 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:18,607-Speed 5686.87 samples/sec Loss 12.9236 LearningRate 0.0889 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:20,412-Speed 5674.20 samples/sec Loss 13.0280 LearningRate 0.0889 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:22,223-Speed 5660.01 samples/sec Loss 13.0497 LearningRate 0.0889 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:24,034-Speed 5655.50 samples/sec Loss 12.7674 LearningRate 0.0888 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:25,841-Speed 5670.48 samples/sec Loss 12.8418 LearningRate 0.0888 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:27,667-Speed 5610.12 samples/sec Loss 12.9545 LearningRate 0.0888 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:29,502-Speed 5583.74 samples/sec Loss 12.9573 LearningRate 0.0888 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:31,327-Speed 5612.29 samples/sec Loss 12.9757 LearningRate 0.0888 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:33,125-Speed 5697.97 samples/sec Loss 13.1358 LearningRate 0.0887 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:34,944-Speed 5633.60 samples/sec Loss 13.0596 LearningRate 0.0887 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:36,750-Speed 5669.74 samples/sec Loss 12.9641 LearningRate 0.0887 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:38,581-Speed 5596.50 samples/sec Loss 13.0119 LearningRate 0.0887 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:40,390-Speed 5661.43 samples/sec Loss 13.1278 LearningRate 0.0887 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:42,227-Speed 5576.21 samples/sec Loss 12.9312 LearningRate 0.0887 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:44,036-Speed 5664.65 samples/sec Loss 13.0090 LearningRate 0.0886 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:45,839-Speed 5682.28 samples/sec Loss 13.0311 LearningRate 0.0886 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:47,647-Speed 5665.09 samples/sec Loss 13.0848 LearningRate 0.0886 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:49,457-Speed 5659.15 samples/sec Loss 12.9528 LearningRate 0.0886 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:39:51,264-Speed 5671.59 samples/sec Loss 13.0910 LearningRate 0.0886 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:53,087-Speed 5620.49 samples/sec Loss 12.9472 LearningRate 0.0885 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:54,902-Speed 5643.40 samples/sec Loss 12.9948 LearningRate 0.0885 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:56,706-Speed 5675.81 samples/sec Loss 12.7992 LearningRate 0.0885 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:39:58,519-Speed 5650.29 samples/sec Loss 12.9557 LearningRate 0.0885 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:40:25,899-[lfw][6000]XNorm: 21.248232 Training: 2022-04-11 10:40:25,900-[lfw][6000]Accuracy-Flip: 0.99300+-0.00407 Training: 2022-04-11 10:40:25,900-[lfw][6000]Accuracy-Highest: 0.99300 Training: 2022-04-11 10:40:57,496-[cfp_fp][6000]XNorm: 18.533061 Training: 2022-04-11 10:40:57,498-[cfp_fp][6000]Accuracy-Flip: 0.90586+-0.01327 Training: 2022-04-11 10:40:57,498-[cfp_fp][6000]Accuracy-Highest: 0.90586 Training: 2022-04-11 10:41:24,454-[agedb_30][6000]XNorm: 20.919483 Training: 2022-04-11 10:41:24,455-[agedb_30][6000]Accuracy-Flip: 0.95100+-0.01057 Training: 2022-04-11 10:41:24,455-[agedb_30][6000]Accuracy-Highest: 0.95100 Training: 2022-04-11 10:41:26,286-Speed 116.67 samples/sec Loss 13.0460 LearningRate 0.0885 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:28,112-Speed 5611.59 samples/sec Loss 13.0969 LearningRate 0.0885 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:29,953-Speed 5564.02 samples/sec Loss 13.0301 LearningRate 0.0884 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:31,783-Speed 5603.00 samples/sec Loss 12.8132 LearningRate 0.0884 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:33,619-Speed 5582.04 samples/sec Loss 13.1540 LearningRate 0.0884 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:35,417-Speed 5699.90 samples/sec Loss 12.7747 LearningRate 0.0884 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:37,239-Speed 5623.87 samples/sec Loss 13.0897 LearningRate 0.0884 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:39,061-Speed 5622.17 samples/sec Loss 12.8382 LearningRate 0.0883 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:40,898-Speed 5578.71 samples/sec Loss 12.9663 LearningRate 0.0883 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:42,741-Speed 5559.85 samples/sec Loss 12.9375 LearningRate 0.0883 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:44,542-Speed 5689.28 samples/sec Loss 12.8034 LearningRate 0.0883 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:46,403-Speed 5506.95 samples/sec Loss 12.8811 LearningRate 0.0883 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:48,203-Speed 5689.19 samples/sec Loss 12.9183 LearningRate 0.0882 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:50,014-Speed 5657.44 samples/sec Loss 12.9950 LearningRate 0.0882 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:51,830-Speed 5640.71 samples/sec Loss 12.9172 LearningRate 0.0882 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:53,678-Speed 5546.40 samples/sec Loss 13.0348 LearningRate 0.0882 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:55,515-Speed 5578.48 samples/sec Loss 12.8712 LearningRate 0.0882 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:57,358-Speed 5558.06 samples/sec Loss 13.0438 LearningRate 0.0882 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:41:59,171-Speed 5653.57 samples/sec Loss 12.8557 LearningRate 0.0881 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:00,993-Speed 5624.94 samples/sec Loss 12.9627 LearningRate 0.0881 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:02,829-Speed 5582.68 samples/sec Loss 12.9853 LearningRate 0.0881 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:04,642-Speed 5651.88 samples/sec Loss 12.7977 LearningRate 0.0881 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:06,473-Speed 5595.58 samples/sec Loss 12.8056 LearningRate 0.0881 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:08,282-Speed 5664.62 samples/sec Loss 12.7324 LearningRate 0.0880 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:10,103-Speed 5625.99 samples/sec Loss 12.8508 LearningRate 0.0880 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:11,915-Speed 5655.30 samples/sec Loss 12.9397 LearningRate 0.0880 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:13,731-Speed 5642.46 samples/sec Loss 12.9782 LearningRate 0.0880 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:15,545-Speed 5645.32 samples/sec Loss 12.8440 LearningRate 0.0880 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:17,389-Speed 5558.71 samples/sec Loss 12.7867 LearningRate 0.0880 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:19,205-Speed 5639.25 samples/sec Loss 13.0146 LearningRate 0.0879 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:21,030-Speed 5615.02 samples/sec Loss 12.5868 LearningRate 0.0879 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:22,836-Speed 5675.25 samples/sec Loss 12.9063 LearningRate 0.0879 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:24,680-Speed 5553.38 samples/sec Loss 12.8544 LearningRate 0.0879 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:26,501-Speed 5626.39 samples/sec Loss 12.8457 LearningRate 0.0879 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:28,309-Speed 5667.67 samples/sec Loss 12.6736 LearningRate 0.0878 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:30,131-Speed 5623.25 samples/sec Loss 12.7501 LearningRate 0.0878 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:31,937-Speed 5672.05 samples/sec Loss 12.7433 LearningRate 0.0878 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:33,757-Speed 5630.15 samples/sec Loss 12.7742 LearningRate 0.0878 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:35,612-Speed 5525.07 samples/sec Loss 12.9016 LearningRate 0.0878 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:37,417-Speed 5675.05 samples/sec Loss 12.8651 LearningRate 0.0877 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:39,237-Speed 5631.40 samples/sec Loss 12.8425 LearningRate 0.0877 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:41,071-Speed 5588.19 samples/sec Loss 12.8022 LearningRate 0.0877 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:42,886-Speed 5643.39 samples/sec Loss 12.7121 LearningRate 0.0877 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:44,698-Speed 5656.06 samples/sec Loss 12.9270 LearningRate 0.0877 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:46,533-Speed 5581.95 samples/sec Loss 12.7209 LearningRate 0.0877 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:48,335-Speed 5684.45 samples/sec Loss 12.5311 LearningRate 0.0876 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:50,141-Speed 5675.03 samples/sec Loss 12.8062 LearningRate 0.0876 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:51,963-Speed 5621.83 samples/sec Loss 12.7824 LearningRate 0.0876 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:53,769-Speed 5673.79 samples/sec Loss 12.6871 LearningRate 0.0876 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:55,606-Speed 5576.86 samples/sec Loss 12.7557 LearningRate 0.0876 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:57,439-Speed 5588.68 samples/sec Loss 12.6582 LearningRate 0.0875 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:42:59,260-Speed 5627.20 samples/sec Loss 12.6049 LearningRate 0.0875 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:01,069-Speed 5665.11 samples/sec Loss 12.5169 LearningRate 0.0875 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:02,873-Speed 5679.92 samples/sec Loss 12.6284 LearningRate 0.0875 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:04,687-Speed 5645.57 samples/sec Loss 12.5024 LearningRate 0.0875 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:06,542-Speed 5522.80 samples/sec Loss 12.5904 LearningRate 0.0875 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:43:08,364-Speed 5625.51 samples/sec Loss 12.6673 LearningRate 0.0874 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:10,173-Speed 5664.34 samples/sec Loss 12.5183 LearningRate 0.0874 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:11,978-Speed 5672.83 samples/sec Loss 12.6507 LearningRate 0.0874 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:13,809-Speed 5595.27 samples/sec Loss 12.6270 LearningRate 0.0874 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:15,613-Speed 5678.95 samples/sec Loss 12.9027 LearningRate 0.0874 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:17,437-Speed 5621.15 samples/sec Loss 12.6904 LearningRate 0.0873 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:19,242-Speed 5674.84 samples/sec Loss 12.6569 LearningRate 0.0873 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:21,048-Speed 5675.52 samples/sec Loss 12.7405 LearningRate 0.0873 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:22,852-Speed 5677.66 samples/sec Loss 12.7071 LearningRate 0.0873 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:24,668-Speed 5640.43 samples/sec Loss 12.4697 LearningRate 0.0873 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:26,469-Speed 5688.33 samples/sec Loss 12.6572 LearningRate 0.0872 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:28,278-Speed 5662.07 samples/sec Loss 12.5173 LearningRate 0.0872 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:30,138-Speed 5510.52 samples/sec Loss 12.7278 LearningRate 0.0872 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:31,946-Speed 5666.78 samples/sec Loss 12.7123 LearningRate 0.0872 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:33,747-Speed 5688.29 samples/sec Loss 12.6018 LearningRate 0.0872 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:35,554-Speed 5669.95 samples/sec Loss 12.5258 LearningRate 0.0872 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:37,399-Speed 5554.05 samples/sec Loss 12.6752 LearningRate 0.0871 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:39,235-Speed 5579.49 samples/sec Loss 12.4883 LearningRate 0.0871 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:41,056-Speed 5629.54 samples/sec Loss 12.5412 LearningRate 0.0871 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:42,866-Speed 5659.97 samples/sec Loss 12.6946 LearningRate 0.0871 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:44,665-Speed 5696.90 samples/sec Loss 12.7151 LearningRate 0.0871 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:46,485-Speed 5626.89 samples/sec Loss 12.7068 LearningRate 0.0870 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:48,323-Speed 5576.75 samples/sec Loss 12.4374 LearningRate 0.0870 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:50,127-Speed 5677.69 samples/sec Loss 12.5583 LearningRate 0.0870 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:52,043-Speed 5349.41 samples/sec Loss 12.4958 LearningRate 0.0870 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:53,854-Speed 5656.83 samples/sec Loss 12.6356 LearningRate 0.0870 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:55,744-Speed 5418.51 samples/sec Loss 12.6588 LearningRate 0.0870 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:57,560-Speed 5642.88 samples/sec Loss 12.5232 LearningRate 0.0869 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:43:59,385-Speed 5616.78 samples/sec Loss 12.6561 LearningRate 0.0869 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:01,196-Speed 5658.35 samples/sec Loss 12.5035 LearningRate 0.0869 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:03,022-Speed 5610.92 samples/sec Loss 12.3713 LearningRate 0.0869 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:04,830-Speed 5666.64 samples/sec Loss 12.2574 LearningRate 0.0869 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:06,633-Speed 5680.86 samples/sec Loss 12.5321 LearningRate 0.0868 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:08,488-Speed 5525.28 samples/sec Loss 12.5137 LearningRate 0.0868 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:10,305-Speed 5641.13 samples/sec Loss 12.7288 LearningRate 0.0868 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:12,136-Speed 5592.79 samples/sec Loss 12.6058 LearningRate 0.0868 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:13,946-Speed 5661.42 samples/sec Loss 12.4628 LearningRate 0.0868 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:15,760-Speed 5647.88 samples/sec Loss 12.3408 LearningRate 0.0867 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:17,589-Speed 5599.99 samples/sec Loss 12.5080 LearningRate 0.0867 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:19,392-Speed 5681.90 samples/sec Loss 12.5734 LearningRate 0.0867 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:21,222-Speed 5601.16 samples/sec Loss 12.5579 LearningRate 0.0867 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:23,027-Speed 5675.87 samples/sec Loss 12.4360 LearningRate 0.0867 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:24,874-Speed 5548.36 samples/sec Loss 12.3342 LearningRate 0.0867 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:26,699-Speed 5614.61 samples/sec Loss 12.6542 LearningRate 0.0866 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:28,509-Speed 5658.78 samples/sec Loss 12.4144 LearningRate 0.0866 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:30,322-Speed 5655.97 samples/sec Loss 12.5295 LearningRate 0.0866 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:32,142-Speed 5629.96 samples/sec Loss 12.3459 LearningRate 0.0866 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:33,950-Speed 5664.64 samples/sec Loss 12.4526 LearningRate 0.0866 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:35,800-Speed 5537.73 samples/sec Loss 12.1538 LearningRate 0.0865 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:37,635-Speed 5587.99 samples/sec Loss 12.4727 LearningRate 0.0865 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:39,445-Speed 5659.41 samples/sec Loss 12.4049 LearningRate 0.0865 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:41,300-Speed 5525.01 samples/sec Loss 12.3364 LearningRate 0.0865 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:43,128-Speed 5604.07 samples/sec Loss 12.4486 LearningRate 0.0865 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:44,943-Speed 5644.79 samples/sec Loss 12.5996 LearningRate 0.0865 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:46,804-Speed 5503.80 samples/sec Loss 12.4024 LearningRate 0.0864 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:48,635-Speed 5597.11 samples/sec Loss 12.5218 LearningRate 0.0864 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:50,463-Speed 5607.36 samples/sec Loss 12.4821 LearningRate 0.0864 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:44:52,283-Speed 5627.84 samples/sec Loss 12.3142 LearningRate 0.0864 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:44:54,101-Speed 5635.72 samples/sec Loss 12.6460 LearningRate 0.0864 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:44:55,938-Speed 5577.04 samples/sec Loss 12.4471 LearningRate 0.0863 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:44:57,761-Speed 5620.64 samples/sec Loss 12.3937 LearningRate 0.0863 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:44:59,607-Speed 5550.19 samples/sec Loss 12.2504 LearningRate 0.0863 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:01,437-Speed 5600.62 samples/sec Loss 12.3219 LearningRate 0.0863 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:03,282-Speed 5551.71 samples/sec Loss 12.3793 LearningRate 0.0863 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:05,112-Speed 5601.33 samples/sec Loss 12.4543 LearningRate 0.0863 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:06,947-Speed 5581.79 samples/sec Loss 12.4835 LearningRate 0.0862 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:08,772-Speed 5614.02 samples/sec Loss 12.3443 LearningRate 0.0862 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:45:10,619-Speed 5546.65 samples/sec Loss 12.4225 LearningRate 0.0862 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:12,434-Speed 5646.38 samples/sec Loss 12.3596 LearningRate 0.0862 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:14,244-Speed 5661.60 samples/sec Loss 12.4606 LearningRate 0.0862 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:16,074-Speed 5597.92 samples/sec Loss 12.2994 LearningRate 0.0861 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:17,892-Speed 5635.56 samples/sec Loss 12.2949 LearningRate 0.0861 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:19,699-Speed 5667.83 samples/sec Loss 12.3504 LearningRate 0.0861 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:21,531-Speed 5594.74 samples/sec Loss 12.4878 LearningRate 0.0861 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:23,335-Speed 5678.87 samples/sec Loss 12.2946 LearningRate 0.0861 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:25,148-Speed 5650.25 samples/sec Loss 12.2962 LearningRate 0.0861 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:26,964-Speed 5644.66 samples/sec Loss 12.3721 LearningRate 0.0860 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:28,809-Speed 5552.94 samples/sec Loss 12.3238 LearningRate 0.0860 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:30,631-Speed 5624.03 samples/sec Loss 12.3716 LearningRate 0.0860 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:32,436-Speed 5677.33 samples/sec Loss 12.1304 LearningRate 0.0860 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:34,272-Speed 5578.83 samples/sec Loss 12.3488 LearningRate 0.0860 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:36,088-Speed 5639.85 samples/sec Loss 12.3365 LearningRate 0.0859 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:37,902-Speed 5649.43 samples/sec Loss 12.3259 LearningRate 0.0859 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:39,706-Speed 5677.85 samples/sec Loss 12.3469 LearningRate 0.0859 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:41,557-Speed 5535.73 samples/sec Loss 12.2020 LearningRate 0.0859 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:43,367-Speed 5660.27 samples/sec Loss 12.3475 LearningRate 0.0859 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:45,203-Speed 5580.42 samples/sec Loss 12.2354 LearningRate 0.0858 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:47,010-Speed 5670.69 samples/sec Loss 12.2645 LearningRate 0.0858 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:48,825-Speed 5645.42 samples/sec Loss 12.1575 LearningRate 0.0858 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:50,667-Speed 5563.89 samples/sec Loss 12.3376 LearningRate 0.0858 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:52,491-Speed 5616.95 samples/sec Loss 12.3649 LearningRate 0.0858 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:54,314-Speed 5621.36 samples/sec Loss 12.0055 LearningRate 0.0858 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:56,122-Speed 5666.94 samples/sec Loss 12.4362 LearningRate 0.0857 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:57,947-Speed 5612.87 samples/sec Loss 12.3192 LearningRate 0.0857 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:45:59,759-Speed 5655.13 samples/sec Loss 12.2157 LearningRate 0.0857 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:01,632-Speed 5471.26 samples/sec Loss 12.0909 LearningRate 0.0857 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:03,457-Speed 5615.05 samples/sec Loss 12.1535 LearningRate 0.0857 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:05,262-Speed 5676.45 samples/sec Loss 12.1442 LearningRate 0.0856 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:07,082-Speed 5633.05 samples/sec Loss 12.1462 LearningRate 0.0856 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:08,886-Speed 5679.87 samples/sec Loss 12.3226 LearningRate 0.0856 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:10,697-Speed 5655.26 samples/sec Loss 12.2491 LearningRate 0.0856 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:12,540-Speed 5559.37 samples/sec Loss 12.3216 LearningRate 0.0856 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:14,360-Speed 5628.88 samples/sec Loss 11.9703 LearningRate 0.0856 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:16,245-Speed 5436.97 samples/sec Loss 12.2993 LearningRate 0.0855 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:18,049-Speed 5681.05 samples/sec Loss 12.2175 LearningRate 0.0855 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:19,910-Speed 5504.44 samples/sec Loss 12.0743 LearningRate 0.0855 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:21,711-Speed 5688.55 samples/sec Loss 12.1409 LearningRate 0.0855 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:23,578-Speed 5486.86 samples/sec Loss 12.1249 LearningRate 0.0855 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:25,396-Speed 5636.32 samples/sec Loss 12.1659 LearningRate 0.0854 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:27,248-Speed 5593.80 samples/sec Loss 12.0944 LearningRate 0.0854 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:29,059-Speed 5659.30 samples/sec Loss 12.2074 LearningRate 0.0854 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:30,904-Speed 5552.12 samples/sec Loss 12.1342 LearningRate 0.0854 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:32,708-Speed 5680.58 samples/sec Loss 12.1195 LearningRate 0.0854 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:34,530-Speed 5623.84 samples/sec Loss 12.2056 LearningRate 0.0854 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:36,362-Speed 5592.27 samples/sec Loss 12.0693 LearningRate 0.0853 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:46:38,227-Speed 5495.25 samples/sec Loss 11.9777 LearningRate 0.0853 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:40,040-Speed 5652.78 samples/sec Loss 12.0948 LearningRate 0.0853 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:41,859-Speed 5644.27 samples/sec Loss 12.0751 LearningRate 0.0853 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:43,722-Speed 5497.74 samples/sec Loss 12.0482 LearningRate 0.0853 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:45,529-Speed 5671.60 samples/sec Loss 12.1496 LearningRate 0.0852 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:47,410-Speed 5445.94 samples/sec Loss 12.0476 LearningRate 0.0852 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:49,219-Speed 5663.48 samples/sec Loss 12.2557 LearningRate 0.0852 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:51,097-Speed 5456.99 samples/sec Loss 12.2037 LearningRate 0.0852 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:52,903-Speed 5671.27 samples/sec Loss 12.0654 LearningRate 0.0852 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:54,731-Speed 5604.40 samples/sec Loss 12.1520 LearningRate 0.0852 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:56,544-Speed 5651.50 samples/sec Loss 11.9496 LearningRate 0.0851 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:46:58,359-Speed 5644.49 samples/sec Loss 12.0411 LearningRate 0.0851 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:00,215-Speed 5523.17 samples/sec Loss 12.2397 LearningRate 0.0851 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:02,030-Speed 5644.02 samples/sec Loss 11.9815 LearningRate 0.0851 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:03,868-Speed 5575.94 samples/sec Loss 12.1241 LearningRate 0.0851 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:05,675-Speed 5669.49 samples/sec Loss 12.0092 LearningRate 0.0850 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:07,485-Speed 5659.75 samples/sec Loss 11.9638 LearningRate 0.0850 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:09,336-Speed 5535.46 samples/sec Loss 11.8489 LearningRate 0.0850 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:11,160-Speed 5617.92 samples/sec Loss 12.1543 LearningRate 0.0850 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:12,970-Speed 5659.13 samples/sec Loss 12.0715 LearningRate 0.0850 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:14,842-Speed 5474.85 samples/sec Loss 12.2130 LearningRate 0.0850 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:16,652-Speed 5658.82 samples/sec Loss 12.2111 LearningRate 0.0849 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:18,493-Speed 5568.09 samples/sec Loss 11.9880 LearningRate 0.0849 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:20,330-Speed 5578.05 samples/sec Loss 11.8472 LearningRate 0.0849 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:22,161-Speed 5595.92 samples/sec Loss 11.9506 LearningRate 0.0849 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:24,021-Speed 5507.71 samples/sec Loss 11.9863 LearningRate 0.0849 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:25,831-Speed 5661.04 samples/sec Loss 12.0470 LearningRate 0.0848 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:27,649-Speed 5635.62 samples/sec Loss 11.9003 LearningRate 0.0848 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:29,493-Speed 5556.93 samples/sec Loss 12.0393 LearningRate 0.0848 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:47:57,215-[lfw][8000]XNorm: 22.081268 Training: 2022-04-11 10:47:57,215-[lfw][8000]Accuracy-Flip: 0.99417+-0.00318 Training: 2022-04-11 10:47:57,216-[lfw][8000]Accuracy-Highest: 0.99417 Training: 2022-04-11 10:48:28,862-[cfp_fp][8000]XNorm: 19.295822 Training: 2022-04-11 10:48:28,863-[cfp_fp][8000]Accuracy-Flip: 0.92086+-0.01054 Training: 2022-04-11 10:48:28,863-[cfp_fp][8000]Accuracy-Highest: 0.92086 Training: 2022-04-11 10:48:56,125-[agedb_30][8000]XNorm: 21.211359 Training: 2022-04-11 10:48:56,126-[agedb_30][8000]Accuracy-Flip: 0.95567+-0.00854 Training: 2022-04-11 10:48:56,127-[agedb_30][8000]Accuracy-Highest: 0.95567 Training: 2022-04-11 10:48:57,950-Speed 115.76 samples/sec Loss 12.0456 LearningRate 0.0848 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:48:59,756-Speed 5671.80 samples/sec Loss 12.0100 LearningRate 0.0848 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:01,555-Speed 5696.50 samples/sec Loss 12.1607 LearningRate 0.0848 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:03,369-Speed 5649.71 samples/sec Loss 12.1217 LearningRate 0.0847 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:05,175-Speed 5672.58 samples/sec Loss 11.9567 LearningRate 0.0847 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:06,978-Speed 5681.59 samples/sec Loss 12.0276 LearningRate 0.0847 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:08,820-Speed 5563.28 samples/sec Loss 11.9976 LearningRate 0.0847 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:10,634-Speed 5646.91 samples/sec Loss 11.7917 LearningRate 0.0847 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:12,452-Speed 5636.05 samples/sec Loss 12.0303 LearningRate 0.0846 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:14,266-Speed 5646.07 samples/sec Loss 12.0692 LearningRate 0.0846 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:16,068-Speed 5684.71 samples/sec Loss 11.8618 LearningRate 0.0846 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:17,911-Speed 5561.01 samples/sec Loss 11.9524 LearningRate 0.0846 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:19,710-Speed 5693.63 samples/sec Loss 12.0166 LearningRate 0.0846 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:21,536-Speed 5611.23 samples/sec Loss 11.9919 LearningRate 0.0846 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:23,343-Speed 5669.48 samples/sec Loss 11.9697 LearningRate 0.0845 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:25,150-Speed 5670.49 samples/sec Loss 11.8702 LearningRate 0.0845 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:26,972-Speed 5619.94 samples/sec Loss 11.8602 LearningRate 0.0845 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:28,793-Speed 5626.49 samples/sec Loss 11.9774 LearningRate 0.0845 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:30,658-Speed 5494.40 samples/sec Loss 11.8104 LearningRate 0.0845 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:32,458-Speed 5691.72 samples/sec Loss 12.1036 LearningRate 0.0844 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:34,296-Speed 5575.87 samples/sec Loss 12.0969 LearningRate 0.0844 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:36,105-Speed 5664.94 samples/sec Loss 12.0233 LearningRate 0.0844 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:49:37,922-Speed 5639.12 samples/sec Loss 11.9617 LearningRate 0.0844 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:39,780-Speed 5515.53 samples/sec Loss 12.1158 LearningRate 0.0844 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:41,589-Speed 5664.44 samples/sec Loss 12.0268 LearningRate 0.0844 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:43,388-Speed 5695.07 samples/sec Loss 12.0212 LearningRate 0.0843 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:45,229-Speed 5565.56 samples/sec Loss 11.9674 LearningRate 0.0843 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:47,040-Speed 5658.13 samples/sec Loss 11.8822 LearningRate 0.0843 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:48,883-Speed 5556.46 samples/sec Loss 11.7937 LearningRate 0.0843 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:50,733-Speed 5539.92 samples/sec Loss 12.0021 LearningRate 0.0843 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:52,555-Speed 5623.53 samples/sec Loss 11.9657 LearningRate 0.0842 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:54,388-Speed 5587.68 samples/sec Loss 11.8800 LearningRate 0.0842 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:56,204-Speed 5640.33 samples/sec Loss 11.8922 LearningRate 0.0842 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:58,022-Speed 5637.09 samples/sec Loss 12.0319 LearningRate 0.0842 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:49:59,829-Speed 5668.36 samples/sec Loss 11.7271 LearningRate 0.0842 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:01,662-Speed 5588.79 samples/sec Loss 11.8530 LearningRate 0.0842 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:03,498-Speed 5581.12 samples/sec Loss 12.1727 LearningRate 0.0841 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:05,310-Speed 5652.85 samples/sec Loss 11.8916 LearningRate 0.0841 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:07,202-Speed 5417.43 samples/sec Loss 11.8874 LearningRate 0.0841 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:09,013-Speed 5656.89 samples/sec Loss 11.7485 LearningRate 0.0841 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:10,837-Speed 5617.07 samples/sec Loss 11.9435 LearningRate 0.0841 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:12,678-Speed 5562.38 samples/sec Loss 11.9612 LearningRate 0.0840 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:14,488-Speed 5659.74 samples/sec Loss 11.8089 LearningRate 0.0840 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:16,307-Speed 5634.05 samples/sec Loss 11.9132 LearningRate 0.0840 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:18,160-Speed 5528.70 samples/sec Loss 11.7615 LearningRate 0.0840 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:19,991-Speed 5595.12 samples/sec Loss 11.9148 LearningRate 0.0840 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:21,834-Speed 5560.70 samples/sec Loss 11.9500 LearningRate 0.0840 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:23,671-Speed 5578.73 samples/sec Loss 11.9070 LearningRate 0.0839 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:25,478-Speed 5672.93 samples/sec Loss 11.8167 LearningRate 0.0839 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:27,301-Speed 5618.11 samples/sec Loss 11.7210 LearningRate 0.0839 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:29,106-Speed 5676.97 samples/sec Loss 11.8292 LearningRate 0.0839 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:30,908-Speed 5684.38 samples/sec Loss 11.7854 LearningRate 0.0839 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:32,719-Speed 5659.54 samples/sec Loss 11.8030 LearningRate 0.0838 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:34,557-Speed 5574.49 samples/sec Loss 11.7852 LearningRate 0.0838 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:36,383-Speed 5608.58 samples/sec Loss 11.8129 LearningRate 0.0838 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:38,230-Speed 5547.74 samples/sec Loss 12.0251 LearningRate 0.0838 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:40,042-Speed 5654.99 samples/sec Loss 11.8900 LearningRate 0.0838 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:41,896-Speed 5524.51 samples/sec Loss 11.8700 LearningRate 0.0838 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:43,722-Speed 5611.09 samples/sec Loss 11.8679 LearningRate 0.0837 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:45,545-Speed 5618.98 samples/sec Loss 11.8199 LearningRate 0.0837 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:50:47,353-Speed 5665.75 samples/sec Loss 12.0055 LearningRate 0.0837 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:49,170-Speed 5639.86 samples/sec Loss 11.6771 LearningRate 0.0837 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:50,979-Speed 5662.52 samples/sec Loss 11.6362 LearningRate 0.0837 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:52,824-Speed 5552.78 samples/sec Loss 11.9845 LearningRate 0.0836 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:54,636-Speed 5654.75 samples/sec Loss 11.8356 LearningRate 0.0836 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:56,476-Speed 5567.36 samples/sec Loss 11.7226 LearningRate 0.0836 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:50:58,277-Speed 5689.86 samples/sec Loss 11.7836 LearningRate 0.0836 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:00,084-Speed 5667.02 samples/sec Loss 11.7587 LearningRate 0.0836 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:01,902-Speed 5637.70 samples/sec Loss 11.7364 LearningRate 0.0836 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:03,711-Speed 5663.03 samples/sec Loss 11.7948 LearningRate 0.0835 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:05,522-Speed 5655.29 samples/sec Loss 11.8346 LearningRate 0.0835 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:07,346-Speed 5617.43 samples/sec Loss 11.8069 LearningRate 0.0835 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:09,195-Speed 5543.54 samples/sec Loss 11.7656 LearningRate 0.0835 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:11,076-Speed 5444.77 samples/sec Loss 11.9411 LearningRate 0.0835 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:12,883-Speed 5668.84 samples/sec Loss 11.7921 LearningRate 0.0834 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:14,710-Speed 5607.56 samples/sec Loss 11.7172 LearningRate 0.0834 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:16,539-Speed 5602.03 samples/sec Loss 11.6484 LearningRate 0.0834 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:18,356-Speed 5636.92 samples/sec Loss 11.7092 LearningRate 0.0834 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:20,163-Speed 5671.24 samples/sec Loss 11.5859 LearningRate 0.0834 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:21,971-Speed 5666.44 samples/sec Loss 11.9443 LearningRate 0.0834 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:23,770-Speed 5693.49 samples/sec Loss 11.6350 LearningRate 0.0833 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:25,622-Speed 5533.72 samples/sec Loss 11.5946 LearningRate 0.0833 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:27,474-Speed 5531.25 samples/sec Loss 11.5152 LearningRate 0.0833 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:29,306-Speed 5591.30 samples/sec Loss 11.7577 LearningRate 0.0833 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:31,123-Speed 5639.76 samples/sec Loss 11.7672 LearningRate 0.0833 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:32,928-Speed 5675.42 samples/sec Loss 11.5403 LearningRate 0.0833 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:34,749-Speed 5626.40 samples/sec Loss 11.7925 LearningRate 0.0832 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:36,563-Speed 5649.60 samples/sec Loss 11.6250 LearningRate 0.0832 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:38,398-Speed 5581.37 samples/sec Loss 11.6479 LearningRate 0.0832 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:40,207-Speed 5665.97 samples/sec Loss 11.6031 LearningRate 0.0832 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:42,042-Speed 5582.22 samples/sec Loss 11.7884 LearningRate 0.0832 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:51:43,856-Speed 5648.32 samples/sec Loss 11.7479 LearningRate 0.0831 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:45,665-Speed 5659.92 samples/sec Loss 11.6054 LearningRate 0.0831 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:47,493-Speed 5607.08 samples/sec Loss 11.6829 LearningRate 0.0831 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:49,326-Speed 5587.33 samples/sec Loss 11.7054 LearningRate 0.0831 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:51:51,180-Speed 5526.19 samples/sec Loss 11.6280 LearningRate 0.0831 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:52,996-Speed 5643.68 samples/sec Loss 11.7370 LearningRate 0.0831 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:54,801-Speed 5674.34 samples/sec Loss 11.6904 LearningRate 0.0830 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:56,650-Speed 5543.21 samples/sec Loss 11.6614 LearningRate 0.0830 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:51:58,507-Speed 5515.06 samples/sec Loss 11.6586 LearningRate 0.0830 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:00,336-Speed 5601.59 samples/sec Loss 11.8396 LearningRate 0.0830 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:02,151-Speed 5645.42 samples/sec Loss 11.5966 LearningRate 0.0830 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:03,996-Speed 5553.54 samples/sec Loss 11.6397 LearningRate 0.0829 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:05,800-Speed 5677.79 samples/sec Loss 11.9146 LearningRate 0.0829 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:07,633-Speed 5588.91 samples/sec Loss 11.5535 LearningRate 0.0829 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:09,438-Speed 5678.23 samples/sec Loss 11.5365 LearningRate 0.0829 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:11,249-Speed 5655.59 samples/sec Loss 11.6009 LearningRate 0.0829 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:13,071-Speed 5624.59 samples/sec Loss 11.6722 LearningRate 0.0829 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:14,892-Speed 5623.38 samples/sec Loss 11.7785 LearningRate 0.0828 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:16,739-Speed 5547.08 samples/sec Loss 11.5616 LearningRate 0.0828 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:18,543-Speed 5678.96 samples/sec Loss 11.6224 LearningRate 0.0828 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:20,352-Speed 5662.82 samples/sec Loss 11.5517 LearningRate 0.0828 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:22,215-Speed 5501.46 samples/sec Loss 11.7127 LearningRate 0.0828 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:24,017-Speed 5683.68 samples/sec Loss 11.6960 LearningRate 0.0827 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:25,955-Speed 5286.92 samples/sec Loss 11.5106 LearningRate 0.0827 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:27,753-Speed 5696.28 samples/sec Loss 11.6576 LearningRate 0.0827 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:29,596-Speed 5559.17 samples/sec Loss 11.5236 LearningRate 0.0827 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:31,436-Speed 5570.85 samples/sec Loss 11.4550 LearningRate 0.0827 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:33,235-Speed 5694.14 samples/sec Loss 11.5504 LearningRate 0.0827 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:35,040-Speed 5674.93 samples/sec Loss 11.5508 LearningRate 0.0826 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:36,858-Speed 5635.28 samples/sec Loss 11.7055 LearningRate 0.0826 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:38,663-Speed 5678.03 samples/sec Loss 11.5971 LearningRate 0.0826 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:40,474-Speed 5656.49 samples/sec Loss 11.5486 LearningRate 0.0826 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:42,321-Speed 5547.61 samples/sec Loss 11.4217 LearningRate 0.0826 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:44,127-Speed 5671.40 samples/sec Loss 11.6198 LearningRate 0.0825 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:52:46,003-Speed 5462.06 samples/sec Loss 11.5756 LearningRate 0.0825 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:47,809-Speed 5673.30 samples/sec Loss 11.4635 LearningRate 0.0825 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:49,631-Speed 5622.10 samples/sec Loss 11.3894 LearningRate 0.0825 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:51,533-Speed 5386.21 samples/sec Loss 11.5023 LearningRate 0.0825 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:53,348-Speed 5644.25 samples/sec Loss 11.4400 LearningRate 0.0825 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:55,159-Speed 5655.45 samples/sec Loss 11.6683 LearningRate 0.0824 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:56,995-Speed 5579.55 samples/sec Loss 11.5974 LearningRate 0.0824 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:52:58,799-Speed 5680.26 samples/sec Loss 11.4540 LearningRate 0.0824 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:00,658-Speed 5511.53 samples/sec Loss 11.6107 LearningRate 0.0824 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:02,475-Speed 5640.53 samples/sec Loss 11.5538 LearningRate 0.0824 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:04,301-Speed 5607.90 samples/sec Loss 11.7580 LearningRate 0.0824 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:06,112-Speed 5658.52 samples/sec Loss 11.4134 LearningRate 0.0823 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:07,914-Speed 5685.28 samples/sec Loss 11.6036 LearningRate 0.0823 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:09,744-Speed 5600.27 samples/sec Loss 11.6929 LearningRate 0.0823 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:11,553-Speed 5661.52 samples/sec Loss 11.4477 LearningRate 0.0823 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:13,382-Speed 5601.95 samples/sec Loss 11.5334 LearningRate 0.0823 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:15,202-Speed 5629.53 samples/sec Loss 11.4329 LearningRate 0.0822 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:17,035-Speed 5586.82 samples/sec Loss 11.4974 LearningRate 0.0822 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:18,855-Speed 5627.83 samples/sec Loss 11.7482 LearningRate 0.0822 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:20,661-Speed 5673.59 samples/sec Loss 11.3903 LearningRate 0.0822 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:22,474-Speed 5652.82 samples/sec Loss 11.6052 LearningRate 0.0822 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:24,291-Speed 5640.54 samples/sec Loss 11.5193 LearningRate 0.0822 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:26,098-Speed 5667.45 samples/sec Loss 11.3829 LearningRate 0.0821 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:27,926-Speed 5605.86 samples/sec Loss 11.4876 LearningRate 0.0821 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:29,760-Speed 5585.12 samples/sec Loss 11.5169 LearningRate 0.0821 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:31,582-Speed 5625.67 samples/sec Loss 11.4479 LearningRate 0.0821 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:33,402-Speed 5630.45 samples/sec Loss 11.4840 LearningRate 0.0821 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:35,209-Speed 5670.63 samples/sec Loss 11.5313 LearningRate 0.0820 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:37,028-Speed 5632.63 samples/sec Loss 11.4213 LearningRate 0.0820 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:38,843-Speed 5645.62 samples/sec Loss 11.3721 LearningRate 0.0820 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:40,686-Speed 5558.52 samples/sec Loss 11.5621 LearningRate 0.0820 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:42,494-Speed 5666.01 samples/sec Loss 11.2946 LearningRate 0.0820 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:44,296-Speed 5684.17 samples/sec Loss 11.5335 LearningRate 0.0820 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:46,129-Speed 5590.34 samples/sec Loss 11.8026 LearningRate 0.0819 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:47,929-Speed 5691.07 samples/sec Loss 11.5379 LearningRate 0.0819 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:49,733-Speed 5679.74 samples/sec Loss 11.3824 LearningRate 0.0819 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:51,577-Speed 5556.37 samples/sec Loss 11.3665 LearningRate 0.0819 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:53,389-Speed 5652.58 samples/sec Loss 11.2230 LearningRate 0.0819 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:55,215-Speed 5612.19 samples/sec Loss 11.4752 LearningRate 0.0818 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:57,024-Speed 5662.97 samples/sec Loss 11.4465 LearningRate 0.0818 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:53:58,824-Speed 5689.77 samples/sec Loss 11.4819 LearningRate 0.0818 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:00,662-Speed 5574.32 samples/sec Loss 11.3358 LearningRate 0.0818 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:02,469-Speed 5673.50 samples/sec Loss 11.3520 LearningRate 0.0818 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:04,272-Speed 5683.12 samples/sec Loss 11.4599 LearningRate 0.0818 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:06,097-Speed 5611.86 samples/sec Loss 11.5488 LearningRate 0.0817 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:07,897-Speed 5692.51 samples/sec Loss 11.3132 LearningRate 0.0817 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:09,745-Speed 5544.81 samples/sec Loss 11.3766 LearningRate 0.0817 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:11,549-Speed 5679.83 samples/sec Loss 11.2326 LearningRate 0.0817 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:13,353-Speed 5677.26 samples/sec Loss 11.2633 LearningRate 0.0817 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:15,168-Speed 5642.72 samples/sec Loss 11.3867 LearningRate 0.0817 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:17,009-Speed 5565.65 samples/sec Loss 11.2646 LearningRate 0.0816 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:18,821-Speed 5655.31 samples/sec Loss 11.3250 LearningRate 0.0816 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:20,651-Speed 5599.80 samples/sec Loss 11.4040 LearningRate 0.0816 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:22,462-Speed 5663.67 samples/sec Loss 11.4125 LearningRate 0.0816 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:24,336-Speed 5465.37 samples/sec Loss 11.5447 LearningRate 0.0816 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:26,149-Speed 5651.61 samples/sec Loss 11.3842 LearningRate 0.0815 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:27,988-Speed 5569.54 samples/sec Loss 11.3801 LearningRate 0.0815 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:29,794-Speed 5673.73 samples/sec Loss 11.2870 LearningRate 0.0815 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:31,642-Speed 5543.84 samples/sec Loss 11.2840 LearningRate 0.0815 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:33,445-Speed 5681.10 samples/sec Loss 11.4703 LearningRate 0.0815 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:35,290-Speed 5555.02 samples/sec Loss 11.5834 LearningRate 0.0815 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:54:37,101-Speed 5654.31 samples/sec Loss 11.3345 LearningRate 0.0814 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:38,923-Speed 5627.00 samples/sec Loss 11.3866 LearningRate 0.0814 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:40,730-Speed 5667.89 samples/sec Loss 11.4509 LearningRate 0.0814 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:42,555-Speed 5614.77 samples/sec Loss 11.3410 LearningRate 0.0814 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:44,358-Speed 5678.78 samples/sec Loss 11.4007 LearningRate 0.0814 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:46,193-Speed 5583.42 samples/sec Loss 11.2595 LearningRate 0.0813 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:48,008-Speed 5644.16 samples/sec Loss 11.2341 LearningRate 0.0813 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:49,856-Speed 5546.56 samples/sec Loss 11.5700 LearningRate 0.0813 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:51,675-Speed 5632.80 samples/sec Loss 11.3144 LearningRate 0.0813 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:53,504-Speed 5601.84 samples/sec Loss 11.3037 LearningRate 0.0813 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:55,382-Speed 5454.14 samples/sec Loss 11.2136 LearningRate 0.0813 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:57,187-Speed 5676.30 samples/sec Loss 11.3557 LearningRate 0.0812 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:54:59,009-Speed 5622.15 samples/sec Loss 11.4865 LearningRate 0.0812 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:55:00,821-Speed 5654.72 samples/sec Loss 11.2960 LearningRate 0.0812 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:55:28,458-[lfw][10000]XNorm: 21.465626 Training: 2022-04-11 10:55:28,458-[lfw][10000]Accuracy-Flip: 0.99483+-0.00293 Training: 2022-04-11 10:55:28,458-[lfw][10000]Accuracy-Highest: 0.99483 Training: 2022-04-11 10:55:59,773-[cfp_fp][10000]XNorm: 18.616215 Training: 2022-04-11 10:55:59,773-[cfp_fp][10000]Accuracy-Flip: 0.92457+-0.00973 Training: 2022-04-11 10:55:59,774-[cfp_fp][10000]Accuracy-Highest: 0.92457 Training: 2022-04-11 10:56:27,155-[agedb_30][10000]XNorm: 21.377548 Training: 2022-04-11 10:56:27,155-[agedb_30][10000]Accuracy-Flip: 0.96050+-0.00925 Training: 2022-04-11 10:56:27,155-[agedb_30][10000]Accuracy-Highest: 0.96050 Training: 2022-04-11 10:56:28,998-Speed 116.13 samples/sec Loss 11.2706 LearningRate 0.0812 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:30,809-Speed 5658.11 samples/sec Loss 11.2755 LearningRate 0.0812 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:32,605-Speed 5704.45 samples/sec Loss 11.2625 LearningRate 0.0812 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:34,470-Speed 5492.45 samples/sec Loss 11.4125 LearningRate 0.0811 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:36,274-Speed 5679.56 samples/sec Loss 11.3464 LearningRate 0.0811 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:38,102-Speed 5606.33 samples/sec Loss 11.2382 LearningRate 0.0811 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:39,903-Speed 5688.29 samples/sec Loss 11.2890 LearningRate 0.0811 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:41,735-Speed 5593.77 samples/sec Loss 11.3047 LearningRate 0.0811 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:43,533-Speed 5695.60 samples/sec Loss 11.2983 LearningRate 0.0810 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:45,361-Speed 5605.30 samples/sec Loss 11.3921 LearningRate 0.0810 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:47,307-Speed 5264.46 samples/sec Loss 11.1143 LearningRate 0.0810 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:56:59,442-Speed 843.95 samples/sec Loss 10.7863 LearningRate 0.0810 Epoch: 2 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:01,292-Speed 5539.19 samples/sec Loss 10.4029 LearningRate 0.0810 Epoch: 2 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:03,151-Speed 5512.99 samples/sec Loss 10.6321 LearningRate 0.0810 Epoch: 2 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:04,981-Speed 5601.90 samples/sec Loss 10.6596 LearningRate 0.0809 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:06,918-Speed 5290.13 samples/sec Loss 10.2906 LearningRate 0.0809 Epoch: 2 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:08,909-Speed 5146.06 samples/sec Loss 10.3439 LearningRate 0.0809 Epoch: 2 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:10,757-Speed 5543.09 samples/sec Loss 10.5188 LearningRate 0.0809 Epoch: 2 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:12,603-Speed 5549.34 samples/sec Loss 10.5784 LearningRate 0.0809 Epoch: 2 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:14,437-Speed 5621.25 samples/sec Loss 10.4105 LearningRate 0.0809 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:16,327-Speed 5420.76 samples/sec Loss 10.5836 LearningRate 0.0808 Epoch: 2 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:18,157-Speed 5598.95 samples/sec Loss 10.6532 LearningRate 0.0808 Epoch: 2 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:20,016-Speed 5509.26 samples/sec Loss 10.4835 LearningRate 0.0808 Epoch: 2 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:21,821-Speed 5678.31 samples/sec Loss 10.6118 LearningRate 0.0808 Epoch: 2 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:23,659-Speed 5574.13 samples/sec Loss 10.3883 LearningRate 0.0808 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:25,471-Speed 5656.72 samples/sec Loss 10.5154 LearningRate 0.0807 Epoch: 2 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:27,270-Speed 5693.93 samples/sec Loss 10.5790 LearningRate 0.0807 Epoch: 2 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:29,106-Speed 5581.75 samples/sec Loss 10.6977 LearningRate 0.0807 Epoch: 2 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:30,964-Speed 5512.02 samples/sec Loss 10.7059 LearningRate 0.0807 Epoch: 2 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:32,769-Speed 5676.88 samples/sec Loss 10.6855 LearningRate 0.0807 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:34,599-Speed 5599.33 samples/sec Loss 10.6687 LearningRate 0.0807 Epoch: 2 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:36,431-Speed 5589.81 samples/sec Loss 10.6844 LearningRate 0.0806 Epoch: 2 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:38,265-Speed 5586.72 samples/sec Loss 10.6289 LearningRate 0.0806 Epoch: 2 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:40,078-Speed 5652.36 samples/sec Loss 10.7103 LearningRate 0.0806 Epoch: 2 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:41,921-Speed 5556.40 samples/sec Loss 10.8042 LearningRate 0.0806 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:43,760-Speed 5572.00 samples/sec Loss 10.7823 LearningRate 0.0806 Epoch: 2 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:45,555-Speed 5706.60 samples/sec Loss 10.6930 LearningRate 0.0805 Epoch: 2 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:47,388-Speed 5596.83 samples/sec Loss 10.7881 LearningRate 0.0805 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:49,191-Speed 5683.34 samples/sec Loss 10.6766 LearningRate 0.0805 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:51,003-Speed 5652.06 samples/sec Loss 10.7716 LearningRate 0.0805 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:52,904-Speed 5391.73 samples/sec Loss 10.8460 LearningRate 0.0805 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:54,759-Speed 5522.85 samples/sec Loss 10.7293 LearningRate 0.0805 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:56,596-Speed 5578.38 samples/sec Loss 10.8922 LearningRate 0.0804 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:57:58,441-Speed 5553.59 samples/sec Loss 10.7553 LearningRate 0.0804 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:00,247-Speed 5673.28 samples/sec Loss 10.6825 LearningRate 0.0804 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:02,108-Speed 5505.29 samples/sec Loss 10.6781 LearningRate 0.0804 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:03,925-Speed 5639.63 samples/sec Loss 10.6175 LearningRate 0.0804 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:05,771-Speed 5552.23 samples/sec Loss 10.7481 LearningRate 0.0804 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:07,592-Speed 5624.69 samples/sec Loss 10.5945 LearningRate 0.0803 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:09,433-Speed 5566.83 samples/sec Loss 10.7872 LearningRate 0.0803 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:11,248-Speed 5646.67 samples/sec Loss 10.7455 LearningRate 0.0803 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:13,075-Speed 5607.15 samples/sec Loss 10.6704 LearningRate 0.0803 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:14,890-Speed 5642.08 samples/sec Loss 10.8112 LearningRate 0.0803 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:16,726-Speed 5581.57 samples/sec Loss 10.9836 LearningRate 0.0802 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:18,578-Speed 5532.65 samples/sec Loss 10.9387 LearningRate 0.0802 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:20,461-Speed 5440.82 samples/sec Loss 10.8568 LearningRate 0.0802 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:22,285-Speed 5616.47 samples/sec Loss 10.8439 LearningRate 0.0802 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:24,124-Speed 5572.52 samples/sec Loss 10.9868 LearningRate 0.0802 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:25,959-Speed 5580.89 samples/sec Loss 10.8578 LearningRate 0.0802 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:27,773-Speed 5647.18 samples/sec Loss 10.7951 LearningRate 0.0801 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:29,601-Speed 5605.71 samples/sec Loss 10.8886 LearningRate 0.0801 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:31,419-Speed 5636.39 samples/sec Loss 10.9178 LearningRate 0.0801 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:33,237-Speed 5635.11 samples/sec Loss 10.8221 LearningRate 0.0801 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:35,048-Speed 5657.07 samples/sec Loss 10.8386 LearningRate 0.0801 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:36,851-Speed 5681.04 samples/sec Loss 11.0019 LearningRate 0.0801 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:38,704-Speed 5530.69 samples/sec Loss 10.7744 LearningRate 0.0800 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:40,514-Speed 5661.12 samples/sec Loss 10.8629 LearningRate 0.0800 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:42,363-Speed 5539.29 samples/sec Loss 11.0091 LearningRate 0.0800 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:44,183-Speed 5630.50 samples/sec Loss 11.0246 LearningRate 0.0800 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:45,992-Speed 5663.94 samples/sec Loss 11.0382 LearningRate 0.0800 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:47,823-Speed 5593.63 samples/sec Loss 10.9172 LearningRate 0.0799 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:49,641-Speed 5634.82 samples/sec Loss 11.0299 LearningRate 0.0799 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:51,451-Speed 5659.80 samples/sec Loss 10.9407 LearningRate 0.0799 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:53,298-Speed 5548.65 samples/sec Loss 10.9434 LearningRate 0.0799 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:58:55,141-Speed 5557.94 samples/sec Loss 11.0094 LearningRate 0.0799 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:56,947-Speed 5672.70 samples/sec Loss 10.7443 LearningRate 0.0799 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:58:58,784-Speed 5576.04 samples/sec Loss 10.9317 LearningRate 0.0798 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:00,614-Speed 5599.86 samples/sec Loss 11.0744 LearningRate 0.0798 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:02,419-Speed 5678.03 samples/sec Loss 10.9402 LearningRate 0.0798 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:04,223-Speed 5677.57 samples/sec Loss 10.8177 LearningRate 0.0798 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:06,065-Speed 5560.25 samples/sec Loss 10.8487 LearningRate 0.0798 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:07,884-Speed 5634.77 samples/sec Loss 11.1048 LearningRate 0.0798 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:09,740-Speed 5521.18 samples/sec Loss 10.9379 LearningRate 0.0797 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:11,618-Speed 5454.32 samples/sec Loss 10.8515 LearningRate 0.0797 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:13,428-Speed 5660.90 samples/sec Loss 10.8613 LearningRate 0.0797 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:15,259-Speed 5595.45 samples/sec Loss 11.0316 LearningRate 0.0797 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:17,138-Speed 5451.56 samples/sec Loss 10.7536 LearningRate 0.0797 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:18,959-Speed 5625.27 samples/sec Loss 10.7949 LearningRate 0.0796 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:20,805-Speed 5552.45 samples/sec Loss 10.8597 LearningRate 0.0796 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:22,615-Speed 5659.45 samples/sec Loss 10.8825 LearningRate 0.0796 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:24,483-Speed 5484.24 samples/sec Loss 11.0401 LearningRate 0.0796 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:26,300-Speed 5638.61 samples/sec Loss 10.8538 LearningRate 0.0796 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:28,131-Speed 5596.39 samples/sec Loss 10.8505 LearningRate 0.0796 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:29,983-Speed 5531.75 samples/sec Loss 11.0514 LearningRate 0.0795 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:31,833-Speed 5535.94 samples/sec Loss 10.8038 LearningRate 0.0795 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:33,641-Speed 5667.66 samples/sec Loss 10.9510 LearningRate 0.0795 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:35,471-Speed 5598.01 samples/sec Loss 10.8496 LearningRate 0.0795 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:37,279-Speed 5667.82 samples/sec Loss 10.9952 LearningRate 0.0795 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:39,114-Speed 5583.69 samples/sec Loss 10.9900 LearningRate 0.0795 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:40,930-Speed 5641.92 samples/sec Loss 10.9581 LearningRate 0.0794 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:42,784-Speed 5529.10 samples/sec Loss 11.0176 LearningRate 0.0794 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:44,592-Speed 5666.95 samples/sec Loss 11.0167 LearningRate 0.0794 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:46,417-Speed 5614.93 samples/sec Loss 11.0250 LearningRate 0.0794 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:48,233-Speed 5640.29 samples/sec Loss 10.9033 LearningRate 0.0794 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:50,044-Speed 5656.76 samples/sec Loss 11.0729 LearningRate 0.0793 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 10:59:51,920-Speed 5461.38 samples/sec Loss 10.8672 LearningRate 0.0793 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 10:59:53,735-Speed 5645.08 samples/sec Loss 10.8274 LearningRate 0.0793 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:59:55,627-Speed 5415.95 samples/sec Loss 11.0687 LearningRate 0.0793 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:59:57,440-Speed 5651.23 samples/sec Loss 10.8795 LearningRate 0.0793 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 10:59:59,273-Speed 5587.65 samples/sec Loss 10.8208 LearningRate 0.0793 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:01,098-Speed 5612.43 samples/sec Loss 10.8442 LearningRate 0.0792 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:02,950-Speed 5532.43 samples/sec Loss 10.9444 LearningRate 0.0792 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:04,764-Speed 5647.99 samples/sec Loss 10.9898 LearningRate 0.0792 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:06,586-Speed 5622.83 samples/sec Loss 10.8858 LearningRate 0.0792 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:08,415-Speed 5603.12 samples/sec Loss 10.9190 LearningRate 0.0792 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:10,228-Speed 5649.88 samples/sec Loss 10.9349 LearningRate 0.0792 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:00:12,060-Speed 5594.64 samples/sec Loss 10.8825 LearningRate 0.0791 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:13,903-Speed 5556.74 samples/sec Loss 10.8739 LearningRate 0.0791 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:15,717-Speed 5650.09 samples/sec Loss 10.8690 LearningRate 0.0791 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:17,575-Speed 5512.69 samples/sec Loss 10.8599 LearningRate 0.0791 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:19,398-Speed 5622.21 samples/sec Loss 10.9907 LearningRate 0.0791 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:21,204-Speed 5669.87 samples/sec Loss 10.8815 LearningRate 0.0790 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:23,018-Speed 5650.64 samples/sec Loss 10.8990 LearningRate 0.0790 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:24,876-Speed 5514.93 samples/sec Loss 10.8905 LearningRate 0.0790 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:26,687-Speed 5657.17 samples/sec Loss 10.9895 LearningRate 0.0790 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:28,556-Speed 5483.14 samples/sec Loss 10.9642 LearningRate 0.0790 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:30,365-Speed 5663.12 samples/sec Loss 11.0745 LearningRate 0.0790 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:32,199-Speed 5585.12 samples/sec Loss 10.8470 LearningRate 0.0789 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:34,009-Speed 5659.67 samples/sec Loss 10.9754 LearningRate 0.0789 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:35,815-Speed 5675.08 samples/sec Loss 10.8640 LearningRate 0.0789 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:37,678-Speed 5499.15 samples/sec Loss 10.8083 LearningRate 0.0789 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:39,498-Speed 5627.29 samples/sec Loss 10.8530 LearningRate 0.0789 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:41,363-Speed 5495.12 samples/sec Loss 10.8023 LearningRate 0.0789 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:43,187-Speed 5616.56 samples/sec Loss 11.0187 LearningRate 0.0788 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:45,042-Speed 5523.35 samples/sec Loss 10.9112 LearningRate 0.0788 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:46,864-Speed 5621.61 samples/sec Loss 10.8199 LearningRate 0.0788 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:48,683-Speed 5635.12 samples/sec Loss 10.7629 LearningRate 0.0788 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:50,506-Speed 5622.04 samples/sec Loss 10.7965 LearningRate 0.0788 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:52,316-Speed 5661.18 samples/sec Loss 11.0544 LearningRate 0.0787 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:54,184-Speed 5485.12 samples/sec Loss 11.0171 LearningRate 0.0787 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:56,003-Speed 5633.71 samples/sec Loss 10.9338 LearningRate 0.0787 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:57,808-Speed 5677.21 samples/sec Loss 11.0658 LearningRate 0.0787 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:00:59,660-Speed 5532.21 samples/sec Loss 10.8703 LearningRate 0.0787 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:01,471-Speed 5658.60 samples/sec Loss 10.7047 LearningRate 0.0787 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:03,301-Speed 5597.33 samples/sec Loss 11.0785 LearningRate 0.0786 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:05,132-Speed 5596.82 samples/sec Loss 10.9350 LearningRate 0.0786 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:06,978-Speed 5550.51 samples/sec Loss 10.7847 LearningRate 0.0786 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:08,782-Speed 5679.15 samples/sec Loss 11.0867 LearningRate 0.0786 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:10,596-Speed 5649.68 samples/sec Loss 10.9988 LearningRate 0.0786 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:12,447-Speed 5535.17 samples/sec Loss 10.9107 LearningRate 0.0786 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:14,263-Speed 5641.91 samples/sec Loss 11.0429 LearningRate 0.0785 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:16,108-Speed 5552.01 samples/sec Loss 10.9685 LearningRate 0.0785 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:17,920-Speed 5655.28 samples/sec Loss 10.7218 LearningRate 0.0785 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:19,788-Speed 5483.64 samples/sec Loss 10.8217 LearningRate 0.0785 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:21,592-Speed 5679.84 samples/sec Loss 10.7811 LearningRate 0.0785 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:23,397-Speed 5676.50 samples/sec Loss 10.7625 LearningRate 0.0785 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:25,228-Speed 5594.87 samples/sec Loss 10.7891 LearningRate 0.0784 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:27,032-Speed 5679.83 samples/sec Loss 10.7629 LearningRate 0.0784 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:28,842-Speed 5659.22 samples/sec Loss 10.7186 LearningRate 0.0784 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:30,691-Speed 5541.67 samples/sec Loss 10.9326 LearningRate 0.0784 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:32,505-Speed 5647.86 samples/sec Loss 10.9604 LearningRate 0.0784 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 11:01:34,315-Speed 5659.41 samples/sec Loss 10.7845 LearningRate 0.0783 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:36,151-Speed 5579.54 samples/sec Loss 10.8868 LearningRate 0.0783 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:37,959-Speed 5665.99 samples/sec Loss 10.7978 LearningRate 0.0783 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:39,763-Speed 5681.46 samples/sec Loss 10.8024 LearningRate 0.0783 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:41,610-Speed 5545.66 samples/sec Loss 10.8556 LearningRate 0.0783 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:43,416-Speed 5671.12 samples/sec Loss 10.7521 LearningRate 0.0783 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:45,235-Speed 5631.70 samples/sec Loss 10.7035 LearningRate 0.0782 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:47,067-Speed 5593.93 samples/sec Loss 10.8777 LearningRate 0.0782 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:48,881-Speed 5648.66 samples/sec Loss 10.9212 LearningRate 0.0782 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:50,683-Speed 5684.99 samples/sec Loss 10.9078 LearningRate 0.0782 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:52,497-Speed 5646.69 samples/sec Loss 10.9283 LearningRate 0.0782 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:01:54,324-Speed 5606.75 samples/sec Loss 10.8292 LearningRate 0.0782 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:01:56,135-Speed 5655.59 samples/sec Loss 10.7382 LearningRate 0.0781 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:01:57,961-Speed 5611.10 samples/sec Loss 10.9490 LearningRate 0.0781 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:01:59,777-Speed 5641.02 samples/sec Loss 10.8564 LearningRate 0.0781 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:01,591-Speed 5658.19 samples/sec Loss 10.7119 LearningRate 0.0781 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:03,423-Speed 5590.96 samples/sec Loss 10.6929 LearningRate 0.0781 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:05,241-Speed 5635.83 samples/sec Loss 10.8463 LearningRate 0.0780 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:07,048-Speed 5669.08 samples/sec Loss 10.7058 LearningRate 0.0780 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:08,886-Speed 5573.25 samples/sec Loss 10.9077 LearningRate 0.0780 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:10,691-Speed 5675.32 samples/sec Loss 10.8853 LearningRate 0.0780 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 11:02:12,488-Speed 5699.20 samples/sec Loss 10.9163 LearningRate 0.0780 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:14,326-Speed 5574.06 samples/sec Loss 10.7939 LearningRate 0.0780 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:16,141-Speed 5647.93 samples/sec Loss 10.9000 LearningRate 0.0779 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:18,000-Speed 5513.90 samples/sec Loss 10.7967 LearningRate 0.0779 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:19,808-Speed 5665.63 samples/sec Loss 10.9241 LearningRate 0.0779 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:21,630-Speed 5623.63 samples/sec Loss 10.6826 LearningRate 0.0779 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:23,439-Speed 5660.65 samples/sec Loss 10.7957 LearningRate 0.0779 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:25,269-Speed 5598.95 samples/sec Loss 10.9247 LearningRate 0.0779 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:27,084-Speed 5645.58 samples/sec Loss 10.7837 LearningRate 0.0778 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:28,894-Speed 5658.44 samples/sec Loss 10.8484 LearningRate 0.0778 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:30,708-Speed 5649.35 samples/sec Loss 10.8752 LearningRate 0.0778 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:32,578-Speed 5479.16 samples/sec Loss 10.6911 LearningRate 0.0778 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:02:34,390-Speed 5652.01 samples/sec Loss 10.6911 LearningRate 0.0778 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:36,234-Speed 5561.61 samples/sec Loss 10.5788 LearningRate 0.0778 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:38,070-Speed 5582.61 samples/sec Loss 10.9517 LearningRate 0.0777 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:39,912-Speed 5563.21 samples/sec Loss 10.6974 LearningRate 0.0777 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:41,731-Speed 5631.24 samples/sec Loss 10.7780 LearningRate 0.0777 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:02:43,574-Speed 5560.59 samples/sec Loss 10.8642 LearningRate 0.0777 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:03:10,848-[lfw][12000]XNorm: 21.607418 Training: 2022-04-11 11:03:10,849-[lfw][12000]Accuracy-Flip: 0.99667+-0.00289 Training: 2022-04-11 11:03:10,850-[lfw][12000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:03:42,388-[cfp_fp][12000]XNorm: 18.734406 Training: 2022-04-11 11:03:42,389-[cfp_fp][12000]Accuracy-Flip: 0.91457+-0.01385 Training: 2022-04-11 11:03:42,390-[cfp_fp][12000]Accuracy-Highest: 0.92457 Training: 2022-04-11 11:04:09,618-[agedb_30][12000]XNorm: 21.070338 Training: 2022-04-11 11:04:09,619-[agedb_30][12000]Accuracy-Flip: 0.96283+-0.00949 Training: 2022-04-11 11:04:09,620-[agedb_30][12000]Accuracy-Highest: 0.96283 Training: 2022-04-11 11:04:11,463-Speed 116.51 samples/sec Loss 10.5584 LearningRate 0.0777 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:13,292-Speed 5601.12 samples/sec Loss 10.7694 LearningRate 0.0776 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:15,111-Speed 5632.40 samples/sec Loss 10.7634 LearningRate 0.0776 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:16,959-Speed 5544.97 samples/sec Loss 10.7593 LearningRate 0.0776 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:18,808-Speed 5542.37 samples/sec Loss 10.7825 LearningRate 0.0776 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:20,612-Speed 5681.39 samples/sec Loss 10.7645 LearningRate 0.0776 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:22,414-Speed 5684.52 samples/sec Loss 10.9004 LearningRate 0.0776 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:24,228-Speed 5644.91 samples/sec Loss 10.8093 LearningRate 0.0775 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:26,028-Speed 5692.32 samples/sec Loss 10.8178 LearningRate 0.0775 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:27,827-Speed 5696.35 samples/sec Loss 10.8346 LearningRate 0.0775 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:29,626-Speed 5692.56 samples/sec Loss 10.7646 LearningRate 0.0775 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:31,449-Speed 5618.89 samples/sec Loss 10.7074 LearningRate 0.0775 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:33,246-Speed 5701.78 samples/sec Loss 10.8078 LearningRate 0.0775 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:35,053-Speed 5675.30 samples/sec Loss 10.7176 LearningRate 0.0774 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:36,858-Speed 5673.02 samples/sec Loss 10.8367 LearningRate 0.0774 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:38,700-Speed 5562.28 samples/sec Loss 10.7505 LearningRate 0.0774 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:40,534-Speed 5587.17 samples/sec Loss 10.7744 LearningRate 0.0774 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:42,388-Speed 5525.89 samples/sec Loss 10.8297 LearningRate 0.0774 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:44,216-Speed 5605.15 samples/sec Loss 10.8789 LearningRate 0.0774 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:46,017-Speed 5687.14 samples/sec Loss 10.8403 LearningRate 0.0773 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:47,857-Speed 5568.63 samples/sec Loss 10.8414 LearningRate 0.0773 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:49,661-Speed 5680.64 samples/sec Loss 10.6280 LearningRate 0.0773 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:51,466-Speed 5675.08 samples/sec Loss 10.7909 LearningRate 0.0773 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:53,304-Speed 5573.08 samples/sec Loss 10.8833 LearningRate 0.0773 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:55,111-Speed 5668.91 samples/sec Loss 10.8863 LearningRate 0.0772 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:56,927-Speed 5642.22 samples/sec Loss 10.7653 LearningRate 0.0772 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:04:58,772-Speed 5553.84 samples/sec Loss 10.8695 LearningRate 0.0772 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:00,617-Speed 5552.13 samples/sec Loss 10.6295 LearningRate 0.0772 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:02,502-Speed 5434.05 samples/sec Loss 10.7332 LearningRate 0.0772 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:04,309-Speed 5670.58 samples/sec Loss 10.7263 LearningRate 0.0772 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:06,166-Speed 5518.26 samples/sec Loss 10.8730 LearningRate 0.0771 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:07,976-Speed 5660.99 samples/sec Loss 10.8595 LearningRate 0.0771 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:09,845-Speed 5481.50 samples/sec Loss 10.7656 LearningRate 0.0771 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:11,679-Speed 5586.54 samples/sec Loss 10.6923 LearningRate 0.0771 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:13,515-Speed 5579.82 samples/sec Loss 10.5370 LearningRate 0.0771 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:15,386-Speed 5473.69 samples/sec Loss 10.6060 LearningRate 0.0771 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:17,218-Speed 5593.43 samples/sec Loss 10.6867 LearningRate 0.0770 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:19,036-Speed 5637.59 samples/sec Loss 10.7949 LearningRate 0.0770 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:20,875-Speed 5568.03 samples/sec Loss 10.6406 LearningRate 0.0770 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:22,697-Speed 5622.59 samples/sec Loss 10.6963 LearningRate 0.0770 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:24,502-Speed 5676.99 samples/sec Loss 10.7042 LearningRate 0.0770 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:26,373-Speed 5477.35 samples/sec Loss 10.7649 LearningRate 0.0770 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:28,187-Speed 5647.06 samples/sec Loss 10.6857 LearningRate 0.0769 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:30,048-Speed 5506.62 samples/sec Loss 10.7394 LearningRate 0.0769 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:31,884-Speed 5578.19 samples/sec Loss 10.7291 LearningRate 0.0769 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:33,694-Speed 5662.34 samples/sec Loss 10.5138 LearningRate 0.0769 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:35,539-Speed 5554.29 samples/sec Loss 10.5487 LearningRate 0.0769 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:37,354-Speed 5645.44 samples/sec Loss 10.7627 LearningRate 0.0768 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:39,168-Speed 5647.99 samples/sec Loss 10.5869 LearningRate 0.0768 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:40,999-Speed 5595.18 samples/sec Loss 10.5757 LearningRate 0.0768 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:42,813-Speed 5649.26 samples/sec Loss 10.6799 LearningRate 0.0768 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:44,679-Speed 5490.79 samples/sec Loss 10.6053 LearningRate 0.0768 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:46,490-Speed 5655.34 samples/sec Loss 10.9026 LearningRate 0.0768 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:48,307-Speed 5639.66 samples/sec Loss 10.7052 LearningRate 0.0767 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:50,147-Speed 5568.33 samples/sec Loss 10.5894 LearningRate 0.0767 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 11:05:51,986-Speed 5569.53 samples/sec Loss 10.8885 LearningRate 0.0767 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:53,810-Speed 5617.92 samples/sec Loss 10.6221 LearningRate 0.0767 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:55,615-Speed 5678.00 samples/sec Loss 10.5591 LearningRate 0.0767 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:57,452-Speed 5574.60 samples/sec Loss 10.5346 LearningRate 0.0767 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:05:59,262-Speed 5663.39 samples/sec Loss 10.6327 LearningRate 0.0766 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:01,111-Speed 5539.30 samples/sec Loss 10.5079 LearningRate 0.0766 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:02,913-Speed 5687.38 samples/sec Loss 10.4161 LearningRate 0.0766 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:04,778-Speed 5492.30 samples/sec Loss 10.6621 LearningRate 0.0766 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:06,598-Speed 5633.37 samples/sec Loss 10.5926 LearningRate 0.0766 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:08,417-Speed 5632.05 samples/sec Loss 10.7041 LearningRate 0.0766 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:10,220-Speed 5684.44 samples/sec Loss 10.6198 LearningRate 0.0765 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:12,056-Speed 5579.06 samples/sec Loss 10.4871 LearningRate 0.0765 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:13,887-Speed 5596.86 samples/sec Loss 10.6369 LearningRate 0.0765 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:15,717-Speed 5596.43 samples/sec Loss 10.5819 LearningRate 0.0765 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:17,541-Speed 5617.65 samples/sec Loss 10.7154 LearningRate 0.0765 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:19,377-Speed 5578.84 samples/sec Loss 10.6576 LearningRate 0.0765 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:21,225-Speed 5546.83 samples/sec Loss 10.5620 LearningRate 0.0764 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:23,032-Speed 5671.86 samples/sec Loss 10.5055 LearningRate 0.0764 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:24,886-Speed 5524.96 samples/sec Loss 10.4842 LearningRate 0.0764 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:26,707-Speed 5624.81 samples/sec Loss 10.5609 LearningRate 0.0764 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:28,600-Speed 5411.99 samples/sec Loss 10.7339 LearningRate 0.0764 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:30,447-Speed 5547.09 samples/sec Loss 10.6683 LearningRate 0.0763 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:32,275-Speed 5607.03 samples/sec Loss 10.6234 LearningRate 0.0763 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:34,147-Speed 5472.39 samples/sec Loss 10.4997 LearningRate 0.0763 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:35,950-Speed 5685.64 samples/sec Loss 10.6002 LearningRate 0.0763 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:37,755-Speed 5674.07 samples/sec Loss 10.6474 LearningRate 0.0763 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:39,623-Speed 5485.79 samples/sec Loss 10.4996 LearningRate 0.0763 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:41,464-Speed 5567.16 samples/sec Loss 10.5925 LearningRate 0.0762 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:43,313-Speed 5543.07 samples/sec Loss 10.6412 LearningRate 0.0762 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:45,143-Speed 5599.43 samples/sec Loss 10.6881 LearningRate 0.0762 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:46,979-Speed 5579.50 samples/sec Loss 10.6771 LearningRate 0.0762 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:48,839-Speed 5508.06 samples/sec Loss 10.5116 LearningRate 0.0762 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:50,654-Speed 5643.82 samples/sec Loss 10.5456 LearningRate 0.0762 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:52,495-Speed 5565.02 samples/sec Loss 10.6008 LearningRate 0.0761 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:54,333-Speed 5576.37 samples/sec Loss 10.2828 LearningRate 0.0761 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:56,153-Speed 5627.82 samples/sec Loss 10.6359 LearningRate 0.0761 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:57,989-Speed 5582.18 samples/sec Loss 10.6319 LearningRate 0.0761 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:06:59,811-Speed 5623.27 samples/sec Loss 10.6139 LearningRate 0.0761 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:07:01,667-Speed 5518.15 samples/sec Loss 10.5523 LearningRate 0.0761 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:03,562-Speed 5408.55 samples/sec Loss 10.5389 LearningRate 0.0760 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:05,373-Speed 5656.27 samples/sec Loss 10.4496 LearningRate 0.0760 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:07,193-Speed 5631.76 samples/sec Loss 10.5996 LearningRate 0.0760 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:09,017-Speed 5614.87 samples/sec Loss 10.4378 LearningRate 0.0760 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:10,844-Speed 5610.64 samples/sec Loss 10.6668 LearningRate 0.0760 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:12,696-Speed 5531.22 samples/sec Loss 10.3875 LearningRate 0.0759 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:14,509-Speed 5653.34 samples/sec Loss 10.5119 LearningRate 0.0759 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:16,341-Speed 5591.75 samples/sec Loss 10.6711 LearningRate 0.0759 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:18,187-Speed 5550.19 samples/sec Loss 10.6379 LearningRate 0.0759 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:20,005-Speed 5636.47 samples/sec Loss 10.5415 LearningRate 0.0759 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:21,830-Speed 5614.17 samples/sec Loss 10.5231 LearningRate 0.0759 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:23,655-Speed 5614.72 samples/sec Loss 10.6517 LearningRate 0.0758 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:25,478-Speed 5619.11 samples/sec Loss 10.5238 LearningRate 0.0758 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:27,283-Speed 5678.15 samples/sec Loss 10.7000 LearningRate 0.0758 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:29,121-Speed 5573.42 samples/sec Loss 10.5555 LearningRate 0.0758 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:30,941-Speed 5631.49 samples/sec Loss 10.5287 LearningRate 0.0758 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:32,794-Speed 5532.15 samples/sec Loss 10.5723 LearningRate 0.0758 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:34,613-Speed 5629.03 samples/sec Loss 10.6088 LearningRate 0.0757 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:36,425-Speed 5655.35 samples/sec Loss 10.4620 LearningRate 0.0757 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:38,266-Speed 5564.04 samples/sec Loss 10.3533 LearningRate 0.0757 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:40,111-Speed 5554.17 samples/sec Loss 10.4939 LearningRate 0.0757 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:41,908-Speed 5700.70 samples/sec Loss 10.4757 LearningRate 0.0757 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:43,720-Speed 5655.43 samples/sec Loss 10.5212 LearningRate 0.0757 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:45,533-Speed 5649.47 samples/sec Loss 10.4663 LearningRate 0.0756 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:47,352-Speed 5633.29 samples/sec Loss 10.4633 LearningRate 0.0756 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:49,178-Speed 5607.83 samples/sec Loss 10.6121 LearningRate 0.0756 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:51,049-Speed 5475.35 samples/sec Loss 10.7067 LearningRate 0.0756 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:52,866-Speed 5640.14 samples/sec Loss 10.3911 LearningRate 0.0756 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:54,669-Speed 5681.84 samples/sec Loss 10.4250 LearningRate 0.0756 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:56,471-Speed 5684.03 samples/sec Loss 10.6248 LearningRate 0.0755 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:07:58,292-Speed 5625.24 samples/sec Loss 10.6641 LearningRate 0.0755 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:00,107-Speed 5644.14 samples/sec Loss 10.5558 LearningRate 0.0755 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 11:08:01,927-Speed 5628.95 samples/sec Loss 10.5418 LearningRate 0.0755 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:03,753-Speed 5609.83 samples/sec Loss 10.5623 LearningRate 0.0755 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:05,563-Speed 5661.71 samples/sec Loss 10.4062 LearningRate 0.0755 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:07,447-Speed 5436.11 samples/sec Loss 10.5161 LearningRate 0.0754 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:09,311-Speed 5497.84 samples/sec Loss 10.5949 LearningRate 0.0754 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:11,128-Speed 5640.03 samples/sec Loss 10.4365 LearningRate 0.0754 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:12,954-Speed 5609.24 samples/sec Loss 10.6075 LearningRate 0.0754 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:14,770-Speed 5643.28 samples/sec Loss 10.5141 LearningRate 0.0754 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:16,611-Speed 5563.01 samples/sec Loss 10.6235 LearningRate 0.0753 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:18,424-Speed 5651.52 samples/sec Loss 10.4983 LearningRate 0.0753 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:20,259-Speed 5585.17 samples/sec Loss 10.3953 LearningRate 0.0753 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:22,068-Speed 5664.11 samples/sec Loss 10.5441 LearningRate 0.0753 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:23,909-Speed 5564.66 samples/sec Loss 10.4686 LearningRate 0.0753 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:25,726-Speed 5637.36 samples/sec Loss 10.5450 LearningRate 0.0753 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:27,565-Speed 5571.01 samples/sec Loss 10.3557 LearningRate 0.0752 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:29,388-Speed 5621.15 samples/sec Loss 10.4981 LearningRate 0.0752 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:31,214-Speed 5615.91 samples/sec Loss 10.5017 LearningRate 0.0752 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:33,020-Speed 5675.26 samples/sec Loss 10.5129 LearningRate 0.0752 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:34,880-Speed 5506.23 samples/sec Loss 10.5099 LearningRate 0.0752 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:36,711-Speed 5596.63 samples/sec Loss 10.3272 LearningRate 0.0752 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:38,578-Speed 5485.94 samples/sec Loss 10.5943 LearningRate 0.0751 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:40,408-Speed 5598.40 samples/sec Loss 10.4270 LearningRate 0.0751 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:42,223-Speed 5647.19 samples/sec Loss 10.4879 LearningRate 0.0751 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:44,066-Speed 5559.56 samples/sec Loss 10.4478 LearningRate 0.0751 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:45,881-Speed 5643.61 samples/sec Loss 10.5355 LearningRate 0.0751 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:47,720-Speed 5571.21 samples/sec Loss 10.5185 LearningRate 0.0751 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:49,605-Speed 5434.07 samples/sec Loss 10.5718 LearningRate 0.0750 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:51,418-Speed 5651.27 samples/sec Loss 10.2559 LearningRate 0.0750 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:53,235-Speed 5640.32 samples/sec Loss 10.4348 LearningRate 0.0750 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:55,060-Speed 5612.90 samples/sec Loss 10.4105 LearningRate 0.0750 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:56,861-Speed 5689.62 samples/sec Loss 10.3556 LearningRate 0.0750 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:08:58,666-Speed 5672.47 samples/sec Loss 10.4080 LearningRate 0.0750 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:00,476-Speed 5660.50 samples/sec Loss 10.3780 LearningRate 0.0749 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:02,302-Speed 5612.43 samples/sec Loss 10.3694 LearningRate 0.0749 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:04,119-Speed 5638.72 samples/sec Loss 10.2348 LearningRate 0.0749 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:05,958-Speed 5569.82 samples/sec Loss 10.2489 LearningRate 0.0749 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:07,752-Speed 5710.28 samples/sec Loss 10.2977 LearningRate 0.0749 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:09,621-Speed 5482.20 samples/sec Loss 10.4979 LearningRate 0.0749 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:11,436-Speed 5643.42 samples/sec Loss 10.5157 LearningRate 0.0748 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:13,273-Speed 5576.92 samples/sec Loss 10.4931 LearningRate 0.0748 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:15,086-Speed 5651.46 samples/sec Loss 10.4747 LearningRate 0.0748 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:16,913-Speed 5608.26 samples/sec Loss 10.4801 LearningRate 0.0748 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:18,745-Speed 5591.01 samples/sec Loss 10.4308 LearningRate 0.0748 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:20,563-Speed 5633.45 samples/sec Loss 10.3758 LearningRate 0.0747 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:22,396-Speed 5589.32 samples/sec Loss 10.4958 LearningRate 0.0747 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:24,203-Speed 5670.76 samples/sec Loss 10.6315 LearningRate 0.0747 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:26,025-Speed 5623.90 samples/sec Loss 10.3470 LearningRate 0.0747 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:27,861-Speed 5580.26 samples/sec Loss 10.4949 LearningRate 0.0747 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:29,679-Speed 5635.81 samples/sec Loss 10.4033 LearningRate 0.0747 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:31,501-Speed 5623.40 samples/sec Loss 10.4730 LearningRate 0.0746 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:33,358-Speed 5517.20 samples/sec Loss 10.2792 LearningRate 0.0746 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:35,162-Speed 5679.93 samples/sec Loss 10.3923 LearningRate 0.0746 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:36,964-Speed 5683.82 samples/sec Loss 10.2999 LearningRate 0.0746 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:38,805-Speed 5566.76 samples/sec Loss 10.5138 LearningRate 0.0746 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:40,627-Speed 5623.40 samples/sec Loss 10.3753 LearningRate 0.0746 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:42,444-Speed 5638.41 samples/sec Loss 10.3568 LearningRate 0.0745 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:44,288-Speed 5555.78 samples/sec Loss 10.3638 LearningRate 0.0745 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:46,125-Speed 5578.06 samples/sec Loss 10.2708 LearningRate 0.0745 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:47,956-Speed 5594.46 samples/sec Loss 10.2901 LearningRate 0.0745 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:49,786-Speed 5599.77 samples/sec Loss 10.3214 LearningRate 0.0745 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:51,672-Speed 5431.49 samples/sec Loss 10.4779 LearningRate 0.0745 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:53,487-Speed 5647.72 samples/sec Loss 10.2223 LearningRate 0.0744 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:09:55,300-Speed 5648.84 samples/sec Loss 10.3197 LearningRate 0.0744 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:57,132-Speed 5593.32 samples/sec Loss 10.2944 LearningRate 0.0744 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:09:58,940-Speed 5666.95 samples/sec Loss 10.3363 LearningRate 0.0744 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:00,790-Speed 5537.65 samples/sec Loss 10.3287 LearningRate 0.0744 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:02,613-Speed 5620.06 samples/sec Loss 10.1891 LearningRate 0.0744 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:04,441-Speed 5605.14 samples/sec Loss 10.3477 LearningRate 0.0743 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:06,256-Speed 5646.00 samples/sec Loss 10.3669 LearningRate 0.0743 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:08,092-Speed 5580.05 samples/sec Loss 10.4482 LearningRate 0.0743 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:09,912-Speed 5627.53 samples/sec Loss 10.4291 LearningRate 0.0743 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:11,719-Speed 5671.81 samples/sec Loss 10.4160 LearningRate 0.0743 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:13,524-Speed 5673.34 samples/sec Loss 10.3808 LearningRate 0.0743 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:15,349-Speed 5613.27 samples/sec Loss 10.2403 LearningRate 0.0742 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:10:42,570-[lfw][14000]XNorm: 21.175048 Training: 2022-04-11 11:10:42,571-[lfw][14000]Accuracy-Flip: 0.99517+-0.00353 Training: 2022-04-11 11:10:42,571-[lfw][14000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:11:13,888-[cfp_fp][14000]XNorm: 18.010395 Training: 2022-04-11 11:11:13,889-[cfp_fp][14000]Accuracy-Flip: 0.93243+-0.01148 Training: 2022-04-11 11:11:13,890-[cfp_fp][14000]Accuracy-Highest: 0.93243 Training: 2022-04-11 11:11:41,025-[agedb_30][14000]XNorm: 20.614065 Training: 2022-04-11 11:11:41,026-[agedb_30][14000]Accuracy-Flip: 0.96417+-0.01020 Training: 2022-04-11 11:11:41,027-[agedb_30][14000]Accuracy-Highest: 0.96417 Training: 2022-04-11 11:11:42,862-Speed 117.01 samples/sec Loss 10.4229 LearningRate 0.0742 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:11:44,659-Speed 5700.70 samples/sec Loss 10.2819 LearningRate 0.0742 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:11:46,466-Speed 5668.08 samples/sec Loss 10.2335 LearningRate 0.0742 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 11:11:48,303-Speed 5576.98 samples/sec Loss 10.1398 LearningRate 0.0742 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:50,113-Speed 5660.73 samples/sec Loss 10.3161 LearningRate 0.0742 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:51,922-Speed 5665.16 samples/sec Loss 10.3416 LearningRate 0.0741 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:53,743-Speed 5625.40 samples/sec Loss 10.2850 LearningRate 0.0741 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:55,544-Speed 5688.65 samples/sec Loss 10.1804 LearningRate 0.0741 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:57,334-Speed 5723.81 samples/sec Loss 10.4727 LearningRate 0.0741 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:11:59,176-Speed 5561.23 samples/sec Loss 10.2654 LearningRate 0.0741 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:00,996-Speed 5629.12 samples/sec Loss 10.2326 LearningRate 0.0740 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:02,808-Speed 5654.46 samples/sec Loss 10.3686 LearningRate 0.0740 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:04,653-Speed 5553.13 samples/sec Loss 10.3205 LearningRate 0.0740 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:06,494-Speed 5565.05 samples/sec Loss 10.4062 LearningRate 0.0740 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:08,316-Speed 5623.31 samples/sec Loss 10.1521 LearningRate 0.0740 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:10,130-Speed 5645.14 samples/sec Loss 10.5547 LearningRate 0.0740 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:11,961-Speed 5598.32 samples/sec Loss 10.3129 LearningRate 0.0739 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:13,791-Speed 5596.83 samples/sec Loss 10.2736 LearningRate 0.0739 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:15,650-Speed 5512.63 samples/sec Loss 10.4428 LearningRate 0.0739 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:17,479-Speed 5601.61 samples/sec Loss 10.3138 LearningRate 0.0739 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:19,302-Speed 5620.02 samples/sec Loss 10.3545 LearningRate 0.0739 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:21,119-Speed 5638.69 samples/sec Loss 10.2218 LearningRate 0.0739 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:22,958-Speed 5574.78 samples/sec Loss 10.3755 LearningRate 0.0738 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:24,822-Speed 5495.81 samples/sec Loss 10.3809 LearningRate 0.0738 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:26,641-Speed 5632.54 samples/sec Loss 10.2394 LearningRate 0.0738 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:28,462-Speed 5625.16 samples/sec Loss 10.2333 LearningRate 0.0738 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:30,296-Speed 5587.26 samples/sec Loss 10.2584 LearningRate 0.0738 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:32,102-Speed 5674.62 samples/sec Loss 10.2232 LearningRate 0.0738 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:33,937-Speed 5584.64 samples/sec Loss 10.1995 LearningRate 0.0737 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:35,759-Speed 5623.31 samples/sec Loss 10.4934 LearningRate 0.0737 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:37,611-Speed 5531.27 samples/sec Loss 10.3839 LearningRate 0.0737 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:39,411-Speed 5692.10 samples/sec Loss 10.5791 LearningRate 0.0737 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:41,217-Speed 5674.60 samples/sec Loss 10.4217 LearningRate 0.0737 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:43,038-Speed 5625.31 samples/sec Loss 10.4859 LearningRate 0.0737 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:44,845-Speed 5669.63 samples/sec Loss 10.3963 LearningRate 0.0736 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:46,658-Speed 5651.91 samples/sec Loss 10.2601 LearningRate 0.0736 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:48,528-Speed 5480.90 samples/sec Loss 10.5168 LearningRate 0.0736 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:50,336-Speed 5664.18 samples/sec Loss 10.3266 LearningRate 0.0736 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:52,152-Speed 5641.20 samples/sec Loss 10.2659 LearningRate 0.0736 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:53,982-Speed 5600.83 samples/sec Loss 10.3944 LearningRate 0.0736 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:55,787-Speed 5673.84 samples/sec Loss 10.1857 LearningRate 0.0735 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:12:57,643-Speed 5521.92 samples/sec Loss 10.2655 LearningRate 0.0735 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:12:59,447-Speed 5677.81 samples/sec Loss 10.5424 LearningRate 0.0735 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:01,275-Speed 5605.86 samples/sec Loss 10.3504 LearningRate 0.0735 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:03,140-Speed 5493.84 samples/sec Loss 10.3526 LearningRate 0.0735 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:04,945-Speed 5674.73 samples/sec Loss 10.2592 LearningRate 0.0735 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:06,790-Speed 5551.95 samples/sec Loss 10.3171 LearningRate 0.0734 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:08,616-Speed 5612.60 samples/sec Loss 10.1961 LearningRate 0.0734 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:10,454-Speed 5574.31 samples/sec Loss 10.4714 LearningRate 0.0734 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:12,286-Speed 5591.90 samples/sec Loss 10.3372 LearningRate 0.0734 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:14,096-Speed 5661.67 samples/sec Loss 10.4409 LearningRate 0.0734 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:15,948-Speed 5530.47 samples/sec Loss 10.1532 LearningRate 0.0734 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:17,774-Speed 5612.99 samples/sec Loss 10.3296 LearningRate 0.0733 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:19,625-Speed 5533.64 samples/sec Loss 10.2730 LearningRate 0.0733 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:21,442-Speed 5639.54 samples/sec Loss 10.2714 LearningRate 0.0733 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:23,278-Speed 5580.83 samples/sec Loss 10.3900 LearningRate 0.0733 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:25,113-Speed 5582.84 samples/sec Loss 10.3890 LearningRate 0.0733 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:26,962-Speed 5541.91 samples/sec Loss 10.2635 LearningRate 0.0733 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:28,771-Speed 5663.27 samples/sec Loss 10.1485 LearningRate 0.0732 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:30,603-Speed 5591.05 samples/sec Loss 10.1749 LearningRate 0.0732 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:32,407-Speed 5682.14 samples/sec Loss 10.1698 LearningRate 0.0732 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:34,220-Speed 5650.06 samples/sec Loss 10.4196 LearningRate 0.0732 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 11:13:36,020-Speed 5692.17 samples/sec Loss 10.3980 LearningRate 0.0732 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:37,844-Speed 5616.20 samples/sec Loss 10.3629 LearningRate 0.0732 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:39,652-Speed 5666.05 samples/sec Loss 10.3148 LearningRate 0.0731 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:41,499-Speed 5548.93 samples/sec Loss 10.2656 LearningRate 0.0731 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:13:43,306-Speed 5670.33 samples/sec Loss 10.3704 LearningRate 0.0731 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:45,134-Speed 5661.35 samples/sec Loss 10.1868 LearningRate 0.0731 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:46,968-Speed 5588.57 samples/sec Loss 10.2479 LearningRate 0.0731 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:48,811-Speed 5559.01 samples/sec Loss 10.2344 LearningRate 0.0730 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:50,643-Speed 5593.16 samples/sec Loss 10.2243 LearningRate 0.0730 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:52,483-Speed 5568.12 samples/sec Loss 10.2215 LearningRate 0.0730 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:54,344-Speed 5507.57 samples/sec Loss 10.4156 LearningRate 0.0730 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:56,149-Speed 5675.52 samples/sec Loss 10.3005 LearningRate 0.0730 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:57,966-Speed 5639.85 samples/sec Loss 10.2261 LearningRate 0.0730 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:13:59,797-Speed 5595.03 samples/sec Loss 10.3630 LearningRate 0.0729 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:01,660-Speed 5498.09 samples/sec Loss 10.2410 LearningRate 0.0729 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:03,493-Speed 5589.93 samples/sec Loss 10.2375 LearningRate 0.0729 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:05,306-Speed 5650.56 samples/sec Loss 10.1675 LearningRate 0.0729 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:07,130-Speed 5615.99 samples/sec Loss 10.1812 LearningRate 0.0729 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:08,936-Speed 5673.72 samples/sec Loss 10.4261 LearningRate 0.0729 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:10,779-Speed 5560.02 samples/sec Loss 10.0879 LearningRate 0.0728 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:12,610-Speed 5596.03 samples/sec Loss 10.1744 LearningRate 0.0728 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:14,421-Speed 5656.53 samples/sec Loss 10.1357 LearningRate 0.0728 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:16,259-Speed 5572.51 samples/sec Loss 10.0885 LearningRate 0.0728 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:18,075-Speed 5642.23 samples/sec Loss 10.1903 LearningRate 0.0728 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:19,922-Speed 5549.30 samples/sec Loss 10.0753 LearningRate 0.0728 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:21,731-Speed 5664.38 samples/sec Loss 10.3372 LearningRate 0.0727 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:23,572-Speed 5565.60 samples/sec Loss 10.2383 LearningRate 0.0727 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:25,414-Speed 5561.86 samples/sec Loss 10.2270 LearningRate 0.0727 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:27,265-Speed 5534.97 samples/sec Loss 10.1695 LearningRate 0.0727 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:29,106-Speed 5567.42 samples/sec Loss 10.0158 LearningRate 0.0727 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:30,941-Speed 5584.52 samples/sec Loss 10.1865 LearningRate 0.0727 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:32,787-Speed 5551.74 samples/sec Loss 10.2817 LearningRate 0.0726 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:34,596-Speed 5662.62 samples/sec Loss 10.3195 LearningRate 0.0726 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:36,410-Speed 5648.28 samples/sec Loss 10.2143 LearningRate 0.0726 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:38,225-Speed 5645.10 samples/sec Loss 10.1532 LearningRate 0.0726 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:40,041-Speed 5642.75 samples/sec Loss 10.1277 LearningRate 0.0726 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:41,879-Speed 5575.09 samples/sec Loss 10.2220 LearningRate 0.0726 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:43,697-Speed 5635.90 samples/sec Loss 10.4588 LearningRate 0.0725 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:45,534-Speed 5577.13 samples/sec Loss 10.1825 LearningRate 0.0725 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:47,342-Speed 5666.41 samples/sec Loss 10.2895 LearningRate 0.0725 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:49,174-Speed 5593.06 samples/sec Loss 10.2441 LearningRate 0.0725 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:50,996-Speed 5623.14 samples/sec Loss 10.0314 LearningRate 0.0725 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:52,848-Speed 5531.00 samples/sec Loss 10.0158 LearningRate 0.0725 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:14:54,685-Speed 5579.24 samples/sec Loss 10.3277 LearningRate 0.0724 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:56,510-Speed 5614.67 samples/sec Loss 10.0066 LearningRate 0.0724 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:14:58,322-Speed 5654.72 samples/sec Loss 10.3099 LearningRate 0.0724 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:00,196-Speed 5466.25 samples/sec Loss 10.2446 LearningRate 0.0724 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:02,013-Speed 5639.31 samples/sec Loss 10.1341 LearningRate 0.0724 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:03,858-Speed 5553.26 samples/sec Loss 10.1426 LearningRate 0.0724 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:05,691-Speed 5589.05 samples/sec Loss 10.2103 LearningRate 0.0723 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:07,560-Speed 5481.79 samples/sec Loss 10.0046 LearningRate 0.0723 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:09,371-Speed 5657.57 samples/sec Loss 10.1051 LearningRate 0.0723 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:11,187-Speed 5643.27 samples/sec Loss 10.2427 LearningRate 0.0723 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:13,018-Speed 5597.62 samples/sec Loss 10.2983 LearningRate 0.0723 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:14,891-Speed 5471.45 samples/sec Loss 10.0217 LearningRate 0.0723 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:26,128-Speed 911.37 samples/sec Loss 9.7422 LearningRate 0.0722 Epoch: 3 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:27,999-Speed 5476.89 samples/sec Loss 9.2672 LearningRate 0.0722 Epoch: 3 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:29,828-Speed 5605.30 samples/sec Loss 9.2866 LearningRate 0.0722 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:31,684-Speed 5518.44 samples/sec Loss 9.1695 LearningRate 0.0722 Epoch: 3 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:33,513-Speed 5606.45 samples/sec Loss 9.2647 LearningRate 0.0722 Epoch: 3 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:35,511-Speed 5126.90 samples/sec Loss 9.2683 LearningRate 0.0722 Epoch: 3 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:37,341-Speed 5601.36 samples/sec Loss 9.3141 LearningRate 0.0721 Epoch: 3 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:39,188-Speed 5547.50 samples/sec Loss 9.2501 LearningRate 0.0721 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:40,991-Speed 5683.73 samples/sec Loss 9.5842 LearningRate 0.0721 Epoch: 3 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:42,814-Speed 5619.42 samples/sec Loss 9.4031 LearningRate 0.0721 Epoch: 3 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:44,643-Speed 5600.51 samples/sec Loss 9.5399 LearningRate 0.0721 Epoch: 3 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:46,488-Speed 5553.26 samples/sec Loss 9.3562 LearningRate 0.0721 Epoch: 3 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:48,319-Speed 5595.86 samples/sec Loss 9.5488 LearningRate 0.0720 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:50,169-Speed 5538.14 samples/sec Loss 9.5202 LearningRate 0.0720 Epoch: 3 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:51,996-Speed 5607.13 samples/sec Loss 9.5305 LearningRate 0.0720 Epoch: 3 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:53,834-Speed 5576.39 samples/sec Loss 9.4968 LearningRate 0.0720 Epoch: 3 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:55,652-Speed 5636.90 samples/sec Loss 9.5614 LearningRate 0.0720 Epoch: 3 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:57,465-Speed 5649.17 samples/sec Loss 9.4829 LearningRate 0.0720 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:15:59,337-Speed 5476.81 samples/sec Loss 9.4750 LearningRate 0.0719 Epoch: 3 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:01,169-Speed 5590.94 samples/sec Loss 9.4762 LearningRate 0.0719 Epoch: 3 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:03,040-Speed 5475.94 samples/sec Loss 9.5156 LearningRate 0.0719 Epoch: 3 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:04,865-Speed 5614.49 samples/sec Loss 9.5526 LearningRate 0.0719 Epoch: 3 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:06,731-Speed 5490.56 samples/sec Loss 9.7076 LearningRate 0.0719 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:08,544-Speed 5653.12 samples/sec Loss 9.4177 LearningRate 0.0719 Epoch: 3 Global Step: 15410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:10,358-Speed 5646.35 samples/sec Loss 9.5048 LearningRate 0.0718 Epoch: 3 Global Step: 15420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:12,238-Speed 5449.08 samples/sec Loss 9.4426 LearningRate 0.0718 Epoch: 3 Global Step: 15430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:14,063-Speed 5613.53 samples/sec Loss 9.5377 LearningRate 0.0718 Epoch: 3 Global Step: 15440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:15,916-Speed 5529.94 samples/sec Loss 9.6546 LearningRate 0.0718 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:17,815-Speed 5396.76 samples/sec Loss 9.6172 LearningRate 0.0718 Epoch: 3 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:19,663-Speed 5543.17 samples/sec Loss 9.6561 LearningRate 0.0718 Epoch: 3 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:21,474-Speed 5656.82 samples/sec Loss 9.7446 LearningRate 0.0717 Epoch: 3 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:23,327-Speed 5528.15 samples/sec Loss 9.6784 LearningRate 0.0717 Epoch: 3 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:25,144-Speed 5638.37 samples/sec Loss 9.7405 LearningRate 0.0717 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:26,999-Speed 5524.26 samples/sec Loss 9.7065 LearningRate 0.0717 Epoch: 3 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:28,841-Speed 5563.45 samples/sec Loss 9.6947 LearningRate 0.0717 Epoch: 3 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:30,682-Speed 5566.12 samples/sec Loss 9.6581 LearningRate 0.0717 Epoch: 3 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:32,488-Speed 5672.70 samples/sec Loss 9.6695 LearningRate 0.0716 Epoch: 3 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:34,352-Speed 5497.24 samples/sec Loss 9.7013 LearningRate 0.0716 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:36,182-Speed 5598.49 samples/sec Loss 9.6707 LearningRate 0.0716 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:37,992-Speed 5660.75 samples/sec Loss 9.6270 LearningRate 0.0716 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:39,831-Speed 5571.76 samples/sec Loss 9.5192 LearningRate 0.0716 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:41,666-Speed 5583.03 samples/sec Loss 9.6647 LearningRate 0.0716 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:43,501-Speed 5582.97 samples/sec Loss 9.7626 LearningRate 0.0715 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:45,325-Speed 5618.46 samples/sec Loss 9.8003 LearningRate 0.0715 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:47,173-Speed 5543.40 samples/sec Loss 9.6622 LearningRate 0.0715 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:49,026-Speed 5529.38 samples/sec Loss 9.6264 LearningRate 0.0715 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:50,846-Speed 5631.05 samples/sec Loss 9.7569 LearningRate 0.0715 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:16:52,678-Speed 5596.66 samples/sec Loss 9.8125 LearningRate 0.0715 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:54,523-Speed 5553.18 samples/sec Loss 9.6842 LearningRate 0.0714 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:56,335-Speed 5653.45 samples/sec Loss 9.9041 LearningRate 0.0714 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:16:58,205-Speed 5478.80 samples/sec Loss 9.7151 LearningRate 0.0714 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:00,037-Speed 5592.53 samples/sec Loss 9.6325 LearningRate 0.0714 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:01,876-Speed 5570.94 samples/sec Loss 9.5297 LearningRate 0.0714 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:03,694-Speed 5636.40 samples/sec Loss 9.7934 LearningRate 0.0714 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:05,552-Speed 5517.16 samples/sec Loss 9.9596 LearningRate 0.0713 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:07,417-Speed 5492.03 samples/sec Loss 9.7960 LearningRate 0.0713 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:09,246-Speed 5601.55 samples/sec Loss 9.7476 LearningRate 0.0713 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:17:11,066-Speed 5629.64 samples/sec Loss 9.7464 LearningRate 0.0713 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:12,951-Speed 5436.13 samples/sec Loss 9.8830 LearningRate 0.0713 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:14,773-Speed 5625.09 samples/sec Loss 9.6342 LearningRate 0.0713 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:16,627-Speed 5527.72 samples/sec Loss 9.8571 LearningRate 0.0712 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:18,443-Speed 5643.52 samples/sec Loss 9.8766 LearningRate 0.0712 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:20,266-Speed 5617.05 samples/sec Loss 9.8247 LearningRate 0.0712 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:22,095-Speed 5605.60 samples/sec Loss 9.8807 LearningRate 0.0712 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:23,929-Speed 5586.66 samples/sec Loss 9.8184 LearningRate 0.0712 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:25,789-Speed 5506.78 samples/sec Loss 9.8078 LearningRate 0.0712 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:27,635-Speed 5549.56 samples/sec Loss 9.7401 LearningRate 0.0711 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:29,464-Speed 5604.12 samples/sec Loss 9.6050 LearningRate 0.0711 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:31,292-Speed 5601.87 samples/sec Loss 9.8767 LearningRate 0.0711 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:33,121-Speed 5604.78 samples/sec Loss 9.8067 LearningRate 0.0711 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:34,971-Speed 5536.51 samples/sec Loss 9.9310 LearningRate 0.0711 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:36,796-Speed 5615.54 samples/sec Loss 9.8389 LearningRate 0.0711 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:38,631-Speed 5582.79 samples/sec Loss 9.7977 LearningRate 0.0710 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:40,441-Speed 5660.83 samples/sec Loss 9.9632 LearningRate 0.0710 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:42,309-Speed 5483.89 samples/sec Loss 9.7426 LearningRate 0.0710 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:44,126-Speed 5639.68 samples/sec Loss 9.8097 LearningRate 0.0710 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:45,992-Speed 5491.69 samples/sec Loss 10.0355 LearningRate 0.0710 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:47,805-Speed 5651.13 samples/sec Loss 9.9540 LearningRate 0.0710 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:49,675-Speed 5477.21 samples/sec Loss 9.7964 LearningRate 0.0709 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:51,498-Speed 5622.58 samples/sec Loss 9.9411 LearningRate 0.0709 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:53,343-Speed 5551.60 samples/sec Loss 9.9845 LearningRate 0.0709 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:55,176-Speed 5590.40 samples/sec Loss 9.9387 LearningRate 0.0709 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:17:57,059-Speed 5441.97 samples/sec Loss 9.8023 LearningRate 0.0709 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:18:24,249-[lfw][16000]XNorm: 21.707389 Training: 2022-04-11 11:18:24,250-[lfw][16000]Accuracy-Flip: 0.99583+-0.00227 Training: 2022-04-11 11:18:24,251-[lfw][16000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:18:55,973-[cfp_fp][16000]XNorm: 18.765134 Training: 2022-04-11 11:18:55,974-[cfp_fp][16000]Accuracy-Flip: 0.94543+-0.01141 Training: 2022-04-11 11:18:55,975-[cfp_fp][16000]Accuracy-Highest: 0.94543 Training: 2022-04-11 11:19:23,138-[agedb_30][16000]XNorm: 21.307326 Training: 2022-04-11 11:19:23,138-[agedb_30][16000]Accuracy-Flip: 0.96350+-0.01031 Training: 2022-04-11 11:19:23,139-[agedb_30][16000]Accuracy-Highest: 0.96417 Training: 2022-04-11 11:19:24,958-Speed 116.50 samples/sec Loss 9.8535 LearningRate 0.0709 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:26,815-Speed 5514.75 samples/sec Loss 9.8577 LearningRate 0.0708 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:28,669-Speed 5526.77 samples/sec Loss 10.0185 LearningRate 0.0708 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:30,489-Speed 5631.52 samples/sec Loss 9.7050 LearningRate 0.0708 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:32,287-Speed 5699.25 samples/sec Loss 9.8121 LearningRate 0.0708 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:34,157-Speed 5477.88 samples/sec Loss 9.8915 LearningRate 0.0708 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:35,976-Speed 5631.76 samples/sec Loss 9.8870 LearningRate 0.0708 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:37,795-Speed 5631.57 samples/sec Loss 9.6810 LearningRate 0.0707 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:39,615-Speed 5630.49 samples/sec Loss 9.7815 LearningRate 0.0707 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:41,501-Speed 5431.97 samples/sec Loss 9.8909 LearningRate 0.0707 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:43,318-Speed 5637.11 samples/sec Loss 9.9432 LearningRate 0.0707 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:45,120-Speed 5689.15 samples/sec Loss 9.9369 LearningRate 0.0707 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:46,961-Speed 5564.81 samples/sec Loss 9.9648 LearningRate 0.0707 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:48,777-Speed 5640.12 samples/sec Loss 9.7658 LearningRate 0.0706 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:50,637-Speed 5508.49 samples/sec Loss 9.9512 LearningRate 0.0706 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:52,479-Speed 5563.09 samples/sec Loss 9.9464 LearningRate 0.0706 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:54,319-Speed 5567.01 samples/sec Loss 9.7215 LearningRate 0.0706 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:56,150-Speed 5595.83 samples/sec Loss 9.7173 LearningRate 0.0706 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:19:57,983-Speed 5590.45 samples/sec Loss 9.9226 LearningRate 0.0706 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:19:59,850-Speed 5487.66 samples/sec Loss 9.8222 LearningRate 0.0705 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:01,712-Speed 5501.72 samples/sec Loss 9.8331 LearningRate 0.0705 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:03,559-Speed 5547.41 samples/sec Loss 9.9176 LearningRate 0.0705 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:05,421-Speed 5503.28 samples/sec Loss 9.8154 LearningRate 0.0705 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:07,229-Speed 5665.21 samples/sec Loss 9.8936 LearningRate 0.0705 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:09,037-Speed 5665.44 samples/sec Loss 9.9894 LearningRate 0.0705 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:10,864-Speed 5610.09 samples/sec Loss 9.8433 LearningRate 0.0704 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:12,687-Speed 5616.95 samples/sec Loss 9.9429 LearningRate 0.0704 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:14,546-Speed 5513.71 samples/sec Loss 9.8351 LearningRate 0.0704 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:16,357-Speed 5656.26 samples/sec Loss 9.8261 LearningRate 0.0704 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:18,183-Speed 5609.75 samples/sec Loss 9.8026 LearningRate 0.0704 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:20,007-Speed 5617.97 samples/sec Loss 9.8540 LearningRate 0.0704 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:21,845-Speed 5577.91 samples/sec Loss 9.7294 LearningRate 0.0703 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:23,658-Speed 5651.62 samples/sec Loss 9.9765 LearningRate 0.0703 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:25,533-Speed 5465.30 samples/sec Loss 9.7949 LearningRate 0.0703 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:27,374-Speed 5563.81 samples/sec Loss 9.9667 LearningRate 0.0703 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:29,186-Speed 5656.65 samples/sec Loss 9.9314 LearningRate 0.0703 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:31,056-Speed 5477.24 samples/sec Loss 10.0419 LearningRate 0.0703 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:32,873-Speed 5639.93 samples/sec Loss 9.6212 LearningRate 0.0702 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:34,714-Speed 5566.93 samples/sec Loss 9.8229 LearningRate 0.0702 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:36,563-Speed 5540.08 samples/sec Loss 9.6831 LearningRate 0.0702 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:38,376-Speed 5650.61 samples/sec Loss 9.7775 LearningRate 0.0702 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:40,232-Speed 5518.99 samples/sec Loss 9.8586 LearningRate 0.0702 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:42,053-Speed 5626.37 samples/sec Loss 9.9701 LearningRate 0.0702 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:43,867-Speed 5648.86 samples/sec Loss 9.8570 LearningRate 0.0701 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:45,683-Speed 5641.37 samples/sec Loss 9.7352 LearningRate 0.0701 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:47,511-Speed 5606.83 samples/sec Loss 9.8402 LearningRate 0.0701 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:49,315-Speed 5677.13 samples/sec Loss 9.8453 LearningRate 0.0701 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:51,118-Speed 5681.57 samples/sec Loss 9.9320 LearningRate 0.0701 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:52,935-Speed 5638.27 samples/sec Loss 9.7371 LearningRate 0.0701 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:20:54,776-Speed 5564.71 samples/sec Loss 9.8806 LearningRate 0.0700 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:56,578-Speed 5684.92 samples/sec Loss 9.9198 LearningRate 0.0700 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:20:58,387-Speed 5664.62 samples/sec Loss 9.8522 LearningRate 0.0700 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:00,199-Speed 5654.38 samples/sec Loss 9.8525 LearningRate 0.0700 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:02,002-Speed 5680.21 samples/sec Loss 9.7355 LearningRate 0.0700 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:03,842-Speed 5568.67 samples/sec Loss 10.0875 LearningRate 0.0700 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:05,694-Speed 5532.25 samples/sec Loss 10.0250 LearningRate 0.0699 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:07,545-Speed 5533.27 samples/sec Loss 9.8762 LearningRate 0.0699 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:09,352-Speed 5669.86 samples/sec Loss 9.6994 LearningRate 0.0699 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:11,186-Speed 5585.87 samples/sec Loss 9.7660 LearningRate 0.0699 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:12,997-Speed 5659.38 samples/sec Loss 9.8195 LearningRate 0.0699 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:14,854-Speed 5517.14 samples/sec Loss 10.0186 LearningRate 0.0699 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:16,686-Speed 5591.97 samples/sec Loss 9.8305 LearningRate 0.0698 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:18,565-Speed 5450.81 samples/sec Loss 9.8191 LearningRate 0.0698 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:20,367-Speed 5686.38 samples/sec Loss 9.8928 LearningRate 0.0698 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:22,208-Speed 5569.69 samples/sec Loss 9.6248 LearningRate 0.0698 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:24,069-Speed 5506.83 samples/sec Loss 9.8706 LearningRate 0.0698 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:25,876-Speed 5669.70 samples/sec Loss 9.7623 LearningRate 0.0698 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:27,715-Speed 5570.01 samples/sec Loss 9.9027 LearningRate 0.0697 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:29,523-Speed 5668.04 samples/sec Loss 10.1129 LearningRate 0.0697 Epoch: 3 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:31,403-Speed 5447.27 samples/sec Loss 9.8319 LearningRate 0.0697 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:33,218-Speed 5645.05 samples/sec Loss 9.8599 LearningRate 0.0697 Epoch: 3 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:35,059-Speed 5567.51 samples/sec Loss 9.8619 LearningRate 0.0697 Epoch: 3 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:36,878-Speed 5632.62 samples/sec Loss 9.8377 LearningRate 0.0697 Epoch: 3 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:38,790-Speed 5359.73 samples/sec Loss 10.0265 LearningRate 0.0696 Epoch: 3 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:40,599-Speed 5662.36 samples/sec Loss 9.8376 LearningRate 0.0696 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:42,426-Speed 5606.99 samples/sec Loss 9.7084 LearningRate 0.0696 Epoch: 3 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:21:44,285-Speed 5510.68 samples/sec Loss 9.7798 LearningRate 0.0696 Epoch: 3 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:46,104-Speed 5631.66 samples/sec Loss 9.8269 LearningRate 0.0696 Epoch: 3 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:47,941-Speed 5579.69 samples/sec Loss 9.8415 LearningRate 0.0696 Epoch: 3 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:49,817-Speed 5460.75 samples/sec Loss 9.8894 LearningRate 0.0695 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:51,646-Speed 5605.36 samples/sec Loss 9.6877 LearningRate 0.0695 Epoch: 3 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:53,473-Speed 5608.15 samples/sec Loss 9.7754 LearningRate 0.0695 Epoch: 3 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:55,292-Speed 5631.32 samples/sec Loss 9.8902 LearningRate 0.0695 Epoch: 3 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:57,122-Speed 5599.14 samples/sec Loss 9.8474 LearningRate 0.0695 Epoch: 3 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:21:58,983-Speed 5509.66 samples/sec Loss 9.9744 LearningRate 0.0695 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:00,798-Speed 5643.55 samples/sec Loss 9.8971 LearningRate 0.0694 Epoch: 3 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:02,651-Speed 5528.95 samples/sec Loss 9.7240 LearningRate 0.0694 Epoch: 3 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:04,452-Speed 5690.15 samples/sec Loss 9.8572 LearningRate 0.0694 Epoch: 3 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:06,295-Speed 5560.76 samples/sec Loss 9.7008 LearningRate 0.0694 Epoch: 3 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:08,111-Speed 5638.99 samples/sec Loss 9.7231 LearningRate 0.0694 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:09,939-Speed 5606.87 samples/sec Loss 9.7146 LearningRate 0.0694 Epoch: 3 Global Step: 16910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:11,833-Speed 5412.12 samples/sec Loss 10.0129 LearningRate 0.0693 Epoch: 3 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:13,651-Speed 5635.55 samples/sec Loss 9.8516 LearningRate 0.0693 Epoch: 3 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:15,484-Speed 5591.36 samples/sec Loss 9.9879 LearningRate 0.0693 Epoch: 3 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:17,407-Speed 5325.95 samples/sec Loss 9.8099 LearningRate 0.0693 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:19,237-Speed 5599.69 samples/sec Loss 9.9537 LearningRate 0.0693 Epoch: 3 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:21,051-Speed 5651.16 samples/sec Loss 9.7895 LearningRate 0.0693 Epoch: 3 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:22,889-Speed 5573.54 samples/sec Loss 9.7996 LearningRate 0.0692 Epoch: 3 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:24,735-Speed 5552.63 samples/sec Loss 9.7444 LearningRate 0.0692 Epoch: 3 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:26,551-Speed 5643.07 samples/sec Loss 9.6555 LearningRate 0.0692 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:28,383-Speed 5591.34 samples/sec Loss 9.8103 LearningRate 0.0692 Epoch: 3 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:30,216-Speed 5591.02 samples/sec Loss 9.8546 LearningRate 0.0692 Epoch: 3 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:32,035-Speed 5632.19 samples/sec Loss 9.8823 LearningRate 0.0692 Epoch: 3 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:33,895-Speed 5508.01 samples/sec Loss 9.7795 LearningRate 0.0691 Epoch: 3 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:35,780-Speed 5435.51 samples/sec Loss 9.7552 LearningRate 0.0691 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:37,625-Speed 5552.57 samples/sec Loss 9.8518 LearningRate 0.0691 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:39,517-Speed 5418.00 samples/sec Loss 9.8976 LearningRate 0.0691 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:41,448-Speed 5554.84 samples/sec Loss 10.0565 LearningRate 0.0691 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:22:43,259-Speed 5658.97 samples/sec Loss 9.7644 LearningRate 0.0691 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:45,090-Speed 5595.29 samples/sec Loss 9.8452 LearningRate 0.0690 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:46,935-Speed 5553.94 samples/sec Loss 9.6633 LearningRate 0.0690 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:48,759-Speed 5617.02 samples/sec Loss 9.9862 LearningRate 0.0690 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:50,584-Speed 5613.50 samples/sec Loss 9.8234 LearningRate 0.0690 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:52,437-Speed 5560.33 samples/sec Loss 9.8635 LearningRate 0.0690 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:54,275-Speed 5573.00 samples/sec Loss 9.9201 LearningRate 0.0690 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:56,134-Speed 5513.37 samples/sec Loss 9.8822 LearningRate 0.0690 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:57,958-Speed 5616.21 samples/sec Loss 9.8107 LearningRate 0.0689 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:22:59,818-Speed 5508.53 samples/sec Loss 9.8129 LearningRate 0.0689 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:01,628-Speed 5660.12 samples/sec Loss 9.7229 LearningRate 0.0689 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:03,477-Speed 5540.82 samples/sec Loss 9.7290 LearningRate 0.0689 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:05,347-Speed 5480.20 samples/sec Loss 9.8794 LearningRate 0.0689 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:07,177-Speed 5597.64 samples/sec Loss 9.7873 LearningRate 0.0689 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:09,008-Speed 5596.02 samples/sec Loss 9.7497 LearningRate 0.0688 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:10,836-Speed 5603.41 samples/sec Loss 9.7374 LearningRate 0.0688 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:12,738-Speed 5386.35 samples/sec Loss 9.8960 LearningRate 0.0688 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:14,578-Speed 5581.02 samples/sec Loss 9.7455 LearningRate 0.0688 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:16,410-Speed 5591.94 samples/sec Loss 9.8371 LearningRate 0.0688 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:18,277-Speed 5488.09 samples/sec Loss 9.8210 LearningRate 0.0688 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:20,085-Speed 5665.55 samples/sec Loss 9.8478 LearningRate 0.0687 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:21,937-Speed 5555.50 samples/sec Loss 9.6400 LearningRate 0.0687 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:23,748-Speed 5657.00 samples/sec Loss 9.6963 LearningRate 0.0687 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:25,611-Speed 5499.20 samples/sec Loss 9.7106 LearningRate 0.0687 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:27,421-Speed 5659.73 samples/sec Loss 9.7105 LearningRate 0.0687 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:23:29,239-Speed 5636.97 samples/sec Loss 9.6856 LearningRate 0.0687 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:31,079-Speed 5567.11 samples/sec Loss 9.8966 LearningRate 0.0686 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:32,940-Speed 5504.99 samples/sec Loss 9.6918 LearningRate 0.0686 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:34,806-Speed 5500.99 samples/sec Loss 9.7422 LearningRate 0.0686 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:36,614-Speed 5664.59 samples/sec Loss 9.8557 LearningRate 0.0686 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:38,458-Speed 5557.69 samples/sec Loss 9.7064 LearningRate 0.0686 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:40,275-Speed 5635.83 samples/sec Loss 9.8195 LearningRate 0.0686 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:42,135-Speed 5510.58 samples/sec Loss 9.7915 LearningRate 0.0685 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:43,957-Speed 5623.39 samples/sec Loss 9.7521 LearningRate 0.0685 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:45,780-Speed 5652.27 samples/sec Loss 9.7854 LearningRate 0.0685 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:23:47,590-Speed 5659.77 samples/sec Loss 9.8044 LearningRate 0.0685 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:49,447-Speed 5515.96 samples/sec Loss 9.9810 LearningRate 0.0685 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:51,280-Speed 5590.54 samples/sec Loss 9.7376 LearningRate 0.0685 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:53,124-Speed 5557.59 samples/sec Loss 9.7496 LearningRate 0.0684 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:54,938-Speed 5648.03 samples/sec Loss 9.8548 LearningRate 0.0684 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:56,808-Speed 5478.04 samples/sec Loss 9.8909 LearningRate 0.0684 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:23:58,622-Speed 5649.34 samples/sec Loss 9.7727 LearningRate 0.0684 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:00,440-Speed 5632.74 samples/sec Loss 9.8800 LearningRate 0.0684 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:02,321-Speed 5448.25 samples/sec Loss 9.5394 LearningRate 0.0684 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:04,143-Speed 5621.47 samples/sec Loss 9.9100 LearningRate 0.0683 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:05,993-Speed 5599.85 samples/sec Loss 9.6983 LearningRate 0.0683 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:07,834-Speed 5564.84 samples/sec Loss 9.9008 LearningRate 0.0683 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:09,640-Speed 5675.63 samples/sec Loss 9.8273 LearningRate 0.0683 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:11,467-Speed 5605.01 samples/sec Loss 9.6758 LearningRate 0.0683 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:13,272-Speed 5675.24 samples/sec Loss 9.6769 LearningRate 0.0683 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:15,134-Speed 5549.39 samples/sec Loss 9.7803 LearningRate 0.0682 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:17,030-Speed 5406.07 samples/sec Loss 9.5972 LearningRate 0.0682 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:18,862-Speed 5591.61 samples/sec Loss 9.6290 LearningRate 0.0682 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:20,676-Speed 5648.11 samples/sec Loss 9.6090 LearningRate 0.0682 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:22,520-Speed 5561.32 samples/sec Loss 9.5346 LearningRate 0.0682 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:24,338-Speed 5634.42 samples/sec Loss 9.7274 LearningRate 0.0682 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:26,166-Speed 5602.41 samples/sec Loss 9.7458 LearningRate 0.0681 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:27,997-Speed 5598.26 samples/sec Loss 9.6961 LearningRate 0.0681 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:29,859-Speed 5501.62 samples/sec Loss 9.7564 LearningRate 0.0681 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:31,666-Speed 5667.51 samples/sec Loss 9.7235 LearningRate 0.0681 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:33,534-Speed 5486.89 samples/sec Loss 9.6522 LearningRate 0.0681 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:35,363-Speed 5601.40 samples/sec Loss 9.6061 LearningRate 0.0681 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:37,245-Speed 5443.43 samples/sec Loss 9.6914 LearningRate 0.0681 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:39,113-Speed 5484.92 samples/sec Loss 9.7055 LearningRate 0.0680 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:40,972-Speed 5511.51 samples/sec Loss 9.6264 LearningRate 0.0680 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:42,792-Speed 5627.60 samples/sec Loss 9.7498 LearningRate 0.0680 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:44,654-Speed 5621.16 samples/sec Loss 9.6676 LearningRate 0.0680 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:46,475-Speed 5627.28 samples/sec Loss 9.6469 LearningRate 0.0680 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:48,284-Speed 5664.27 samples/sec Loss 9.6412 LearningRate 0.0680 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:50,100-Speed 5639.63 samples/sec Loss 9.5711 LearningRate 0.0679 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:51,927-Speed 5645.57 samples/sec Loss 9.6351 LearningRate 0.0679 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:24:53,759-Speed 5592.00 samples/sec Loss 9.7247 LearningRate 0.0679 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:55,595-Speed 5579.11 samples/sec Loss 9.7093 LearningRate 0.0679 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:57,402-Speed 5670.39 samples/sec Loss 9.7825 LearningRate 0.0679 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:24:59,271-Speed 5484.39 samples/sec Loss 9.8229 LearningRate 0.0679 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:01,116-Speed 5553.13 samples/sec Loss 9.7594 LearningRate 0.0678 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:02,985-Speed 5482.86 samples/sec Loss 9.7354 LearningRate 0.0678 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:04,823-Speed 5635.87 samples/sec Loss 9.6907 LearningRate 0.0678 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:06,665-Speed 5561.59 samples/sec Loss 9.6811 LearningRate 0.0678 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:08,484-Speed 5631.61 samples/sec Loss 9.8286 LearningRate 0.0678 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:10,338-Speed 5525.83 samples/sec Loss 9.7515 LearningRate 0.0678 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:25:12,162-Speed 5617.49 samples/sec Loss 9.7404 LearningRate 0.0677 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:14,032-Speed 5480.54 samples/sec Loss 9.7273 LearningRate 0.0677 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:15,963-Speed 5433.53 samples/sec Loss 9.5745 LearningRate 0.0677 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:17,793-Speed 5599.15 samples/sec Loss 9.7354 LearningRate 0.0677 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:19,674-Speed 5447.33 samples/sec Loss 9.7046 LearningRate 0.0677 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:21,483-Speed 5662.68 samples/sec Loss 9.7533 LearningRate 0.0677 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:23,338-Speed 5666.48 samples/sec Loss 9.7818 LearningRate 0.0676 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:25,164-Speed 5613.19 samples/sec Loss 9.7492 LearningRate 0.0676 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:27,016-Speed 5529.38 samples/sec Loss 9.6849 LearningRate 0.0676 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:28,828-Speed 5656.39 samples/sec Loss 9.7916 LearningRate 0.0676 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:30,639-Speed 5656.55 samples/sec Loss 9.8895 LearningRate 0.0676 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:25:58,172-[lfw][18000]XNorm: 21.568299 Training: 2022-04-11 11:25:58,175-[lfw][18000]Accuracy-Flip: 0.99467+-0.00323 Training: 2022-04-11 11:25:58,175-[lfw][18000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:26:30,004-[cfp_fp][18000]XNorm: 18.345091 Training: 2022-04-11 11:26:30,005-[cfp_fp][18000]Accuracy-Flip: 0.94057+-0.01056 Training: 2022-04-11 11:26:30,005-[cfp_fp][18000]Accuracy-Highest: 0.94543 Training: 2022-04-11 11:26:57,171-[agedb_30][18000]XNorm: 21.054906 Training: 2022-04-11 11:26:57,177-[agedb_30][18000]Accuracy-Flip: 0.96467+-0.00823 Training: 2022-04-11 11:26:57,178-[agedb_30][18000]Accuracy-Highest: 0.96467 Training: 2022-04-11 11:26:59,008-Speed 115.88 samples/sec Loss 9.6451 LearningRate 0.0676 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:00,862-Speed 5524.32 samples/sec Loss 9.6848 LearningRate 0.0675 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:02,699-Speed 5576.92 samples/sec Loss 9.6439 LearningRate 0.0675 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:04,551-Speed 5546.76 samples/sec Loss 9.4978 LearningRate 0.0675 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:06,387-Speed 5578.67 samples/sec Loss 9.8556 LearningRate 0.0675 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:08,213-Speed 5610.30 samples/sec Loss 9.6893 LearningRate 0.0675 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:10,062-Speed 5541.10 samples/sec Loss 9.7069 LearningRate 0.0675 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:11,873-Speed 5659.20 samples/sec Loss 9.7192 LearningRate 0.0674 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:13,743-Speed 5479.18 samples/sec Loss 9.8718 LearningRate 0.0674 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:15,567-Speed 5616.85 samples/sec Loss 9.6326 LearningRate 0.0674 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:17,427-Speed 5509.17 samples/sec Loss 9.5998 LearningRate 0.0674 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:19,272-Speed 5550.76 samples/sec Loss 9.7258 LearningRate 0.0674 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:21,084-Speed 5656.12 samples/sec Loss 9.6384 LearningRate 0.0674 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:22,961-Speed 5458.57 samples/sec Loss 9.7463 LearningRate 0.0674 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:24,867-Speed 5545.78 samples/sec Loss 9.6689 LearningRate 0.0673 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:26,726-Speed 5511.81 samples/sec Loss 9.7866 LearningRate 0.0673 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:28,550-Speed 5614.76 samples/sec Loss 9.6107 LearningRate 0.0673 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:30,439-Speed 5422.28 samples/sec Loss 9.6911 LearningRate 0.0673 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:32,269-Speed 5598.88 samples/sec Loss 9.6558 LearningRate 0.0673 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:34,075-Speed 5674.13 samples/sec Loss 9.7653 LearningRate 0.0673 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:35,888-Speed 5652.03 samples/sec Loss 9.7052 LearningRate 0.0672 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:37,727-Speed 5571.38 samples/sec Loss 9.8009 LearningRate 0.0672 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:39,533-Speed 5672.51 samples/sec Loss 9.7974 LearningRate 0.0672 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:41,382-Speed 5586.89 samples/sec Loss 9.6860 LearningRate 0.0672 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:43,254-Speed 5473.47 samples/sec Loss 9.4467 LearningRate 0.0672 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:45,065-Speed 5657.05 samples/sec Loss 9.6698 LearningRate 0.0672 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:46,916-Speed 5535.37 samples/sec Loss 9.5147 LearningRate 0.0671 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:48,752-Speed 5578.78 samples/sec Loss 9.5890 LearningRate 0.0671 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:50,584-Speed 5591.98 samples/sec Loss 9.4861 LearningRate 0.0671 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:52,398-Speed 5648.02 samples/sec Loss 9.6374 LearningRate 0.0671 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:54,279-Speed 5652.61 samples/sec Loss 9.6432 LearningRate 0.0671 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:56,091-Speed 5653.90 samples/sec Loss 9.4563 LearningRate 0.0671 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:27:57,894-Speed 5681.87 samples/sec Loss 9.7239 LearningRate 0.0670 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:27:59,769-Speed 5463.26 samples/sec Loss 9.5770 LearningRate 0.0670 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:01,627-Speed 5512.30 samples/sec Loss 9.4883 LearningRate 0.0670 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:03,487-Speed 5580.12 samples/sec Loss 9.5406 LearningRate 0.0670 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:05,345-Speed 5514.55 samples/sec Loss 9.4174 LearningRate 0.0670 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:07,212-Speed 5489.45 samples/sec Loss 9.6331 LearningRate 0.0670 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:09,029-Speed 5637.75 samples/sec Loss 9.6496 LearningRate 0.0669 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:10,860-Speed 5596.53 samples/sec Loss 9.5902 LearningRate 0.0669 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:12,670-Speed 5661.53 samples/sec Loss 9.5513 LearningRate 0.0669 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:14,505-Speed 5584.47 samples/sec Loss 9.5787 LearningRate 0.0669 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:16,358-Speed 5527.36 samples/sec Loss 9.7610 LearningRate 0.0669 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:18,214-Speed 5521.03 samples/sec Loss 9.4972 LearningRate 0.0669 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:20,020-Speed 5673.76 samples/sec Loss 9.4514 LearningRate 0.0668 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:21,867-Speed 5546.40 samples/sec Loss 9.5261 LearningRate 0.0668 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:23,699-Speed 5621.18 samples/sec Loss 9.4197 LearningRate 0.0668 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:25,548-Speed 5540.92 samples/sec Loss 9.5033 LearningRate 0.0668 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:27,366-Speed 5634.83 samples/sec Loss 9.5490 LearningRate 0.0668 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:29,215-Speed 5542.77 samples/sec Loss 9.4302 LearningRate 0.0668 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:31,034-Speed 5632.35 samples/sec Loss 9.5304 LearningRate 0.0668 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:32,857-Speed 5618.50 samples/sec Loss 9.7842 LearningRate 0.0667 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:34,677-Speed 5642.40 samples/sec Loss 9.7479 LearningRate 0.0667 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:36,528-Speed 5535.19 samples/sec Loss 9.4723 LearningRate 0.0667 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:38,349-Speed 5624.23 samples/sec Loss 9.5053 LearningRate 0.0667 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:28:40,182-Speed 5588.29 samples/sec Loss 9.5242 LearningRate 0.0667 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:42,016-Speed 5676.55 samples/sec Loss 9.6632 LearningRate 0.0667 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:43,852-Speed 5580.89 samples/sec Loss 9.5933 LearningRate 0.0666 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:45,688-Speed 5579.50 samples/sec Loss 9.6054 LearningRate 0.0666 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:47,496-Speed 5667.83 samples/sec Loss 9.6974 LearningRate 0.0666 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:49,338-Speed 5560.41 samples/sec Loss 9.8450 LearningRate 0.0666 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:51,159-Speed 5627.75 samples/sec Loss 9.6836 LearningRate 0.0666 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:52,987-Speed 5601.76 samples/sec Loss 9.6801 LearningRate 0.0666 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:54,953-Speed 5460.63 samples/sec Loss 9.5873 LearningRate 0.0665 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:56,762-Speed 5668.79 samples/sec Loss 9.5545 LearningRate 0.0665 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:28:58,624-Speed 5500.95 samples/sec Loss 9.6199 LearningRate 0.0665 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:00,436-Speed 5652.35 samples/sec Loss 9.5251 LearningRate 0.0665 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:02,283-Speed 5547.31 samples/sec Loss 9.7057 LearningRate 0.0665 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:04,168-Speed 5435.82 samples/sec Loss 9.6076 LearningRate 0.0665 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:05,999-Speed 5597.08 samples/sec Loss 9.6317 LearningRate 0.0664 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:07,815-Speed 5640.49 samples/sec Loss 9.6349 LearningRate 0.0664 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:09,668-Speed 5530.73 samples/sec Loss 9.6587 LearningRate 0.0664 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:11,544-Speed 5470.93 samples/sec Loss 9.5304 LearningRate 0.0664 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:13,360-Speed 5641.01 samples/sec Loss 9.5182 LearningRate 0.0664 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:15,178-Speed 5635.29 samples/sec Loss 9.5942 LearningRate 0.0664 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:16,997-Speed 5629.97 samples/sec Loss 9.4230 LearningRate 0.0663 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:18,813-Speed 5643.93 samples/sec Loss 9.4904 LearningRate 0.0663 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:20,619-Speed 5671.18 samples/sec Loss 9.5394 LearningRate 0.0663 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:22,432-Speed 5650.05 samples/sec Loss 9.5614 LearningRate 0.0663 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:24,346-Speed 5486.24 samples/sec Loss 9.5302 LearningRate 0.0663 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:26,165-Speed 5631.99 samples/sec Loss 9.4425 LearningRate 0.0663 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:28,058-Speed 5411.41 samples/sec Loss 9.7237 LearningRate 0.0663 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:29,874-Speed 5641.20 samples/sec Loss 9.4600 LearningRate 0.0662 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:31,702-Speed 5603.59 samples/sec Loss 9.4349 LearningRate 0.0662 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:29:33,549-Speed 5548.14 samples/sec Loss 9.4748 LearningRate 0.0662 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:35,360-Speed 5656.24 samples/sec Loss 9.4543 LearningRate 0.0662 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:37,209-Speed 5539.69 samples/sec Loss 9.7226 LearningRate 0.0662 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:39,089-Speed 5450.54 samples/sec Loss 9.5602 LearningRate 0.0662 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:40,908-Speed 5633.20 samples/sec Loss 9.4871 LearningRate 0.0661 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:42,775-Speed 5486.05 samples/sec Loss 9.5794 LearningRate 0.0661 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:44,608-Speed 5589.07 samples/sec Loss 9.5220 LearningRate 0.0661 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:29:46,450-Speed 5562.68 samples/sec Loss 9.7158 LearningRate 0.0661 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:48,619-Speed 4722.59 samples/sec Loss 9.5813 LearningRate 0.0661 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:50,446-Speed 5607.50 samples/sec Loss 9.5326 LearningRate 0.0661 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:52,262-Speed 5640.48 samples/sec Loss 9.3677 LearningRate 0.0660 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:54,100-Speed 5589.39 samples/sec Loss 9.6395 LearningRate 0.0660 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:55,924-Speed 5623.04 samples/sec Loss 9.7113 LearningRate 0.0660 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:57,750-Speed 5608.88 samples/sec Loss 9.4972 LearningRate 0.0660 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:29:59,594-Speed 5554.76 samples/sec Loss 9.2905 LearningRate 0.0660 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:01,404-Speed 5659.11 samples/sec Loss 9.4224 LearningRate 0.0660 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:03,243-Speed 5573.09 samples/sec Loss 9.3748 LearningRate 0.0659 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:05,062-Speed 5639.31 samples/sec Loss 9.5497 LearningRate 0.0659 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:06,891-Speed 5600.46 samples/sec Loss 9.6033 LearningRate 0.0659 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:08,706-Speed 5645.73 samples/sec Loss 9.6268 LearningRate 0.0659 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:10,539-Speed 5587.17 samples/sec Loss 9.5015 LearningRate 0.0659 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:12,363-Speed 5618.08 samples/sec Loss 9.6052 LearningRate 0.0659 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:14,194-Speed 5594.64 samples/sec Loss 9.5093 LearningRate 0.0659 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:16,040-Speed 5546.45 samples/sec Loss 9.5260 LearningRate 0.0658 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:17,867-Speed 5607.86 samples/sec Loss 9.5322 LearningRate 0.0658 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:19,685-Speed 5634.27 samples/sec Loss 9.6057 LearningRate 0.0658 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:21,519-Speed 5585.56 samples/sec Loss 9.5298 LearningRate 0.0658 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:23,354-Speed 5582.63 samples/sec Loss 9.5603 LearningRate 0.0658 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:30:25,159-Speed 5677.14 samples/sec Loss 9.3503 LearningRate 0.0658 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:26,980-Speed 5624.64 samples/sec Loss 9.5665 LearningRate 0.0657 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:28,859-Speed 5451.74 samples/sec Loss 9.3690 LearningRate 0.0657 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:30,699-Speed 5567.53 samples/sec Loss 9.3624 LearningRate 0.0657 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:32,521-Speed 5621.03 samples/sec Loss 9.4120 LearningRate 0.0657 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:34,357-Speed 5582.53 samples/sec Loss 9.5617 LearningRate 0.0657 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:36,204-Speed 5545.56 samples/sec Loss 9.6025 LearningRate 0.0657 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:38,028-Speed 5614.01 samples/sec Loss 9.6203 LearningRate 0.0656 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:39,860-Speed 5593.20 samples/sec Loss 9.4069 LearningRate 0.0656 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:41,670-Speed 5659.90 samples/sec Loss 9.5118 LearningRate 0.0656 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:30:43,482-Speed 5652.27 samples/sec Loss 9.5425 LearningRate 0.0656 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:45,318-Speed 5580.48 samples/sec Loss 9.6254 LearningRate 0.0656 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:47,134-Speed 5640.33 samples/sec Loss 9.5292 LearningRate 0.0656 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:48,946-Speed 5652.60 samples/sec Loss 9.5187 LearningRate 0.0655 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:50,781-Speed 5582.84 samples/sec Loss 9.4885 LearningRate 0.0655 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:52,607-Speed 5611.17 samples/sec Loss 9.3042 LearningRate 0.0655 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:54,421-Speed 5646.50 samples/sec Loss 9.6058 LearningRate 0.0655 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:56,299-Speed 5454.13 samples/sec Loss 9.4508 LearningRate 0.0655 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:58,127-Speed 5605.68 samples/sec Loss 9.5499 LearningRate 0.0655 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:30:59,951-Speed 5615.01 samples/sec Loss 9.5728 LearningRate 0.0655 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:01,814-Speed 5499.69 samples/sec Loss 9.6744 LearningRate 0.0654 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:03,678-Speed 5496.54 samples/sec Loss 9.6133 LearningRate 0.0654 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:05,512-Speed 5585.66 samples/sec Loss 9.4522 LearningRate 0.0654 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:07,339-Speed 5607.01 samples/sec Loss 9.6713 LearningRate 0.0654 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:09,145-Speed 5670.75 samples/sec Loss 9.4486 LearningRate 0.0654 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:10,969-Speed 5615.85 samples/sec Loss 9.3320 LearningRate 0.0654 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:12,782-Speed 5650.24 samples/sec Loss 9.3997 LearningRate 0.0653 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:14,586-Speed 5679.45 samples/sec Loss 9.3838 LearningRate 0.0653 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:16,411-Speed 5613.04 samples/sec Loss 9.5607 LearningRate 0.0653 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:18,293-Speed 5442.15 samples/sec Loss 9.4846 LearningRate 0.0653 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:20,113-Speed 5626.97 samples/sec Loss 9.5313 LearningRate 0.0653 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:21,922-Speed 5662.95 samples/sec Loss 9.3742 LearningRate 0.0653 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:23,747-Speed 5615.32 samples/sec Loss 9.2798 LearningRate 0.0652 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:25,579-Speed 5591.91 samples/sec Loss 9.5113 LearningRate 0.0652 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:27,405-Speed 5609.87 samples/sec Loss 9.6070 LearningRate 0.0652 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:29,214-Speed 5660.92 samples/sec Loss 9.5072 LearningRate 0.0652 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:31,044-Speed 5599.21 samples/sec Loss 9.5412 LearningRate 0.0652 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:32,874-Speed 5595.30 samples/sec Loss 9.4831 LearningRate 0.0652 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:34,691-Speed 5640.79 samples/sec Loss 9.4864 LearningRate 0.0651 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:36,579-Speed 5425.48 samples/sec Loss 9.3843 LearningRate 0.0651 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:38,423-Speed 5554.41 samples/sec Loss 9.2462 LearningRate 0.0651 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:40,278-Speed 5522.14 samples/sec Loss 9.4994 LearningRate 0.0651 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:42,097-Speed 5634.54 samples/sec Loss 9.3434 LearningRate 0.0651 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:43,944-Speed 5545.55 samples/sec Loss 9.3679 LearningRate 0.0651 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:45,800-Speed 5517.07 samples/sec Loss 9.3636 LearningRate 0.0651 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:47,657-Speed 5518.51 samples/sec Loss 9.3930 LearningRate 0.0650 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:49,527-Speed 5476.71 samples/sec Loss 9.4179 LearningRate 0.0650 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:51,342-Speed 5644.36 samples/sec Loss 9.4288 LearningRate 0.0650 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:53,181-Speed 5571.47 samples/sec Loss 9.5508 LearningRate 0.0650 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:31:54,993-Speed 5652.67 samples/sec Loss 9.4262 LearningRate 0.0650 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:56,816-Speed 5617.84 samples/sec Loss 9.5121 LearningRate 0.0650 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:31:58,652-Speed 5578.82 samples/sec Loss 9.3885 LearningRate 0.0649 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:00,503-Speed 5536.46 samples/sec Loss 9.4445 LearningRate 0.0649 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:02,312-Speed 5661.76 samples/sec Loss 9.2871 LearningRate 0.0649 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:04,149-Speed 5577.27 samples/sec Loss 9.5117 LearningRate 0.0649 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:05,976-Speed 5608.31 samples/sec Loss 9.5957 LearningRate 0.0649 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:07,796-Speed 5626.91 samples/sec Loss 9.3552 LearningRate 0.0649 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:09,613-Speed 5636.46 samples/sec Loss 9.3700 LearningRate 0.0648 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:11,425-Speed 5653.80 samples/sec Loss 9.3905 LearningRate 0.0648 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:13,245-Speed 5630.15 samples/sec Loss 9.3503 LearningRate 0.0648 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:15,062-Speed 5637.25 samples/sec Loss 9.4171 LearningRate 0.0648 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:16,961-Speed 5394.62 samples/sec Loss 9.4291 LearningRate 0.0648 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:18,830-Speed 5479.02 samples/sec Loss 9.3575 LearningRate 0.0648 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:20,660-Speed 5597.78 samples/sec Loss 9.3912 LearningRate 0.0647 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:22,475-Speed 5643.89 samples/sec Loss 9.4830 LearningRate 0.0647 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:24,303-Speed 5606.24 samples/sec Loss 9.4905 LearningRate 0.0647 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:26,174-Speed 5474.24 samples/sec Loss 9.4197 LearningRate 0.0647 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:28,007-Speed 5586.35 samples/sec Loss 9.2783 LearningRate 0.0647 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:29,845-Speed 5574.37 samples/sec Loss 9.3175 LearningRate 0.0647 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:31,669-Speed 5614.53 samples/sec Loss 9.6238 LearningRate 0.0647 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:33,493-Speed 5618.89 samples/sec Loss 9.3811 LearningRate 0.0646 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:35,304-Speed 5656.12 samples/sec Loss 9.3259 LearningRate 0.0646 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:37,119-Speed 5643.37 samples/sec Loss 9.2479 LearningRate 0.0646 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:38,942-Speed 5618.81 samples/sec Loss 9.3732 LearningRate 0.0646 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:40,764-Speed 5620.95 samples/sec Loss 9.2802 LearningRate 0.0646 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:42,599-Speed 5585.21 samples/sec Loss 9.3868 LearningRate 0.0646 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:44,414-Speed 5643.35 samples/sec Loss 9.4954 LearningRate 0.0645 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:46,228-Speed 5646.90 samples/sec Loss 9.2656 LearningRate 0.0645 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:48,051-Speed 5618.04 samples/sec Loss 9.3758 LearningRate 0.0645 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:49,875-Speed 5618.41 samples/sec Loss 9.4203 LearningRate 0.0645 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:51,709-Speed 5584.80 samples/sec Loss 9.4560 LearningRate 0.0645 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:32:53,535-Speed 5607.64 samples/sec Loss 9.2285 LearningRate 0.0645 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:55,372-Speed 5578.80 samples/sec Loss 9.2507 LearningRate 0.0644 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:57,204-Speed 5591.60 samples/sec Loss 9.3670 LearningRate 0.0644 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:32:59,052-Speed 5542.65 samples/sec Loss 9.2116 LearningRate 0.0644 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:33:00,877-Speed 5614.08 samples/sec Loss 9.4806 LearningRate 0.0644 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:33:02,739-Speed 5500.27 samples/sec Loss 9.5349 LearningRate 0.0644 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:33:04,565-Speed 5610.50 samples/sec Loss 9.3772 LearningRate 0.0644 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:33:31,994-[lfw][20000]XNorm: 21.186255 Training: 2022-04-11 11:33:31,995-[lfw][20000]Accuracy-Flip: 0.99633+-0.00277 Training: 2022-04-11 11:33:31,996-[lfw][20000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:34:03,204-[cfp_fp][20000]XNorm: 18.244188 Training: 2022-04-11 11:34:03,206-[cfp_fp][20000]Accuracy-Flip: 0.95486+-0.00836 Training: 2022-04-11 11:34:03,206-[cfp_fp][20000]Accuracy-Highest: 0.95486 Training: 2022-04-11 11:34:30,163-[agedb_30][20000]XNorm: 20.944213 Training: 2022-04-11 11:34:30,164-[agedb_30][20000]Accuracy-Flip: 0.97033+-0.00894 Training: 2022-04-11 11:34:30,165-[agedb_30][20000]Accuracy-Highest: 0.97033 Training: 2022-04-11 11:34:32,041-Speed 117.06 samples/sec Loss 9.3294 LearningRate 0.0644 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:33,897-Speed 5520.48 samples/sec Loss 9.2522 LearningRate 0.0643 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:35,776-Speed 5454.02 samples/sec Loss 9.4319 LearningRate 0.0643 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:37,617-Speed 5563.96 samples/sec Loss 9.4107 LearningRate 0.0643 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:39,466-Speed 5542.40 samples/sec Loss 9.2184 LearningRate 0.0643 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:41,319-Speed 5529.57 samples/sec Loss 9.4813 LearningRate 0.0643 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:43,136-Speed 5639.95 samples/sec Loss 9.4274 LearningRate 0.0643 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:44,961-Speed 5613.88 samples/sec Loss 9.3704 LearningRate 0.0642 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:46,799-Speed 5576.29 samples/sec Loss 9.3525 LearningRate 0.0642 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:48,631-Speed 5593.13 samples/sec Loss 9.2881 LearningRate 0.0642 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:50,535-Speed 5379.40 samples/sec Loss 9.3532 LearningRate 0.0642 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:52,363-Speed 5606.22 samples/sec Loss 9.2000 LearningRate 0.0642 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:54,221-Speed 5515.77 samples/sec Loss 9.4689 LearningRate 0.0642 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:56,065-Speed 5553.84 samples/sec Loss 9.3001 LearningRate 0.0641 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:57,894-Speed 5602.72 samples/sec Loss 9.4400 LearningRate 0.0641 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:34:59,708-Speed 5647.42 samples/sec Loss 9.3003 LearningRate 0.0641 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:01,595-Speed 5433.17 samples/sec Loss 9.4453 LearningRate 0.0641 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:03,419-Speed 5616.19 samples/sec Loss 9.5310 LearningRate 0.0641 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:05,264-Speed 5553.54 samples/sec Loss 9.3606 LearningRate 0.0641 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:07,141-Speed 5458.65 samples/sec Loss 9.3868 LearningRate 0.0641 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:09,022-Speed 5446.83 samples/sec Loss 9.2697 LearningRate 0.0640 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:10,839-Speed 5637.97 samples/sec Loss 9.4822 LearningRate 0.0640 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:12,765-Speed 5319.65 samples/sec Loss 9.2583 LearningRate 0.0640 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:24,624-Speed 863.58 samples/sec Loss 8.6482 LearningRate 0.0640 Epoch: 4 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:26,481-Speed 5519.76 samples/sec Loss 8.5766 LearningRate 0.0640 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:28,343-Speed 5503.30 samples/sec Loss 8.4664 LearningRate 0.0640 Epoch: 4 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:30,170-Speed 5608.29 samples/sec Loss 8.7219 LearningRate 0.0639 Epoch: 4 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:31,994-Speed 5615.43 samples/sec Loss 8.5181 LearningRate 0.0639 Epoch: 4 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:33,802-Speed 5665.02 samples/sec Loss 8.4967 LearningRate 0.0639 Epoch: 4 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:35,628-Speed 5612.21 samples/sec Loss 8.4135 LearningRate 0.0639 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:37,472-Speed 5555.10 samples/sec Loss 8.4607 LearningRate 0.0639 Epoch: 4 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:39,309-Speed 5576.15 samples/sec Loss 8.6387 LearningRate 0.0639 Epoch: 4 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:41,144-Speed 5585.47 samples/sec Loss 8.7072 LearningRate 0.0638 Epoch: 4 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:42,992-Speed 5546.02 samples/sec Loss 8.5758 LearningRate 0.0638 Epoch: 4 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:44,821-Speed 5598.04 samples/sec Loss 8.6679 LearningRate 0.0638 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:46,660-Speed 5571.84 samples/sec Loss 8.7407 LearningRate 0.0638 Epoch: 4 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:48,510-Speed 5537.88 samples/sec Loss 8.7627 LearningRate 0.0638 Epoch: 4 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:50,347-Speed 5578.24 samples/sec Loss 8.6404 LearningRate 0.0638 Epoch: 4 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:52,199-Speed 5531.92 samples/sec Loss 8.7708 LearningRate 0.0638 Epoch: 4 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:35:54,040-Speed 5561.76 samples/sec Loss 8.6206 LearningRate 0.0637 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:55,872-Speed 5596.00 samples/sec Loss 8.6880 LearningRate 0.0637 Epoch: 4 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:57,697-Speed 5611.65 samples/sec Loss 8.7636 LearningRate 0.0637 Epoch: 4 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:35:59,569-Speed 5472.47 samples/sec Loss 8.8118 LearningRate 0.0637 Epoch: 4 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:01,391-Speed 5623.18 samples/sec Loss 8.7410 LearningRate 0.0637 Epoch: 4 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:03,220-Speed 5601.35 samples/sec Loss 8.7259 LearningRate 0.0637 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:05,033-Speed 5651.05 samples/sec Loss 8.6106 LearningRate 0.0636 Epoch: 4 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:06,853-Speed 5631.32 samples/sec Loss 8.6570 LearningRate 0.0636 Epoch: 4 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:08,663-Speed 5659.37 samples/sec Loss 8.7595 LearningRate 0.0636 Epoch: 4 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:10,528-Speed 5493.99 samples/sec Loss 8.8462 LearningRate 0.0636 Epoch: 4 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:12,349-Speed 5627.11 samples/sec Loss 8.6175 LearningRate 0.0636 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:14,193-Speed 5554.39 samples/sec Loss 8.8099 LearningRate 0.0636 Epoch: 4 Global Step: 20510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:16,006-Speed 5652.08 samples/sec Loss 8.7622 LearningRate 0.0635 Epoch: 4 Global Step: 20520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:17,831-Speed 5612.49 samples/sec Loss 8.9073 LearningRate 0.0635 Epoch: 4 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:19,643-Speed 5655.80 samples/sec Loss 8.8464 LearningRate 0.0635 Epoch: 4 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:21,498-Speed 5521.85 samples/sec Loss 8.7543 LearningRate 0.0635 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:23,325-Speed 5606.66 samples/sec Loss 8.8590 LearningRate 0.0635 Epoch: 4 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:25,187-Speed 5500.89 samples/sec Loss 8.9266 LearningRate 0.0635 Epoch: 4 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:27,013-Speed 5612.89 samples/sec Loss 9.0738 LearningRate 0.0635 Epoch: 4 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:28,825-Speed 5654.90 samples/sec Loss 8.9077 LearningRate 0.0634 Epoch: 4 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:30,626-Speed 5688.91 samples/sec Loss 8.8913 LearningRate 0.0634 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:36:32,464-Speed 5570.81 samples/sec Loss 8.9797 LearningRate 0.0634 Epoch: 4 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:34,313-Speed 5541.03 samples/sec Loss 8.9939 LearningRate 0.0634 Epoch: 4 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:36,137-Speed 5618.97 samples/sec Loss 8.8119 LearningRate 0.0634 Epoch: 4 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:37,969-Speed 5590.64 samples/sec Loss 8.8743 LearningRate 0.0634 Epoch: 4 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:39,800-Speed 5595.50 samples/sec Loss 8.9089 LearningRate 0.0633 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:41,631-Speed 5596.09 samples/sec Loss 8.6811 LearningRate 0.0633 Epoch: 4 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:43,478-Speed 5546.61 samples/sec Loss 9.0246 LearningRate 0.0633 Epoch: 4 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:45,295-Speed 5638.39 samples/sec Loss 9.0654 LearningRate 0.0633 Epoch: 4 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:47,126-Speed 5595.24 samples/sec Loss 9.0391 LearningRate 0.0633 Epoch: 4 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:48,998-Speed 5473.49 samples/sec Loss 9.0327 LearningRate 0.0633 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:50,799-Speed 5686.34 samples/sec Loss 8.9276 LearningRate 0.0632 Epoch: 4 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:52,646-Speed 5546.49 samples/sec Loss 8.9945 LearningRate 0.0632 Epoch: 4 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:54,478-Speed 5594.47 samples/sec Loss 8.8051 LearningRate 0.0632 Epoch: 4 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:56,304-Speed 5612.05 samples/sec Loss 9.1188 LearningRate 0.0632 Epoch: 4 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:58,165-Speed 5504.42 samples/sec Loss 8.9152 LearningRate 0.0632 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:36:59,995-Speed 5598.47 samples/sec Loss 8.9878 LearningRate 0.0632 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:01,831-Speed 5578.13 samples/sec Loss 8.9650 LearningRate 0.0632 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:03,679-Speed 5543.85 samples/sec Loss 9.0670 LearningRate 0.0631 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:05,505-Speed 5613.09 samples/sec Loss 8.9290 LearningRate 0.0631 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:07,349-Speed 5556.94 samples/sec Loss 8.9355 LearningRate 0.0631 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:09,166-Speed 5638.61 samples/sec Loss 9.0107 LearningRate 0.0631 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:11,027-Speed 5507.82 samples/sec Loss 8.8538 LearningRate 0.0631 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:12,853-Speed 5611.77 samples/sec Loss 8.9198 LearningRate 0.0631 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:14,702-Speed 5541.18 samples/sec Loss 8.9219 LearningRate 0.0630 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:16,522-Speed 5629.45 samples/sec Loss 8.9512 LearningRate 0.0630 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:18,387-Speed 5493.27 samples/sec Loss 9.0284 LearningRate 0.0630 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:20,231-Speed 5555.11 samples/sec Loss 9.1492 LearningRate 0.0630 Epoch: 4 Global Step: 20870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:22,083-Speed 5532.49 samples/sec Loss 9.0794 LearningRate 0.0630 Epoch: 4 Global Step: 20880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:23,929-Speed 5548.03 samples/sec Loss 9.0104 LearningRate 0.0630 Epoch: 4 Global Step: 20890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:25,751-Speed 5626.14 samples/sec Loss 9.0634 LearningRate 0.0629 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:27,582-Speed 5596.84 samples/sec Loss 8.9186 LearningRate 0.0629 Epoch: 4 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:29,426-Speed 5555.54 samples/sec Loss 8.8288 LearningRate 0.0629 Epoch: 4 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:31,256-Speed 5598.07 samples/sec Loss 9.0643 LearningRate 0.0629 Epoch: 4 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:33,095-Speed 5570.02 samples/sec Loss 9.1214 LearningRate 0.0629 Epoch: 4 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:34,927-Speed 5592.90 samples/sec Loss 8.9265 LearningRate 0.0629 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:36,750-Speed 5622.28 samples/sec Loss 8.9585 LearningRate 0.0629 Epoch: 4 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:38,571-Speed 5623.55 samples/sec Loss 9.2076 LearningRate 0.0628 Epoch: 4 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:37:40,374-Speed 5681.00 samples/sec Loss 8.9342 LearningRate 0.0628 Epoch: 4 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:42,215-Speed 5565.54 samples/sec Loss 9.0224 LearningRate 0.0628 Epoch: 4 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:44,042-Speed 5608.39 samples/sec Loss 8.9401 LearningRate 0.0628 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:45,866-Speed 5615.34 samples/sec Loss 9.0417 LearningRate 0.0628 Epoch: 4 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:47,707-Speed 5565.61 samples/sec Loss 9.0081 LearningRate 0.0628 Epoch: 4 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:49,543-Speed 5578.68 samples/sec Loss 9.0729 LearningRate 0.0627 Epoch: 4 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:51,363-Speed 5630.87 samples/sec Loss 9.0329 LearningRate 0.0627 Epoch: 4 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:53,189-Speed 5608.71 samples/sec Loss 9.1826 LearningRate 0.0627 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:55,006-Speed 5638.52 samples/sec Loss 9.0234 LearningRate 0.0627 Epoch: 4 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:56,821-Speed 5643.21 samples/sec Loss 8.9388 LearningRate 0.0627 Epoch: 4 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:37:58,648-Speed 5609.83 samples/sec Loss 9.1965 LearningRate 0.0627 Epoch: 4 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:38:00,480-Speed 5592.97 samples/sec Loss 8.9683 LearningRate 0.0627 Epoch: 4 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:02,308-Speed 5602.14 samples/sec Loss 9.2176 LearningRate 0.0626 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:04,158-Speed 5540.59 samples/sec Loss 9.0106 LearningRate 0.0626 Epoch: 4 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:05,994-Speed 5579.59 samples/sec Loss 9.1290 LearningRate 0.0626 Epoch: 4 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:07,822-Speed 5606.95 samples/sec Loss 9.0068 LearningRate 0.0626 Epoch: 4 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:09,653-Speed 5594.91 samples/sec Loss 9.0173 LearningRate 0.0626 Epoch: 4 Global Step: 21140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:11,497-Speed 5555.23 samples/sec Loss 9.1656 LearningRate 0.0626 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:13,332-Speed 5584.32 samples/sec Loss 9.1176 LearningRate 0.0625 Epoch: 4 Global Step: 21160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:15,161-Speed 5600.74 samples/sec Loss 9.1196 LearningRate 0.0625 Epoch: 4 Global Step: 21170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:16,987-Speed 5610.63 samples/sec Loss 9.0190 LearningRate 0.0625 Epoch: 4 Global Step: 21180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:18,829-Speed 5565.20 samples/sec Loss 9.2860 LearningRate 0.0625 Epoch: 4 Global Step: 21190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:20,670-Speed 5563.60 samples/sec Loss 9.0494 LearningRate 0.0625 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:22,490-Speed 5628.91 samples/sec Loss 9.1284 LearningRate 0.0625 Epoch: 4 Global Step: 21210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:24,317-Speed 5610.46 samples/sec Loss 9.0635 LearningRate 0.0624 Epoch: 4 Global Step: 21220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:26,155-Speed 5574.34 samples/sec Loss 9.0335 LearningRate 0.0624 Epoch: 4 Global Step: 21230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:38:28,009-Speed 5527.78 samples/sec Loss 9.2640 LearningRate 0.0624 Epoch: 4 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:29,834-Speed 5611.02 samples/sec Loss 9.0977 LearningRate 0.0624 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:31,675-Speed 5569.09 samples/sec Loss 9.1753 LearningRate 0.0624 Epoch: 4 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:33,512-Speed 5576.55 samples/sec Loss 9.0162 LearningRate 0.0624 Epoch: 4 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:35,357-Speed 5553.15 samples/sec Loss 9.0695 LearningRate 0.0624 Epoch: 4 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:37,187-Speed 5598.78 samples/sec Loss 8.9281 LearningRate 0.0623 Epoch: 4 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:39,034-Speed 5547.48 samples/sec Loss 9.0216 LearningRate 0.0623 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:40,906-Speed 5474.00 samples/sec Loss 9.0483 LearningRate 0.0623 Epoch: 4 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:42,732-Speed 5610.72 samples/sec Loss 9.2169 LearningRate 0.0623 Epoch: 4 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:44,578-Speed 5548.08 samples/sec Loss 9.1596 LearningRate 0.0623 Epoch: 4 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:46,421-Speed 5561.06 samples/sec Loss 9.1560 LearningRate 0.0623 Epoch: 4 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:38:48,246-Speed 5613.93 samples/sec Loss 9.1570 LearningRate 0.0622 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:38:50,104-Speed 5515.70 samples/sec Loss 9.0884 LearningRate 0.0622 Epoch: 4 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:38:51,917-Speed 5651.80 samples/sec Loss 8.9653 LearningRate 0.0622 Epoch: 4 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:53,744-Speed 5608.01 samples/sec Loss 9.1094 LearningRate 0.0622 Epoch: 4 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:55,618-Speed 5465.78 samples/sec Loss 9.1228 LearningRate 0.0622 Epoch: 4 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:57,443-Speed 5615.46 samples/sec Loss 9.0272 LearningRate 0.0622 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:38:59,284-Speed 5562.48 samples/sec Loss 9.0606 LearningRate 0.0622 Epoch: 4 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:01,148-Speed 5498.86 samples/sec Loss 9.2062 LearningRate 0.0621 Epoch: 4 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:03,006-Speed 5514.17 samples/sec Loss 8.9079 LearningRate 0.0621 Epoch: 4 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:04,869-Speed 5500.09 samples/sec Loss 8.9460 LearningRate 0.0621 Epoch: 4 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:06,695-Speed 5611.46 samples/sec Loss 9.2022 LearningRate 0.0621 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:08,539-Speed 5558.08 samples/sec Loss 9.1706 LearningRate 0.0621 Epoch: 4 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:10,380-Speed 5563.58 samples/sec Loss 9.1807 LearningRate 0.0621 Epoch: 4 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:12,241-Speed 5507.82 samples/sec Loss 8.9765 LearningRate 0.0620 Epoch: 4 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:14,082-Speed 5563.91 samples/sec Loss 8.9541 LearningRate 0.0620 Epoch: 4 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:15,907-Speed 5615.20 samples/sec Loss 9.2012 LearningRate 0.0620 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:17,739-Speed 5589.96 samples/sec Loss 9.1648 LearningRate 0.0620 Epoch: 4 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:19,578-Speed 5573.43 samples/sec Loss 9.0589 LearningRate 0.0620 Epoch: 4 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:21,405-Speed 5606.93 samples/sec Loss 9.0309 LearningRate 0.0620 Epoch: 4 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:23,255-Speed 5538.31 samples/sec Loss 8.8048 LearningRate 0.0619 Epoch: 4 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:25,132-Speed 5459.95 samples/sec Loss 9.1259 LearningRate 0.0619 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:26,990-Speed 5512.94 samples/sec Loss 9.1696 LearningRate 0.0619 Epoch: 4 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:28,824-Speed 5589.50 samples/sec Loss 9.2847 LearningRate 0.0619 Epoch: 4 Global Step: 21570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:30,673-Speed 5538.91 samples/sec Loss 9.1107 LearningRate 0.0619 Epoch: 4 Global Step: 21580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:32,507-Speed 5586.95 samples/sec Loss 8.9733 LearningRate 0.0619 Epoch: 4 Global Step: 21590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:34,360-Speed 5529.33 samples/sec Loss 9.1503 LearningRate 0.0619 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:36,185-Speed 5613.44 samples/sec Loss 8.9587 LearningRate 0.0618 Epoch: 4 Global Step: 21610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:38,006-Speed 5624.33 samples/sec Loss 8.9859 LearningRate 0.0618 Epoch: 4 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:39,852-Speed 5550.49 samples/sec Loss 9.0532 LearningRate 0.0618 Epoch: 4 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:41,664-Speed 5656.83 samples/sec Loss 8.9689 LearningRate 0.0618 Epoch: 4 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:43,483-Speed 5632.10 samples/sec Loss 8.9926 LearningRate 0.0618 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:45,341-Speed 5513.05 samples/sec Loss 9.0233 LearningRate 0.0618 Epoch: 4 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:39:47,174-Speed 5588.30 samples/sec Loss 9.1168 LearningRate 0.0617 Epoch: 4 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:49,035-Speed 5507.07 samples/sec Loss 9.1135 LearningRate 0.0617 Epoch: 4 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:50,879-Speed 5555.33 samples/sec Loss 9.0946 LearningRate 0.0617 Epoch: 4 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:52,718-Speed 5572.16 samples/sec Loss 9.0539 LearningRate 0.0617 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:54,582-Speed 5496.32 samples/sec Loss 9.1149 LearningRate 0.0617 Epoch: 4 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:56,403-Speed 5625.03 samples/sec Loss 9.0266 LearningRate 0.0617 Epoch: 4 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:39:58,225-Speed 5626.20 samples/sec Loss 9.1609 LearningRate 0.0617 Epoch: 4 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:00,077-Speed 5529.53 samples/sec Loss 9.0236 LearningRate 0.0616 Epoch: 4 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:01,945-Speed 5491.09 samples/sec Loss 8.9622 LearningRate 0.0616 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:03,800-Speed 5522.36 samples/sec Loss 9.1783 LearningRate 0.0616 Epoch: 4 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:05,663-Speed 5497.98 samples/sec Loss 9.1724 LearningRate 0.0616 Epoch: 4 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:07,523-Speed 5509.32 samples/sec Loss 9.0428 LearningRate 0.0616 Epoch: 4 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:09,362-Speed 5569.94 samples/sec Loss 8.9968 LearningRate 0.0616 Epoch: 4 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:11,191-Speed 5604.49 samples/sec Loss 9.0394 LearningRate 0.0615 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:13,018-Speed 5610.29 samples/sec Loss 9.1294 LearningRate 0.0615 Epoch: 4 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:14,862-Speed 5556.87 samples/sec Loss 9.0006 LearningRate 0.0615 Epoch: 4 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:16,708-Speed 5550.03 samples/sec Loss 9.2050 LearningRate 0.0615 Epoch: 4 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:18,549-Speed 5566.32 samples/sec Loss 9.0071 LearningRate 0.0615 Epoch: 4 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:20,373-Speed 5615.64 samples/sec Loss 8.9448 LearningRate 0.0615 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:22,205-Speed 5593.28 samples/sec Loss 9.0794 LearningRate 0.0615 Epoch: 4 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:24,058-Speed 5529.35 samples/sec Loss 9.1400 LearningRate 0.0614 Epoch: 4 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:25,885-Speed 5606.37 samples/sec Loss 9.1576 LearningRate 0.0614 Epoch: 4 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:27,714-Speed 5600.37 samples/sec Loss 8.9935 LearningRate 0.0614 Epoch: 4 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:29,552-Speed 5574.58 samples/sec Loss 8.9498 LearningRate 0.0614 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:31,388-Speed 5580.22 samples/sec Loss 9.0718 LearningRate 0.0614 Epoch: 4 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:33,218-Speed 5597.92 samples/sec Loss 9.0503 LearningRate 0.0614 Epoch: 4 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:40:35,039-Speed 5626.53 samples/sec Loss 9.1201 LearningRate 0.0613 Epoch: 4 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:36,863-Speed 5615.75 samples/sec Loss 9.1180 LearningRate 0.0613 Epoch: 4 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:38,696-Speed 5589.91 samples/sec Loss 8.9405 LearningRate 0.0613 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:40,539-Speed 5559.04 samples/sec Loss 9.1246 LearningRate 0.0613 Epoch: 4 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:42,372-Speed 5591.28 samples/sec Loss 8.9822 LearningRate 0.0613 Epoch: 4 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:44,216-Speed 5553.34 samples/sec Loss 9.0481 LearningRate 0.0613 Epoch: 4 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:46,049-Speed 5589.31 samples/sec Loss 9.1369 LearningRate 0.0612 Epoch: 4 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:40:47,905-Speed 5519.96 samples/sec Loss 8.9961 LearningRate 0.0612 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:41:15,282-[lfw][22000]XNorm: 22.424608 Training: 2022-04-11 11:41:15,282-[lfw][22000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 11:41:15,283-[lfw][22000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:41:46,889-[cfp_fp][22000]XNorm: 19.330436 Training: 2022-04-11 11:41:46,890-[cfp_fp][22000]Accuracy-Flip: 0.94543+-0.01160 Training: 2022-04-11 11:41:46,891-[cfp_fp][22000]Accuracy-Highest: 0.95486 Training: 2022-04-11 11:42:14,177-[agedb_30][22000]XNorm: 22.090508 Training: 2022-04-11 11:42:14,177-[agedb_30][22000]Accuracy-Flip: 0.96850+-0.01029 Training: 2022-04-11 11:42:14,178-[agedb_30][22000]Accuracy-Highest: 0.97033 Training: 2022-04-11 11:42:16,034-Speed 116.19 samples/sec Loss 9.1488 LearningRate 0.0612 Epoch: 4 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:17,873-Speed 5570.65 samples/sec Loss 9.0137 LearningRate 0.0612 Epoch: 4 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:19,693-Speed 5629.61 samples/sec Loss 9.0522 LearningRate 0.0612 Epoch: 4 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:21,538-Speed 5555.42 samples/sec Loss 9.1663 LearningRate 0.0612 Epoch: 4 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:23,385-Speed 5546.18 samples/sec Loss 9.0663 LearningRate 0.0612 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:25,235-Speed 5539.77 samples/sec Loss 9.0132 LearningRate 0.0611 Epoch: 4 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:27,063-Speed 5603.30 samples/sec Loss 9.0077 LearningRate 0.0611 Epoch: 4 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:28,962-Speed 5394.59 samples/sec Loss 9.1119 LearningRate 0.0611 Epoch: 4 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:30,804-Speed 5561.79 samples/sec Loss 9.2437 LearningRate 0.0611 Epoch: 4 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:32,658-Speed 5527.03 samples/sec Loss 9.0369 LearningRate 0.0611 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:34,492-Speed 5585.76 samples/sec Loss 9.0595 LearningRate 0.0611 Epoch: 4 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:36,339-Speed 5547.98 samples/sec Loss 9.0965 LearningRate 0.0610 Epoch: 4 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:38,194-Speed 5524.56 samples/sec Loss 9.0976 LearningRate 0.0610 Epoch: 4 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:40,044-Speed 5535.38 samples/sec Loss 9.0409 LearningRate 0.0610 Epoch: 4 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:41,873-Speed 5602.61 samples/sec Loss 8.8690 LearningRate 0.0610 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:43,713-Speed 5570.51 samples/sec Loss 9.1478 LearningRate 0.0610 Epoch: 4 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:42:45,542-Speed 5600.46 samples/sec Loss 8.9875 LearningRate 0.0610 Epoch: 4 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:47,362-Speed 5629.67 samples/sec Loss 9.1308 LearningRate 0.0610 Epoch: 4 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:49,221-Speed 5510.21 samples/sec Loss 8.9853 LearningRate 0.0609 Epoch: 4 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:51,051-Speed 5597.69 samples/sec Loss 9.1591 LearningRate 0.0609 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:52,892-Speed 5564.33 samples/sec Loss 9.1435 LearningRate 0.0609 Epoch: 4 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:54,741-Speed 5541.49 samples/sec Loss 8.8779 LearningRate 0.0609 Epoch: 4 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:56,593-Speed 5531.83 samples/sec Loss 9.2203 LearningRate 0.0609 Epoch: 4 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:42:58,415-Speed 5622.36 samples/sec Loss 8.9296 LearningRate 0.0609 Epoch: 4 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:00,254-Speed 5571.63 samples/sec Loss 8.9042 LearningRate 0.0608 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:02,124-Speed 5481.15 samples/sec Loss 8.8208 LearningRate 0.0608 Epoch: 4 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:03,991-Speed 5485.50 samples/sec Loss 9.0956 LearningRate 0.0608 Epoch: 4 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:05,832-Speed 5564.89 samples/sec Loss 8.9704 LearningRate 0.0608 Epoch: 4 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:07,672-Speed 5567.90 samples/sec Loss 9.0060 LearningRate 0.0608 Epoch: 4 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:09,510-Speed 5574.13 samples/sec Loss 9.1643 LearningRate 0.0608 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:11,349-Speed 5571.35 samples/sec Loss 8.9860 LearningRate 0.0608 Epoch: 4 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:13,187-Speed 5575.02 samples/sec Loss 9.0114 LearningRate 0.0607 Epoch: 4 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:15,014-Speed 5605.78 samples/sec Loss 9.0746 LearningRate 0.0607 Epoch: 4 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:16,839-Speed 5614.35 samples/sec Loss 8.9978 LearningRate 0.0607 Epoch: 4 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:18,667-Speed 5608.02 samples/sec Loss 9.0279 LearningRate 0.0607 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:20,513-Speed 5547.12 samples/sec Loss 9.0785 LearningRate 0.0607 Epoch: 4 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:22,355-Speed 5562.51 samples/sec Loss 9.0155 LearningRate 0.0607 Epoch: 4 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:24,187-Speed 5592.34 samples/sec Loss 8.9030 LearningRate 0.0606 Epoch: 4 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:26,009-Speed 5622.23 samples/sec Loss 9.1562 LearningRate 0.0606 Epoch: 4 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:27,856-Speed 5543.29 samples/sec Loss 8.9540 LearningRate 0.0606 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:29,706-Speed 5537.03 samples/sec Loss 9.0604 LearningRate 0.0606 Epoch: 4 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:31,535-Speed 5604.38 samples/sec Loss 8.9574 LearningRate 0.0606 Epoch: 4 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:33,357-Speed 5621.90 samples/sec Loss 8.9831 LearningRate 0.0606 Epoch: 4 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:35,172-Speed 5643.34 samples/sec Loss 9.1597 LearningRate 0.0606 Epoch: 4 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:37,015-Speed 5559.70 samples/sec Loss 8.8150 LearningRate 0.0605 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:38,855-Speed 5569.03 samples/sec Loss 9.1738 LearningRate 0.0605 Epoch: 4 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:40,709-Speed 5524.75 samples/sec Loss 9.1477 LearningRate 0.0605 Epoch: 4 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:42,537-Speed 5609.29 samples/sec Loss 9.2229 LearningRate 0.0605 Epoch: 4 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:44,388-Speed 5533.75 samples/sec Loss 9.1641 LearningRate 0.0605 Epoch: 4 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:46,252-Speed 5531.26 samples/sec Loss 9.0189 LearningRate 0.0605 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:48,076-Speed 5617.99 samples/sec Loss 8.9965 LearningRate 0.0604 Epoch: 4 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:49,938-Speed 5503.34 samples/sec Loss 9.0372 LearningRate 0.0604 Epoch: 4 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:52,857-Speed 3508.84 samples/sec Loss 9.0345 LearningRate 0.0604 Epoch: 4 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:43:54,692-Speed 5582.05 samples/sec Loss 8.9961 LearningRate 0.0604 Epoch: 4 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:56,523-Speed 5597.50 samples/sec Loss 8.9718 LearningRate 0.0604 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:43:58,345-Speed 5620.61 samples/sec Loss 8.9974 LearningRate 0.0604 Epoch: 4 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:00,177-Speed 5591.83 samples/sec Loss 8.9779 LearningRate 0.0604 Epoch: 4 Global Step: 22570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:02,008-Speed 5596.85 samples/sec Loss 9.0048 LearningRate 0.0603 Epoch: 4 Global Step: 22580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:03,841-Speed 5588.34 samples/sec Loss 8.9004 LearningRate 0.0603 Epoch: 4 Global Step: 22590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:05,682-Speed 5562.91 samples/sec Loss 9.0147 LearningRate 0.0603 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:07,520-Speed 5574.83 samples/sec Loss 9.0589 LearningRate 0.0603 Epoch: 4 Global Step: 22610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:09,350-Speed 5598.57 samples/sec Loss 8.8839 LearningRate 0.0603 Epoch: 4 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:11,182-Speed 5592.24 samples/sec Loss 9.1251 LearningRate 0.0603 Epoch: 4 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:13,074-Speed 5414.59 samples/sec Loss 9.0501 LearningRate 0.0602 Epoch: 4 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:14,914-Speed 5567.94 samples/sec Loss 9.0388 LearningRate 0.0602 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:16,775-Speed 5509.84 samples/sec Loss 9.0445 LearningRate 0.0602 Epoch: 4 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:18,601-Speed 5610.30 samples/sec Loss 9.1455 LearningRate 0.0602 Epoch: 4 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:20,460-Speed 5510.47 samples/sec Loss 9.0248 LearningRate 0.0602 Epoch: 4 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:22,302-Speed 5563.14 samples/sec Loss 8.9169 LearningRate 0.0602 Epoch: 4 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:24,126-Speed 5615.96 samples/sec Loss 8.9059 LearningRate 0.0602 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:25,966-Speed 5568.93 samples/sec Loss 9.0197 LearningRate 0.0601 Epoch: 4 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:27,794-Speed 5603.80 samples/sec Loss 9.0123 LearningRate 0.0601 Epoch: 4 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:29,633-Speed 5572.26 samples/sec Loss 9.0942 LearningRate 0.0601 Epoch: 4 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:31,473-Speed 5566.77 samples/sec Loss 9.0907 LearningRate 0.0601 Epoch: 4 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:33,308-Speed 5582.91 samples/sec Loss 8.9583 LearningRate 0.0601 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:35,160-Speed 5532.15 samples/sec Loss 8.9972 LearningRate 0.0601 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:37,012-Speed 5531.42 samples/sec Loss 8.9135 LearningRate 0.0600 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:38,841-Speed 5601.83 samples/sec Loss 9.0594 LearningRate 0.0600 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:40,680-Speed 5570.90 samples/sec Loss 9.0284 LearningRate 0.0600 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:42,535-Speed 5521.73 samples/sec Loss 9.0077 LearningRate 0.0600 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:44,370-Speed 5583.77 samples/sec Loss 8.9549 LearningRate 0.0600 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:46,204-Speed 5586.38 samples/sec Loss 8.9230 LearningRate 0.0600 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:48,096-Speed 5416.16 samples/sec Loss 8.9673 LearningRate 0.0600 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:49,960-Speed 5495.07 samples/sec Loss 8.9314 LearningRate 0.0599 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:51,797-Speed 5577.06 samples/sec Loss 8.9357 LearningRate 0.0599 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:53,702-Speed 5379.33 samples/sec Loss 8.9116 LearningRate 0.0599 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:44:55,532-Speed 5598.87 samples/sec Loss 9.0411 LearningRate 0.0599 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:57,380-Speed 5542.46 samples/sec Loss 9.1573 LearningRate 0.0599 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:44:59,217-Speed 5579.00 samples/sec Loss 8.9021 LearningRate 0.0599 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:01,078-Speed 5503.13 samples/sec Loss 9.0126 LearningRate 0.0598 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:02,943-Speed 5493.00 samples/sec Loss 8.8894 LearningRate 0.0598 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:04,797-Speed 5528.18 samples/sec Loss 8.9648 LearningRate 0.0598 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:06,668-Speed 5475.42 samples/sec Loss 8.9708 LearningRate 0.0598 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:08,504-Speed 5581.72 samples/sec Loss 8.9556 LearningRate 0.0598 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:10,355-Speed 5534.30 samples/sec Loss 9.0424 LearningRate 0.0598 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:12,218-Speed 5497.63 samples/sec Loss 9.0785 LearningRate 0.0598 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:14,048-Speed 5599.38 samples/sec Loss 8.9464 LearningRate 0.0597 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:15,922-Speed 5466.54 samples/sec Loss 8.9517 LearningRate 0.0597 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:17,791-Speed 5481.84 samples/sec Loss 8.8234 LearningRate 0.0597 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:19,636-Speed 5552.46 samples/sec Loss 8.9967 LearningRate 0.0597 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:21,484-Speed 5545.43 samples/sec Loss 8.9078 LearningRate 0.0597 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:23,377-Speed 5411.85 samples/sec Loss 8.9144 LearningRate 0.0597 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:25,248-Speed 5476.97 samples/sec Loss 8.8859 LearningRate 0.0597 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:27,103-Speed 5522.43 samples/sec Loss 8.9320 LearningRate 0.0596 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:28,945-Speed 5559.81 samples/sec Loss 9.0526 LearningRate 0.0596 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:30,795-Speed 5537.38 samples/sec Loss 8.9455 LearningRate 0.0596 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:32,634-Speed 5571.93 samples/sec Loss 8.8616 LearningRate 0.0596 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:34,477-Speed 5560.31 samples/sec Loss 8.9456 LearningRate 0.0596 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:36,388-Speed 5359.96 samples/sec Loss 8.8161 LearningRate 0.0596 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:38,277-Speed 5423.19 samples/sec Loss 8.9216 LearningRate 0.0595 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:40,151-Speed 5466.79 samples/sec Loss 9.0122 LearningRate 0.0595 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:41,997-Speed 5550.30 samples/sec Loss 8.8167 LearningRate 0.0595 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:43,855-Speed 5515.56 samples/sec Loss 8.9899 LearningRate 0.0595 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:45,702-Speed 5545.13 samples/sec Loss 8.9160 LearningRate 0.0595 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:47,574-Speed 5474.77 samples/sec Loss 8.7978 LearningRate 0.0595 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:49,461-Speed 5430.13 samples/sec Loss 8.9641 LearningRate 0.0595 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:51,333-Speed 5472.70 samples/sec Loss 9.0036 LearningRate 0.0594 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:45:53,186-Speed 5526.75 samples/sec Loss 8.9836 LearningRate 0.0594 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:55,034-Speed 5547.17 samples/sec Loss 8.8652 LearningRate 0.0594 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:56,891-Speed 5516.17 samples/sec Loss 9.0068 LearningRate 0.0594 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:45:58,744-Speed 5527.06 samples/sec Loss 8.9560 LearningRate 0.0594 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:00,616-Speed 5475.09 samples/sec Loss 8.8148 LearningRate 0.0594 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:02,473-Speed 5516.95 samples/sec Loss 8.8197 LearningRate 0.0593 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:04,317-Speed 5555.31 samples/sec Loss 8.8265 LearningRate 0.0593 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:06,153-Speed 5578.39 samples/sec Loss 8.7919 LearningRate 0.0593 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:07,992-Speed 5573.11 samples/sec Loss 8.7991 LearningRate 0.0593 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:09,847-Speed 5523.43 samples/sec Loss 9.0195 LearningRate 0.0593 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:11,673-Speed 5608.20 samples/sec Loss 8.7596 LearningRate 0.0593 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:13,564-Speed 5419.82 samples/sec Loss 8.9190 LearningRate 0.0593 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:15,439-Speed 5464.24 samples/sec Loss 8.9619 LearningRate 0.0592 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:17,274-Speed 5582.82 samples/sec Loss 8.9572 LearningRate 0.0592 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:19,106-Speed 5593.98 samples/sec Loss 8.9832 LearningRate 0.0592 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:20,942-Speed 5577.37 samples/sec Loss 8.9018 LearningRate 0.0592 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:22,796-Speed 5525.36 samples/sec Loss 8.9687 LearningRate 0.0592 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:24,630-Speed 5586.26 samples/sec Loss 8.9295 LearningRate 0.0592 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:26,498-Speed 5484.66 samples/sec Loss 9.0251 LearningRate 0.0591 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:28,348-Speed 5539.29 samples/sec Loss 9.0687 LearningRate 0.0591 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:30,200-Speed 5531.06 samples/sec Loss 8.8480 LearningRate 0.0591 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:32,036-Speed 5581.46 samples/sec Loss 8.9868 LearningRate 0.0591 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:33,910-Speed 5466.81 samples/sec Loss 8.8625 LearningRate 0.0591 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:35,760-Speed 5537.41 samples/sec Loss 8.8740 LearningRate 0.0591 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:37,607-Speed 5548.75 samples/sec Loss 9.0180 LearningRate 0.0591 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:39,444-Speed 5578.14 samples/sec Loss 8.7877 LearningRate 0.0590 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:46:41,271-Speed 5606.35 samples/sec Loss 8.9422 LearningRate 0.0590 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:43,102-Speed 5593.63 samples/sec Loss 8.8941 LearningRate 0.0590 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:44,943-Speed 5563.63 samples/sec Loss 8.9328 LearningRate 0.0590 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:46,822-Speed 5455.30 samples/sec Loss 9.0511 LearningRate 0.0590 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:48,657-Speed 5581.52 samples/sec Loss 8.8542 LearningRate 0.0590 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:50,505-Speed 5543.66 samples/sec Loss 9.0490 LearningRate 0.0590 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:52,356-Speed 5534.78 samples/sec Loss 8.8497 LearningRate 0.0589 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:54,228-Speed 5476.83 samples/sec Loss 8.8596 LearningRate 0.0589 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:56,058-Speed 5598.73 samples/sec Loss 8.8359 LearningRate 0.0589 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:57,915-Speed 5514.99 samples/sec Loss 8.9291 LearningRate 0.0589 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:46:59,760-Speed 5553.68 samples/sec Loss 9.0132 LearningRate 0.0589 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:01,638-Speed 5454.68 samples/sec Loss 9.0085 LearningRate 0.0589 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:03,483-Speed 5552.77 samples/sec Loss 8.9291 LearningRate 0.0588 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:05,342-Speed 5511.49 samples/sec Loss 8.6456 LearningRate 0.0588 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:07,172-Speed 5598.23 samples/sec Loss 8.8007 LearningRate 0.0588 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:09,040-Speed 5486.42 samples/sec Loss 8.8486 LearningRate 0.0588 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:10,885-Speed 5553.75 samples/sec Loss 8.8026 LearningRate 0.0588 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:12,732-Speed 5546.39 samples/sec Loss 9.0247 LearningRate 0.0588 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:14,575-Speed 5557.14 samples/sec Loss 8.9574 LearningRate 0.0588 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:16,423-Speed 5545.39 samples/sec Loss 9.0239 LearningRate 0.0587 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:18,291-Speed 5486.52 samples/sec Loss 8.8679 LearningRate 0.0587 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:20,148-Speed 5516.77 samples/sec Loss 8.7254 LearningRate 0.0587 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:21,991-Speed 5558.73 samples/sec Loss 8.8189 LearningRate 0.0587 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:23,872-Speed 5447.41 samples/sec Loss 8.7376 LearningRate 0.0587 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:25,753-Speed 5447.89 samples/sec Loss 8.8045 LearningRate 0.0587 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:47:27,609-Speed 5519.54 samples/sec Loss 8.9857 LearningRate 0.0586 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:29,474-Speed 5491.70 samples/sec Loss 8.9177 LearningRate 0.0586 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:31,315-Speed 5565.03 samples/sec Loss 8.9886 LearningRate 0.0586 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:33,180-Speed 5493.09 samples/sec Loss 8.7573 LearningRate 0.0586 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:35,082-Speed 5388.42 samples/sec Loss 8.9703 LearningRate 0.0586 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:36,959-Speed 5456.79 samples/sec Loss 8.7581 LearningRate 0.0586 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:38,824-Speed 5494.87 samples/sec Loss 8.7347 LearningRate 0.0586 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:40,705-Speed 5443.73 samples/sec Loss 8.7263 LearningRate 0.0585 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:42,632-Speed 5317.47 samples/sec Loss 8.9052 LearningRate 0.0585 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:44,500-Speed 5485.55 samples/sec Loss 8.8379 LearningRate 0.0585 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:47:46,352-Speed 5532.55 samples/sec Loss 8.7652 LearningRate 0.0585 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:48,189-Speed 5578.26 samples/sec Loss 8.5751 LearningRate 0.0585 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:50,032-Speed 5556.55 samples/sec Loss 8.8739 LearningRate 0.0585 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:51,900-Speed 5484.55 samples/sec Loss 8.8944 LearningRate 0.0585 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:53,745-Speed 5552.93 samples/sec Loss 9.0171 LearningRate 0.0584 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:55,590-Speed 5553.08 samples/sec Loss 8.9388 LearningRate 0.0584 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:57,425-Speed 5581.29 samples/sec Loss 8.8041 LearningRate 0.0584 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:47:59,293-Speed 5486.68 samples/sec Loss 9.0566 LearningRate 0.0584 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:48:01,157-Speed 5495.86 samples/sec Loss 8.7307 LearningRate 0.0584 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:48:03,012-Speed 5522.34 samples/sec Loss 8.7890 LearningRate 0.0584 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:48:04,837-Speed 5612.25 samples/sec Loss 8.9201 LearningRate 0.0583 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:48:06,682-Speed 5555.73 samples/sec Loss 8.8615 LearningRate 0.0583 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:48:08,509-Speed 5606.25 samples/sec Loss 8.8849 LearningRate 0.0583 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:48:10,335-Speed 5611.66 samples/sec Loss 8.8813 LearningRate 0.0583 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:12,201-Speed 5490.44 samples/sec Loss 8.9424 LearningRate 0.0583 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:14,031-Speed 5595.85 samples/sec Loss 8.8207 LearningRate 0.0583 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:15,882-Speed 5535.75 samples/sec Loss 8.7079 LearningRate 0.0583 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:17,717-Speed 5584.29 samples/sec Loss 8.8734 LearningRate 0.0582 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:19,579-Speed 5502.08 samples/sec Loss 8.9125 LearningRate 0.0582 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:21,429-Speed 5537.63 samples/sec Loss 8.8114 LearningRate 0.0582 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:23,304-Speed 5464.18 samples/sec Loss 8.8395 LearningRate 0.0582 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:25,153-Speed 5541.46 samples/sec Loss 8.7440 LearningRate 0.0582 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:48:52,091-[lfw][24000]XNorm: 22.663238 Training: 2022-04-11 11:48:52,091-[lfw][24000]Accuracy-Flip: 0.99633+-0.00267 Training: 2022-04-11 11:48:52,092-[lfw][24000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:49:23,434-[cfp_fp][24000]XNorm: 19.661125 Training: 2022-04-11 11:49:23,434-[cfp_fp][24000]Accuracy-Flip: 0.94657+-0.01216 Training: 2022-04-11 11:49:23,435-[cfp_fp][24000]Accuracy-Highest: 0.95486 Training: 2022-04-11 11:49:50,424-[agedb_30][24000]XNorm: 22.181026 Training: 2022-04-11 11:49:50,425-[agedb_30][24000]Accuracy-Flip: 0.97083+-0.00696 Training: 2022-04-11 11:49:50,426-[agedb_30][24000]Accuracy-Highest: 0.97083 Training: 2022-04-11 11:49:52,298-Speed 117.51 samples/sec Loss 8.8683 LearningRate 0.0582 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:49:54,146-Speed 5541.19 samples/sec Loss 8.8980 LearningRate 0.0581 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:49:55,980-Speed 5587.53 samples/sec Loss 8.7410 LearningRate 0.0581 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:49:57,823-Speed 5558.76 samples/sec Loss 8.7524 LearningRate 0.0581 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:49:59,690-Speed 5488.97 samples/sec Loss 8.9425 LearningRate 0.0581 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:01,532-Speed 5561.16 samples/sec Loss 8.6334 LearningRate 0.0581 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:03,366-Speed 5585.09 samples/sec Loss 8.8283 LearningRate 0.0581 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:05,267-Speed 5391.02 samples/sec Loss 8.9035 LearningRate 0.0581 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:07,097-Speed 5597.47 samples/sec Loss 8.7001 LearningRate 0.0580 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:08,944-Speed 5548.20 samples/sec Loss 8.7939 LearningRate 0.0580 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:10,793-Speed 5539.77 samples/sec Loss 8.8095 LearningRate 0.0580 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:12,648-Speed 5524.37 samples/sec Loss 8.9128 LearningRate 0.0580 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:14,513-Speed 5491.40 samples/sec Loss 8.7011 LearningRate 0.0580 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:16,394-Speed 5448.64 samples/sec Loss 8.6672 LearningRate 0.0580 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:18,234-Speed 5566.15 samples/sec Loss 8.7838 LearningRate 0.0580 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:20,087-Speed 5529.55 samples/sec Loss 8.8448 LearningRate 0.0579 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:21,956-Speed 5481.53 samples/sec Loss 8.9513 LearningRate 0.0579 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:23,790-Speed 5586.60 samples/sec Loss 8.7050 LearningRate 0.0579 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:50:25,631-Speed 5564.00 samples/sec Loss 8.7694 LearningRate 0.0579 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:27,478-Speed 5547.40 samples/sec Loss 8.6519 LearningRate 0.0579 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:29,331-Speed 5530.08 samples/sec Loss 8.9031 LearningRate 0.0579 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:31,179-Speed 5543.75 samples/sec Loss 9.0054 LearningRate 0.0578 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:33,021-Speed 5565.45 samples/sec Loss 8.7753 LearningRate 0.0578 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:34,893-Speed 5471.53 samples/sec Loss 8.7837 LearningRate 0.0578 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:36,745-Speed 5530.53 samples/sec Loss 8.9111 LearningRate 0.0578 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:38,580-Speed 5581.34 samples/sec Loss 8.8629 LearningRate 0.0578 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:40,417-Speed 5577.76 samples/sec Loss 8.7569 LearningRate 0.0578 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:42,258-Speed 5565.61 samples/sec Loss 8.7056 LearningRate 0.0578 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:44,147-Speed 5421.91 samples/sec Loss 8.8646 LearningRate 0.0577 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:50:46,003-Speed 5521.96 samples/sec Loss 8.8741 LearningRate 0.0577 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:50:47,856-Speed 5527.39 samples/sec Loss 8.9313 LearningRate 0.0577 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:50:49,700-Speed 5554.68 samples/sec Loss 8.6681 LearningRate 0.0577 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:51,536-Speed 5579.14 samples/sec Loss 9.0343 LearningRate 0.0577 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:53,409-Speed 5470.67 samples/sec Loss 8.8306 LearningRate 0.0577 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:55,245-Speed 5578.58 samples/sec Loss 8.8342 LearningRate 0.0577 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:57,126-Speed 5447.93 samples/sec Loss 8.9402 LearningRate 0.0576 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:50:58,971-Speed 5553.82 samples/sec Loss 8.7521 LearningRate 0.0576 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:00,816-Speed 5551.28 samples/sec Loss 8.9036 LearningRate 0.0576 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:02,660-Speed 5557.93 samples/sec Loss 8.8394 LearningRate 0.0576 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:04,515-Speed 5522.09 samples/sec Loss 8.7830 LearningRate 0.0576 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:06,370-Speed 5524.88 samples/sec Loss 8.8758 LearningRate 0.0576 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:08,210-Speed 5565.48 samples/sec Loss 8.7076 LearningRate 0.0575 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:10,062-Speed 5532.83 samples/sec Loss 8.7657 LearningRate 0.0575 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:11,938-Speed 5460.05 samples/sec Loss 9.0067 LearningRate 0.0575 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:13,792-Speed 5528.15 samples/sec Loss 8.8809 LearningRate 0.0575 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:15,649-Speed 5514.94 samples/sec Loss 8.6940 LearningRate 0.0575 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:17,521-Speed 5476.29 samples/sec Loss 8.9246 LearningRate 0.0575 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:19,374-Speed 5528.81 samples/sec Loss 8.9154 LearningRate 0.0575 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:21,241-Speed 5487.17 samples/sec Loss 8.6428 LearningRate 0.0574 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:23,088-Speed 5546.83 samples/sec Loss 8.7853 LearningRate 0.0574 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:24,970-Speed 5443.48 samples/sec Loss 8.7682 LearningRate 0.0574 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:26,854-Speed 5437.85 samples/sec Loss 8.6647 LearningRate 0.0574 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:28,753-Speed 5393.17 samples/sec Loss 8.7683 LearningRate 0.0574 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:30,594-Speed 5566.10 samples/sec Loss 8.7643 LearningRate 0.0574 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:32,482-Speed 5425.77 samples/sec Loss 8.8163 LearningRate 0.0574 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:34,345-Speed 5500.31 samples/sec Loss 8.6139 LearningRate 0.0573 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:36,187-Speed 5561.96 samples/sec Loss 8.7482 LearningRate 0.0573 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:38,025-Speed 5574.80 samples/sec Loss 8.7912 LearningRate 0.0573 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:39,879-Speed 5525.10 samples/sec Loss 8.9074 LearningRate 0.0573 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:41,738-Speed 5509.87 samples/sec Loss 8.6607 LearningRate 0.0573 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:43,627-Speed 5424.37 samples/sec Loss 8.7991 LearningRate 0.0573 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:45,498-Speed 5476.17 samples/sec Loss 8.6615 LearningRate 0.0572 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:47,358-Speed 5508.32 samples/sec Loss 8.7725 LearningRate 0.0572 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:49,247-Speed 5422.74 samples/sec Loss 8.9604 LearningRate 0.0572 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:51,115-Speed 5484.20 samples/sec Loss 8.7349 LearningRate 0.0572 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:52,958-Speed 5559.00 samples/sec Loss 8.9019 LearningRate 0.0572 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:51:54,784-Speed 5610.32 samples/sec Loss 8.7768 LearningRate 0.0572 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:56,632-Speed 5541.67 samples/sec Loss 8.7278 LearningRate 0.0572 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:51:58,470-Speed 5575.91 samples/sec Loss 8.9318 LearningRate 0.0571 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:00,319-Speed 5539.05 samples/sec Loss 8.7123 LearningRate 0.0571 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:02,175-Speed 5518.76 samples/sec Loss 8.7453 LearningRate 0.0571 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:04,037-Speed 5503.22 samples/sec Loss 8.5444 LearningRate 0.0571 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:05,878-Speed 5564.27 samples/sec Loss 8.8387 LearningRate 0.0571 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:07,717-Speed 5570.56 samples/sec Loss 8.7729 LearningRate 0.0571 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:09,585-Speed 5483.59 samples/sec Loss 8.7107 LearningRate 0.0571 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:11,433-Speed 5542.45 samples/sec Loss 8.7806 LearningRate 0.0570 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:13,270-Speed 5576.97 samples/sec Loss 8.7295 LearningRate 0.0570 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:15,121-Speed 5532.42 samples/sec Loss 8.8027 LearningRate 0.0570 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:16,976-Speed 5526.00 samples/sec Loss 8.8311 LearningRate 0.0570 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:18,818-Speed 5561.03 samples/sec Loss 8.7296 LearningRate 0.0570 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:20,659-Speed 5563.97 samples/sec Loss 8.6709 LearningRate 0.0570 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:22,490-Speed 5593.49 samples/sec Loss 8.7069 LearningRate 0.0569 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:24,338-Speed 5542.44 samples/sec Loss 8.7834 LearningRate 0.0569 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:26,175-Speed 5578.84 samples/sec Loss 8.7961 LearningRate 0.0569 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:28,017-Speed 5560.49 samples/sec Loss 8.8554 LearningRate 0.0569 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:29,852-Speed 5582.40 samples/sec Loss 8.7281 LearningRate 0.0569 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:31,704-Speed 5532.18 samples/sec Loss 8.8269 LearningRate 0.0569 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:33,606-Speed 5385.88 samples/sec Loss 8.7200 LearningRate 0.0569 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:35,467-Speed 5506.37 samples/sec Loss 8.8437 LearningRate 0.0568 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:37,351-Speed 5436.14 samples/sec Loss 8.8281 LearningRate 0.0568 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:39,203-Speed 5536.80 samples/sec Loss 8.7143 LearningRate 0.0568 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:41,094-Speed 5416.21 samples/sec Loss 8.6780 LearningRate 0.0568 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:42,944-Speed 5539.21 samples/sec Loss 8.7438 LearningRate 0.0568 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:44,795-Speed 5533.92 samples/sec Loss 8.6065 LearningRate 0.0568 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:46,664-Speed 5483.20 samples/sec Loss 8.5492 LearningRate 0.0568 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:48,501-Speed 5575.70 samples/sec Loss 8.8259 LearningRate 0.0567 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:50,337-Speed 5579.75 samples/sec Loss 8.5969 LearningRate 0.0567 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:52:52,180-Speed 5560.89 samples/sec Loss 8.6139 LearningRate 0.0567 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:54,012-Speed 5589.45 samples/sec Loss 8.7460 LearningRate 0.0567 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:55,879-Speed 5489.75 samples/sec Loss 8.7743 LearningRate 0.0567 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:57,726-Speed 5547.86 samples/sec Loss 8.9216 LearningRate 0.0567 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:52:59,612-Speed 5429.93 samples/sec Loss 8.6622 LearningRate 0.0567 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:01,518-Speed 5376.88 samples/sec Loss 8.5565 LearningRate 0.0566 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:03,388-Speed 5477.11 samples/sec Loss 8.6938 LearningRate 0.0566 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:05,227-Speed 5570.98 samples/sec Loss 8.7749 LearningRate 0.0566 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:07,099-Speed 5473.20 samples/sec Loss 8.7235 LearningRate 0.0566 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:08,938-Speed 5569.80 samples/sec Loss 8.8347 LearningRate 0.0566 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:10,810-Speed 5473.25 samples/sec Loss 8.9363 LearningRate 0.0566 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:12,668-Speed 5513.89 samples/sec Loss 8.6411 LearningRate 0.0565 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:14,525-Speed 5517.34 samples/sec Loss 8.8658 LearningRate 0.0565 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:16,385-Speed 5509.33 samples/sec Loss 8.7654 LearningRate 0.0565 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:18,251-Speed 5490.09 samples/sec Loss 8.6989 LearningRate 0.0565 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:20,090-Speed 5569.16 samples/sec Loss 8.7203 LearningRate 0.0565 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:21,954-Speed 5498.54 samples/sec Loss 8.7994 LearningRate 0.0565 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:23,892-Speed 5284.90 samples/sec Loss 8.5850 LearningRate 0.0565 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:25,805-Speed 5355.87 samples/sec Loss 8.8399 LearningRate 0.0564 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:27,645-Speed 5566.80 samples/sec Loss 8.9422 LearningRate 0.0564 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:29,507-Speed 5501.52 samples/sec Loss 8.8035 LearningRate 0.0564 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:31,343-Speed 5580.55 samples/sec Loss 8.6897 LearningRate 0.0564 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:33,204-Speed 5505.97 samples/sec Loss 8.6920 LearningRate 0.0564 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:35,039-Speed 5582.92 samples/sec Loss 8.5288 LearningRate 0.0564 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:36,894-Speed 5522.26 samples/sec Loss 8.6997 LearningRate 0.0564 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:38,752-Speed 5516.43 samples/sec Loss 8.8221 LearningRate 0.0563 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:40,624-Speed 5469.87 samples/sec Loss 8.7297 LearningRate 0.0563 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:53:42,479-Speed 5524.21 samples/sec Loss 8.6393 LearningRate 0.0563 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:44,323-Speed 5554.51 samples/sec Loss 8.7365 LearningRate 0.0563 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:46,214-Speed 5418.75 samples/sec Loss 8.6418 LearningRate 0.0563 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:48,195-Speed 5173.75 samples/sec Loss 8.9139 LearningRate 0.0563 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:53:50,021-Speed 5610.39 samples/sec Loss 8.8260 LearningRate 0.0563 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:01,105-Speed 923.95 samples/sec Loss 7.8594 LearningRate 0.0562 Epoch: 5 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:03,030-Speed 5323.82 samples/sec Loss 7.9958 LearningRate 0.0562 Epoch: 5 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:04,887-Speed 5515.19 samples/sec Loss 7.9973 LearningRate 0.0562 Epoch: 5 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:06,760-Speed 5468.71 samples/sec Loss 7.9905 LearningRate 0.0562 Epoch: 5 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:08,618-Speed 5516.14 samples/sec Loss 7.9165 LearningRate 0.0562 Epoch: 5 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:10,477-Speed 5511.29 samples/sec Loss 7.9705 LearningRate 0.0562 Epoch: 5 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:12,356-Speed 5452.57 samples/sec Loss 8.0484 LearningRate 0.0561 Epoch: 5 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:14,215-Speed 5509.85 samples/sec Loss 7.9482 LearningRate 0.0561 Epoch: 5 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:16,048-Speed 5587.11 samples/sec Loss 8.2060 LearningRate 0.0561 Epoch: 5 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:17,897-Speed 5542.10 samples/sec Loss 7.9745 LearningRate 0.0561 Epoch: 5 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:19,740-Speed 5557.33 samples/sec Loss 8.1128 LearningRate 0.0561 Epoch: 5 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:21,599-Speed 5510.56 samples/sec Loss 8.0413 LearningRate 0.0561 Epoch: 5 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:23,441-Speed 5562.65 samples/sec Loss 7.9129 LearningRate 0.0561 Epoch: 5 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:25,303-Speed 5503.40 samples/sec Loss 7.9079 LearningRate 0.0560 Epoch: 5 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:27,152-Speed 5539.98 samples/sec Loss 7.9922 LearningRate 0.0560 Epoch: 5 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:28,991-Speed 5571.29 samples/sec Loss 7.9993 LearningRate 0.0560 Epoch: 5 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:30,838-Speed 5545.79 samples/sec Loss 8.2440 LearningRate 0.0560 Epoch: 5 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:32,685-Speed 5547.59 samples/sec Loss 8.1313 LearningRate 0.0560 Epoch: 5 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:34,532-Speed 5548.97 samples/sec Loss 8.0954 LearningRate 0.0560 Epoch: 5 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:36,420-Speed 5426.00 samples/sec Loss 8.1053 LearningRate 0.0560 Epoch: 5 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:38,306-Speed 5431.35 samples/sec Loss 8.0093 LearningRate 0.0559 Epoch: 5 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:40,150-Speed 5555.98 samples/sec Loss 7.9433 LearningRate 0.0559 Epoch: 5 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:42,018-Speed 5483.75 samples/sec Loss 8.1013 LearningRate 0.0559 Epoch: 5 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:43,853-Speed 5584.26 samples/sec Loss 8.0131 LearningRate 0.0559 Epoch: 5 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:45,708-Speed 5521.77 samples/sec Loss 8.0758 LearningRate 0.0559 Epoch: 5 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:47,588-Speed 5450.05 samples/sec Loss 7.9958 LearningRate 0.0559 Epoch: 5 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:49,459-Speed 5476.38 samples/sec Loss 7.9068 LearningRate 0.0559 Epoch: 5 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:51,317-Speed 5516.24 samples/sec Loss 7.8864 LearningRate 0.0558 Epoch: 5 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:53,185-Speed 5481.68 samples/sec Loss 8.1356 LearningRate 0.0558 Epoch: 5 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:55,044-Speed 5513.79 samples/sec Loss 8.1547 LearningRate 0.0558 Epoch: 5 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:54:56,915-Speed 5476.76 samples/sec Loss 8.1525 LearningRate 0.0558 Epoch: 5 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:54:58,749-Speed 5586.67 samples/sec Loss 8.2793 LearningRate 0.0558 Epoch: 5 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:00,602-Speed 5528.90 samples/sec Loss 8.1673 LearningRate 0.0558 Epoch: 5 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:02,480-Speed 5453.97 samples/sec Loss 8.0868 LearningRate 0.0557 Epoch: 5 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:04,328-Speed 5543.61 samples/sec Loss 8.1301 LearningRate 0.0557 Epoch: 5 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:06,221-Speed 5412.32 samples/sec Loss 8.3341 LearningRate 0.0557 Epoch: 5 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:08,068-Speed 5550.49 samples/sec Loss 8.2235 LearningRate 0.0557 Epoch: 5 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:09,921-Speed 5526.30 samples/sec Loss 8.2831 LearningRate 0.0557 Epoch: 5 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:11,780-Speed 5513.15 samples/sec Loss 8.2076 LearningRate 0.0557 Epoch: 5 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:13,615-Speed 5582.63 samples/sec Loss 8.2237 LearningRate 0.0557 Epoch: 5 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:15,481-Speed 5488.18 samples/sec Loss 8.3686 LearningRate 0.0556 Epoch: 5 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:17,325-Speed 5558.51 samples/sec Loss 8.3755 LearningRate 0.0556 Epoch: 5 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:19,171-Speed 5548.61 samples/sec Loss 8.0978 LearningRate 0.0556 Epoch: 5 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:21,034-Speed 5498.89 samples/sec Loss 8.2024 LearningRate 0.0556 Epoch: 5 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:22,881-Speed 5546.68 samples/sec Loss 8.2478 LearningRate 0.0556 Epoch: 5 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:24,723-Speed 5561.70 samples/sec Loss 8.1447 LearningRate 0.0556 Epoch: 5 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:26,564-Speed 5565.56 samples/sec Loss 8.3805 LearningRate 0.0556 Epoch: 5 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:28,445-Speed 5447.56 samples/sec Loss 8.0754 LearningRate 0.0555 Epoch: 5 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:30,286-Speed 5565.28 samples/sec Loss 8.2309 LearningRate 0.0555 Epoch: 5 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:32,130-Speed 5556.31 samples/sec Loss 8.3018 LearningRate 0.0555 Epoch: 5 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:33,978-Speed 5545.40 samples/sec Loss 8.2999 LearningRate 0.0555 Epoch: 5 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:35,819-Speed 5564.43 samples/sec Loss 8.2525 LearningRate 0.0555 Epoch: 5 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:37,671-Speed 5532.92 samples/sec Loss 8.3290 LearningRate 0.0555 Epoch: 5 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:39,526-Speed 5521.57 samples/sec Loss 8.2452 LearningRate 0.0555 Epoch: 5 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:41,383-Speed 5518.29 samples/sec Loss 8.2947 LearningRate 0.0554 Epoch: 5 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:43,231-Speed 5541.16 samples/sec Loss 8.2141 LearningRate 0.0554 Epoch: 5 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:45,081-Speed 5538.55 samples/sec Loss 8.3431 LearningRate 0.0554 Epoch: 5 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:55:46,941-Speed 5507.84 samples/sec Loss 8.1549 LearningRate 0.0554 Epoch: 5 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:48,794-Speed 5529.26 samples/sec Loss 8.4363 LearningRate 0.0554 Epoch: 5 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:50,645-Speed 5536.62 samples/sec Loss 8.5159 LearningRate 0.0554 Epoch: 5 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:52,531-Speed 5429.37 samples/sec Loss 8.2315 LearningRate 0.0553 Epoch: 5 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:54,374-Speed 5562.14 samples/sec Loss 8.3992 LearningRate 0.0553 Epoch: 5 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:56,213-Speed 5568.86 samples/sec Loss 8.4290 LearningRate 0.0553 Epoch: 5 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:58,074-Speed 5507.33 samples/sec Loss 8.2319 LearningRate 0.0553 Epoch: 5 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:55:59,939-Speed 5491.62 samples/sec Loss 8.2608 LearningRate 0.0553 Epoch: 5 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:01,810-Speed 5475.89 samples/sec Loss 8.3872 LearningRate 0.0553 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:03,668-Speed 5513.37 samples/sec Loss 8.3577 LearningRate 0.0553 Epoch: 5 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:05,522-Speed 5526.61 samples/sec Loss 8.3383 LearningRate 0.0552 Epoch: 5 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:56:07,388-Speed 5492.86 samples/sec Loss 8.2705 LearningRate 0.0552 Epoch: 5 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:09,222-Speed 5586.59 samples/sec Loss 8.3705 LearningRate 0.0552 Epoch: 5 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:11,066-Speed 5554.62 samples/sec Loss 8.3580 LearningRate 0.0552 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:56:38,331-[lfw][26000]XNorm: 23.686060 Training: 2022-04-11 11:56:38,332-[lfw][26000]Accuracy-Flip: 0.99617+-0.00289 Training: 2022-04-11 11:56:38,333-[lfw][26000]Accuracy-Highest: 0.99667 Training: 2022-04-11 11:57:09,832-[cfp_fp][26000]XNorm: 20.557960 Training: 2022-04-11 11:57:09,833-[cfp_fp][26000]Accuracy-Flip: 0.94686+-0.01232 Training: 2022-04-11 11:57:09,833-[cfp_fp][26000]Accuracy-Highest: 0.95486 Training: 2022-04-11 11:57:36,701-[agedb_30][26000]XNorm: 22.865481 Training: 2022-04-11 11:57:36,702-[agedb_30][26000]Accuracy-Flip: 0.96850+-0.01031 Training: 2022-04-11 11:57:36,702-[agedb_30][26000]Accuracy-Highest: 0.97083 Training: 2022-04-11 11:57:38,554-Speed 117.05 samples/sec Loss 8.3113 LearningRate 0.0552 Epoch: 5 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:40,402-Speed 5542.68 samples/sec Loss 8.2872 LearningRate 0.0552 Epoch: 5 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:42,241-Speed 5570.70 samples/sec Loss 8.5609 LearningRate 0.0552 Epoch: 5 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:44,071-Speed 5599.53 samples/sec Loss 8.4028 LearningRate 0.0551 Epoch: 5 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:45,937-Speed 5488.58 samples/sec Loss 8.3697 LearningRate 0.0551 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:47,768-Speed 5595.23 samples/sec Loss 8.4226 LearningRate 0.0551 Epoch: 5 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:49,609-Speed 5565.43 samples/sec Loss 8.3882 LearningRate 0.0551 Epoch: 5 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:57:51,449-Speed 5567.25 samples/sec Loss 8.3917 LearningRate 0.0551 Epoch: 5 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:57:53,294-Speed 5555.91 samples/sec Loss 8.5487 LearningRate 0.0551 Epoch: 5 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:57:55,125-Speed 5595.53 samples/sec Loss 8.3458 LearningRate 0.0551 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:57:56,981-Speed 5520.95 samples/sec Loss 8.4003 LearningRate 0.0550 Epoch: 5 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:57:58,813-Speed 5589.87 samples/sec Loss 8.4387 LearningRate 0.0550 Epoch: 5 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:58:00,668-Speed 5523.87 samples/sec Loss 8.3529 LearningRate 0.0550 Epoch: 5 Global Step: 26130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:58:02,505-Speed 5576.85 samples/sec Loss 8.3904 LearningRate 0.0550 Epoch: 5 Global Step: 26140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:58:04,349-Speed 5556.99 samples/sec Loss 8.1120 LearningRate 0.0550 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:58:06,191-Speed 5562.00 samples/sec Loss 8.3362 LearningRate 0.0550 Epoch: 5 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:08,027-Speed 5579.22 samples/sec Loss 8.3389 LearningRate 0.0550 Epoch: 5 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:09,884-Speed 5516.95 samples/sec Loss 8.4420 LearningRate 0.0549 Epoch: 5 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:11,736-Speed 5531.29 samples/sec Loss 8.3360 LearningRate 0.0549 Epoch: 5 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:13,620-Speed 5438.49 samples/sec Loss 8.4144 LearningRate 0.0549 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:15,462-Speed 5561.63 samples/sec Loss 8.3333 LearningRate 0.0549 Epoch: 5 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:17,333-Speed 5478.29 samples/sec Loss 8.4300 LearningRate 0.0549 Epoch: 5 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:19,166-Speed 5587.80 samples/sec Loss 8.4749 LearningRate 0.0549 Epoch: 5 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:21,028-Speed 5503.17 samples/sec Loss 8.3703 LearningRate 0.0549 Epoch: 5 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:22,873-Speed 5554.12 samples/sec Loss 8.3710 LearningRate 0.0548 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:24,723-Speed 5536.91 samples/sec Loss 8.4915 LearningRate 0.0548 Epoch: 5 Global Step: 26260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:26,574-Speed 5535.80 samples/sec Loss 8.4217 LearningRate 0.0548 Epoch: 5 Global Step: 26270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:28,431-Speed 5532.09 samples/sec Loss 8.4018 LearningRate 0.0548 Epoch: 5 Global Step: 26280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:30,280-Speed 5539.52 samples/sec Loss 8.4261 LearningRate 0.0548 Epoch: 5 Global Step: 26290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:32,110-Speed 5601.68 samples/sec Loss 8.4949 LearningRate 0.0548 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:33,950-Speed 5567.89 samples/sec Loss 8.3967 LearningRate 0.0547 Epoch: 5 Global Step: 26310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:35,785-Speed 5581.92 samples/sec Loss 8.4102 LearningRate 0.0547 Epoch: 5 Global Step: 26320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:37,648-Speed 5499.73 samples/sec Loss 8.4450 LearningRate 0.0547 Epoch: 5 Global Step: 26330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:39,510-Speed 5502.58 samples/sec Loss 8.4012 LearningRate 0.0547 Epoch: 5 Global Step: 26340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:41,368-Speed 5512.95 samples/sec Loss 8.3528 LearningRate 0.0547 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:58:43,232-Speed 5498.98 samples/sec Loss 8.6012 LearningRate 0.0547 Epoch: 5 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:45,097-Speed 5492.69 samples/sec Loss 8.3470 LearningRate 0.0547 Epoch: 5 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:46,948-Speed 5535.76 samples/sec Loss 8.4026 LearningRate 0.0546 Epoch: 5 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:48,812-Speed 5496.46 samples/sec Loss 8.3121 LearningRate 0.0546 Epoch: 5 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:50,650-Speed 5572.75 samples/sec Loss 8.2763 LearningRate 0.0546 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:52,490-Speed 5568.52 samples/sec Loss 8.4432 LearningRate 0.0546 Epoch: 5 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:54,339-Speed 5541.91 samples/sec Loss 8.3492 LearningRate 0.0546 Epoch: 5 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:56,179-Speed 5567.17 samples/sec Loss 8.3169 LearningRate 0.0546 Epoch: 5 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:58,015-Speed 5579.48 samples/sec Loss 8.4155 LearningRate 0.0546 Epoch: 5 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:58:59,881-Speed 5489.49 samples/sec Loss 8.3750 LearningRate 0.0545 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:01,718-Speed 5578.01 samples/sec Loss 8.3969 LearningRate 0.0545 Epoch: 5 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:03,555-Speed 5576.73 samples/sec Loss 8.2986 LearningRate 0.0545 Epoch: 5 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:05,399-Speed 5555.97 samples/sec Loss 8.4530 LearningRate 0.0545 Epoch: 5 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:07,279-Speed 5448.89 samples/sec Loss 8.4125 LearningRate 0.0545 Epoch: 5 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:09,121-Speed 5562.46 samples/sec Loss 8.5343 LearningRate 0.0545 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:10,956-Speed 5581.70 samples/sec Loss 8.4072 LearningRate 0.0545 Epoch: 5 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:12,809-Speed 5530.74 samples/sec Loss 8.4007 LearningRate 0.0544 Epoch: 5 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:14,645-Speed 5577.97 samples/sec Loss 8.5217 LearningRate 0.0544 Epoch: 5 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:16,509-Speed 5496.93 samples/sec Loss 8.4172 LearningRate 0.0544 Epoch: 5 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:18,368-Speed 5509.87 samples/sec Loss 8.4305 LearningRate 0.0544 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:20,198-Speed 5598.74 samples/sec Loss 8.2384 LearningRate 0.0544 Epoch: 5 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:22,054-Speed 5518.95 samples/sec Loss 8.5223 LearningRate 0.0544 Epoch: 5 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:23,956-Speed 5387.15 samples/sec Loss 8.4511 LearningRate 0.0544 Epoch: 5 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:25,800-Speed 5553.95 samples/sec Loss 8.4020 LearningRate 0.0543 Epoch: 5 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 11:59:27,629-Speed 5600.05 samples/sec Loss 8.4924 LearningRate 0.0543 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:29,462-Speed 5589.09 samples/sec Loss 8.5598 LearningRate 0.0543 Epoch: 5 Global Step: 26610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:31,299-Speed 5576.71 samples/sec Loss 8.5327 LearningRate 0.0543 Epoch: 5 Global Step: 26620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:33,136-Speed 5577.56 samples/sec Loss 8.3712 LearningRate 0.0543 Epoch: 5 Global Step: 26630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:34,975-Speed 5572.82 samples/sec Loss 8.5269 LearningRate 0.0543 Epoch: 5 Global Step: 26640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:36,821-Speed 5547.14 samples/sec Loss 8.5174 LearningRate 0.0543 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:38,666-Speed 5552.32 samples/sec Loss 8.4521 LearningRate 0.0542 Epoch: 5 Global Step: 26660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:40,537-Speed 5476.94 samples/sec Loss 8.4675 LearningRate 0.0542 Epoch: 5 Global Step: 26670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:42,369-Speed 5590.95 samples/sec Loss 8.5100 LearningRate 0.0542 Epoch: 5 Global Step: 26680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:44,237-Speed 5485.62 samples/sec Loss 8.4790 LearningRate 0.0542 Epoch: 5 Global Step: 26690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:46,104-Speed 5487.39 samples/sec Loss 8.4453 LearningRate 0.0542 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 11:59:47,971-Speed 5487.75 samples/sec Loss 8.2758 LearningRate 0.0542 Epoch: 5 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:49,838-Speed 5486.40 samples/sec Loss 8.2491 LearningRate 0.0541 Epoch: 5 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:51,744-Speed 5376.97 samples/sec Loss 8.3095 LearningRate 0.0541 Epoch: 5 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:53,609-Speed 5492.58 samples/sec Loss 8.3886 LearningRate 0.0541 Epoch: 5 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:55,462-Speed 5528.63 samples/sec Loss 8.3801 LearningRate 0.0541 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:57,307-Speed 5553.08 samples/sec Loss 8.2918 LearningRate 0.0541 Epoch: 5 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 11:59:59,194-Speed 5427.28 samples/sec Loss 8.5417 LearningRate 0.0541 Epoch: 5 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:01,051-Speed 5517.40 samples/sec Loss 8.3241 LearningRate 0.0541 Epoch: 5 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:02,897-Speed 5552.83 samples/sec Loss 8.6314 LearningRate 0.0540 Epoch: 5 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:04,734-Speed 5577.04 samples/sec Loss 8.3379 LearningRate 0.0540 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:06,581-Speed 5544.66 samples/sec Loss 8.4976 LearningRate 0.0540 Epoch: 5 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:08,407-Speed 5610.38 samples/sec Loss 8.5351 LearningRate 0.0540 Epoch: 5 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:10,290-Speed 5439.63 samples/sec Loss 8.3579 LearningRate 0.0540 Epoch: 5 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:12,153-Speed 5501.02 samples/sec Loss 8.4027 LearningRate 0.0540 Epoch: 5 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:13,992-Speed 5570.29 samples/sec Loss 8.4658 LearningRate 0.0540 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:15,861-Speed 5483.57 samples/sec Loss 8.3837 LearningRate 0.0539 Epoch: 5 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:17,705-Speed 5556.38 samples/sec Loss 8.5581 LearningRate 0.0539 Epoch: 5 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:19,567-Speed 5501.36 samples/sec Loss 8.5174 LearningRate 0.0539 Epoch: 5 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:21,402-Speed 5583.99 samples/sec Loss 8.4283 LearningRate 0.0539 Epoch: 5 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:23,275-Speed 5467.95 samples/sec Loss 8.3949 LearningRate 0.0539 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:25,135-Speed 5546.25 samples/sec Loss 8.5020 LearningRate 0.0539 Epoch: 5 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:27,004-Speed 5478.32 samples/sec Loss 8.4746 LearningRate 0.0539 Epoch: 5 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:28,847-Speed 5559.34 samples/sec Loss 8.3257 LearningRate 0.0538 Epoch: 5 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:30,701-Speed 5526.14 samples/sec Loss 8.4291 LearningRate 0.0538 Epoch: 5 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:32,537-Speed 5581.62 samples/sec Loss 8.4835 LearningRate 0.0538 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:34,392-Speed 5522.68 samples/sec Loss 8.3105 LearningRate 0.0538 Epoch: 5 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:36,274-Speed 5444.07 samples/sec Loss 8.2898 LearningRate 0.0538 Epoch: 5 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:38,173-Speed 5394.73 samples/sec Loss 8.3155 LearningRate 0.0538 Epoch: 5 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:40,030-Speed 5519.80 samples/sec Loss 8.3564 LearningRate 0.0538 Epoch: 5 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:41,899-Speed 5483.36 samples/sec Loss 8.4053 LearningRate 0.0537 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:43,733-Speed 5583.96 samples/sec Loss 8.3842 LearningRate 0.0537 Epoch: 5 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:45,568-Speed 5581.96 samples/sec Loss 8.3383 LearningRate 0.0537 Epoch: 5 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:00:47,436-Speed 5487.68 samples/sec Loss 8.3049 LearningRate 0.0537 Epoch: 5 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:49,318-Speed 5441.87 samples/sec Loss 8.3972 LearningRate 0.0537 Epoch: 5 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:51,196-Speed 5457.06 samples/sec Loss 8.4736 LearningRate 0.0537 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:53,032-Speed 5580.32 samples/sec Loss 8.5063 LearningRate 0.0537 Epoch: 5 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:54,879-Speed 5546.58 samples/sec Loss 8.3603 LearningRate 0.0536 Epoch: 5 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:56,717-Speed 5575.10 samples/sec Loss 8.4733 LearningRate 0.0536 Epoch: 5 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:00:58,559-Speed 5561.25 samples/sec Loss 8.3597 LearningRate 0.0536 Epoch: 5 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:00,403-Speed 5556.09 samples/sec Loss 8.3274 LearningRate 0.0536 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:02,290-Speed 5429.39 samples/sec Loss 8.3973 LearningRate 0.0536 Epoch: 5 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:04,137-Speed 5548.54 samples/sec Loss 8.5611 LearningRate 0.0536 Epoch: 5 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:05,978-Speed 5562.29 samples/sec Loss 8.5603 LearningRate 0.0536 Epoch: 5 Global Step: 27130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 12:01:07,842-Speed 5499.59 samples/sec Loss 8.5733 LearningRate 0.0535 Epoch: 5 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:09,676-Speed 5585.36 samples/sec Loss 8.4103 LearningRate 0.0535 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:11,510-Speed 5585.79 samples/sec Loss 8.5210 LearningRate 0.0535 Epoch: 5 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:13,358-Speed 5543.37 samples/sec Loss 8.3768 LearningRate 0.0535 Epoch: 5 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:15,215-Speed 5518.46 samples/sec Loss 8.5651 LearningRate 0.0535 Epoch: 5 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:17,052-Speed 5578.77 samples/sec Loss 8.2715 LearningRate 0.0535 Epoch: 5 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:18,884-Speed 5589.92 samples/sec Loss 8.2676 LearningRate 0.0535 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:20,736-Speed 5530.79 samples/sec Loss 8.2520 LearningRate 0.0534 Epoch: 5 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:22,598-Speed 5506.09 samples/sec Loss 8.4752 LearningRate 0.0534 Epoch: 5 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:24,495-Speed 5401.15 samples/sec Loss 8.3062 LearningRate 0.0534 Epoch: 5 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:26,373-Speed 5452.71 samples/sec Loss 8.4726 LearningRate 0.0534 Epoch: 5 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:28,233-Speed 5511.53 samples/sec Loss 8.4793 LearningRate 0.0534 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:30,075-Speed 5561.44 samples/sec Loss 8.6362 LearningRate 0.0534 Epoch: 5 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:31,937-Speed 5499.97 samples/sec Loss 8.2608 LearningRate 0.0534 Epoch: 5 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:33,785-Speed 5545.75 samples/sec Loss 8.3662 LearningRate 0.0533 Epoch: 5 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:35,653-Speed 5483.77 samples/sec Loss 8.3536 LearningRate 0.0533 Epoch: 5 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:01:37,504-Speed 5536.35 samples/sec Loss 8.5538 LearningRate 0.0533 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:39,350-Speed 5551.63 samples/sec Loss 8.5409 LearningRate 0.0533 Epoch: 5 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:41,198-Speed 5542.29 samples/sec Loss 8.4843 LearningRate 0.0533 Epoch: 5 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:43,044-Speed 5551.80 samples/sec Loss 8.2969 LearningRate 0.0533 Epoch: 5 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:44,908-Speed 5498.85 samples/sec Loss 8.2616 LearningRate 0.0533 Epoch: 5 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:46,759-Speed 5534.73 samples/sec Loss 8.4107 LearningRate 0.0532 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:48,639-Speed 5450.12 samples/sec Loss 8.4442 LearningRate 0.0532 Epoch: 5 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:50,500-Speed 5504.26 samples/sec Loss 8.4485 LearningRate 0.0532 Epoch: 5 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:52,407-Speed 5372.60 samples/sec Loss 8.3164 LearningRate 0.0532 Epoch: 5 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:54,273-Speed 5490.34 samples/sec Loss 8.2379 LearningRate 0.0532 Epoch: 5 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:56,120-Speed 5547.93 samples/sec Loss 8.4549 LearningRate 0.0532 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:57,960-Speed 5568.48 samples/sec Loss 8.2746 LearningRate 0.0532 Epoch: 5 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:01:59,828-Speed 5485.31 samples/sec Loss 8.3201 LearningRate 0.0531 Epoch: 5 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:01,691-Speed 5497.29 samples/sec Loss 8.4051 LearningRate 0.0531 Epoch: 5 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:03,548-Speed 5519.94 samples/sec Loss 8.3740 LearningRate 0.0531 Epoch: 5 Global Step: 27440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:05,401-Speed 5526.78 samples/sec Loss 8.2378 LearningRate 0.0531 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:07,248-Speed 5549.52 samples/sec Loss 8.3632 LearningRate 0.0531 Epoch: 5 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:09,094-Speed 5550.54 samples/sec Loss 8.4551 LearningRate 0.0531 Epoch: 5 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:10,959-Speed 5491.95 samples/sec Loss 8.4327 LearningRate 0.0530 Epoch: 5 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:12,828-Speed 5483.81 samples/sec Loss 8.4799 LearningRate 0.0530 Epoch: 5 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:14,690-Speed 5500.37 samples/sec Loss 8.1732 LearningRate 0.0530 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:16,558-Speed 5484.88 samples/sec Loss 8.3669 LearningRate 0.0530 Epoch: 5 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:18,408-Speed 5540.49 samples/sec Loss 8.4474 LearningRate 0.0530 Epoch: 5 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:20,256-Speed 5546.27 samples/sec Loss 8.4618 LearningRate 0.0530 Epoch: 5 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:22,097-Speed 5565.65 samples/sec Loss 8.2562 LearningRate 0.0530 Epoch: 5 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:23,961-Speed 5495.40 samples/sec Loss 8.2433 LearningRate 0.0529 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:25,822-Speed 5503.66 samples/sec Loss 8.3351 LearningRate 0.0529 Epoch: 5 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:27,686-Speed 5498.73 samples/sec Loss 8.3518 LearningRate 0.0529 Epoch: 5 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:29,546-Speed 5507.99 samples/sec Loss 8.1414 LearningRate 0.0529 Epoch: 5 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:31,372-Speed 5610.70 samples/sec Loss 8.3439 LearningRate 0.0529 Epoch: 5 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:33,220-Speed 5541.97 samples/sec Loss 8.4567 LearningRate 0.0529 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:35,064-Speed 5558.34 samples/sec Loss 8.4057 LearningRate 0.0529 Epoch: 5 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:36,921-Speed 5514.50 samples/sec Loss 8.3242 LearningRate 0.0528 Epoch: 5 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:38,783-Speed 5504.21 samples/sec Loss 8.3048 LearningRate 0.0528 Epoch: 5 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:40,646-Speed 5498.95 samples/sec Loss 8.5332 LearningRate 0.0528 Epoch: 5 Global Step: 27640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:42,520-Speed 5465.62 samples/sec Loss 8.2178 LearningRate 0.0528 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:44,355-Speed 5582.03 samples/sec Loss 8.3095 LearningRate 0.0528 Epoch: 5 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:46,229-Speed 5469.33 samples/sec Loss 8.1855 LearningRate 0.0528 Epoch: 5 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:48,085-Speed 5518.84 samples/sec Loss 8.4089 LearningRate 0.0528 Epoch: 5 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:49,964-Speed 5452.63 samples/sec Loss 8.3852 LearningRate 0.0527 Epoch: 5 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:51,807-Speed 5560.60 samples/sec Loss 8.5719 LearningRate 0.0527 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:53,649-Speed 5561.41 samples/sec Loss 8.3621 LearningRate 0.0527 Epoch: 5 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:02:55,502-Speed 5528.71 samples/sec Loss 8.3582 LearningRate 0.0527 Epoch: 5 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:57,340-Speed 5576.97 samples/sec Loss 8.4390 LearningRate 0.0527 Epoch: 5 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:02:59,219-Speed 5451.72 samples/sec Loss 8.4111 LearningRate 0.0527 Epoch: 5 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:01,055-Speed 5578.94 samples/sec Loss 8.3365 LearningRate 0.0527 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:02,924-Speed 5483.47 samples/sec Loss 8.3462 LearningRate 0.0526 Epoch: 5 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:04,776-Speed 5531.38 samples/sec Loss 8.5338 LearningRate 0.0526 Epoch: 5 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:06,630-Speed 5526.03 samples/sec Loss 8.3994 LearningRate 0.0526 Epoch: 5 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:08,480-Speed 5537.75 samples/sec Loss 8.2585 LearningRate 0.0526 Epoch: 5 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:10,350-Speed 5480.71 samples/sec Loss 8.2488 LearningRate 0.0526 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:12,203-Speed 5527.56 samples/sec Loss 8.3506 LearningRate 0.0526 Epoch: 5 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:14,059-Speed 5521.49 samples/sec Loss 8.3889 LearningRate 0.0526 Epoch: 5 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:15,927-Speed 5484.40 samples/sec Loss 8.6433 LearningRate 0.0525 Epoch: 5 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:17,782-Speed 5523.68 samples/sec Loss 8.4080 LearningRate 0.0525 Epoch: 5 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:19,649-Speed 5486.51 samples/sec Loss 8.4544 LearningRate 0.0525 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:21,496-Speed 5546.43 samples/sec Loss 8.4559 LearningRate 0.0525 Epoch: 5 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:23,369-Speed 5471.92 samples/sec Loss 8.4471 LearningRate 0.0525 Epoch: 5 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:25,233-Speed 5494.24 samples/sec Loss 8.2938 LearningRate 0.0525 Epoch: 5 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:27,136-Speed 5383.97 samples/sec Loss 8.3076 LearningRate 0.0525 Epoch: 5 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:03:28,982-Speed 5552.28 samples/sec Loss 8.4537 LearningRate 0.0524 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:30,849-Speed 5485.46 samples/sec Loss 8.2950 LearningRate 0.0524 Epoch: 5 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:32,688-Speed 5572.15 samples/sec Loss 8.3352 LearningRate 0.0524 Epoch: 5 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:34,538-Speed 5539.44 samples/sec Loss 8.4057 LearningRate 0.0524 Epoch: 5 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:36,378-Speed 5566.92 samples/sec Loss 8.2723 LearningRate 0.0524 Epoch: 5 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:38,227-Speed 5541.90 samples/sec Loss 8.2334 LearningRate 0.0524 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:40,092-Speed 5493.67 samples/sec Loss 8.6583 LearningRate 0.0524 Epoch: 5 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:41,932-Speed 5568.35 samples/sec Loss 8.5122 LearningRate 0.0523 Epoch: 5 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:43,801-Speed 5481.91 samples/sec Loss 8.3704 LearningRate 0.0523 Epoch: 5 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:45,668-Speed 5486.49 samples/sec Loss 8.5193 LearningRate 0.0523 Epoch: 5 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:03:47,537-Speed 5482.85 samples/sec Loss 8.4258 LearningRate 0.0523 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:04:14,926-[lfw][28000]XNorm: 21.572095 Training: 2022-04-11 12:04:14,927-[lfw][28000]Accuracy-Flip: 0.99683+-0.00229 Training: 2022-04-11 12:04:14,927-[lfw][28000]Accuracy-Highest: 0.99683 Training: 2022-04-11 12:04:46,190-[cfp_fp][28000]XNorm: 18.669144 Training: 2022-04-11 12:04:46,191-[cfp_fp][28000]Accuracy-Flip: 0.94900+-0.00958 Training: 2022-04-11 12:04:46,191-[cfp_fp][28000]Accuracy-Highest: 0.95486 Training: 2022-04-11 12:05:13,138-[agedb_30][28000]XNorm: 21.295386 Training: 2022-04-11 12:05:13,139-[agedb_30][28000]Accuracy-Flip: 0.97200+-0.00674 Training: 2022-04-11 12:05:13,140-[agedb_30][28000]Accuracy-Highest: 0.97200 Training: 2022-04-11 12:05:15,007-Speed 117.07 samples/sec Loss 8.2531 LearningRate 0.0523 Epoch: 5 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:16,850-Speed 5560.11 samples/sec Loss 8.4365 LearningRate 0.0523 Epoch: 5 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:18,701-Speed 5534.83 samples/sec Loss 8.4490 LearningRate 0.0523 Epoch: 5 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:20,535-Speed 5586.17 samples/sec Loss 8.4145 LearningRate 0.0522 Epoch: 5 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:22,377-Speed 5560.99 samples/sec Loss 8.2561 LearningRate 0.0522 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:24,259-Speed 5445.28 samples/sec Loss 8.1994 LearningRate 0.0522 Epoch: 5 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:26,129-Speed 5476.72 samples/sec Loss 8.2507 LearningRate 0.0522 Epoch: 5 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:27,977-Speed 5544.85 samples/sec Loss 8.2773 LearningRate 0.0522 Epoch: 5 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:29,806-Speed 5601.81 samples/sec Loss 8.4586 LearningRate 0.0522 Epoch: 5 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:31,654-Speed 5544.96 samples/sec Loss 8.3054 LearningRate 0.0522 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:33,507-Speed 5528.33 samples/sec Loss 8.2758 LearningRate 0.0521 Epoch: 5 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:35,366-Speed 5511.47 samples/sec Loss 8.3626 LearningRate 0.0521 Epoch: 5 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:37,198-Speed 5591.84 samples/sec Loss 8.4187 LearningRate 0.0521 Epoch: 5 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:05:39,056-Speed 5515.66 samples/sec Loss 8.3760 LearningRate 0.0521 Epoch: 5 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:40,931-Speed 5461.84 samples/sec Loss 8.3051 LearningRate 0.0521 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:42,800-Speed 5482.18 samples/sec Loss 8.2094 LearningRate 0.0521 Epoch: 5 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:44,637-Speed 5577.45 samples/sec Loss 8.2356 LearningRate 0.0521 Epoch: 5 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:46,467-Speed 5598.89 samples/sec Loss 8.3114 LearningRate 0.0520 Epoch: 5 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:48,328-Speed 5503.26 samples/sec Loss 8.3538 LearningRate 0.0520 Epoch: 5 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:50,203-Speed 5466.53 samples/sec Loss 8.4719 LearningRate 0.0520 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:52,057-Speed 5525.48 samples/sec Loss 8.3584 LearningRate 0.0520 Epoch: 5 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:53,993-Speed 5292.20 samples/sec Loss 8.2761 LearningRate 0.0520 Epoch: 5 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:55,834-Speed 5564.30 samples/sec Loss 8.3776 LearningRate 0.0520 Epoch: 5 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:05:57,684-Speed 5537.11 samples/sec Loss 8.3553 LearningRate 0.0520 Epoch: 5 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 12:05:59,528-Speed 5557.13 samples/sec Loss 8.2983 LearningRate 0.0519 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:01,400-Speed 5473.85 samples/sec Loss 8.4711 LearningRate 0.0519 Epoch: 5 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:03,275-Speed 5464.14 samples/sec Loss 8.2969 LearningRate 0.0519 Epoch: 5 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:05,112-Speed 5577.56 samples/sec Loss 8.2498 LearningRate 0.0519 Epoch: 5 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:06,969-Speed 5516.55 samples/sec Loss 8.2706 LearningRate 0.0519 Epoch: 5 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:08,819-Speed 5539.72 samples/sec Loss 8.1783 LearningRate 0.0519 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:10,659-Speed 5567.08 samples/sec Loss 8.2993 LearningRate 0.0519 Epoch: 5 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:12,509-Speed 5538.09 samples/sec Loss 8.2228 LearningRate 0.0518 Epoch: 5 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:06:14,350-Speed 5565.33 samples/sec Loss 8.3775 LearningRate 0.0518 Epoch: 5 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:16,185-Speed 5583.16 samples/sec Loss 8.4580 LearningRate 0.0518 Epoch: 5 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:18,028-Speed 5556.96 samples/sec Loss 8.3969 LearningRate 0.0518 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:19,862-Speed 5587.94 samples/sec Loss 8.1387 LearningRate 0.0518 Epoch: 5 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:21,692-Speed 5599.16 samples/sec Loss 8.4050 LearningRate 0.0518 Epoch: 5 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:23,526-Speed 5586.45 samples/sec Loss 8.2246 LearningRate 0.0518 Epoch: 5 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:25,365-Speed 5570.61 samples/sec Loss 8.4814 LearningRate 0.0517 Epoch: 5 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:27,219-Speed 5525.24 samples/sec Loss 8.2688 LearningRate 0.0517 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:29,056-Speed 5577.82 samples/sec Loss 8.4238 LearningRate 0.0517 Epoch: 5 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:30,877-Speed 5626.91 samples/sec Loss 8.2021 LearningRate 0.0517 Epoch: 5 Global Step: 28420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:32,709-Speed 5588.68 samples/sec Loss 8.2140 LearningRate 0.0517 Epoch: 5 Global Step: 28430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:34,573-Speed 5497.67 samples/sec Loss 8.1906 LearningRate 0.0517 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:36,405-Speed 5594.38 samples/sec Loss 8.2916 LearningRate 0.0517 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:38,304-Speed 5395.28 samples/sec Loss 8.4215 LearningRate 0.0516 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:40,162-Speed 5514.09 samples/sec Loss 8.4582 LearningRate 0.0516 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:42,003-Speed 5564.11 samples/sec Loss 8.1978 LearningRate 0.0516 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:43,874-Speed 5476.23 samples/sec Loss 8.4248 LearningRate 0.0516 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:45,718-Speed 5557.03 samples/sec Loss 8.2315 LearningRate 0.0516 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:47,572-Speed 5526.29 samples/sec Loss 8.2005 LearningRate 0.0516 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 12:06:49,409-Speed 5578.44 samples/sec Loss 8.3637 LearningRate 0.0516 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:51,237-Speed 5601.71 samples/sec Loss 8.2882 LearningRate 0.0515 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:53,095-Speed 5516.27 samples/sec Loss 8.2709 LearningRate 0.0515 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:54,960-Speed 5493.75 samples/sec Loss 8.1551 LearningRate 0.0515 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:56,789-Speed 5601.11 samples/sec Loss 8.2968 LearningRate 0.0515 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:06:58,629-Speed 5569.14 samples/sec Loss 8.3752 LearningRate 0.0515 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:00,507-Speed 5455.49 samples/sec Loss 8.1867 LearningRate 0.0515 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:02,376-Speed 5481.38 samples/sec Loss 8.3019 LearningRate 0.0515 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:04,245-Speed 5480.49 samples/sec Loss 8.3514 LearningRate 0.0514 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:06,080-Speed 5582.58 samples/sec Loss 8.2931 LearningRate 0.0514 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:07,952-Speed 5473.64 samples/sec Loss 8.4357 LearningRate 0.0514 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:09,787-Speed 5586.01 samples/sec Loss 8.2628 LearningRate 0.0514 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:11,753-Speed 5211.72 samples/sec Loss 8.2155 LearningRate 0.0514 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:13,608-Speed 5521.99 samples/sec Loss 8.0900 LearningRate 0.0514 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:15,480-Speed 5472.78 samples/sec Loss 8.3342 LearningRate 0.0514 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:17,336-Speed 5518.50 samples/sec Loss 8.4144 LearningRate 0.0513 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:19,201-Speed 5498.82 samples/sec Loss 8.4299 LearningRate 0.0513 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:21,064-Speed 5497.82 samples/sec Loss 8.2923 LearningRate 0.0513 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:22,919-Speed 5525.32 samples/sec Loss 8.1640 LearningRate 0.0513 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:24,765-Speed 5549.95 samples/sec Loss 8.3031 LearningRate 0.0513 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:26,604-Speed 5570.48 samples/sec Loss 8.2356 LearningRate 0.0513 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:28,451-Speed 5548.19 samples/sec Loss 8.2920 LearningRate 0.0513 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:30,295-Speed 5556.22 samples/sec Loss 8.1844 LearningRate 0.0513 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:32,134-Speed 5572.16 samples/sec Loss 8.2185 LearningRate 0.0512 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:33,972-Speed 5574.37 samples/sec Loss 8.3014 LearningRate 0.0512 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:35,825-Speed 5529.24 samples/sec Loss 8.1919 LearningRate 0.0512 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:37,674-Speed 5541.04 samples/sec Loss 8.3381 LearningRate 0.0512 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:39,514-Speed 5568.07 samples/sec Loss 8.1495 LearningRate 0.0512 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:41,347-Speed 5590.40 samples/sec Loss 8.2527 LearningRate 0.0512 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:43,186-Speed 5567.77 samples/sec Loss 8.2253 LearningRate 0.0512 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:45,033-Speed 5550.15 samples/sec Loss 8.2291 LearningRate 0.0511 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:46,890-Speed 5517.48 samples/sec Loss 8.2480 LearningRate 0.0511 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:48,729-Speed 5571.85 samples/sec Loss 8.3740 LearningRate 0.0511 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:50,580-Speed 5534.37 samples/sec Loss 8.1921 LearningRate 0.0511 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:52,471-Speed 5417.19 samples/sec Loss 8.4349 LearningRate 0.0511 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:07:54,310-Speed 5571.76 samples/sec Loss 8.1867 LearningRate 0.0511 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:56,179-Speed 5481.11 samples/sec Loss 8.3780 LearningRate 0.0511 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:58,021-Speed 5562.45 samples/sec Loss 8.2977 LearningRate 0.0510 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:07:59,910-Speed 5423.94 samples/sec Loss 8.2476 LearningRate 0.0510 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:08:01,740-Speed 5599.65 samples/sec Loss 8.3718 LearningRate 0.0510 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:03,609-Speed 5481.99 samples/sec Loss 8.2592 LearningRate 0.0510 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:05,446-Speed 5576.36 samples/sec Loss 8.3387 LearningRate 0.0510 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:07,299-Speed 5530.54 samples/sec Loss 8.1448 LearningRate 0.0510 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:09,133-Speed 5584.89 samples/sec Loss 8.3823 LearningRate 0.0510 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:10,991-Speed 5513.50 samples/sec Loss 8.2973 LearningRate 0.0509 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:12,862-Speed 5476.32 samples/sec Loss 8.4185 LearningRate 0.0509 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:14,757-Speed 5408.48 samples/sec Loss 8.1122 LearningRate 0.0509 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:16,615-Speed 5514.55 samples/sec Loss 8.2383 LearningRate 0.0509 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:18,471-Speed 5527.34 samples/sec Loss 8.1854 LearningRate 0.0509 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:20,316-Speed 5555.61 samples/sec Loss 8.2587 LearningRate 0.0509 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:08:22,178-Speed 5500.75 samples/sec Loss 8.2797 LearningRate 0.0509 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:24,049-Speed 5475.98 samples/sec Loss 8.0976 LearningRate 0.0508 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:25,954-Speed 5377.59 samples/sec Loss 8.2116 LearningRate 0.0508 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:27,842-Speed 5458.27 samples/sec Loss 8.2585 LearningRate 0.0508 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:29,690-Speed 5546.84 samples/sec Loss 8.3012 LearningRate 0.0508 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:31,550-Speed 5507.05 samples/sec Loss 8.2457 LearningRate 0.0508 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:33,387-Speed 5578.86 samples/sec Loss 8.1583 LearningRate 0.0508 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:35,267-Speed 5449.59 samples/sec Loss 8.3500 LearningRate 0.0508 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:37,152-Speed 5434.06 samples/sec Loss 8.2778 LearningRate 0.0507 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:38,997-Speed 5555.66 samples/sec Loss 8.1743 LearningRate 0.0507 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:40,870-Speed 5468.95 samples/sec Loss 8.1253 LearningRate 0.0507 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:08:42,704-Speed 5587.70 samples/sec Loss 8.2585 LearningRate 0.0507 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:08:44,552-Speed 5545.44 samples/sec Loss 8.2064 LearningRate 0.0507 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:46,390-Speed 5572.76 samples/sec Loss 8.2100 LearningRate 0.0507 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:48,222-Speed 5593.58 samples/sec Loss 8.1507 LearningRate 0.0507 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:50,078-Speed 5518.85 samples/sec Loss 8.1677 LearningRate 0.0506 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:51,922-Speed 5557.89 samples/sec Loss 8.3150 LearningRate 0.0506 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:53,778-Speed 5520.05 samples/sec Loss 8.1007 LearningRate 0.0506 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:55,630-Speed 5531.73 samples/sec Loss 8.0966 LearningRate 0.0506 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:57,496-Speed 5490.91 samples/sec Loss 8.2018 LearningRate 0.0506 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:08:59,342-Speed 5551.45 samples/sec Loss 8.3142 LearningRate 0.0506 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:01,196-Speed 5527.37 samples/sec Loss 8.1559 LearningRate 0.0506 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:03,033-Speed 5576.33 samples/sec Loss 8.3141 LearningRate 0.0505 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:04,912-Speed 5454.32 samples/sec Loss 8.0888 LearningRate 0.0505 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:06,760-Speed 5544.27 samples/sec Loss 8.2265 LearningRate 0.0505 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:08,606-Speed 5551.05 samples/sec Loss 8.2333 LearningRate 0.0505 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:10,489-Speed 5437.94 samples/sec Loss 8.2119 LearningRate 0.0505 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:12,333-Speed 5555.91 samples/sec Loss 8.1201 LearningRate 0.0505 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:14,208-Speed 5466.93 samples/sec Loss 8.1023 LearningRate 0.0505 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:16,068-Speed 5508.44 samples/sec Loss 8.3206 LearningRate 0.0504 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:17,910-Speed 5560.74 samples/sec Loss 8.1367 LearningRate 0.0504 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:19,782-Speed 5474.35 samples/sec Loss 8.1176 LearningRate 0.0504 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:21,608-Speed 5609.88 samples/sec Loss 8.2028 LearningRate 0.0504 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:23,456-Speed 5545.57 samples/sec Loss 8.1617 LearningRate 0.0504 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:25,322-Speed 5490.43 samples/sec Loss 8.1236 LearningRate 0.0504 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:27,195-Speed 5469.77 samples/sec Loss 8.0855 LearningRate 0.0504 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:29,052-Speed 5514.65 samples/sec Loss 8.3254 LearningRate 0.0503 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:30,906-Speed 5529.76 samples/sec Loss 8.2028 LearningRate 0.0503 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:32,766-Speed 5507.01 samples/sec Loss 8.1420 LearningRate 0.0503 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:34,611-Speed 5554.00 samples/sec Loss 8.3020 LearningRate 0.0503 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:36,453-Speed 5560.08 samples/sec Loss 8.3329 LearningRate 0.0503 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:38,307-Speed 5527.79 samples/sec Loss 8.3249 LearningRate 0.0503 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:40,142-Speed 5582.72 samples/sec Loss 8.1347 LearningRate 0.0503 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:41,985-Speed 5559.43 samples/sec Loss 8.1604 LearningRate 0.0503 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:43,841-Speed 5519.24 samples/sec Loss 8.1399 LearningRate 0.0502 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:45,694-Speed 5530.06 samples/sec Loss 7.9869 LearningRate 0.0502 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:09:47,568-Speed 5467.26 samples/sec Loss 8.3419 LearningRate 0.0502 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:49,409-Speed 5565.76 samples/sec Loss 8.1693 LearningRate 0.0502 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:51,288-Speed 5450.09 samples/sec Loss 8.4249 LearningRate 0.0502 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:53,142-Speed 5527.92 samples/sec Loss 8.1664 LearningRate 0.0502 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:55,010-Speed 5485.08 samples/sec Loss 8.1525 LearningRate 0.0502 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:56,872-Speed 5502.07 samples/sec Loss 8.3148 LearningRate 0.0501 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:09:58,705-Speed 5589.05 samples/sec Loss 8.2683 LearningRate 0.0501 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:00,562-Speed 5515.77 samples/sec Loss 8.2660 LearningRate 0.0501 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:02,404-Speed 5563.43 samples/sec Loss 8.2039 LearningRate 0.0501 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:04,250-Speed 5547.58 samples/sec Loss 8.0126 LearningRate 0.0501 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:06,135-Speed 5436.45 samples/sec Loss 8.0582 LearningRate 0.0501 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:07,989-Speed 5526.46 samples/sec Loss 8.0846 LearningRate 0.0501 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:09,830-Speed 5562.64 samples/sec Loss 8.1622 LearningRate 0.0500 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:11,682-Speed 5533.71 samples/sec Loss 8.2724 LearningRate 0.0500 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:13,539-Speed 5515.43 samples/sec Loss 8.2620 LearningRate 0.0500 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:15,410-Speed 5482.86 samples/sec Loss 8.0825 LearningRate 0.0500 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:17,252-Speed 5560.78 samples/sec Loss 8.2017 LearningRate 0.0500 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:19,123-Speed 5476.31 samples/sec Loss 8.3208 LearningRate 0.0500 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:20,953-Speed 5597.76 samples/sec Loss 8.3224 LearningRate 0.0500 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:22,792-Speed 5572.25 samples/sec Loss 8.2101 LearningRate 0.0499 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:24,642-Speed 5538.45 samples/sec Loss 8.3978 LearningRate 0.0499 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:26,503-Speed 5502.23 samples/sec Loss 8.0928 LearningRate 0.0499 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:28,340-Speed 5577.95 samples/sec Loss 8.0851 LearningRate 0.0499 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:30,227-Speed 5430.99 samples/sec Loss 8.1053 LearningRate 0.0499 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:10:32,046-Speed 5630.65 samples/sec Loss 8.2263 LearningRate 0.0499 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:33,897-Speed 5534.99 samples/sec Loss 8.1652 LearningRate 0.0499 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:35,754-Speed 5517.13 samples/sec Loss 8.1492 LearningRate 0.0498 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:37,630-Speed 5461.68 samples/sec Loss 8.2106 LearningRate 0.0498 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:39,481-Speed 5566.93 samples/sec Loss 8.0671 LearningRate 0.0498 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:41,352-Speed 5476.95 samples/sec Loss 8.2335 LearningRate 0.0498 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:43,189-Speed 5577.59 samples/sec Loss 8.3034 LearningRate 0.0498 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:45,060-Speed 5472.93 samples/sec Loss 8.1283 LearningRate 0.0498 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:46,902-Speed 5565.63 samples/sec Loss 8.3201 LearningRate 0.0498 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:48,779-Speed 5457.94 samples/sec Loss 8.2143 LearningRate 0.0497 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:10:50,645-Speed 5489.89 samples/sec Loss 8.0630 LearningRate 0.0497 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:52,485-Speed 5566.97 samples/sec Loss 8.1768 LearningRate 0.0497 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:54,340-Speed 5524.49 samples/sec Loss 7.9531 LearningRate 0.0497 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:56,208-Speed 5482.77 samples/sec Loss 8.2399 LearningRate 0.0497 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:58,055-Speed 5548.03 samples/sec Loss 8.1340 LearningRate 0.0497 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:10:59,915-Speed 5507.78 samples/sec Loss 8.1942 LearningRate 0.0497 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:01,763-Speed 5545.48 samples/sec Loss 8.2016 LearningRate 0.0496 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:03,609-Speed 5549.68 samples/sec Loss 8.0189 LearningRate 0.0496 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:05,467-Speed 5510.47 samples/sec Loss 8.1015 LearningRate 0.0496 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:07,322-Speed 5525.83 samples/sec Loss 8.0681 LearningRate 0.0496 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:09,187-Speed 5493.57 samples/sec Loss 8.1087 LearningRate 0.0496 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:11:11,024-Speed 5577.63 samples/sec Loss 8.1826 LearningRate 0.0496 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:12,862-Speed 5570.65 samples/sec Loss 8.3014 LearningRate 0.0496 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:14,714-Speed 5534.52 samples/sec Loss 8.0661 LearningRate 0.0496 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:16,558-Speed 5558.70 samples/sec Loss 8.1836 LearningRate 0.0495 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:18,430-Speed 5470.95 samples/sec Loss 8.2496 LearningRate 0.0495 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:20,267-Speed 5578.38 samples/sec Loss 8.1561 LearningRate 0.0495 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:22,115-Speed 5542.91 samples/sec Loss 7.9694 LearningRate 0.0495 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:23,966-Speed 5535.17 samples/sec Loss 8.1143 LearningRate 0.0495 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:11:51,409-[lfw][30000]XNorm: 22.979852 Training: 2022-04-11 12:11:51,409-[lfw][30000]Accuracy-Flip: 0.99650+-0.00302 Training: 2022-04-11 12:11:51,410-[lfw][30000]Accuracy-Highest: 0.99683 Training: 2022-04-11 12:12:22,840-[cfp_fp][30000]XNorm: 19.932093 Training: 2022-04-11 12:12:22,841-[cfp_fp][30000]Accuracy-Flip: 0.95786+-0.00846 Training: 2022-04-11 12:12:22,842-[cfp_fp][30000]Accuracy-Highest: 0.95786 Training: 2022-04-11 12:12:50,050-[agedb_30][30000]XNorm: 22.505847 Training: 2022-04-11 12:12:50,051-[agedb_30][30000]Accuracy-Flip: 0.97417+-0.00739 Training: 2022-04-11 12:12:50,052-[agedb_30][30000]Accuracy-Highest: 0.97417 Training: 2022-04-11 12:12:51,907-Speed 116.44 samples/sec Loss 8.1568 LearningRate 0.0495 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:12:53,739-Speed 5589.31 samples/sec Loss 8.0571 LearningRate 0.0495 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:12:55,582-Speed 5558.58 samples/sec Loss 8.2008 LearningRate 0.0494 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:12:57,421-Speed 5569.70 samples/sec Loss 8.1877 LearningRate 0.0494 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:12:59,264-Speed 5563.79 samples/sec Loss 8.2891 LearningRate 0.0494 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:13:01,142-Speed 5454.94 samples/sec Loss 8.4213 LearningRate 0.0494 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:02,992-Speed 5538.30 samples/sec Loss 8.1191 LearningRate 0.0494 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:04,854-Speed 5504.70 samples/sec Loss 8.3017 LearningRate 0.0494 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:06,707-Speed 5526.65 samples/sec Loss 8.1812 LearningRate 0.0494 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:08,537-Speed 5599.16 samples/sec Loss 8.2713 LearningRate 0.0493 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:10,405-Speed 5483.56 samples/sec Loss 8.1517 LearningRate 0.0493 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:12,263-Speed 5515.16 samples/sec Loss 8.3442 LearningRate 0.0493 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:14,117-Speed 5526.06 samples/sec Loss 8.0572 LearningRate 0.0493 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:16,002-Speed 5435.29 samples/sec Loss 8.2055 LearningRate 0.0493 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:17,861-Speed 5510.50 samples/sec Loss 8.1478 LearningRate 0.0493 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 12:13:19,723-Speed 5503.52 samples/sec Loss 8.0420 LearningRate 0.0493 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 12:13:21,568-Speed 5552.63 samples/sec Loss 8.1616 LearningRate 0.0492 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:23,434-Speed 5490.95 samples/sec Loss 7.9543 LearningRate 0.0492 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:25,316-Speed 5444.94 samples/sec Loss 8.1337 LearningRate 0.0492 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:27,185-Speed 5481.27 samples/sec Loss 8.1655 LearningRate 0.0492 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:29,027-Speed 5561.08 samples/sec Loss 8.0569 LearningRate 0.0492 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:30,876-Speed 5541.12 samples/sec Loss 8.1606 LearningRate 0.0492 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:32,714-Speed 5572.64 samples/sec Loss 8.0982 LearningRate 0.0492 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:34,589-Speed 5465.76 samples/sec Loss 8.1161 LearningRate 0.0491 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:36,425-Speed 5579.12 samples/sec Loss 8.1440 LearningRate 0.0491 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:38,288-Speed 5500.69 samples/sec Loss 8.2080 LearningRate 0.0491 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:40,123-Speed 5582.54 samples/sec Loss 8.0763 LearningRate 0.0491 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:41,967-Speed 5555.70 samples/sec Loss 8.1450 LearningRate 0.0491 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:43,798-Speed 5594.75 samples/sec Loss 8.0893 LearningRate 0.0491 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:45,645-Speed 5546.29 samples/sec Loss 8.0887 LearningRate 0.0491 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:13:47,482-Speed 5579.18 samples/sec Loss 8.2007 LearningRate 0.0491 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:49,317-Speed 5579.92 samples/sec Loss 8.0463 LearningRate 0.0490 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:51,195-Speed 5457.26 samples/sec Loss 8.3381 LearningRate 0.0490 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:13:53,095-Speed 5392.61 samples/sec Loss 8.3902 LearningRate 0.0490 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:04,635-Speed 887.46 samples/sec Loss 7.9798 LearningRate 0.0490 Epoch: 6 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:06,534-Speed 5395.39 samples/sec Loss 7.3059 LearningRate 0.0490 Epoch: 6 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:08,409-Speed 5465.75 samples/sec Loss 7.2502 LearningRate 0.0490 Epoch: 6 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:10,277-Speed 5485.13 samples/sec Loss 7.2784 LearningRate 0.0490 Epoch: 6 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:12,149-Speed 5471.43 samples/sec Loss 7.1624 LearningRate 0.0489 Epoch: 6 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:14,038-Speed 5424.16 samples/sec Loss 7.2509 LearningRate 0.0489 Epoch: 6 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:15,941-Speed 5385.45 samples/sec Loss 7.3422 LearningRate 0.0489 Epoch: 6 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:17,804-Speed 5497.04 samples/sec Loss 7.2362 LearningRate 0.0489 Epoch: 6 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:19,668-Speed 5498.08 samples/sec Loss 7.2689 LearningRate 0.0489 Epoch: 6 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:21,535-Speed 5488.12 samples/sec Loss 7.4526 LearningRate 0.0489 Epoch: 6 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:23,390-Speed 5522.39 samples/sec Loss 7.3569 LearningRate 0.0489 Epoch: 6 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:25,232-Speed 5561.91 samples/sec Loss 7.4468 LearningRate 0.0488 Epoch: 6 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:27,086-Speed 5525.02 samples/sec Loss 7.4102 LearningRate 0.0488 Epoch: 6 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:29,015-Speed 5311.25 samples/sec Loss 7.3867 LearningRate 0.0488 Epoch: 6 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:30,882-Speed 5486.36 samples/sec Loss 7.4307 LearningRate 0.0488 Epoch: 6 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:32,755-Speed 5471.97 samples/sec Loss 7.5243 LearningRate 0.0488 Epoch: 6 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:34,577-Speed 5623.13 samples/sec Loss 7.5794 LearningRate 0.0488 Epoch: 6 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:36,435-Speed 5512.22 samples/sec Loss 7.4054 LearningRate 0.0488 Epoch: 6 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:38,289-Speed 5526.75 samples/sec Loss 7.5556 LearningRate 0.0487 Epoch: 6 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:40,125-Speed 5581.61 samples/sec Loss 7.5434 LearningRate 0.0487 Epoch: 6 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:41,964-Speed 5571.42 samples/sec Loss 7.3630 LearningRate 0.0487 Epoch: 6 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:14:43,808-Speed 5553.76 samples/sec Loss 7.5939 LearningRate 0.0487 Epoch: 6 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:45,673-Speed 5497.08 samples/sec Loss 7.5506 LearningRate 0.0487 Epoch: 6 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:47,513-Speed 5568.04 samples/sec Loss 7.4743 LearningRate 0.0487 Epoch: 6 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:49,408-Speed 5407.25 samples/sec Loss 7.5177 LearningRate 0.0487 Epoch: 6 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:51,248-Speed 5567.09 samples/sec Loss 7.4754 LearningRate 0.0487 Epoch: 6 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:53,127-Speed 5450.68 samples/sec Loss 7.4088 LearningRate 0.0486 Epoch: 6 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:54,967-Speed 5569.78 samples/sec Loss 7.4078 LearningRate 0.0486 Epoch: 6 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:56,819-Speed 5533.46 samples/sec Loss 7.5664 LearningRate 0.0486 Epoch: 6 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:14:58,704-Speed 5434.17 samples/sec Loss 7.6147 LearningRate 0.0486 Epoch: 6 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:00,555-Speed 5534.47 samples/sec Loss 7.5083 LearningRate 0.0486 Epoch: 6 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:02,397-Speed 5566.03 samples/sec Loss 7.5391 LearningRate 0.0486 Epoch: 6 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:15:04,293-Speed 5402.21 samples/sec Loss 7.5387 LearningRate 0.0486 Epoch: 6 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:15:06,137-Speed 5557.20 samples/sec Loss 7.3934 LearningRate 0.0485 Epoch: 6 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:08,017-Speed 5450.97 samples/sec Loss 7.6460 LearningRate 0.0485 Epoch: 6 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:09,856-Speed 5571.75 samples/sec Loss 7.7221 LearningRate 0.0485 Epoch: 6 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:11,712-Speed 5519.79 samples/sec Loss 7.4344 LearningRate 0.0485 Epoch: 6 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:13,579-Speed 5488.52 samples/sec Loss 7.6509 LearningRate 0.0485 Epoch: 6 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:15,440-Speed 5505.30 samples/sec Loss 7.4932 LearningRate 0.0485 Epoch: 6 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:17,292-Speed 5531.35 samples/sec Loss 7.6282 LearningRate 0.0485 Epoch: 6 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:19,147-Speed 5523.68 samples/sec Loss 7.4510 LearningRate 0.0484 Epoch: 6 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:21,023-Speed 5461.64 samples/sec Loss 7.5707 LearningRate 0.0484 Epoch: 6 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:22,884-Speed 5507.12 samples/sec Loss 7.7058 LearningRate 0.0484 Epoch: 6 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:24,762-Speed 5455.50 samples/sec Loss 7.7447 LearningRate 0.0484 Epoch: 6 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:15:26,640-Speed 5455.86 samples/sec Loss 7.6119 LearningRate 0.0484 Epoch: 6 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:15:28,461-Speed 5625.90 samples/sec Loss 7.6775 LearningRate 0.0484 Epoch: 6 Global Step: 30800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:30,359-Speed 5398.42 samples/sec Loss 7.7338 LearningRate 0.0484 Epoch: 6 Global Step: 30810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:32,207-Speed 5542.31 samples/sec Loss 7.6595 LearningRate 0.0483 Epoch: 6 Global Step: 30820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:34,070-Speed 5501.78 samples/sec Loss 7.6825 LearningRate 0.0483 Epoch: 6 Global Step: 30830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:35,902-Speed 5591.39 samples/sec Loss 7.7567 LearningRate 0.0483 Epoch: 6 Global Step: 30840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:37,775-Speed 5469.68 samples/sec Loss 7.7843 LearningRate 0.0483 Epoch: 6 Global Step: 30850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:39,651-Speed 5462.74 samples/sec Loss 7.6855 LearningRate 0.0483 Epoch: 6 Global Step: 30860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:41,507-Speed 5518.43 samples/sec Loss 7.6283 LearningRate 0.0483 Epoch: 6 Global Step: 30870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:43,390-Speed 5442.15 samples/sec Loss 7.8191 LearningRate 0.0483 Epoch: 6 Global Step: 30880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:45,224-Speed 5585.74 samples/sec Loss 7.8524 LearningRate 0.0483 Epoch: 6 Global Step: 30890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:15:47,104-Speed 5448.25 samples/sec Loss 7.5946 LearningRate 0.0482 Epoch: 6 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:48,991-Speed 5430.79 samples/sec Loss 7.6516 LearningRate 0.0482 Epoch: 6 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:50,837-Speed 5549.49 samples/sec Loss 7.8220 LearningRate 0.0482 Epoch: 6 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:52,675-Speed 5574.53 samples/sec Loss 7.5784 LearningRate 0.0482 Epoch: 6 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:54,518-Speed 5557.98 samples/sec Loss 7.7981 LearningRate 0.0482 Epoch: 6 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:56,356-Speed 5571.77 samples/sec Loss 7.7871 LearningRate 0.0482 Epoch: 6 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:15:58,208-Speed 5531.96 samples/sec Loss 7.7279 LearningRate 0.0482 Epoch: 6 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:00,064-Speed 5523.75 samples/sec Loss 7.6603 LearningRate 0.0481 Epoch: 6 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:01,901-Speed 5574.20 samples/sec Loss 7.7035 LearningRate 0.0481 Epoch: 6 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:03,746-Speed 5553.63 samples/sec Loss 7.6348 LearningRate 0.0481 Epoch: 6 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:05,581-Speed 5584.64 samples/sec Loss 7.7047 LearningRate 0.0481 Epoch: 6 Global Step: 31000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:07,410-Speed 5599.60 samples/sec Loss 7.7531 LearningRate 0.0481 Epoch: 6 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:09,266-Speed 5521.13 samples/sec Loss 7.8909 LearningRate 0.0481 Epoch: 6 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:11,162-Speed 5404.07 samples/sec Loss 7.7124 LearningRate 0.0481 Epoch: 6 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:13,040-Speed 5452.64 samples/sec Loss 7.7422 LearningRate 0.0480 Epoch: 6 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:14,894-Speed 5526.08 samples/sec Loss 7.6490 LearningRate 0.0480 Epoch: 6 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:16,767-Speed 5472.46 samples/sec Loss 7.7469 LearningRate 0.0480 Epoch: 6 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:18,614-Speed 5546.01 samples/sec Loss 7.7832 LearningRate 0.0480 Epoch: 6 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:20,458-Speed 5557.59 samples/sec Loss 7.7223 LearningRate 0.0480 Epoch: 6 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:22,333-Speed 5461.07 samples/sec Loss 7.7036 LearningRate 0.0480 Epoch: 6 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:24,181-Speed 5545.70 samples/sec Loss 7.7510 LearningRate 0.0480 Epoch: 6 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:26,065-Speed 5439.15 samples/sec Loss 7.7271 LearningRate 0.0480 Epoch: 6 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:27,902-Speed 5576.91 samples/sec Loss 7.7418 LearningRate 0.0479 Epoch: 6 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:29,739-Speed 5577.67 samples/sec Loss 7.8867 LearningRate 0.0479 Epoch: 6 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:31,590-Speed 5535.45 samples/sec Loss 7.7370 LearningRate 0.0479 Epoch: 6 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:33,425-Speed 5581.28 samples/sec Loss 7.7120 LearningRate 0.0479 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:35,290-Speed 5496.51 samples/sec Loss 7.8885 LearningRate 0.0479 Epoch: 6 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:37,160-Speed 5484.38 samples/sec Loss 7.9500 LearningRate 0.0479 Epoch: 6 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:39,053-Speed 5412.63 samples/sec Loss 7.8214 LearningRate 0.0479 Epoch: 6 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:40,927-Speed 5467.50 samples/sec Loss 7.8986 LearningRate 0.0478 Epoch: 6 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:42,788-Speed 5506.52 samples/sec Loss 7.8139 LearningRate 0.0478 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:44,624-Speed 5581.25 samples/sec Loss 7.7602 LearningRate 0.0478 Epoch: 6 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:46,496-Speed 5472.69 samples/sec Loss 7.8710 LearningRate 0.0478 Epoch: 6 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:48,361-Speed 5492.03 samples/sec Loss 7.7914 LearningRate 0.0478 Epoch: 6 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:50,222-Speed 5507.70 samples/sec Loss 7.8395 LearningRate 0.0478 Epoch: 6 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:52,072-Speed 5535.56 samples/sec Loss 7.8436 LearningRate 0.0478 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:16:53,928-Speed 5520.75 samples/sec Loss 7.9222 LearningRate 0.0477 Epoch: 6 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:55,786-Speed 5515.58 samples/sec Loss 7.8208 LearningRate 0.0477 Epoch: 6 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:57,621-Speed 5584.03 samples/sec Loss 7.6607 LearningRate 0.0477 Epoch: 6 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:16:59,454-Speed 5588.08 samples/sec Loss 7.7613 LearningRate 0.0477 Epoch: 6 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:01,338-Speed 5437.42 samples/sec Loss 7.8085 LearningRate 0.0477 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:03,188-Speed 5540.14 samples/sec Loss 7.7764 LearningRate 0.0477 Epoch: 6 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:05,057-Speed 5480.36 samples/sec Loss 7.7524 LearningRate 0.0477 Epoch: 6 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:06,896-Speed 5570.59 samples/sec Loss 7.9987 LearningRate 0.0477 Epoch: 6 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:08,734-Speed 5574.04 samples/sec Loss 7.8001 LearningRate 0.0476 Epoch: 6 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:10,581-Speed 5548.94 samples/sec Loss 7.9100 LearningRate 0.0476 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:12,453-Speed 5473.00 samples/sec Loss 7.8911 LearningRate 0.0476 Epoch: 6 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 12:17:14,308-Speed 5524.37 samples/sec Loss 7.7634 LearningRate 0.0476 Epoch: 6 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:16,181-Speed 5471.32 samples/sec Loss 7.8077 LearningRate 0.0476 Epoch: 6 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:18,030-Speed 5538.83 samples/sec Loss 7.8482 LearningRate 0.0476 Epoch: 6 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:19,876-Speed 5551.02 samples/sec Loss 7.7472 LearningRate 0.0476 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:21,736-Speed 5508.57 samples/sec Loss 7.6672 LearningRate 0.0475 Epoch: 6 Global Step: 31410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:23,572-Speed 5581.90 samples/sec Loss 7.9211 LearningRate 0.0475 Epoch: 6 Global Step: 31420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:25,409-Speed 5577.54 samples/sec Loss 7.7372 LearningRate 0.0475 Epoch: 6 Global Step: 31430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:27,271-Speed 5500.23 samples/sec Loss 7.7068 LearningRate 0.0475 Epoch: 6 Global Step: 31440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:29,157-Speed 5431.71 samples/sec Loss 7.7355 LearningRate 0.0475 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:31,000-Speed 5561.53 samples/sec Loss 7.8547 LearningRate 0.0475 Epoch: 6 Global Step: 31460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:32,877-Speed 5457.81 samples/sec Loss 7.8643 LearningRate 0.0475 Epoch: 6 Global Step: 31470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:34,713-Speed 5582.05 samples/sec Loss 7.7408 LearningRate 0.0474 Epoch: 6 Global Step: 31480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:36,555-Speed 5560.98 samples/sec Loss 7.8939 LearningRate 0.0474 Epoch: 6 Global Step: 31490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:17:38,406-Speed 5534.62 samples/sec Loss 7.5938 LearningRate 0.0474 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:40,273-Speed 5489.35 samples/sec Loss 8.0806 LearningRate 0.0474 Epoch: 6 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:42,155-Speed 5444.82 samples/sec Loss 7.8101 LearningRate 0.0474 Epoch: 6 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:44,002-Speed 5544.56 samples/sec Loss 7.8521 LearningRate 0.0474 Epoch: 6 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:45,865-Speed 5502.78 samples/sec Loss 7.7985 LearningRate 0.0474 Epoch: 6 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:47,712-Speed 5544.55 samples/sec Loss 7.8112 LearningRate 0.0474 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:49,578-Speed 5491.24 samples/sec Loss 7.8883 LearningRate 0.0473 Epoch: 6 Global Step: 31560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:51,474-Speed 5404.42 samples/sec Loss 7.9587 LearningRate 0.0473 Epoch: 6 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:53,328-Speed 5524.30 samples/sec Loss 7.9567 LearningRate 0.0473 Epoch: 6 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:55,175-Speed 5548.09 samples/sec Loss 7.8119 LearningRate 0.0473 Epoch: 6 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:17:57,011-Speed 5581.75 samples/sec Loss 7.6257 LearningRate 0.0473 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:17:58,891-Speed 5447.56 samples/sec Loss 7.7109 LearningRate 0.0473 Epoch: 6 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:00,748-Speed 5519.20 samples/sec Loss 7.7536 LearningRate 0.0473 Epoch: 6 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:02,617-Speed 5480.92 samples/sec Loss 7.7975 LearningRate 0.0472 Epoch: 6 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:04,484-Speed 5488.18 samples/sec Loss 7.8087 LearningRate 0.0472 Epoch: 6 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:06,352-Speed 5484.70 samples/sec Loss 7.7747 LearningRate 0.0472 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:08,201-Speed 5539.67 samples/sec Loss 7.7461 LearningRate 0.0472 Epoch: 6 Global Step: 31660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:10,068-Speed 5491.18 samples/sec Loss 7.8200 LearningRate 0.0472 Epoch: 6 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:11,953-Speed 5434.58 samples/sec Loss 7.7499 LearningRate 0.0472 Epoch: 6 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:13,843-Speed 5419.11 samples/sec Loss 7.7222 LearningRate 0.0472 Epoch: 6 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:15,727-Speed 5441.98 samples/sec Loss 7.6993 LearningRate 0.0471 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:17,589-Speed 5502.69 samples/sec Loss 7.6928 LearningRate 0.0471 Epoch: 6 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:19,483-Speed 5409.54 samples/sec Loss 7.8549 LearningRate 0.0471 Epoch: 6 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:21,331-Speed 5543.12 samples/sec Loss 7.7535 LearningRate 0.0471 Epoch: 6 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:23,182-Speed 5533.19 samples/sec Loss 7.8575 LearningRate 0.0471 Epoch: 6 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:25,061-Speed 5454.44 samples/sec Loss 7.7156 LearningRate 0.0471 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:26,911-Speed 5539.30 samples/sec Loss 7.9385 LearningRate 0.0471 Epoch: 6 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:28,764-Speed 5528.04 samples/sec Loss 7.8382 LearningRate 0.0471 Epoch: 6 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:30,622-Speed 5514.01 samples/sec Loss 7.8611 LearningRate 0.0470 Epoch: 6 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:32,481-Speed 5513.17 samples/sec Loss 7.8941 LearningRate 0.0470 Epoch: 6 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:34,329-Speed 5544.35 samples/sec Loss 7.7079 LearningRate 0.0470 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:36,201-Speed 5471.23 samples/sec Loss 7.9679 LearningRate 0.0470 Epoch: 6 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:38,051-Speed 5540.37 samples/sec Loss 7.8563 LearningRate 0.0470 Epoch: 6 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:39,920-Speed 5479.92 samples/sec Loss 7.7982 LearningRate 0.0470 Epoch: 6 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:41,775-Speed 5522.83 samples/sec Loss 7.8532 LearningRate 0.0470 Epoch: 6 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:43,648-Speed 5471.10 samples/sec Loss 7.7607 LearningRate 0.0469 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:45,510-Speed 5503.80 samples/sec Loss 7.8235 LearningRate 0.0469 Epoch: 6 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:47,364-Speed 5527.31 samples/sec Loss 7.9518 LearningRate 0.0469 Epoch: 6 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:49,229-Speed 5492.67 samples/sec Loss 7.9817 LearningRate 0.0469 Epoch: 6 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:51,112-Speed 5441.20 samples/sec Loss 7.7706 LearningRate 0.0469 Epoch: 6 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:52,975-Speed 5500.55 samples/sec Loss 7.7481 LearningRate 0.0469 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:54,843-Speed 5485.05 samples/sec Loss 7.8878 LearningRate 0.0469 Epoch: 6 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:18:56,691-Speed 5544.09 samples/sec Loss 8.0042 LearningRate 0.0468 Epoch: 6 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:18:58,550-Speed 5512.99 samples/sec Loss 7.8521 LearningRate 0.0468 Epoch: 6 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:00,432-Speed 5444.77 samples/sec Loss 7.8309 LearningRate 0.0468 Epoch: 6 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:02,284-Speed 5530.69 samples/sec Loss 7.9511 LearningRate 0.0468 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:04,122-Speed 5573.70 samples/sec Loss 7.6924 LearningRate 0.0468 Epoch: 6 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:05,966-Speed 5554.90 samples/sec Loss 7.8935 LearningRate 0.0468 Epoch: 6 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:07,845-Speed 5452.55 samples/sec Loss 7.8329 LearningRate 0.0468 Epoch: 6 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:09,698-Speed 5531.81 samples/sec Loss 7.8312 LearningRate 0.0468 Epoch: 6 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:11,584-Speed 5431.41 samples/sec Loss 7.7212 LearningRate 0.0467 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:19:38,844-[lfw][32000]XNorm: 23.115894 Training: 2022-04-11 12:19:38,845-[lfw][32000]Accuracy-Flip: 0.99733+-0.00213 Training: 2022-04-11 12:19:38,845-[lfw][32000]Accuracy-Highest: 0.99733 Training: 2022-04-11 12:20:10,055-[cfp_fp][32000]XNorm: 20.174163 Training: 2022-04-11 12:20:10,056-[cfp_fp][32000]Accuracy-Flip: 0.96000+-0.00748 Training: 2022-04-11 12:20:10,056-[cfp_fp][32000]Accuracy-Highest: 0.96000 Training: 2022-04-11 12:20:36,906-[agedb_30][32000]XNorm: 22.824152 Training: 2022-04-11 12:20:36,907-[agedb_30][32000]Accuracy-Flip: 0.97467+-0.00767 Training: 2022-04-11 12:20:36,907-[agedb_30][32000]Accuracy-Highest: 0.97467 Training: 2022-04-11 12:20:38,766-Speed 117.46 samples/sec Loss 7.8172 LearningRate 0.0467 Epoch: 6 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:40,590-Speed 5616.29 samples/sec Loss 7.8672 LearningRate 0.0467 Epoch: 6 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:42,438-Speed 5545.64 samples/sec Loss 7.8104 LearningRate 0.0467 Epoch: 6 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:44,261-Speed 5617.32 samples/sec Loss 7.7444 LearningRate 0.0467 Epoch: 6 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:46,087-Speed 5610.02 samples/sec Loss 7.8451 LearningRate 0.0467 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:47,917-Speed 5599.66 samples/sec Loss 7.8648 LearningRate 0.0467 Epoch: 6 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:49,753-Speed 5577.59 samples/sec Loss 8.1877 LearningRate 0.0466 Epoch: 6 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:51,581-Speed 5605.68 samples/sec Loss 7.7958 LearningRate 0.0466 Epoch: 6 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:53,447-Speed 5489.26 samples/sec Loss 7.7898 LearningRate 0.0466 Epoch: 6 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:55,298-Speed 5536.20 samples/sec Loss 7.7875 LearningRate 0.0466 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:20:57,150-Speed 5531.23 samples/sec Loss 7.8510 LearningRate 0.0466 Epoch: 6 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:20:58,998-Speed 5545.10 samples/sec Loss 7.7769 LearningRate 0.0466 Epoch: 6 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:00,922-Speed 5324.59 samples/sec Loss 7.7723 LearningRate 0.0466 Epoch: 6 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:02,751-Speed 5602.38 samples/sec Loss 7.9388 LearningRate 0.0466 Epoch: 6 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:04,602-Speed 5533.72 samples/sec Loss 7.8654 LearningRate 0.0465 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:06,501-Speed 5394.22 samples/sec Loss 7.7083 LearningRate 0.0465 Epoch: 6 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:08,327-Speed 5610.57 samples/sec Loss 7.8744 LearningRate 0.0465 Epoch: 6 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:10,187-Speed 5509.80 samples/sec Loss 7.7989 LearningRate 0.0465 Epoch: 6 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:12,025-Speed 5573.63 samples/sec Loss 7.9759 LearningRate 0.0465 Epoch: 6 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:13,857-Speed 5591.76 samples/sec Loss 7.9297 LearningRate 0.0465 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:15,718-Speed 5504.45 samples/sec Loss 7.8319 LearningRate 0.0465 Epoch: 6 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:17,582-Speed 5496.39 samples/sec Loss 7.7899 LearningRate 0.0464 Epoch: 6 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:19,434-Speed 5534.85 samples/sec Loss 7.8900 LearningRate 0.0464 Epoch: 6 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:21,279-Speed 5553.70 samples/sec Loss 7.9016 LearningRate 0.0464 Epoch: 6 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:23,155-Speed 5460.08 samples/sec Loss 7.9122 LearningRate 0.0464 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:25,019-Speed 5499.39 samples/sec Loss 7.8563 LearningRate 0.0464 Epoch: 6 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:26,892-Speed 5469.59 samples/sec Loss 7.6342 LearningRate 0.0464 Epoch: 6 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:28,730-Speed 5572.17 samples/sec Loss 7.8511 LearningRate 0.0464 Epoch: 6 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:30,569-Speed 5572.83 samples/sec Loss 7.8383 LearningRate 0.0463 Epoch: 6 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:32,426-Speed 5515.41 samples/sec Loss 7.7036 LearningRate 0.0463 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:34,321-Speed 5407.98 samples/sec Loss 7.8943 LearningRate 0.0463 Epoch: 6 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:36,170-Speed 5540.82 samples/sec Loss 7.8889 LearningRate 0.0463 Epoch: 6 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:38,026-Speed 5521.11 samples/sec Loss 7.8097 LearningRate 0.0463 Epoch: 6 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:39,897-Speed 5476.46 samples/sec Loss 7.7858 LearningRate 0.0463 Epoch: 6 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:41,746-Speed 5539.56 samples/sec Loss 7.8922 LearningRate 0.0463 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:43,584-Speed 5573.82 samples/sec Loss 7.7658 LearningRate 0.0463 Epoch: 6 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:21:45,452-Speed 5485.10 samples/sec Loss 7.8927 LearningRate 0.0462 Epoch: 6 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:47,286-Speed 5586.26 samples/sec Loss 7.8878 LearningRate 0.0462 Epoch: 6 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:49,163-Speed 5455.89 samples/sec Loss 7.7578 LearningRate 0.0462 Epoch: 6 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:51,007-Speed 5557.75 samples/sec Loss 7.9819 LearningRate 0.0462 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:52,864-Speed 5516.70 samples/sec Loss 7.7699 LearningRate 0.0462 Epoch: 6 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:54,707-Speed 5559.95 samples/sec Loss 7.7128 LearningRate 0.0462 Epoch: 6 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:56,560-Speed 5529.83 samples/sec Loss 8.0545 LearningRate 0.0462 Epoch: 6 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:21:58,419-Speed 5510.31 samples/sec Loss 7.7619 LearningRate 0.0461 Epoch: 6 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:00,260-Speed 5564.91 samples/sec Loss 7.8742 LearningRate 0.0461 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:02,158-Speed 5396.76 samples/sec Loss 7.9284 LearningRate 0.0461 Epoch: 6 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:04,004-Speed 5550.62 samples/sec Loss 7.9005 LearningRate 0.0461 Epoch: 6 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:22:05,934-Speed 5307.16 samples/sec Loss 7.7764 LearningRate 0.0461 Epoch: 6 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:22:07,785-Speed 5535.93 samples/sec Loss 7.9627 LearningRate 0.0461 Epoch: 6 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:09,634-Speed 5540.72 samples/sec Loss 7.8242 LearningRate 0.0461 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:11,469-Speed 5581.95 samples/sec Loss 7.8342 LearningRate 0.0461 Epoch: 6 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:13,362-Speed 5414.17 samples/sec Loss 7.7436 LearningRate 0.0460 Epoch: 6 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:15,228-Speed 5489.23 samples/sec Loss 7.8853 LearningRate 0.0460 Epoch: 6 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:17,121-Speed 5413.83 samples/sec Loss 7.9151 LearningRate 0.0460 Epoch: 6 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:18,996-Speed 5461.52 samples/sec Loss 7.7815 LearningRate 0.0460 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:20,827-Speed 5596.79 samples/sec Loss 7.7933 LearningRate 0.0460 Epoch: 6 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:22,687-Speed 5508.35 samples/sec Loss 7.7358 LearningRate 0.0460 Epoch: 6 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:24,538-Speed 5536.80 samples/sec Loss 7.8878 LearningRate 0.0460 Epoch: 6 Global Step: 32580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:26,385-Speed 5545.07 samples/sec Loss 7.7267 LearningRate 0.0459 Epoch: 6 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:22:28,208-Speed 5619.88 samples/sec Loss 7.7426 LearningRate 0.0459 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:30,062-Speed 5527.17 samples/sec Loss 7.8468 LearningRate 0.0459 Epoch: 6 Global Step: 32610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:31,913-Speed 5532.11 samples/sec Loss 7.7741 LearningRate 0.0459 Epoch: 6 Global Step: 32620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:33,771-Speed 5514.57 samples/sec Loss 7.6163 LearningRate 0.0459 Epoch: 6 Global Step: 32630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:35,626-Speed 5524.77 samples/sec Loss 7.7564 LearningRate 0.0459 Epoch: 6 Global Step: 32640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:37,478-Speed 5533.53 samples/sec Loss 7.8408 LearningRate 0.0459 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:39,315-Speed 5576.99 samples/sec Loss 7.8771 LearningRate 0.0459 Epoch: 6 Global Step: 32660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:41,218-Speed 5382.94 samples/sec Loss 7.8209 LearningRate 0.0458 Epoch: 6 Global Step: 32670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:43,070-Speed 5537.62 samples/sec Loss 7.6171 LearningRate 0.0458 Epoch: 6 Global Step: 32680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:44,913-Speed 5558.62 samples/sec Loss 7.9518 LearningRate 0.0458 Epoch: 6 Global Step: 32690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:46,745-Speed 5590.95 samples/sec Loss 7.8468 LearningRate 0.0458 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:48,623-Speed 5457.15 samples/sec Loss 7.9088 LearningRate 0.0458 Epoch: 6 Global Step: 32710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:50,475-Speed 5532.26 samples/sec Loss 7.8805 LearningRate 0.0458 Epoch: 6 Global Step: 32720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:52,353-Speed 5452.55 samples/sec Loss 7.8380 LearningRate 0.0458 Epoch: 6 Global Step: 32730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:54,188-Speed 5583.82 samples/sec Loss 7.7316 LearningRate 0.0457 Epoch: 6 Global Step: 32740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:22:56,023-Speed 5585.53 samples/sec Loss 7.8079 LearningRate 0.0457 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:57,875-Speed 5532.54 samples/sec Loss 7.7419 LearningRate 0.0457 Epoch: 6 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:22:59,759-Speed 5436.23 samples/sec Loss 7.7384 LearningRate 0.0457 Epoch: 6 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:01,591-Speed 5592.07 samples/sec Loss 8.0306 LearningRate 0.0457 Epoch: 6 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:03,428-Speed 5576.03 samples/sec Loss 7.7276 LearningRate 0.0457 Epoch: 6 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:05,279-Speed 5534.99 samples/sec Loss 7.8621 LearningRate 0.0457 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:07,110-Speed 5594.73 samples/sec Loss 7.7992 LearningRate 0.0457 Epoch: 6 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:08,959-Speed 5543.43 samples/sec Loss 7.8001 LearningRate 0.0456 Epoch: 6 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:10,815-Speed 5520.31 samples/sec Loss 7.9027 LearningRate 0.0456 Epoch: 6 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:12,654-Speed 5571.29 samples/sec Loss 7.7174 LearningRate 0.0456 Epoch: 6 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:14,486-Speed 5590.03 samples/sec Loss 7.8056 LearningRate 0.0456 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:16,336-Speed 5539.09 samples/sec Loss 7.8510 LearningRate 0.0456 Epoch: 6 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:18,197-Speed 5505.81 samples/sec Loss 7.8914 LearningRate 0.0456 Epoch: 6 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:20,051-Speed 5527.84 samples/sec Loss 7.9917 LearningRate 0.0456 Epoch: 6 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:21,905-Speed 5525.28 samples/sec Loss 7.8986 LearningRate 0.0455 Epoch: 6 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:23,741-Speed 5579.94 samples/sec Loss 7.9427 LearningRate 0.0455 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:25,589-Speed 5543.15 samples/sec Loss 7.6428 LearningRate 0.0455 Epoch: 6 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:27,463-Speed 5467.31 samples/sec Loss 7.8626 LearningRate 0.0455 Epoch: 6 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:29,294-Speed 5595.82 samples/sec Loss 7.7932 LearningRate 0.0455 Epoch: 6 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:31,150-Speed 5517.72 samples/sec Loss 7.6934 LearningRate 0.0455 Epoch: 6 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:33,007-Speed 5517.46 samples/sec Loss 7.7629 LearningRate 0.0455 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:34,860-Speed 5528.31 samples/sec Loss 7.9930 LearningRate 0.0455 Epoch: 6 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:36,716-Speed 5522.43 samples/sec Loss 7.7696 LearningRate 0.0454 Epoch: 6 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:38,592-Speed 5464.95 samples/sec Loss 7.7735 LearningRate 0.0454 Epoch: 6 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:40,423-Speed 5593.39 samples/sec Loss 7.8090 LearningRate 0.0454 Epoch: 6 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:42,266-Speed 5560.00 samples/sec Loss 7.8465 LearningRate 0.0454 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:44,121-Speed 5523.50 samples/sec Loss 7.6945 LearningRate 0.0454 Epoch: 6 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:45,957-Speed 5580.44 samples/sec Loss 7.7841 LearningRate 0.0454 Epoch: 6 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:23:47,810-Speed 5530.21 samples/sec Loss 7.5762 LearningRate 0.0454 Epoch: 6 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:49,673-Speed 5500.19 samples/sec Loss 7.8742 LearningRate 0.0453 Epoch: 6 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:51,529-Speed 5520.80 samples/sec Loss 7.7191 LearningRate 0.0453 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:53,370-Speed 5565.40 samples/sec Loss 7.7933 LearningRate 0.0453 Epoch: 6 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:55,244-Speed 5466.40 samples/sec Loss 7.7936 LearningRate 0.0453 Epoch: 6 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:57,077-Speed 5590.98 samples/sec Loss 7.6494 LearningRate 0.0453 Epoch: 6 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:23:58,911-Speed 5586.45 samples/sec Loss 7.8231 LearningRate 0.0453 Epoch: 6 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:00,783-Speed 5471.97 samples/sec Loss 7.8999 LearningRate 0.0453 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:02,630-Speed 5547.04 samples/sec Loss 7.8166 LearningRate 0.0453 Epoch: 6 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:04,502-Speed 5473.25 samples/sec Loss 7.8342 LearningRate 0.0452 Epoch: 6 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:06,401-Speed 5393.43 samples/sec Loss 7.8136 LearningRate 0.0452 Epoch: 6 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:08,269-Speed 5487.42 samples/sec Loss 7.8229 LearningRate 0.0452 Epoch: 6 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:10,114-Speed 5551.44 samples/sec Loss 7.8298 LearningRate 0.0452 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:12,004-Speed 5420.58 samples/sec Loss 7.8278 LearningRate 0.0452 Epoch: 6 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:13,854-Speed 5538.72 samples/sec Loss 7.9875 LearningRate 0.0452 Epoch: 6 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:15,712-Speed 5513.84 samples/sec Loss 7.8468 LearningRate 0.0452 Epoch: 6 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:17,542-Speed 5598.73 samples/sec Loss 7.8166 LearningRate 0.0451 Epoch: 6 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:19,376-Speed 5588.20 samples/sec Loss 7.7554 LearningRate 0.0451 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:21,231-Speed 5521.98 samples/sec Loss 7.9387 LearningRate 0.0451 Epoch: 6 Global Step: 33210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:23,066-Speed 5582.21 samples/sec Loss 7.9264 LearningRate 0.0451 Epoch: 6 Global Step: 33220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:24,918-Speed 5533.46 samples/sec Loss 7.8113 LearningRate 0.0451 Epoch: 6 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:26,785-Speed 5487.85 samples/sec Loss 7.6535 LearningRate 0.0451 Epoch: 6 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:28,651-Speed 5489.59 samples/sec Loss 7.7778 LearningRate 0.0451 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:30,493-Speed 5559.80 samples/sec Loss 7.7544 LearningRate 0.0451 Epoch: 6 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:32,354-Speed 5505.60 samples/sec Loss 7.9692 LearningRate 0.0450 Epoch: 6 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:24:34,240-Speed 5433.15 samples/sec Loss 7.6382 LearningRate 0.0450 Epoch: 6 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:36,106-Speed 5491.23 samples/sec Loss 7.8605 LearningRate 0.0450 Epoch: 6 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:37,960-Speed 5527.70 samples/sec Loss 8.0382 LearningRate 0.0450 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:39,858-Speed 5394.82 samples/sec Loss 7.8679 LearningRate 0.0450 Epoch: 6 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:41,708-Speed 5538.99 samples/sec Loss 7.6330 LearningRate 0.0450 Epoch: 6 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:43,546-Speed 5571.72 samples/sec Loss 7.8068 LearningRate 0.0450 Epoch: 6 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:45,382-Speed 5580.95 samples/sec Loss 7.8760 LearningRate 0.0449 Epoch: 6 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:47,214-Speed 5592.05 samples/sec Loss 7.9399 LearningRate 0.0449 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:49,056-Speed 5563.16 samples/sec Loss 7.8021 LearningRate 0.0449 Epoch: 6 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:50,928-Speed 5472.05 samples/sec Loss 7.9132 LearningRate 0.0449 Epoch: 6 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:52,757-Speed 5601.91 samples/sec Loss 7.7187 LearningRate 0.0449 Epoch: 6 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:54,651-Speed 5409.44 samples/sec Loss 7.6275 LearningRate 0.0449 Epoch: 6 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:56,498-Speed 5545.98 samples/sec Loss 7.8699 LearningRate 0.0449 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:24:58,332-Speed 5586.87 samples/sec Loss 7.7544 LearningRate 0.0449 Epoch: 6 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:00,175-Speed 5556.98 samples/sec Loss 7.6054 LearningRate 0.0448 Epoch: 6 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:02,006-Speed 5595.59 samples/sec Loss 7.7493 LearningRate 0.0448 Epoch: 6 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:03,894-Speed 5425.97 samples/sec Loss 7.8699 LearningRate 0.0448 Epoch: 6 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:05,756-Speed 5503.81 samples/sec Loss 7.9099 LearningRate 0.0448 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:07,622-Speed 5490.51 samples/sec Loss 7.6693 LearningRate 0.0448 Epoch: 6 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:09,501-Speed 5453.59 samples/sec Loss 7.7900 LearningRate 0.0448 Epoch: 6 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:11,333-Speed 5591.53 samples/sec Loss 7.6920 LearningRate 0.0448 Epoch: 6 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:13,185-Speed 5530.27 samples/sec Loss 7.8135 LearningRate 0.0447 Epoch: 6 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:15,031-Speed 5551.57 samples/sec Loss 7.7283 LearningRate 0.0447 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:16,903-Speed 5470.75 samples/sec Loss 7.4527 LearningRate 0.0447 Epoch: 6 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:18,743-Speed 5570.00 samples/sec Loss 7.7123 LearningRate 0.0447 Epoch: 6 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:20,574-Speed 5596.06 samples/sec Loss 7.6839 LearningRate 0.0447 Epoch: 6 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:22,414-Speed 5567.48 samples/sec Loss 7.6986 LearningRate 0.0447 Epoch: 6 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:24,274-Speed 5506.61 samples/sec Loss 7.6339 LearningRate 0.0447 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:26,112-Speed 5575.30 samples/sec Loss 7.6666 LearningRate 0.0447 Epoch: 6 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:27,943-Speed 5594.47 samples/sec Loss 7.7588 LearningRate 0.0446 Epoch: 6 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:25:29,801-Speed 5514.64 samples/sec Loss 7.8584 LearningRate 0.0446 Epoch: 6 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:31,655-Speed 5525.87 samples/sec Loss 7.9058 LearningRate 0.0446 Epoch: 6 Global Step: 33590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:33,512-Speed 5517.52 samples/sec Loss 7.8030 LearningRate 0.0446 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:35,354-Speed 5560.50 samples/sec Loss 7.8142 LearningRate 0.0446 Epoch: 6 Global Step: 33610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:37,234-Speed 5448.61 samples/sec Loss 7.6599 LearningRate 0.0446 Epoch: 6 Global Step: 33620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:39,111-Speed 5459.62 samples/sec Loss 7.7660 LearningRate 0.0446 Epoch: 6 Global Step: 33630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:40,957-Speed 5550.91 samples/sec Loss 7.6099 LearningRate 0.0445 Epoch: 6 Global Step: 33640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:42,802-Speed 5551.57 samples/sec Loss 7.7876 LearningRate 0.0445 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:44,645-Speed 5558.29 samples/sec Loss 8.0403 LearningRate 0.0445 Epoch: 6 Global Step: 33660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:46,493-Speed 5542.49 samples/sec Loss 7.7355 LearningRate 0.0445 Epoch: 6 Global Step: 33670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:48,328-Speed 5584.84 samples/sec Loss 7.7568 LearningRate 0.0445 Epoch: 6 Global Step: 33680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:50,177-Speed 5541.14 samples/sec Loss 7.7582 LearningRate 0.0445 Epoch: 6 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:52,030-Speed 5528.86 samples/sec Loss 7.7506 LearningRate 0.0445 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:53,872-Speed 5562.79 samples/sec Loss 7.8130 LearningRate 0.0445 Epoch: 6 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:25:55,701-Speed 5603.06 samples/sec Loss 7.5727 LearningRate 0.0444 Epoch: 6 Global Step: 33720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:57,538-Speed 5575.33 samples/sec Loss 7.8591 LearningRate 0.0444 Epoch: 6 Global Step: 33730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:25:59,422-Speed 5437.27 samples/sec Loss 7.7324 LearningRate 0.0444 Epoch: 6 Global Step: 33740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:01,284-Speed 5501.01 samples/sec Loss 7.7706 LearningRate 0.0444 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:03,131-Speed 5550.17 samples/sec Loss 7.6952 LearningRate 0.0444 Epoch: 6 Global Step: 33760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:04,983-Speed 5531.38 samples/sec Loss 7.7628 LearningRate 0.0444 Epoch: 6 Global Step: 33770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:06,852-Speed 5480.45 samples/sec Loss 7.9505 LearningRate 0.0444 Epoch: 6 Global Step: 33780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:08,696-Speed 5557.42 samples/sec Loss 7.6943 LearningRate 0.0444 Epoch: 6 Global Step: 33790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:10,568-Speed 5472.03 samples/sec Loss 7.7991 LearningRate 0.0443 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:12,482-Speed 5352.72 samples/sec Loss 7.7704 LearningRate 0.0443 Epoch: 6 Global Step: 33810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:26:14,324-Speed 5562.08 samples/sec Loss 7.8729 LearningRate 0.0443 Epoch: 6 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:16,179-Speed 5523.23 samples/sec Loss 7.5614 LearningRate 0.0443 Epoch: 6 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:18,042-Speed 5499.92 samples/sec Loss 7.6727 LearningRate 0.0443 Epoch: 6 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:19,936-Speed 5408.92 samples/sec Loss 7.5671 LearningRate 0.0443 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:21,808-Speed 5474.69 samples/sec Loss 7.7087 LearningRate 0.0443 Epoch: 6 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:23,658-Speed 5536.83 samples/sec Loss 7.7338 LearningRate 0.0442 Epoch: 6 Global Step: 33870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:25,507-Speed 5541.15 samples/sec Loss 7.7294 LearningRate 0.0442 Epoch: 6 Global Step: 33880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:27,356-Speed 5542.81 samples/sec Loss 7.7658 LearningRate 0.0442 Epoch: 6 Global Step: 33890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:29,206-Speed 5536.38 samples/sec Loss 7.7305 LearningRate 0.0442 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:31,048-Speed 5563.15 samples/sec Loss 7.7094 LearningRate 0.0442 Epoch: 6 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:32,909-Speed 5504.26 samples/sec Loss 7.6499 LearningRate 0.0442 Epoch: 6 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:34,776-Speed 5489.66 samples/sec Loss 7.7560 LearningRate 0.0442 Epoch: 6 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:36,621-Speed 5551.57 samples/sec Loss 7.6173 LearningRate 0.0442 Epoch: 6 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:38,543-Speed 5331.63 samples/sec Loss 7.8100 LearningRate 0.0441 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:40,407-Speed 5496.86 samples/sec Loss 7.5573 LearningRate 0.0441 Epoch: 6 Global Step: 33960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:42,245-Speed 5573.89 samples/sec Loss 7.7032 LearningRate 0.0441 Epoch: 6 Global Step: 33970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:44,124-Speed 5452.04 samples/sec Loss 7.6881 LearningRate 0.0441 Epoch: 6 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:45,971-Speed 5547.62 samples/sec Loss 7.8292 LearningRate 0.0441 Epoch: 6 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:26:47,843-Speed 5472.74 samples/sec Loss 7.7838 LearningRate 0.0441 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:27:15,041-[lfw][34000]XNorm: 21.252891 Training: 2022-04-11 12:27:15,042-[lfw][34000]Accuracy-Flip: 0.99583+-0.00344 Training: 2022-04-11 12:27:15,042-[lfw][34000]Accuracy-Highest: 0.99733 Training: 2022-04-11 12:27:46,370-[cfp_fp][34000]XNorm: 18.514315 Training: 2022-04-11 12:27:46,370-[cfp_fp][34000]Accuracy-Flip: 0.95843+-0.00794 Training: 2022-04-11 12:27:46,371-[cfp_fp][34000]Accuracy-Highest: 0.96000 Training: 2022-04-11 12:28:13,469-[agedb_30][34000]XNorm: 21.146796 Training: 2022-04-11 12:28:13,470-[agedb_30][34000]Accuracy-Flip: 0.97350+-0.00728 Training: 2022-04-11 12:28:13,470-[agedb_30][34000]Accuracy-Highest: 0.97467 Training: 2022-04-11 12:28:15,350-Speed 117.02 samples/sec Loss 7.6448 LearningRate 0.0441 Epoch: 6 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:17,213-Speed 5499.13 samples/sec Loss 7.6987 LearningRate 0.0440 Epoch: 6 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:19,039-Speed 5612.42 samples/sec Loss 7.7863 LearningRate 0.0440 Epoch: 6 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:20,888-Speed 5541.36 samples/sec Loss 7.8192 LearningRate 0.0440 Epoch: 6 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:22,730-Speed 5560.67 samples/sec Loss 7.8106 LearningRate 0.0440 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:24,577-Speed 5548.74 samples/sec Loss 7.7196 LearningRate 0.0440 Epoch: 6 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:26,428-Speed 5534.42 samples/sec Loss 7.8004 LearningRate 0.0440 Epoch: 6 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:28,301-Speed 5468.78 samples/sec Loss 7.6943 LearningRate 0.0440 Epoch: 6 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:30,181-Speed 5449.58 samples/sec Loss 7.8512 LearningRate 0.0440 Epoch: 6 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:32,034-Speed 5530.85 samples/sec Loss 7.7099 LearningRate 0.0439 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:33,900-Speed 5490.33 samples/sec Loss 7.7482 LearningRate 0.0439 Epoch: 6 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:35,776-Speed 5460.46 samples/sec Loss 7.7974 LearningRate 0.0439 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:37,624-Speed 5545.41 samples/sec Loss 7.6904 LearningRate 0.0439 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:39,500-Speed 5460.33 samples/sec Loss 7.7530 LearningRate 0.0439 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:28:41,334-Speed 5588.34 samples/sec Loss 7.6852 LearningRate 0.0439 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:43,186-Speed 5533.01 samples/sec Loss 7.8457 LearningRate 0.0439 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:45,044-Speed 5512.43 samples/sec Loss 7.7451 LearningRate 0.0439 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:46,900-Speed 5521.99 samples/sec Loss 7.6835 LearningRate 0.0438 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:48,793-Speed 5411.14 samples/sec Loss 7.7416 LearningRate 0.0438 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:50,630-Speed 5576.97 samples/sec Loss 7.5113 LearningRate 0.0438 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:52,488-Speed 5513.84 samples/sec Loss 7.6807 LearningRate 0.0438 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:54,356-Speed 5486.53 samples/sec Loss 7.6283 LearningRate 0.0438 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:56,194-Speed 5574.13 samples/sec Loss 7.7186 LearningRate 0.0438 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:58,058-Speed 5498.84 samples/sec Loss 7.5391 LearningRate 0.0438 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:28:59,889-Speed 5595.20 samples/sec Loss 7.6697 LearningRate 0.0437 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:29:01,747-Speed 5511.14 samples/sec Loss 7.5912 LearningRate 0.0437 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:29:03,607-Speed 5509.07 samples/sec Loss 7.7380 LearningRate 0.0437 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:05,470-Speed 5499.53 samples/sec Loss 7.6005 LearningRate 0.0437 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:07,319-Speed 5540.80 samples/sec Loss 7.7721 LearningRate 0.0437 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:09,160-Speed 5565.53 samples/sec Loss 7.5232 LearningRate 0.0437 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:11,010-Speed 5539.37 samples/sec Loss 7.7081 LearningRate 0.0437 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:12,894-Speed 5436.90 samples/sec Loss 7.7224 LearningRate 0.0437 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:14,780-Speed 5431.85 samples/sec Loss 7.7763 LearningRate 0.0436 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:16,629-Speed 5544.05 samples/sec Loss 7.7788 LearningRate 0.0436 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:18,488-Speed 5513.14 samples/sec Loss 7.6784 LearningRate 0.0436 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:20,345-Speed 5514.17 samples/sec Loss 7.7453 LearningRate 0.0436 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:22,217-Speed 5473.04 samples/sec Loss 7.6172 LearningRate 0.0436 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:24,073-Speed 5522.66 samples/sec Loss 7.9378 LearningRate 0.0436 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:25,968-Speed 5405.58 samples/sec Loss 7.7842 LearningRate 0.0436 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:27,812-Speed 5557.89 samples/sec Loss 7.7703 LearningRate 0.0436 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:29,641-Speed 5599.97 samples/sec Loss 7.8722 LearningRate 0.0435 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:31,492-Speed 5534.07 samples/sec Loss 7.6951 LearningRate 0.0435 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:33,332-Speed 5568.51 samples/sec Loss 7.7970 LearningRate 0.0435 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:35,203-Speed 5477.77 samples/sec Loss 7.7554 LearningRate 0.0435 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:37,035-Speed 5590.71 samples/sec Loss 7.6711 LearningRate 0.0435 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:38,930-Speed 5407.05 samples/sec Loss 7.7067 LearningRate 0.0435 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:40,781-Speed 5535.99 samples/sec Loss 7.6524 LearningRate 0.0435 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:42,630-Speed 5542.23 samples/sec Loss 7.8006 LearningRate 0.0434 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:44,488-Speed 5512.31 samples/sec Loss 7.5277 LearningRate 0.0434 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:46,346-Speed 5515.97 samples/sec Loss 7.6108 LearningRate 0.0434 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:48,195-Speed 5541.24 samples/sec Loss 7.5848 LearningRate 0.0434 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:50,034-Speed 5569.25 samples/sec Loss 7.6625 LearningRate 0.0434 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:51,864-Speed 5598.51 samples/sec Loss 7.6741 LearningRate 0.0434 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:53,701-Speed 5578.82 samples/sec Loss 7.7112 LearningRate 0.0434 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:55,542-Speed 5561.71 samples/sec Loss 7.7139 LearningRate 0.0434 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:57,374-Speed 5593.38 samples/sec Loss 7.6870 LearningRate 0.0433 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:29:59,299-Speed 5321.48 samples/sec Loss 7.6357 LearningRate 0.0433 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:01,136-Speed 5578.05 samples/sec Loss 7.6239 LearningRate 0.0433 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:03,015-Speed 5453.11 samples/sec Loss 7.7003 LearningRate 0.0433 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:04,873-Speed 5513.26 samples/sec Loss 7.6567 LearningRate 0.0433 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:06,705-Speed 5589.79 samples/sec Loss 7.6198 LearningRate 0.0433 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:08,562-Speed 5517.29 samples/sec Loss 7.6069 LearningRate 0.0433 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:10,402-Speed 5567.39 samples/sec Loss 7.6405 LearningRate 0.0433 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:12,255-Speed 5529.88 samples/sec Loss 7.6756 LearningRate 0.0432 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:14,114-Speed 5512.89 samples/sec Loss 7.6816 LearningRate 0.0432 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:15,950-Speed 5585.27 samples/sec Loss 7.7128 LearningRate 0.0432 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:17,787-Speed 5576.75 samples/sec Loss 7.7774 LearningRate 0.0432 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:19,632-Speed 5553.92 samples/sec Loss 7.6784 LearningRate 0.0432 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:21,481-Speed 5539.46 samples/sec Loss 7.7542 LearningRate 0.0432 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:23,350-Speed 5482.72 samples/sec Loss 7.7003 LearningRate 0.0432 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:25,184-Speed 5586.19 samples/sec Loss 7.5536 LearningRate 0.0431 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:27,023-Speed 5567.73 samples/sec Loss 7.6887 LearningRate 0.0431 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:28,870-Speed 5548.05 samples/sec Loss 7.7731 LearningRate 0.0431 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:30,705-Speed 5582.35 samples/sec Loss 7.5781 LearningRate 0.0431 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:32,539-Speed 5586.40 samples/sec Loss 7.6782 LearningRate 0.0431 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:34,372-Speed 5590.21 samples/sec Loss 7.5721 LearningRate 0.0431 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:36,249-Speed 5458.24 samples/sec Loss 7.5718 LearningRate 0.0431 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:38,097-Speed 5543.60 samples/sec Loss 7.5025 LearningRate 0.0431 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:39,932-Speed 5581.70 samples/sec Loss 7.7744 LearningRate 0.0430 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:30:41,763-Speed 5594.43 samples/sec Loss 7.5884 LearningRate 0.0430 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:43,597-Speed 5586.81 samples/sec Loss 7.7121 LearningRate 0.0430 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:45,431-Speed 5585.99 samples/sec Loss 7.6354 LearningRate 0.0430 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:47,267-Speed 5579.03 samples/sec Loss 7.6438 LearningRate 0.0430 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:49,100-Speed 5588.57 samples/sec Loss 7.5641 LearningRate 0.0430 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:50,958-Speed 5515.35 samples/sec Loss 7.6354 LearningRate 0.0430 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:52,833-Speed 5464.17 samples/sec Loss 7.7144 LearningRate 0.0430 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:54,667-Speed 5585.04 samples/sec Loss 7.6060 LearningRate 0.0429 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:56,522-Speed 5521.24 samples/sec Loss 7.6698 LearningRate 0.0429 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:30:58,378-Speed 5522.42 samples/sec Loss 7.5418 LearningRate 0.0429 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:00,234-Speed 5521.57 samples/sec Loss 7.6208 LearningRate 0.0429 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:02,078-Speed 5555.50 samples/sec Loss 7.6112 LearningRate 0.0429 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:03,929-Speed 5535.66 samples/sec Loss 7.6631 LearningRate 0.0429 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:05,800-Speed 5476.40 samples/sec Loss 7.7069 LearningRate 0.0429 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:07,640-Speed 5568.43 samples/sec Loss 7.7586 LearningRate 0.0429 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:09,499-Speed 5509.88 samples/sec Loss 7.7397 LearningRate 0.0428 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:11,356-Speed 5519.19 samples/sec Loss 7.5131 LearningRate 0.0428 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:13,261-Speed 5378.21 samples/sec Loss 7.7479 LearningRate 0.0428 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:15,098-Speed 5576.59 samples/sec Loss 7.7056 LearningRate 0.0428 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:17,067-Speed 5204.13 samples/sec Loss 7.5985 LearningRate 0.0428 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:18,949-Speed 5446.10 samples/sec Loss 7.7094 LearningRate 0.0428 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:20,781-Speed 5590.49 samples/sec Loss 7.6983 LearningRate 0.0428 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:22,638-Speed 5518.56 samples/sec Loss 7.4808 LearningRate 0.0427 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:24,482-Speed 5556.00 samples/sec Loss 7.4950 LearningRate 0.0427 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:26,374-Speed 5415.19 samples/sec Loss 7.4478 LearningRate 0.0427 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:28,214-Speed 5569.33 samples/sec Loss 7.7352 LearningRate 0.0427 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:30,072-Speed 5513.29 samples/sec Loss 7.6860 LearningRate 0.0427 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:31,905-Speed 5588.50 samples/sec Loss 7.6513 LearningRate 0.0427 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:33,769-Speed 5499.07 samples/sec Loss 7.6704 LearningRate 0.0427 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:35,624-Speed 5523.81 samples/sec Loss 7.5950 LearningRate 0.0427 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:37,461-Speed 5577.19 samples/sec Loss 7.6603 LearningRate 0.0426 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:39,301-Speed 5568.03 samples/sec Loss 7.6834 LearningRate 0.0426 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:41,150-Speed 5539.59 samples/sec Loss 7.5959 LearningRate 0.0426 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:42,993-Speed 5559.29 samples/sec Loss 7.5897 LearningRate 0.0426 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:44,858-Speed 5495.75 samples/sec Loss 7.6376 LearningRate 0.0426 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:46,693-Speed 5580.40 samples/sec Loss 7.6057 LearningRate 0.0426 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:48,576-Speed 5445.51 samples/sec Loss 7.7197 LearningRate 0.0426 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:50,410-Speed 5585.81 samples/sec Loss 7.5069 LearningRate 0.0426 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:52,275-Speed 5492.85 samples/sec Loss 7.6951 LearningRate 0.0425 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:54,115-Speed 5571.52 samples/sec Loss 7.7805 LearningRate 0.0425 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:55,982-Speed 5485.94 samples/sec Loss 7.5768 LearningRate 0.0425 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:31:57,850-Speed 5485.84 samples/sec Loss 7.6065 LearningRate 0.0425 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:31:59,696-Speed 5550.16 samples/sec Loss 7.6051 LearningRate 0.0425 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:01,549-Speed 5528.33 samples/sec Loss 7.7132 LearningRate 0.0425 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:03,393-Speed 5556.56 samples/sec Loss 7.7606 LearningRate 0.0425 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:05,257-Speed 5496.13 samples/sec Loss 7.5316 LearningRate 0.0425 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:07,115-Speed 5517.02 samples/sec Loss 7.5908 LearningRate 0.0424 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:08,984-Speed 5483.94 samples/sec Loss 7.7293 LearningRate 0.0424 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:10,829-Speed 5551.73 samples/sec Loss 7.6939 LearningRate 0.0424 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:12,704-Speed 5463.00 samples/sec Loss 7.7711 LearningRate 0.0424 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:14,548-Speed 5558.93 samples/sec Loss 7.7120 LearningRate 0.0424 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:16,384-Speed 5577.87 samples/sec Loss 7.5016 LearningRate 0.0424 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:18,275-Speed 5419.15 samples/sec Loss 7.4841 LearningRate 0.0424 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:20,123-Speed 5542.81 samples/sec Loss 7.5541 LearningRate 0.0423 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:21,970-Speed 5547.44 samples/sec Loss 7.5653 LearningRate 0.0423 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:23,809-Speed 5571.84 samples/sec Loss 7.6912 LearningRate 0.0423 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:25,703-Speed 5409.51 samples/sec Loss 7.4992 LearningRate 0.0423 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:27,540-Speed 5577.08 samples/sec Loss 7.4513 LearningRate 0.0423 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:29,425-Speed 5436.64 samples/sec Loss 7.5123 LearningRate 0.0423 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:31,271-Speed 5548.38 samples/sec Loss 7.5689 LearningRate 0.0423 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:33,187-Speed 5348.87 samples/sec Loss 7.5958 LearningRate 0.0423 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:46,416-Speed 774.15 samples/sec Loss 7.3153 LearningRate 0.0422 Epoch: 7 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:48,370-Speed 5245.21 samples/sec Loss 6.8312 LearningRate 0.0422 Epoch: 7 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:50,350-Speed 5172.86 samples/sec Loss 6.7185 LearningRate 0.0422 Epoch: 7 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:52,198-Speed 5543.05 samples/sec Loss 6.8867 LearningRate 0.0422 Epoch: 7 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:32:54,062-Speed 5498.23 samples/sec Loss 6.8067 LearningRate 0.0422 Epoch: 7 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:55,901-Speed 5570.74 samples/sec Loss 6.8127 LearningRate 0.0422 Epoch: 7 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:57,739-Speed 5573.98 samples/sec Loss 6.7433 LearningRate 0.0422 Epoch: 7 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:32:59,633-Speed 5407.55 samples/sec Loss 6.8166 LearningRate 0.0422 Epoch: 7 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:01,502-Speed 5481.52 samples/sec Loss 6.7084 LearningRate 0.0421 Epoch: 7 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:03,357-Speed 5522.48 samples/sec Loss 6.8278 LearningRate 0.0421 Epoch: 7 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:05,214-Speed 5516.84 samples/sec Loss 6.7407 LearningRate 0.0421 Epoch: 7 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:07,096-Speed 5445.72 samples/sec Loss 6.8429 LearningRate 0.0421 Epoch: 7 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:08,945-Speed 5540.70 samples/sec Loss 6.9025 LearningRate 0.0421 Epoch: 7 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:10,880-Speed 5292.98 samples/sec Loss 6.8793 LearningRate 0.0421 Epoch: 7 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:12,736-Speed 5519.94 samples/sec Loss 7.0037 LearningRate 0.0421 Epoch: 7 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:14,611-Speed 5468.46 samples/sec Loss 6.9260 LearningRate 0.0421 Epoch: 7 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:16,444-Speed 5590.52 samples/sec Loss 6.8980 LearningRate 0.0420 Epoch: 7 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:18,290-Speed 5551.33 samples/sec Loss 6.7627 LearningRate 0.0420 Epoch: 7 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:20,155-Speed 5490.97 samples/sec Loss 6.9195 LearningRate 0.0420 Epoch: 7 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:22,023-Speed 5485.82 samples/sec Loss 6.8972 LearningRate 0.0420 Epoch: 7 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:23,884-Speed 5506.04 samples/sec Loss 6.9000 LearningRate 0.0420 Epoch: 7 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:25,734-Speed 5537.16 samples/sec Loss 6.8121 LearningRate 0.0420 Epoch: 7 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:27,581-Speed 5547.74 samples/sec Loss 6.7918 LearningRate 0.0420 Epoch: 7 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:29,424-Speed 5557.80 samples/sec Loss 6.9331 LearningRate 0.0419 Epoch: 7 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:31,295-Speed 5477.26 samples/sec Loss 6.9363 LearningRate 0.0419 Epoch: 7 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:33,139-Speed 5557.83 samples/sec Loss 6.9875 LearningRate 0.0419 Epoch: 7 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:34,986-Speed 5545.16 samples/sec Loss 6.9152 LearningRate 0.0419 Epoch: 7 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:36,871-Speed 5435.13 samples/sec Loss 6.9711 LearningRate 0.0419 Epoch: 7 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:38,739-Speed 5483.99 samples/sec Loss 7.1447 LearningRate 0.0419 Epoch: 7 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:40,663-Speed 5325.16 samples/sec Loss 7.0813 LearningRate 0.0419 Epoch: 7 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:42,538-Speed 5462.92 samples/sec Loss 7.0596 LearningRate 0.0419 Epoch: 7 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:44,409-Speed 5478.58 samples/sec Loss 6.9415 LearningRate 0.0418 Epoch: 7 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:46,272-Speed 5498.31 samples/sec Loss 7.0631 LearningRate 0.0418 Epoch: 7 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:48,144-Speed 5472.90 samples/sec Loss 7.0419 LearningRate 0.0418 Epoch: 7 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:50,019-Speed 5465.68 samples/sec Loss 7.0307 LearningRate 0.0418 Epoch: 7 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:51,921-Speed 5384.06 samples/sec Loss 6.9262 LearningRate 0.0418 Epoch: 7 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:33:53,783-Speed 5503.36 samples/sec Loss 6.9669 LearningRate 0.0418 Epoch: 7 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:55,629-Speed 5549.99 samples/sec Loss 7.1253 LearningRate 0.0418 Epoch: 7 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:57,495-Speed 5489.57 samples/sec Loss 6.9343 LearningRate 0.0418 Epoch: 7 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:33:59,373-Speed 5457.24 samples/sec Loss 7.0837 LearningRate 0.0417 Epoch: 7 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:01,223-Speed 5537.75 samples/sec Loss 7.0552 LearningRate 0.0417 Epoch: 7 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:03,082-Speed 5520.64 samples/sec Loss 6.9479 LearningRate 0.0417 Epoch: 7 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:04,958-Speed 5461.56 samples/sec Loss 7.0976 LearningRate 0.0417 Epoch: 7 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:06,852-Speed 5406.45 samples/sec Loss 7.1185 LearningRate 0.0417 Epoch: 7 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:08,691-Speed 5573.12 samples/sec Loss 7.1022 LearningRate 0.0417 Epoch: 7 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:10,528-Speed 5576.79 samples/sec Loss 7.1606 LearningRate 0.0417 Epoch: 7 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:12,389-Speed 5504.35 samples/sec Loss 7.1960 LearningRate 0.0417 Epoch: 7 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:34:14,253-Speed 5496.98 samples/sec Loss 7.1658 LearningRate 0.0416 Epoch: 7 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:34:16,098-Speed 5553.90 samples/sec Loss 7.0486 LearningRate 0.0416 Epoch: 7 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:17,950-Speed 5530.94 samples/sec Loss 7.1115 LearningRate 0.0416 Epoch: 7 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:19,799-Speed 5542.55 samples/sec Loss 7.0427 LearningRate 0.0416 Epoch: 7 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:21,667-Speed 5484.57 samples/sec Loss 7.1792 LearningRate 0.0416 Epoch: 7 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:23,515-Speed 5545.53 samples/sec Loss 7.1646 LearningRate 0.0416 Epoch: 7 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:25,398-Speed 5441.55 samples/sec Loss 7.1205 LearningRate 0.0416 Epoch: 7 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:27,254-Speed 5519.75 samples/sec Loss 7.0552 LearningRate 0.0416 Epoch: 7 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:29,137-Speed 5439.62 samples/sec Loss 7.0466 LearningRate 0.0415 Epoch: 7 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:31,049-Speed 5360.51 samples/sec Loss 7.1857 LearningRate 0.0415 Epoch: 7 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:32,886-Speed 5577.59 samples/sec Loss 7.1394 LearningRate 0.0415 Epoch: 7 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:34:34,733-Speed 5547.14 samples/sec Loss 7.0655 LearningRate 0.0415 Epoch: 7 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:34:36,614-Speed 5443.94 samples/sec Loss 7.1522 LearningRate 0.0415 Epoch: 7 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:35:03,783-[lfw][36000]XNorm: 22.880578 Training: 2022-04-11 12:35:03,784-[lfw][36000]Accuracy-Flip: 0.99617+-0.00289 Training: 2022-04-11 12:35:03,785-[lfw][36000]Accuracy-Highest: 0.99733 Training: 2022-04-11 12:35:35,185-[cfp_fp][36000]XNorm: 20.194693 Training: 2022-04-11 12:35:35,186-[cfp_fp][36000]Accuracy-Flip: 0.95814+-0.00994 Training: 2022-04-11 12:35:35,187-[cfp_fp][36000]Accuracy-Highest: 0.96000 Training: 2022-04-11 12:36:02,208-[agedb_30][36000]XNorm: 22.727170 Training: 2022-04-11 12:36:02,209-[agedb_30][36000]Accuracy-Flip: 0.97200+-0.00733 Training: 2022-04-11 12:36:02,210-[agedb_30][36000]Accuracy-Highest: 0.97467 Training: 2022-04-11 12:36:04,062-Speed 117.10 samples/sec Loss 7.1927 LearningRate 0.0415 Epoch: 7 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:05,912-Speed 5540.40 samples/sec Loss 7.0326 LearningRate 0.0415 Epoch: 7 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:07,761-Speed 5540.64 samples/sec Loss 7.1624 LearningRate 0.0415 Epoch: 7 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:09,603-Speed 5561.29 samples/sec Loss 7.1124 LearningRate 0.0414 Epoch: 7 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:11,460-Speed 5528.85 samples/sec Loss 7.0952 LearningRate 0.0414 Epoch: 7 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:13,328-Speed 5483.41 samples/sec Loss 7.1266 LearningRate 0.0414 Epoch: 7 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:15,187-Speed 5511.34 samples/sec Loss 7.2736 LearningRate 0.0414 Epoch: 7 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:17,042-Speed 5522.10 samples/sec Loss 7.1773 LearningRate 0.0414 Epoch: 7 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:18,865-Speed 5619.47 samples/sec Loss 7.2986 LearningRate 0.0414 Epoch: 7 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:20,740-Speed 5465.86 samples/sec Loss 7.0618 LearningRate 0.0414 Epoch: 7 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:22,581-Speed 5564.35 samples/sec Loss 7.3058 LearningRate 0.0414 Epoch: 7 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:24,434-Speed 5528.75 samples/sec Loss 7.1966 LearningRate 0.0413 Epoch: 7 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:26,275-Speed 5567.63 samples/sec Loss 7.3689 LearningRate 0.0413 Epoch: 7 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:28,126-Speed 5535.56 samples/sec Loss 7.2175 LearningRate 0.0413 Epoch: 7 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:29,967-Speed 5565.59 samples/sec Loss 7.1806 LearningRate 0.0413 Epoch: 7 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:31,824-Speed 5514.48 samples/sec Loss 7.1026 LearningRate 0.0413 Epoch: 7 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:33,672-Speed 5546.14 samples/sec Loss 7.0559 LearningRate 0.0413 Epoch: 7 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:35,520-Speed 5544.28 samples/sec Loss 7.4083 LearningRate 0.0413 Epoch: 7 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:37,387-Speed 5489.44 samples/sec Loss 7.2900 LearningRate 0.0412 Epoch: 7 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:39,249-Speed 5500.89 samples/sec Loss 7.1745 LearningRate 0.0412 Epoch: 7 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:41,087-Speed 5575.62 samples/sec Loss 7.2617 LearningRate 0.0412 Epoch: 7 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:42,937-Speed 5538.09 samples/sec Loss 7.1818 LearningRate 0.0412 Epoch: 7 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:44,803-Speed 5490.52 samples/sec Loss 7.2438 LearningRate 0.0412 Epoch: 7 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:46,655-Speed 5529.56 samples/sec Loss 7.2563 LearningRate 0.0412 Epoch: 7 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:48,526-Speed 5515.63 samples/sec Loss 7.2927 LearningRate 0.0412 Epoch: 7 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:50,373-Speed 5548.17 samples/sec Loss 7.3555 LearningRate 0.0412 Epoch: 7 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:52,208-Speed 5584.46 samples/sec Loss 7.2582 LearningRate 0.0411 Epoch: 7 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:36:54,042-Speed 5583.72 samples/sec Loss 7.2452 LearningRate 0.0411 Epoch: 7 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:55,904-Speed 5503.77 samples/sec Loss 7.3181 LearningRate 0.0411 Epoch: 7 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:57,738-Speed 5586.87 samples/sec Loss 7.1760 LearningRate 0.0411 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:36:59,620-Speed 5443.91 samples/sec Loss 7.1506 LearningRate 0.0411 Epoch: 7 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:01,454-Speed 5585.54 samples/sec Loss 7.2046 LearningRate 0.0411 Epoch: 7 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:03,351-Speed 5402.06 samples/sec Loss 7.3357 LearningRate 0.0411 Epoch: 7 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:05,203-Speed 5532.19 samples/sec Loss 7.1469 LearningRate 0.0411 Epoch: 7 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:07,062-Speed 5510.40 samples/sec Loss 7.0477 LearningRate 0.0410 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:08,899-Speed 5578.45 samples/sec Loss 7.4247 LearningRate 0.0410 Epoch: 7 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:10,790-Speed 5418.41 samples/sec Loss 7.2905 LearningRate 0.0410 Epoch: 7 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:12,640-Speed 5542.60 samples/sec Loss 7.2808 LearningRate 0.0410 Epoch: 7 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:14,488-Speed 5542.55 samples/sec Loss 7.3088 LearningRate 0.0410 Epoch: 7 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:16,347-Speed 5510.56 samples/sec Loss 7.1697 LearningRate 0.0410 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:18,206-Speed 5512.21 samples/sec Loss 7.2811 LearningRate 0.0410 Epoch: 7 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:20,113-Speed 5371.34 samples/sec Loss 7.2202 LearningRate 0.0410 Epoch: 7 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:21,960-Speed 5548.01 samples/sec Loss 7.1497 LearningRate 0.0409 Epoch: 7 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:23,809-Speed 5542.13 samples/sec Loss 7.2821 LearningRate 0.0409 Epoch: 7 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:25,653-Speed 5553.99 samples/sec Loss 7.2648 LearningRate 0.0409 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:27,522-Speed 5483.85 samples/sec Loss 7.0901 LearningRate 0.0409 Epoch: 7 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:29,379-Speed 5517.34 samples/sec Loss 7.2875 LearningRate 0.0409 Epoch: 7 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:31,241-Speed 5502.46 samples/sec Loss 7.1745 LearningRate 0.0409 Epoch: 7 Global Step: 36480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:33,084-Speed 5561.01 samples/sec Loss 7.1247 LearningRate 0.0409 Epoch: 7 Global Step: 36490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:34,946-Speed 5500.21 samples/sec Loss 7.1096 LearningRate 0.0409 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:36,783-Speed 5576.92 samples/sec Loss 7.2799 LearningRate 0.0408 Epoch: 7 Global Step: 36510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:38,630-Speed 5548.51 samples/sec Loss 7.3210 LearningRate 0.0408 Epoch: 7 Global Step: 36520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:40,487-Speed 5515.44 samples/sec Loss 7.2506 LearningRate 0.0408 Epoch: 7 Global Step: 36530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:42,327-Speed 5572.74 samples/sec Loss 7.3599 LearningRate 0.0408 Epoch: 7 Global Step: 36540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:44,164-Speed 5574.30 samples/sec Loss 7.3970 LearningRate 0.0408 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:46,001-Speed 5578.33 samples/sec Loss 7.2064 LearningRate 0.0408 Epoch: 7 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:47,907-Speed 5376.15 samples/sec Loss 7.3339 LearningRate 0.0408 Epoch: 7 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:49,751-Speed 5556.95 samples/sec Loss 7.3524 LearningRate 0.0408 Epoch: 7 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:37:51,682-Speed 5304.54 samples/sec Loss 7.3653 LearningRate 0.0407 Epoch: 7 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:53,525-Speed 5562.27 samples/sec Loss 7.3132 LearningRate 0.0407 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:55,363-Speed 5572.73 samples/sec Loss 7.3768 LearningRate 0.0407 Epoch: 7 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:57,190-Speed 5608.21 samples/sec Loss 7.2834 LearningRate 0.0407 Epoch: 7 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:37:59,020-Speed 5600.69 samples/sec Loss 7.2792 LearningRate 0.0407 Epoch: 7 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:00,866-Speed 5548.93 samples/sec Loss 7.3254 LearningRate 0.0407 Epoch: 7 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:02,710-Speed 5555.80 samples/sec Loss 7.2431 LearningRate 0.0407 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:04,612-Speed 5386.25 samples/sec Loss 7.1577 LearningRate 0.0407 Epoch: 7 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:06,473-Speed 5505.61 samples/sec Loss 7.2077 LearningRate 0.0406 Epoch: 7 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:08,309-Speed 5581.48 samples/sec Loss 7.3839 LearningRate 0.0406 Epoch: 7 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:10,140-Speed 5592.31 samples/sec Loss 7.3221 LearningRate 0.0406 Epoch: 7 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:38:11,957-Speed 5638.47 samples/sec Loss 7.3667 LearningRate 0.0406 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:13,826-Speed 5483.77 samples/sec Loss 7.1651 LearningRate 0.0406 Epoch: 7 Global Step: 36710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:15,689-Speed 5497.99 samples/sec Loss 7.2407 LearningRate 0.0406 Epoch: 7 Global Step: 36720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:17,527-Speed 5574.23 samples/sec Loss 7.3079 LearningRate 0.0406 Epoch: 7 Global Step: 36730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:19,368-Speed 5563.47 samples/sec Loss 7.3666 LearningRate 0.0406 Epoch: 7 Global Step: 36740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:21,207-Speed 5572.05 samples/sec Loss 7.3504 LearningRate 0.0405 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:23,090-Speed 5439.87 samples/sec Loss 7.3259 LearningRate 0.0405 Epoch: 7 Global Step: 36760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:24,939-Speed 5538.72 samples/sec Loss 7.1635 LearningRate 0.0405 Epoch: 7 Global Step: 36770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:26,815-Speed 5461.36 samples/sec Loss 7.2268 LearningRate 0.0405 Epoch: 7 Global Step: 36780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:28,671-Speed 5520.43 samples/sec Loss 7.2803 LearningRate 0.0405 Epoch: 7 Global Step: 36790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:38:30,558-Speed 5428.58 samples/sec Loss 7.2692 LearningRate 0.0405 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:32,388-Speed 5601.39 samples/sec Loss 7.2966 LearningRate 0.0405 Epoch: 7 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:34,237-Speed 5540.03 samples/sec Loss 7.3694 LearningRate 0.0405 Epoch: 7 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:36,107-Speed 5478.83 samples/sec Loss 7.2321 LearningRate 0.0404 Epoch: 7 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:37,967-Speed 5510.18 samples/sec Loss 7.2100 LearningRate 0.0404 Epoch: 7 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:39,845-Speed 5454.21 samples/sec Loss 7.2006 LearningRate 0.0404 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:41,715-Speed 5478.87 samples/sec Loss 7.3006 LearningRate 0.0404 Epoch: 7 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:43,560-Speed 5552.40 samples/sec Loss 7.3151 LearningRate 0.0404 Epoch: 7 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:45,386-Speed 5610.34 samples/sec Loss 7.3550 LearningRate 0.0404 Epoch: 7 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:47,282-Speed 5405.02 samples/sec Loss 7.2414 LearningRate 0.0404 Epoch: 7 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:49,155-Speed 5469.11 samples/sec Loss 7.2934 LearningRate 0.0404 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:38:50,996-Speed 5566.30 samples/sec Loss 7.3663 LearningRate 0.0403 Epoch: 7 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:38:52,845-Speed 5541.86 samples/sec Loss 7.3236 LearningRate 0.0403 Epoch: 7 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:54,690-Speed 5553.03 samples/sec Loss 7.3877 LearningRate 0.0403 Epoch: 7 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:56,552-Speed 5502.99 samples/sec Loss 7.3796 LearningRate 0.0403 Epoch: 7 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:38:58,413-Speed 5504.97 samples/sec Loss 7.4457 LearningRate 0.0403 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:00,265-Speed 5533.46 samples/sec Loss 7.2736 LearningRate 0.0403 Epoch: 7 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:02,156-Speed 5416.33 samples/sec Loss 7.3485 LearningRate 0.0403 Epoch: 7 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:04,010-Speed 5528.15 samples/sec Loss 7.3046 LearningRate 0.0403 Epoch: 7 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:05,858-Speed 5544.44 samples/sec Loss 7.1818 LearningRate 0.0402 Epoch: 7 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:07,691-Speed 5589.81 samples/sec Loss 7.2420 LearningRate 0.0402 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:09,524-Speed 5588.66 samples/sec Loss 7.2176 LearningRate 0.0402 Epoch: 7 Global Step: 37010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:11,390-Speed 5492.49 samples/sec Loss 7.4222 LearningRate 0.0402 Epoch: 7 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:39:13,234-Speed 5555.37 samples/sec Loss 7.3547 LearningRate 0.0402 Epoch: 7 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:39:15,088-Speed 5527.42 samples/sec Loss 7.2173 LearningRate 0.0402 Epoch: 7 Global Step: 37040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:16,971-Speed 5440.77 samples/sec Loss 7.3726 LearningRate 0.0402 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:18,832-Speed 5504.87 samples/sec Loss 7.3876 LearningRate 0.0402 Epoch: 7 Global Step: 37060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:20,681-Speed 5540.86 samples/sec Loss 7.3740 LearningRate 0.0401 Epoch: 7 Global Step: 37070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:22,529-Speed 5546.46 samples/sec Loss 7.3948 LearningRate 0.0401 Epoch: 7 Global Step: 37080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:24,428-Speed 5393.30 samples/sec Loss 7.3049 LearningRate 0.0401 Epoch: 7 Global Step: 37090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:26,267-Speed 5572.09 samples/sec Loss 7.2828 LearningRate 0.0401 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:28,136-Speed 5482.51 samples/sec Loss 7.4496 LearningRate 0.0401 Epoch: 7 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:30,046-Speed 5364.72 samples/sec Loss 7.4069 LearningRate 0.0401 Epoch: 7 Global Step: 37120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:31,932-Speed 5430.10 samples/sec Loss 7.1488 LearningRate 0.0401 Epoch: 7 Global Step: 37130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:33,776-Speed 5558.88 samples/sec Loss 7.3612 LearningRate 0.0401 Epoch: 7 Global Step: 37140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:35,637-Speed 5505.18 samples/sec Loss 7.2890 LearningRate 0.0400 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:37,491-Speed 5527.95 samples/sec Loss 7.3202 LearningRate 0.0400 Epoch: 7 Global Step: 37160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:39,373-Speed 5442.20 samples/sec Loss 7.3220 LearningRate 0.0400 Epoch: 7 Global Step: 37170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:41,216-Speed 5558.11 samples/sec Loss 7.2697 LearningRate 0.0400 Epoch: 7 Global Step: 37180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:43,061-Speed 5555.65 samples/sec Loss 7.2139 LearningRate 0.0400 Epoch: 7 Global Step: 37190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:44,898-Speed 5575.03 samples/sec Loss 7.3846 LearningRate 0.0400 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:46,734-Speed 5580.74 samples/sec Loss 7.3605 LearningRate 0.0400 Epoch: 7 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:39:48,573-Speed 5570.72 samples/sec Loss 7.2917 LearningRate 0.0400 Epoch: 7 Global Step: 37220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:50,455-Speed 5444.88 samples/sec Loss 7.3886 LearningRate 0.0399 Epoch: 7 Global Step: 37230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:52,284-Speed 5601.96 samples/sec Loss 7.3300 LearningRate 0.0399 Epoch: 7 Global Step: 37240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:54,131-Speed 5547.07 samples/sec Loss 7.1480 LearningRate 0.0399 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:55,962-Speed 5594.16 samples/sec Loss 7.1673 LearningRate 0.0399 Epoch: 7 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:57,842-Speed 5450.98 samples/sec Loss 7.2497 LearningRate 0.0399 Epoch: 7 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:39:59,677-Speed 5583.47 samples/sec Loss 7.1974 LearningRate 0.0399 Epoch: 7 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:01,527-Speed 5537.42 samples/sec Loss 7.4213 LearningRate 0.0399 Epoch: 7 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:03,393-Speed 5493.70 samples/sec Loss 7.1763 LearningRate 0.0399 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:05,260-Speed 5485.98 samples/sec Loss 7.1280 LearningRate 0.0398 Epoch: 7 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:07,087-Speed 5606.80 samples/sec Loss 7.2806 LearningRate 0.0398 Epoch: 7 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:08,958-Speed 5475.86 samples/sec Loss 7.2546 LearningRate 0.0398 Epoch: 7 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:10,798-Speed 5569.19 samples/sec Loss 7.3808 LearningRate 0.0398 Epoch: 7 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:12,636-Speed 5574.27 samples/sec Loss 7.3491 LearningRate 0.0398 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:14,526-Speed 5421.30 samples/sec Loss 7.3019 LearningRate 0.0398 Epoch: 7 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:16,365-Speed 5571.09 samples/sec Loss 7.1205 LearningRate 0.0398 Epoch: 7 Global Step: 37370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:18,219-Speed 5524.89 samples/sec Loss 7.1771 LearningRate 0.0398 Epoch: 7 Global Step: 37380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:20,119-Speed 5393.11 samples/sec Loss 7.3028 LearningRate 0.0397 Epoch: 7 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:21,956-Speed 5575.43 samples/sec Loss 7.3604 LearningRate 0.0397 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:23,832-Speed 5462.43 samples/sec Loss 7.3126 LearningRate 0.0397 Epoch: 7 Global Step: 37410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:25,694-Speed 5501.71 samples/sec Loss 7.1579 LearningRate 0.0397 Epoch: 7 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:27,561-Speed 5490.76 samples/sec Loss 7.3156 LearningRate 0.0397 Epoch: 7 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:29,404-Speed 5556.98 samples/sec Loss 7.3682 LearningRate 0.0397 Epoch: 7 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:31,280-Speed 5462.53 samples/sec Loss 7.3856 LearningRate 0.0397 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:33,127-Speed 5547.40 samples/sec Loss 7.1886 LearningRate 0.0397 Epoch: 7 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:34,976-Speed 5540.78 samples/sec Loss 7.3322 LearningRate 0.0396 Epoch: 7 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:36,847-Speed 5476.31 samples/sec Loss 7.4443 LearningRate 0.0396 Epoch: 7 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:38,708-Speed 5506.45 samples/sec Loss 7.2141 LearningRate 0.0396 Epoch: 7 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:40:40,585-Speed 5457.18 samples/sec Loss 7.4424 LearningRate 0.0396 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:42,458-Speed 5469.68 samples/sec Loss 7.2041 LearningRate 0.0396 Epoch: 7 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:44,312-Speed 5526.00 samples/sec Loss 7.2112 LearningRate 0.0396 Epoch: 7 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:46,147-Speed 5583.30 samples/sec Loss 7.2789 LearningRate 0.0396 Epoch: 7 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:47,992-Speed 5554.10 samples/sec Loss 7.3838 LearningRate 0.0396 Epoch: 7 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:49,888-Speed 5403.42 samples/sec Loss 7.3129 LearningRate 0.0395 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:51,730-Speed 5562.15 samples/sec Loss 7.3773 LearningRate 0.0395 Epoch: 7 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:53,607-Speed 5459.75 samples/sec Loss 7.4062 LearningRate 0.0395 Epoch: 7 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:55,453-Speed 5551.21 samples/sec Loss 7.3602 LearningRate 0.0395 Epoch: 7 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:57,331-Speed 5455.37 samples/sec Loss 7.3097 LearningRate 0.0395 Epoch: 7 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:40:59,175-Speed 5559.69 samples/sec Loss 7.2588 LearningRate 0.0395 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:41:01,035-Speed 5506.66 samples/sec Loss 7.3184 LearningRate 0.0395 Epoch: 7 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:02,904-Speed 5483.29 samples/sec Loss 7.3348 LearningRate 0.0395 Epoch: 7 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:04,812-Speed 5369.38 samples/sec Loss 7.3086 LearningRate 0.0394 Epoch: 7 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:06,680-Speed 5484.23 samples/sec Loss 7.3101 LearningRate 0.0394 Epoch: 7 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:08,514-Speed 5586.06 samples/sec Loss 7.4015 LearningRate 0.0394 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:10,350-Speed 5580.62 samples/sec Loss 7.3300 LearningRate 0.0394 Epoch: 7 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:12,221-Speed 5472.68 samples/sec Loss 7.2708 LearningRate 0.0394 Epoch: 7 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:14,074-Speed 5530.08 samples/sec Loss 7.2461 LearningRate 0.0394 Epoch: 7 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:15,908-Speed 5587.76 samples/sec Loss 7.1829 LearningRate 0.0394 Epoch: 7 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:17,747-Speed 5571.98 samples/sec Loss 7.3221 LearningRate 0.0394 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:19,624-Speed 5457.51 samples/sec Loss 7.2611 LearningRate 0.0393 Epoch: 7 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:41:21,446-Speed 5621.73 samples/sec Loss 7.2379 LearningRate 0.0393 Epoch: 7 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:23,339-Speed 5411.36 samples/sec Loss 7.2317 LearningRate 0.0393 Epoch: 7 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:25,188-Speed 5540.64 samples/sec Loss 7.1272 LearningRate 0.0393 Epoch: 7 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:27,048-Speed 5508.59 samples/sec Loss 7.3814 LearningRate 0.0393 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:28,915-Speed 5487.90 samples/sec Loss 7.2466 LearningRate 0.0393 Epoch: 7 Global Step: 37760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:30,751-Speed 5579.96 samples/sec Loss 7.2179 LearningRate 0.0393 Epoch: 7 Global Step: 37770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:32,628-Speed 5458.64 samples/sec Loss 7.2383 LearningRate 0.0393 Epoch: 7 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:34,488-Speed 5510.04 samples/sec Loss 7.4611 LearningRate 0.0392 Epoch: 7 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:36,342-Speed 5526.53 samples/sec Loss 7.2715 LearningRate 0.0392 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:38,197-Speed 5523.79 samples/sec Loss 7.3793 LearningRate 0.0392 Epoch: 7 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:40,069-Speed 5470.36 samples/sec Loss 7.3702 LearningRate 0.0392 Epoch: 7 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:41:41,934-Speed 5492.98 samples/sec Loss 7.2340 LearningRate 0.0392 Epoch: 7 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:43,801-Speed 5488.71 samples/sec Loss 7.2982 LearningRate 0.0392 Epoch: 7 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:45,696-Speed 5407.24 samples/sec Loss 7.3159 LearningRate 0.0392 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:47,579-Speed 5442.12 samples/sec Loss 7.2849 LearningRate 0.0392 Epoch: 7 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:49,431-Speed 5529.63 samples/sec Loss 7.2968 LearningRate 0.0391 Epoch: 7 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:51,286-Speed 5523.99 samples/sec Loss 7.2466 LearningRate 0.0391 Epoch: 7 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:53,155-Speed 5483.30 samples/sec Loss 7.3452 LearningRate 0.0391 Epoch: 7 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:55,020-Speed 5493.56 samples/sec Loss 7.1239 LearningRate 0.0391 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:56,891-Speed 5475.54 samples/sec Loss 7.3958 LearningRate 0.0391 Epoch: 7 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:41:58,743-Speed 5531.53 samples/sec Loss 7.3397 LearningRate 0.0391 Epoch: 7 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:42:00,600-Speed 5518.34 samples/sec Loss 7.4026 LearningRate 0.0391 Epoch: 7 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:02,436-Speed 5579.92 samples/sec Loss 7.4039 LearningRate 0.0391 Epoch: 7 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:04,294-Speed 5515.37 samples/sec Loss 7.1393 LearningRate 0.0390 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:06,132-Speed 5571.85 samples/sec Loss 7.2668 LearningRate 0.0390 Epoch: 7 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:07,972-Speed 5570.74 samples/sec Loss 7.2622 LearningRate 0.0390 Epoch: 7 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:09,833-Speed 5505.77 samples/sec Loss 7.3278 LearningRate 0.0390 Epoch: 7 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:11,683-Speed 5537.73 samples/sec Loss 7.2178 LearningRate 0.0390 Epoch: 7 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:13,551-Speed 5484.89 samples/sec Loss 7.1594 LearningRate 0.0390 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:42:40,928-[lfw][38000]XNorm: 23.302278 Training: 2022-04-11 12:42:40,929-[lfw][38000]Accuracy-Flip: 0.99750+-0.00227 Training: 2022-04-11 12:42:40,929-[lfw][38000]Accuracy-Highest: 0.99750 Training: 2022-04-11 12:43:12,481-[cfp_fp][38000]XNorm: 20.459902 Training: 2022-04-11 12:43:12,481-[cfp_fp][38000]Accuracy-Flip: 0.96057+-0.01044 Training: 2022-04-11 12:43:12,482-[cfp_fp][38000]Accuracy-Highest: 0.96057 Training: 2022-04-11 12:43:39,444-[agedb_30][38000]XNorm: 22.929287 Training: 2022-04-11 12:43:39,445-[agedb_30][38000]Accuracy-Flip: 0.97383+-0.00775 Training: 2022-04-11 12:43:39,446-[agedb_30][38000]Accuracy-Highest: 0.97467 Training: 2022-04-11 12:43:41,298-Speed 116.70 samples/sec Loss 7.2951 LearningRate 0.0390 Epoch: 7 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:43:43,114-Speed 5638.21 samples/sec Loss 7.2955 LearningRate 0.0390 Epoch: 7 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:44,954-Speed 5568.06 samples/sec Loss 7.3198 LearningRate 0.0389 Epoch: 7 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:46,828-Speed 5466.09 samples/sec Loss 7.3842 LearningRate 0.0389 Epoch: 7 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:48,670-Speed 5561.46 samples/sec Loss 7.5181 LearningRate 0.0389 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:50,543-Speed 5473.52 samples/sec Loss 7.2371 LearningRate 0.0389 Epoch: 7 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:52,476-Speed 5298.81 samples/sec Loss 7.4007 LearningRate 0.0389 Epoch: 7 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:54,316-Speed 5566.77 samples/sec Loss 7.0927 LearningRate 0.0389 Epoch: 7 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:56,180-Speed 5498.15 samples/sec Loss 7.1846 LearningRate 0.0389 Epoch: 7 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:58,015-Speed 5583.41 samples/sec Loss 7.2995 LearningRate 0.0389 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:43:59,897-Speed 5443.47 samples/sec Loss 7.3514 LearningRate 0.0388 Epoch: 7 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:01,754-Speed 5518.83 samples/sec Loss 7.2803 LearningRate 0.0388 Epoch: 7 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:03,592-Speed 5571.92 samples/sec Loss 7.2229 LearningRate 0.0388 Epoch: 7 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:05,429-Speed 5579.01 samples/sec Loss 7.2823 LearningRate 0.0388 Epoch: 7 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:07,306-Speed 5457.72 samples/sec Loss 7.5271 LearningRate 0.0388 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:09,152-Speed 5550.01 samples/sec Loss 7.3261 LearningRate 0.0388 Epoch: 7 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:11,018-Speed 5491.85 samples/sec Loss 7.0790 LearningRate 0.0388 Epoch: 7 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:12,874-Speed 5518.74 samples/sec Loss 7.2296 LearningRate 0.0388 Epoch: 7 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:14,730-Speed 5521.06 samples/sec Loss 7.1841 LearningRate 0.0387 Epoch: 7 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:16,606-Speed 5462.94 samples/sec Loss 7.3462 LearningRate 0.0387 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:18,455-Speed 5540.61 samples/sec Loss 7.2598 LearningRate 0.0387 Epoch: 7 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:20,317-Speed 5502.77 samples/sec Loss 7.1967 LearningRate 0.0387 Epoch: 7 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:22,179-Speed 5500.56 samples/sec Loss 7.2344 LearningRate 0.0387 Epoch: 7 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:24,033-Speed 5527.56 samples/sec Loss 7.1425 LearningRate 0.0387 Epoch: 7 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:25,886-Speed 5530.02 samples/sec Loss 7.2873 LearningRate 0.0387 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:27,727-Speed 5564.35 samples/sec Loss 7.2885 LearningRate 0.0387 Epoch: 7 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:29,580-Speed 5529.59 samples/sec Loss 7.3564 LearningRate 0.0386 Epoch: 7 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:31,444-Speed 5496.72 samples/sec Loss 7.3050 LearningRate 0.0386 Epoch: 7 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:33,268-Speed 5619.00 samples/sec Loss 7.2733 LearningRate 0.0386 Epoch: 7 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:35,125-Speed 5516.99 samples/sec Loss 7.1728 LearningRate 0.0386 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:36,971-Speed 5549.30 samples/sec Loss 7.2631 LearningRate 0.0386 Epoch: 7 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:38,833-Speed 5503.28 samples/sec Loss 7.3069 LearningRate 0.0386 Epoch: 7 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:40,728-Speed 5408.33 samples/sec Loss 7.4530 LearningRate 0.0386 Epoch: 7 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:42,570-Speed 5559.75 samples/sec Loss 7.2695 LearningRate 0.0386 Epoch: 7 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:44,441-Speed 5477.82 samples/sec Loss 7.3927 LearningRate 0.0386 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:46,273-Speed 5592.55 samples/sec Loss 7.1840 LearningRate 0.0385 Epoch: 7 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:48,133-Speed 5510.15 samples/sec Loss 7.4239 LearningRate 0.0385 Epoch: 7 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:49,980-Speed 5545.92 samples/sec Loss 7.3684 LearningRate 0.0385 Epoch: 7 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:51,850-Speed 5480.21 samples/sec Loss 7.1400 LearningRate 0.0385 Epoch: 7 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:44:53,694-Speed 5553.59 samples/sec Loss 7.2392 LearningRate 0.0385 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:44:55,547-Speed 5530.34 samples/sec Loss 7.2331 LearningRate 0.0385 Epoch: 7 Global Step: 38410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:44:57,376-Speed 5601.13 samples/sec Loss 7.2665 LearningRate 0.0385 Epoch: 7 Global Step: 38420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:44:59,263-Speed 5429.11 samples/sec Loss 7.2239 LearningRate 0.0385 Epoch: 7 Global Step: 38430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:01,100-Speed 5578.85 samples/sec Loss 7.3788 LearningRate 0.0384 Epoch: 7 Global Step: 38440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:02,955-Speed 5523.23 samples/sec Loss 7.1639 LearningRate 0.0384 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:04,849-Speed 5412.08 samples/sec Loss 7.0478 LearningRate 0.0384 Epoch: 7 Global Step: 38460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:06,696-Speed 5547.20 samples/sec Loss 7.1936 LearningRate 0.0384 Epoch: 7 Global Step: 38470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:08,553-Speed 5516.08 samples/sec Loss 7.3348 LearningRate 0.0384 Epoch: 7 Global Step: 38480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:10,393-Speed 5570.54 samples/sec Loss 7.1867 LearningRate 0.0384 Epoch: 7 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:12,264-Speed 5474.41 samples/sec Loss 7.2439 LearningRate 0.0384 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:45:14,167-Speed 5384.47 samples/sec Loss 7.3309 LearningRate 0.0384 Epoch: 7 Global Step: 38510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:16,057-Speed 5420.26 samples/sec Loss 7.3102 LearningRate 0.0383 Epoch: 7 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:17,895-Speed 5576.15 samples/sec Loss 7.2619 LearningRate 0.0383 Epoch: 7 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:19,760-Speed 5493.56 samples/sec Loss 7.2023 LearningRate 0.0383 Epoch: 7 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:21,628-Speed 5486.07 samples/sec Loss 7.2656 LearningRate 0.0383 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:23,476-Speed 5543.47 samples/sec Loss 7.1761 LearningRate 0.0383 Epoch: 7 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:25,342-Speed 5489.79 samples/sec Loss 7.3147 LearningRate 0.0383 Epoch: 7 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:27,217-Speed 5463.05 samples/sec Loss 7.2121 LearningRate 0.0383 Epoch: 7 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:29,081-Speed 5496.54 samples/sec Loss 7.2783 LearningRate 0.0383 Epoch: 7 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:30,926-Speed 5553.30 samples/sec Loss 7.2454 LearningRate 0.0382 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:32,792-Speed 5491.22 samples/sec Loss 7.2765 LearningRate 0.0382 Epoch: 7 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:45:34,642-Speed 5537.88 samples/sec Loss 7.3737 LearningRate 0.0382 Epoch: 7 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:45:36,507-Speed 5494.21 samples/sec Loss 7.3236 LearningRate 0.0382 Epoch: 7 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:38,359-Speed 5532.67 samples/sec Loss 7.3940 LearningRate 0.0382 Epoch: 7 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:40,232-Speed 5469.55 samples/sec Loss 7.2158 LearningRate 0.0382 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:42,073-Speed 5565.75 samples/sec Loss 7.1282 LearningRate 0.0382 Epoch: 7 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:43,963-Speed 5420.76 samples/sec Loss 7.1754 LearningRate 0.0382 Epoch: 7 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:45,805-Speed 5563.52 samples/sec Loss 7.4189 LearningRate 0.0381 Epoch: 7 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:47,653-Speed 5542.84 samples/sec Loss 7.1285 LearningRate 0.0381 Epoch: 7 Global Step: 38690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:49,495-Speed 5561.72 samples/sec Loss 7.1391 LearningRate 0.0381 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:51,341-Speed 5549.68 samples/sec Loss 7.2488 LearningRate 0.0381 Epoch: 7 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:53,247-Speed 5376.78 samples/sec Loss 7.3477 LearningRate 0.0381 Epoch: 7 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:55,092-Speed 5553.70 samples/sec Loss 7.4401 LearningRate 0.0381 Epoch: 7 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:45:56,918-Speed 5609.80 samples/sec Loss 7.2984 LearningRate 0.0381 Epoch: 7 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:45:58,770-Speed 5531.66 samples/sec Loss 7.2323 LearningRate 0.0381 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:00,621-Speed 5534.81 samples/sec Loss 7.2751 LearningRate 0.0380 Epoch: 7 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:02,463-Speed 5562.10 samples/sec Loss 7.2455 LearningRate 0.0380 Epoch: 7 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:04,300-Speed 5574.28 samples/sec Loss 7.2960 LearningRate 0.0380 Epoch: 7 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:06,163-Speed 5499.64 samples/sec Loss 7.1548 LearningRate 0.0380 Epoch: 7 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:08,044-Speed 5448.12 samples/sec Loss 7.2430 LearningRate 0.0380 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:09,887-Speed 5559.68 samples/sec Loss 7.2703 LearningRate 0.0380 Epoch: 7 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:11,730-Speed 5558.64 samples/sec Loss 7.0479 LearningRate 0.0380 Epoch: 7 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:13,639-Speed 5367.14 samples/sec Loss 7.4424 LearningRate 0.0380 Epoch: 7 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:15,508-Speed 5482.72 samples/sec Loss 7.1784 LearningRate 0.0380 Epoch: 7 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:46:17,356-Speed 5544.67 samples/sec Loss 7.1335 LearningRate 0.0379 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:19,197-Speed 5564.81 samples/sec Loss 7.1020 LearningRate 0.0379 Epoch: 7 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:21,080-Speed 5440.02 samples/sec Loss 7.1936 LearningRate 0.0379 Epoch: 7 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:22,917-Speed 5579.71 samples/sec Loss 7.3675 LearningRate 0.0379 Epoch: 7 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:24,790-Speed 5470.54 samples/sec Loss 7.3256 LearningRate 0.0379 Epoch: 7 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:26,631-Speed 5565.33 samples/sec Loss 7.1014 LearningRate 0.0379 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:28,488-Speed 5517.76 samples/sec Loss 7.1317 LearningRate 0.0379 Epoch: 7 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:30,368-Speed 5450.05 samples/sec Loss 7.0521 LearningRate 0.0379 Epoch: 7 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:32,213-Speed 5553.31 samples/sec Loss 7.2641 LearningRate 0.0378 Epoch: 7 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:34,066-Speed 5527.30 samples/sec Loss 7.1140 LearningRate 0.0378 Epoch: 7 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:35,962-Speed 5405.98 samples/sec Loss 7.5098 LearningRate 0.0378 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:46:37,821-Speed 5511.33 samples/sec Loss 7.3043 LearningRate 0.0378 Epoch: 7 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:39,700-Speed 5453.61 samples/sec Loss 7.1402 LearningRate 0.0378 Epoch: 7 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:41,558-Speed 5512.39 samples/sec Loss 7.3424 LearningRate 0.0378 Epoch: 7 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:43,427-Speed 5483.77 samples/sec Loss 7.2992 LearningRate 0.0378 Epoch: 7 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:45,268-Speed 5566.26 samples/sec Loss 7.2855 LearningRate 0.0378 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:47,147-Speed 5450.33 samples/sec Loss 7.3365 LearningRate 0.0377 Epoch: 7 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:49,024-Speed 5459.74 samples/sec Loss 7.2385 LearningRate 0.0377 Epoch: 7 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:50,876-Speed 5533.64 samples/sec Loss 7.2063 LearningRate 0.0377 Epoch: 7 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:52,778-Speed 5386.95 samples/sec Loss 7.3576 LearningRate 0.0377 Epoch: 7 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:54,636-Speed 5514.17 samples/sec Loss 7.2324 LearningRate 0.0377 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:46:56,512-Speed 5460.10 samples/sec Loss 7.1616 LearningRate 0.0377 Epoch: 7 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:46:58,364-Speed 5531.65 samples/sec Loss 7.1052 LearningRate 0.0377 Epoch: 7 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:47:00,211-Speed 5549.36 samples/sec Loss 7.1308 LearningRate 0.0377 Epoch: 7 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:02,067-Speed 5519.18 samples/sec Loss 6.9994 LearningRate 0.0376 Epoch: 7 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:03,929-Speed 5502.63 samples/sec Loss 7.2671 LearningRate 0.0376 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:05,783-Speed 5528.43 samples/sec Loss 7.3306 LearningRate 0.0376 Epoch: 7 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:07,649-Speed 5490.32 samples/sec Loss 7.2233 LearningRate 0.0376 Epoch: 7 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:09,493-Speed 5556.61 samples/sec Loss 7.0392 LearningRate 0.0376 Epoch: 7 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:11,362-Speed 5481.59 samples/sec Loss 7.1560 LearningRate 0.0376 Epoch: 7 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:13,228-Speed 5491.27 samples/sec Loss 7.1962 LearningRate 0.0376 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:15,066-Speed 5580.01 samples/sec Loss 7.1829 LearningRate 0.0376 Epoch: 7 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:16,899-Speed 5586.50 samples/sec Loss 7.1415 LearningRate 0.0376 Epoch: 7 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:18,740-Speed 5563.76 samples/sec Loss 7.1927 LearningRate 0.0375 Epoch: 7 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:20,577-Speed 5578.76 samples/sec Loss 7.3200 LearningRate 0.0375 Epoch: 7 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:22,415-Speed 5574.93 samples/sec Loss 7.1317 LearningRate 0.0375 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:24,262-Speed 5546.15 samples/sec Loss 7.2727 LearningRate 0.0375 Epoch: 7 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:26,120-Speed 5514.52 samples/sec Loss 7.0880 LearningRate 0.0375 Epoch: 7 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:28,000-Speed 5447.70 samples/sec Loss 7.3559 LearningRate 0.0375 Epoch: 7 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:29,848-Speed 5545.45 samples/sec Loss 7.2037 LearningRate 0.0375 Epoch: 7 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:31,708-Speed 5506.40 samples/sec Loss 7.2257 LearningRate 0.0375 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:33,545-Speed 5579.00 samples/sec Loss 7.3363 LearningRate 0.0374 Epoch: 7 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:35,405-Speed 5506.63 samples/sec Loss 7.1088 LearningRate 0.0374 Epoch: 7 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:37,266-Speed 5505.49 samples/sec Loss 7.1435 LearningRate 0.0374 Epoch: 7 Global Step: 39280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:39,123-Speed 5518.70 samples/sec Loss 7.1530 LearningRate 0.0374 Epoch: 7 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:40,981-Speed 5515.50 samples/sec Loss 7.1660 LearningRate 0.0374 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:42,853-Speed 5472.11 samples/sec Loss 7.1749 LearningRate 0.0374 Epoch: 7 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:44,690-Speed 5579.06 samples/sec Loss 7.1915 LearningRate 0.0374 Epoch: 7 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:46,599-Speed 5366.01 samples/sec Loss 7.2188 LearningRate 0.0374 Epoch: 7 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:48,452-Speed 5526.65 samples/sec Loss 7.2466 LearningRate 0.0373 Epoch: 7 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:50,298-Speed 5552.77 samples/sec Loss 7.0846 LearningRate 0.0373 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:52,170-Speed 5471.25 samples/sec Loss 7.1408 LearningRate 0.0373 Epoch: 7 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:54,014-Speed 5557.91 samples/sec Loss 7.1505 LearningRate 0.0373 Epoch: 7 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:47:55,863-Speed 5539.92 samples/sec Loss 7.1338 LearningRate 0.0373 Epoch: 7 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:47:57,698-Speed 5584.44 samples/sec Loss 7.0347 LearningRate 0.0373 Epoch: 7 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:47:59,555-Speed 5517.02 samples/sec Loss 7.0921 LearningRate 0.0373 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:01,402-Speed 5547.68 samples/sec Loss 7.3151 LearningRate 0.0373 Epoch: 7 Global Step: 39410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:03,276-Speed 5466.46 samples/sec Loss 7.2297 LearningRate 0.0372 Epoch: 7 Global Step: 39420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:05,144-Speed 5484.91 samples/sec Loss 7.1553 LearningRate 0.0372 Epoch: 7 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:07,000-Speed 5520.31 samples/sec Loss 7.1540 LearningRate 0.0372 Epoch: 7 Global Step: 39440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:08,870-Speed 5477.51 samples/sec Loss 7.2544 LearningRate 0.0372 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:10,720-Speed 5541.27 samples/sec Loss 7.1635 LearningRate 0.0372 Epoch: 7 Global Step: 39460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:12,555-Speed 5582.95 samples/sec Loss 7.2813 LearningRate 0.0372 Epoch: 7 Global Step: 39470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:14,409-Speed 5525.95 samples/sec Loss 7.1896 LearningRate 0.0372 Epoch: 7 Global Step: 39480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:16,248-Speed 5571.63 samples/sec Loss 7.1857 LearningRate 0.0372 Epoch: 7 Global Step: 39490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:18,086-Speed 5574.44 samples/sec Loss 7.3030 LearningRate 0.0372 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:19,939-Speed 5529.27 samples/sec Loss 7.3020 LearningRate 0.0371 Epoch: 7 Global Step: 39510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:21,794-Speed 5523.70 samples/sec Loss 7.0638 LearningRate 0.0371 Epoch: 7 Global Step: 39520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:23,636-Speed 5560.26 samples/sec Loss 7.1983 LearningRate 0.0371 Epoch: 7 Global Step: 39530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:48:25,490-Speed 5526.33 samples/sec Loss 7.0780 LearningRate 0.0371 Epoch: 7 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:27,330-Speed 5569.68 samples/sec Loss 7.2688 LearningRate 0.0371 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:29,204-Speed 5465.70 samples/sec Loss 7.3026 LearningRate 0.0371 Epoch: 7 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:31,038-Speed 5587.99 samples/sec Loss 7.2399 LearningRate 0.0371 Epoch: 7 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:32,873-Speed 5582.16 samples/sec Loss 7.1582 LearningRate 0.0371 Epoch: 7 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:34,708-Speed 5582.17 samples/sec Loss 7.1583 LearningRate 0.0370 Epoch: 7 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:36,547-Speed 5572.34 samples/sec Loss 7.1304 LearningRate 0.0370 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:38,426-Speed 5450.26 samples/sec Loss 7.1254 LearningRate 0.0370 Epoch: 7 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:40,290-Speed 5496.66 samples/sec Loss 7.2501 LearningRate 0.0370 Epoch: 7 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:42,171-Speed 5448.42 samples/sec Loss 7.1059 LearningRate 0.0370 Epoch: 7 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:44,016-Speed 5550.39 samples/sec Loss 7.1074 LearningRate 0.0370 Epoch: 7 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:45,886-Speed 5478.74 samples/sec Loss 7.1305 LearningRate 0.0370 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:47,746-Speed 5509.24 samples/sec Loss 7.1140 LearningRate 0.0370 Epoch: 7 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:49,614-Speed 5483.24 samples/sec Loss 7.2545 LearningRate 0.0369 Epoch: 7 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:51,454-Speed 5572.14 samples/sec Loss 7.2272 LearningRate 0.0369 Epoch: 7 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:53,312-Speed 5511.41 samples/sec Loss 7.0514 LearningRate 0.0369 Epoch: 7 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:48:55,148-Speed 5582.82 samples/sec Loss 7.2392 LearningRate 0.0369 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:56,988-Speed 5568.91 samples/sec Loss 7.0384 LearningRate 0.0369 Epoch: 7 Global Step: 39710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:48:58,851-Speed 5500.09 samples/sec Loss 7.1776 LearningRate 0.0369 Epoch: 7 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:00,704-Speed 5527.57 samples/sec Loss 7.1014 LearningRate 0.0369 Epoch: 7 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:02,578-Speed 5469.42 samples/sec Loss 7.1293 LearningRate 0.0369 Epoch: 7 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:04,419-Speed 5563.74 samples/sec Loss 7.1175 LearningRate 0.0369 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:06,309-Speed 5422.01 samples/sec Loss 7.2100 LearningRate 0.0368 Epoch: 7 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:08,155-Speed 5548.80 samples/sec Loss 7.2767 LearningRate 0.0368 Epoch: 7 Global Step: 39770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:10,028-Speed 5472.64 samples/sec Loss 7.0829 LearningRate 0.0368 Epoch: 7 Global Step: 39780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:11,882-Speed 5525.42 samples/sec Loss 7.1318 LearningRate 0.0368 Epoch: 7 Global Step: 39790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:13,750-Speed 5483.77 samples/sec Loss 7.2383 LearningRate 0.0368 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:15,597-Speed 5547.82 samples/sec Loss 7.1660 LearningRate 0.0368 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:17,492-Speed 5407.52 samples/sec Loss 7.2523 LearningRate 0.0368 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:19,328-Speed 5578.41 samples/sec Loss 7.0912 LearningRate 0.0368 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:21,186-Speed 5513.95 samples/sec Loss 7.1846 LearningRate 0.0367 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:23,032-Speed 5551.27 samples/sec Loss 6.9945 LearningRate 0.0367 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:24,921-Speed 5426.07 samples/sec Loss 7.1986 LearningRate 0.0367 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:26,770-Speed 5538.37 samples/sec Loss 7.0925 LearningRate 0.0367 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:28,637-Speed 5486.21 samples/sec Loss 7.0917 LearningRate 0.0367 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:30,503-Speed 5494.07 samples/sec Loss 7.2112 LearningRate 0.0367 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:32,383-Speed 5448.19 samples/sec Loss 7.1259 LearningRate 0.0367 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:34,231-Speed 5543.18 samples/sec Loss 6.9011 LearningRate 0.0367 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:36,109-Speed 5457.01 samples/sec Loss 6.9648 LearningRate 0.0366 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:37,969-Speed 5508.05 samples/sec Loss 7.2155 LearningRate 0.0366 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:39,858-Speed 5424.47 samples/sec Loss 7.0926 LearningRate 0.0366 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:41,700-Speed 5561.39 samples/sec Loss 7.2699 LearningRate 0.0366 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:43,568-Speed 5485.79 samples/sec Loss 7.1662 LearningRate 0.0366 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:45,427-Speed 5509.95 samples/sec Loss 6.9965 LearningRate 0.0366 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:47,293-Speed 5492.36 samples/sec Loss 7.1114 LearningRate 0.0366 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:49:49,133-Speed 5565.34 samples/sec Loss 7.2556 LearningRate 0.0366 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:49:51,012-Speed 5458.44 samples/sec Loss 7.1388 LearningRate 0.0366 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:50:18,457-[lfw][40000]XNorm: 21.711813 Training: 2022-04-11 12:50:18,458-[lfw][40000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-11 12:50:18,458-[lfw][40000]Accuracy-Highest: 0.99767 Training: 2022-04-11 12:50:49,794-[cfp_fp][40000]XNorm: 19.317273 Training: 2022-04-11 12:50:49,794-[cfp_fp][40000]Accuracy-Flip: 0.96414+-0.01160 Training: 2022-04-11 12:50:49,795-[cfp_fp][40000]Accuracy-Highest: 0.96414 Training: 2022-04-11 12:51:16,982-[agedb_30][40000]XNorm: 21.591740 Training: 2022-04-11 12:51:16,983-[agedb_30][40000]Accuracy-Flip: 0.97567+-0.00602 Training: 2022-04-11 12:51:16,983-[agedb_30][40000]Accuracy-Highest: 0.97567 Training: 2022-04-11 12:51:18,854-Speed 116.57 samples/sec Loss 7.1806 LearningRate 0.0365 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:20,683-Speed 5600.81 samples/sec Loss 7.0874 LearningRate 0.0365 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:22,540-Speed 5517.48 samples/sec Loss 7.1300 LearningRate 0.0365 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:24,372-Speed 5593.32 samples/sec Loss 7.1171 LearningRate 0.0365 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:26,223-Speed 5533.34 samples/sec Loss 7.1693 LearningRate 0.0365 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:28,062-Speed 5571.03 samples/sec Loss 7.1524 LearningRate 0.0365 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:29,961-Speed 5397.03 samples/sec Loss 7.0678 LearningRate 0.0365 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:31,787-Speed 5608.45 samples/sec Loss 7.0991 LearningRate 0.0365 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:33,671-Speed 5440.00 samples/sec Loss 7.1324 LearningRate 0.0364 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:51:35,519-Speed 5543.75 samples/sec Loss 7.0850 LearningRate 0.0364 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:51:37,342-Speed 5619.68 samples/sec Loss 7.0926 LearningRate 0.0364 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:39,184-Speed 5562.31 samples/sec Loss 7.2416 LearningRate 0.0364 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:41,031-Speed 5545.47 samples/sec Loss 7.2158 LearningRate 0.0364 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:42,936-Speed 5381.47 samples/sec Loss 7.1444 LearningRate 0.0364 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:44,774-Speed 5573.32 samples/sec Loss 7.0095 LearningRate 0.0364 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:46,630-Speed 5520.02 samples/sec Loss 7.1711 LearningRate 0.0364 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:48,479-Speed 5541.40 samples/sec Loss 7.1747 LearningRate 0.0363 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:50,338-Speed 5509.39 samples/sec Loss 7.1285 LearningRate 0.0363 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:52,191-Speed 5531.20 samples/sec Loss 7.0384 LearningRate 0.0363 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:54,057-Speed 5489.60 samples/sec Loss 7.0183 LearningRate 0.0363 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:55,898-Speed 5564.98 samples/sec Loss 7.0969 LearningRate 0.0363 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:51:57,779-Speed 5449.46 samples/sec Loss 7.0555 LearningRate 0.0363 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:51:59,617-Speed 5572.47 samples/sec Loss 6.9275 LearningRate 0.0363 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:01,509-Speed 5414.25 samples/sec Loss 7.1170 LearningRate 0.0363 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:03,383-Speed 5468.58 samples/sec Loss 7.0848 LearningRate 0.0363 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:05,213-Speed 5599.34 samples/sec Loss 7.0953 LearningRate 0.0362 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:07,058-Speed 5552.00 samples/sec Loss 7.0553 LearningRate 0.0362 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:08,891-Speed 5591.34 samples/sec Loss 7.1141 LearningRate 0.0362 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:10,725-Speed 5582.88 samples/sec Loss 7.0501 LearningRate 0.0362 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:12,558-Speed 5589.27 samples/sec Loss 7.0091 LearningRate 0.0362 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:14,410-Speed 5532.96 samples/sec Loss 7.0876 LearningRate 0.0362 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:16,292-Speed 5443.82 samples/sec Loss 7.0404 LearningRate 0.0362 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:52:18,147-Speed 5521.63 samples/sec Loss 7.0027 LearningRate 0.0362 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:20,015-Speed 5485.81 samples/sec Loss 7.1544 LearningRate 0.0361 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:21,849-Speed 5585.27 samples/sec Loss 7.1984 LearningRate 0.0361 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:23,731-Speed 5444.93 samples/sec Loss 6.8776 LearningRate 0.0361 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:25,569-Speed 5574.73 samples/sec Loss 7.0660 LearningRate 0.0361 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:27,425-Speed 5520.74 samples/sec Loss 7.0552 LearningRate 0.0361 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:29,259-Speed 5587.87 samples/sec Loss 7.1136 LearningRate 0.0361 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:31,089-Speed 5594.85 samples/sec Loss 6.8979 LearningRate 0.0361 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:32,969-Speed 5452.07 samples/sec Loss 7.3568 LearningRate 0.0361 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:34,829-Speed 5506.32 samples/sec Loss 7.0255 LearningRate 0.0361 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:36,680-Speed 5537.05 samples/sec Loss 7.1005 LearningRate 0.0360 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:52:38,535-Speed 5522.79 samples/sec Loss 7.0519 LearningRate 0.0360 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:40,412-Speed 5458.85 samples/sec Loss 6.9119 LearningRate 0.0360 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:42,307-Speed 5406.53 samples/sec Loss 7.1936 LearningRate 0.0360 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:53,606-Speed 906.37 samples/sec Loss 6.4994 LearningRate 0.0360 Epoch: 8 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:55,490-Speed 5440.68 samples/sec Loss 6.1681 LearningRate 0.0360 Epoch: 8 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:57,368-Speed 5455.68 samples/sec Loss 6.1962 LearningRate 0.0360 Epoch: 8 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:52:59,237-Speed 5481.64 samples/sec Loss 6.2690 LearningRate 0.0360 Epoch: 8 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:01,367-Speed 4809.46 samples/sec Loss 6.1490 LearningRate 0.0359 Epoch: 8 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:03,230-Speed 5500.03 samples/sec Loss 6.0908 LearningRate 0.0359 Epoch: 8 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:05,065-Speed 5582.34 samples/sec Loss 6.3503 LearningRate 0.0359 Epoch: 8 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:06,928-Speed 5499.92 samples/sec Loss 6.2989 LearningRate 0.0359 Epoch: 8 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:53:08,766-Speed 5573.04 samples/sec Loss 6.2958 LearningRate 0.0359 Epoch: 8 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:10,620-Speed 5526.42 samples/sec Loss 6.2625 LearningRate 0.0359 Epoch: 8 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:12,470-Speed 5536.46 samples/sec Loss 6.3425 LearningRate 0.0359 Epoch: 8 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:14,317-Speed 5547.55 samples/sec Loss 6.3070 LearningRate 0.0359 Epoch: 8 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:16,173-Speed 5520.66 samples/sec Loss 6.3457 LearningRate 0.0359 Epoch: 8 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:18,009-Speed 5581.80 samples/sec Loss 6.4110 LearningRate 0.0358 Epoch: 8 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:19,843-Speed 5586.52 samples/sec Loss 6.2888 LearningRate 0.0358 Epoch: 8 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:21,714-Speed 5476.03 samples/sec Loss 6.5423 LearningRate 0.0358 Epoch: 8 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:23,557-Speed 5558.14 samples/sec Loss 6.3904 LearningRate 0.0358 Epoch: 8 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:25,405-Speed 5544.67 samples/sec Loss 6.4062 LearningRate 0.0358 Epoch: 8 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:27,295-Speed 5421.95 samples/sec Loss 6.4455 LearningRate 0.0358 Epoch: 8 Global Step: 40650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:29,148-Speed 5527.60 samples/sec Loss 6.4658 LearningRate 0.0358 Epoch: 8 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:31,000-Speed 5532.46 samples/sec Loss 6.5028 LearningRate 0.0358 Epoch: 8 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:32,840-Speed 5569.10 samples/sec Loss 6.4932 LearningRate 0.0357 Epoch: 8 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:34,697-Speed 5515.51 samples/sec Loss 6.2801 LearningRate 0.0357 Epoch: 8 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:36,524-Speed 5609.03 samples/sec Loss 6.5887 LearningRate 0.0357 Epoch: 8 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:38,380-Speed 5520.91 samples/sec Loss 6.6162 LearningRate 0.0357 Epoch: 8 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:40,218-Speed 5574.10 samples/sec Loss 6.4475 LearningRate 0.0357 Epoch: 8 Global Step: 40720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:42,071-Speed 5528.77 samples/sec Loss 6.3871 LearningRate 0.0357 Epoch: 8 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:43,911-Speed 5568.04 samples/sec Loss 6.4052 LearningRate 0.0357 Epoch: 8 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:45,740-Speed 5602.40 samples/sec Loss 6.4139 LearningRate 0.0357 Epoch: 8 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:53:47,581-Speed 5563.66 samples/sec Loss 6.5116 LearningRate 0.0356 Epoch: 8 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:53:49,426-Speed 5554.91 samples/sec Loss 6.6111 LearningRate 0.0356 Epoch: 8 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:51,326-Speed 5389.06 samples/sec Loss 6.3383 LearningRate 0.0356 Epoch: 8 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:53,161-Speed 5584.60 samples/sec Loss 6.5111 LearningRate 0.0356 Epoch: 8 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:54,992-Speed 5595.58 samples/sec Loss 6.4119 LearningRate 0.0356 Epoch: 8 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:56,831-Speed 5571.43 samples/sec Loss 6.3418 LearningRate 0.0356 Epoch: 8 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:53:58,669-Speed 5572.28 samples/sec Loss 6.5225 LearningRate 0.0356 Epoch: 8 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:00,514-Speed 5555.04 samples/sec Loss 6.4528 LearningRate 0.0356 Epoch: 8 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:02,388-Speed 5467.21 samples/sec Loss 6.5992 LearningRate 0.0356 Epoch: 8 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:04,250-Speed 5503.54 samples/sec Loss 6.5112 LearningRate 0.0355 Epoch: 8 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:06,133-Speed 5438.53 samples/sec Loss 6.4350 LearningRate 0.0355 Epoch: 8 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:07,965-Speed 5595.23 samples/sec Loss 6.5268 LearningRate 0.0355 Epoch: 8 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:09,801-Speed 5581.42 samples/sec Loss 6.6316 LearningRate 0.0355 Epoch: 8 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:11,668-Speed 5486.77 samples/sec Loss 6.4848 LearningRate 0.0355 Epoch: 8 Global Step: 40890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:13,511-Speed 5559.98 samples/sec Loss 6.5970 LearningRate 0.0355 Epoch: 8 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:15,360-Speed 5542.83 samples/sec Loss 6.6499 LearningRate 0.0355 Epoch: 8 Global Step: 40910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:17,214-Speed 5525.76 samples/sec Loss 6.5167 LearningRate 0.0355 Epoch: 8 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:19,049-Speed 5585.02 samples/sec Loss 6.5807 LearningRate 0.0354 Epoch: 8 Global Step: 40930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:20,883-Speed 5585.06 samples/sec Loss 6.5908 LearningRate 0.0354 Epoch: 8 Global Step: 40940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:22,858-Speed 5186.84 samples/sec Loss 6.5316 LearningRate 0.0354 Epoch: 8 Global Step: 40950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:24,716-Speed 5517.26 samples/sec Loss 6.6614 LearningRate 0.0354 Epoch: 8 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:26,590-Speed 5465.86 samples/sec Loss 6.5127 LearningRate 0.0354 Epoch: 8 Global Step: 40970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:28,446-Speed 5521.11 samples/sec Loss 6.6580 LearningRate 0.0354 Epoch: 8 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:30,317-Speed 5473.13 samples/sec Loss 6.6972 LearningRate 0.0354 Epoch: 8 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:32,153-Speed 5581.67 samples/sec Loss 6.3254 LearningRate 0.0354 Epoch: 8 Global Step: 41000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:34,039-Speed 5433.17 samples/sec Loss 6.7181 LearningRate 0.0354 Epoch: 8 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:35,910-Speed 5474.67 samples/sec Loss 6.5773 LearningRate 0.0353 Epoch: 8 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:37,756-Speed 5551.41 samples/sec Loss 6.4268 LearningRate 0.0353 Epoch: 8 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:39,604-Speed 5544.35 samples/sec Loss 6.5583 LearningRate 0.0353 Epoch: 8 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:41,444-Speed 5570.56 samples/sec Loss 6.4667 LearningRate 0.0353 Epoch: 8 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:43,287-Speed 5561.12 samples/sec Loss 6.6930 LearningRate 0.0353 Epoch: 8 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:45,143-Speed 5518.84 samples/sec Loss 6.5914 LearningRate 0.0353 Epoch: 8 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:54:47,009-Speed 5492.08 samples/sec Loss 6.5599 LearningRate 0.0353 Epoch: 8 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:48,844-Speed 5585.23 samples/sec Loss 6.7914 LearningRate 0.0353 Epoch: 8 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:50,683-Speed 5570.15 samples/sec Loss 6.6186 LearningRate 0.0352 Epoch: 8 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:52,535-Speed 5533.72 samples/sec Loss 6.7431 LearningRate 0.0352 Epoch: 8 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:54,398-Speed 5499.60 samples/sec Loss 6.5158 LearningRate 0.0352 Epoch: 8 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:56,243-Speed 5552.44 samples/sec Loss 6.6296 LearningRate 0.0352 Epoch: 8 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:58,077-Speed 5584.63 samples/sec Loss 6.7111 LearningRate 0.0352 Epoch: 8 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:54:59,929-Speed 5530.54 samples/sec Loss 6.5551 LearningRate 0.0352 Epoch: 8 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:01,778-Speed 5541.58 samples/sec Loss 6.5893 LearningRate 0.0352 Epoch: 8 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:03,678-Speed 5392.22 samples/sec Loss 6.5234 LearningRate 0.0352 Epoch: 8 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:05,519-Speed 5564.83 samples/sec Loss 6.7267 LearningRate 0.0352 Epoch: 8 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:55:07,350-Speed 5596.19 samples/sec Loss 6.7300 LearningRate 0.0351 Epoch: 8 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:09,195-Speed 5554.20 samples/sec Loss 6.7771 LearningRate 0.0351 Epoch: 8 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:11,097-Speed 5385.99 samples/sec Loss 6.5884 LearningRate 0.0351 Epoch: 8 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:12,966-Speed 5482.71 samples/sec Loss 6.6635 LearningRate 0.0351 Epoch: 8 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:14,846-Speed 5449.48 samples/sec Loss 6.6365 LearningRate 0.0351 Epoch: 8 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:16,680-Speed 5586.45 samples/sec Loss 6.6811 LearningRate 0.0351 Epoch: 8 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:18,583-Speed 5384.28 samples/sec Loss 6.6635 LearningRate 0.0351 Epoch: 8 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:20,445-Speed 5501.99 samples/sec Loss 6.8160 LearningRate 0.0351 Epoch: 8 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:22,278-Speed 5587.87 samples/sec Loss 6.6042 LearningRate 0.0351 Epoch: 8 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:24,154-Speed 5462.08 samples/sec Loss 6.6654 LearningRate 0.0350 Epoch: 8 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:25,998-Speed 5557.55 samples/sec Loss 6.7233 LearningRate 0.0350 Epoch: 8 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:55:27,857-Speed 5510.70 samples/sec Loss 6.8340 LearningRate 0.0350 Epoch: 8 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:55:29,723-Speed 5489.75 samples/sec Loss 6.7550 LearningRate 0.0350 Epoch: 8 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:55:31,566-Speed 5558.42 samples/sec Loss 6.8216 LearningRate 0.0350 Epoch: 8 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:55:33,398-Speed 5594.26 samples/sec Loss 6.7038 LearningRate 0.0350 Epoch: 8 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:35,254-Speed 5519.07 samples/sec Loss 6.7546 LearningRate 0.0350 Epoch: 8 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:37,115-Speed 5506.11 samples/sec Loss 6.7979 LearningRate 0.0350 Epoch: 8 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:38,956-Speed 5565.87 samples/sec Loss 6.7172 LearningRate 0.0349 Epoch: 8 Global Step: 41360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:40,803-Speed 5545.22 samples/sec Loss 6.7757 LearningRate 0.0349 Epoch: 8 Global Step: 41370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:42,670-Speed 5489.52 samples/sec Loss 6.6520 LearningRate 0.0349 Epoch: 8 Global Step: 41380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:44,512-Speed 5563.78 samples/sec Loss 6.7837 LearningRate 0.0349 Epoch: 8 Global Step: 41390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:46,355-Speed 5558.30 samples/sec Loss 6.7397 LearningRate 0.0349 Epoch: 8 Global Step: 41400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:48,224-Speed 5484.70 samples/sec Loss 6.8185 LearningRate 0.0349 Epoch: 8 Global Step: 41410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:50,068-Speed 5554.26 samples/sec Loss 6.7104 LearningRate 0.0349 Epoch: 8 Global Step: 41420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:51,908-Speed 5567.32 samples/sec Loss 6.7141 LearningRate 0.0349 Epoch: 8 Global Step: 41430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:53,751-Speed 5561.57 samples/sec Loss 6.8510 LearningRate 0.0349 Epoch: 8 Global Step: 41440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:55,597-Speed 5551.49 samples/sec Loss 6.8126 LearningRate 0.0348 Epoch: 8 Global Step: 41450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:55:57,433-Speed 5580.41 samples/sec Loss 6.6420 LearningRate 0.0348 Epoch: 8 Global Step: 41460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:55:59,295-Speed 5503.28 samples/sec Loss 6.6847 LearningRate 0.0348 Epoch: 8 Global Step: 41470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:01,140-Speed 5553.21 samples/sec Loss 6.7447 LearningRate 0.0348 Epoch: 8 Global Step: 41480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:03,031-Speed 5418.75 samples/sec Loss 6.5563 LearningRate 0.0348 Epoch: 8 Global Step: 41490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:04,887-Speed 5519.98 samples/sec Loss 6.8907 LearningRate 0.0348 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:06,755-Speed 5485.15 samples/sec Loss 6.7297 LearningRate 0.0348 Epoch: 8 Global Step: 41510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:08,588-Speed 5591.66 samples/sec Loss 6.7965 LearningRate 0.0348 Epoch: 8 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:10,453-Speed 5491.72 samples/sec Loss 6.7015 LearningRate 0.0347 Epoch: 8 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:12,298-Speed 5554.34 samples/sec Loss 6.7559 LearningRate 0.0347 Epoch: 8 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:14,139-Speed 5563.92 samples/sec Loss 6.7466 LearningRate 0.0347 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:15,971-Speed 5594.55 samples/sec Loss 6.8363 LearningRate 0.0347 Epoch: 8 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:17,824-Speed 5527.58 samples/sec Loss 6.5704 LearningRate 0.0347 Epoch: 8 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:19,676-Speed 5532.05 samples/sec Loss 6.6520 LearningRate 0.0347 Epoch: 8 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:21,549-Speed 5471.88 samples/sec Loss 6.6998 LearningRate 0.0347 Epoch: 8 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:23,393-Speed 5556.04 samples/sec Loss 6.6978 LearningRate 0.0347 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:25,262-Speed 5481.10 samples/sec Loss 6.6541 LearningRate 0.0347 Epoch: 8 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:27,128-Speed 5491.85 samples/sec Loss 6.9014 LearningRate 0.0346 Epoch: 8 Global Step: 41620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:29,011-Speed 5442.64 samples/sec Loss 6.8556 LearningRate 0.0346 Epoch: 8 Global Step: 41630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:30,866-Speed 5522.85 samples/sec Loss 6.7471 LearningRate 0.0346 Epoch: 8 Global Step: 41640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:32,726-Speed 5507.39 samples/sec Loss 6.7684 LearningRate 0.0346 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:34,562-Speed 5579.89 samples/sec Loss 6.8252 LearningRate 0.0346 Epoch: 8 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:56:36,422-Speed 5508.02 samples/sec Loss 6.7134 LearningRate 0.0346 Epoch: 8 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:56:38,270-Speed 5544.40 samples/sec Loss 6.8432 LearningRate 0.0346 Epoch: 8 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:56:40,107-Speed 5577.11 samples/sec Loss 6.6112 LearningRate 0.0346 Epoch: 8 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:41,974-Speed 5488.52 samples/sec Loss 6.6571 LearningRate 0.0345 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:43,808-Speed 5585.54 samples/sec Loss 6.7959 LearningRate 0.0345 Epoch: 8 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:45,657-Speed 5540.82 samples/sec Loss 6.9104 LearningRate 0.0345 Epoch: 8 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:47,491-Speed 5587.60 samples/sec Loss 6.7875 LearningRate 0.0345 Epoch: 8 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:49,328-Speed 5576.29 samples/sec Loss 6.8621 LearningRate 0.0345 Epoch: 8 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:51,162-Speed 5587.83 samples/sec Loss 6.8637 LearningRate 0.0345 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:52,999-Speed 5577.41 samples/sec Loss 6.6582 LearningRate 0.0345 Epoch: 8 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:54,851-Speed 5529.23 samples/sec Loss 7.0618 LearningRate 0.0345 Epoch: 8 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:56,695-Speed 5555.55 samples/sec Loss 6.7641 LearningRate 0.0345 Epoch: 8 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:56:58,544-Speed 5543.77 samples/sec Loss 6.8819 LearningRate 0.0344 Epoch: 8 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:57:00,378-Speed 5586.39 samples/sec Loss 6.7519 LearningRate 0.0344 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:02,231-Speed 5527.39 samples/sec Loss 6.8178 LearningRate 0.0344 Epoch: 8 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:04,086-Speed 5521.78 samples/sec Loss 6.7201 LearningRate 0.0344 Epoch: 8 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:05,930-Speed 5557.04 samples/sec Loss 6.7628 LearningRate 0.0344 Epoch: 8 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:07,761-Speed 5593.22 samples/sec Loss 6.8591 LearningRate 0.0344 Epoch: 8 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:09,591-Speed 5597.89 samples/sec Loss 6.6987 LearningRate 0.0344 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:11,418-Speed 5606.05 samples/sec Loss 6.6992 LearningRate 0.0344 Epoch: 8 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:13,262-Speed 5556.72 samples/sec Loss 6.8794 LearningRate 0.0344 Epoch: 8 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:15,099-Speed 5576.52 samples/sec Loss 6.6153 LearningRate 0.0343 Epoch: 8 Global Step: 41880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:16,935-Speed 5581.78 samples/sec Loss 6.6003 LearningRate 0.0343 Epoch: 8 Global Step: 41890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:18,770-Speed 5582.96 samples/sec Loss 6.7566 LearningRate 0.0343 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:20,605-Speed 5582.69 samples/sec Loss 6.8044 LearningRate 0.0343 Epoch: 8 Global Step: 41910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:22,440-Speed 5579.62 samples/sec Loss 6.6694 LearningRate 0.0343 Epoch: 8 Global Step: 41920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:24,273-Speed 5591.30 samples/sec Loss 6.7956 LearningRate 0.0343 Epoch: 8 Global Step: 41930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:26,108-Speed 5580.79 samples/sec Loss 6.8718 LearningRate 0.0343 Epoch: 8 Global Step: 41940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:27,947-Speed 5571.49 samples/sec Loss 6.7435 LearningRate 0.0343 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:29,793-Speed 5550.60 samples/sec Loss 6.8566 LearningRate 0.0342 Epoch: 8 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:57:31,627-Speed 5582.45 samples/sec Loss 6.8662 LearningRate 0.0342 Epoch: 8 Global Step: 41970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:33,478-Speed 5536.45 samples/sec Loss 6.9062 LearningRate 0.0342 Epoch: 8 Global Step: 41980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:35,313-Speed 5582.95 samples/sec Loss 6.9330 LearningRate 0.0342 Epoch: 8 Global Step: 41990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:57:37,153-Speed 5568.89 samples/sec Loss 6.8319 LearningRate 0.0342 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:58:04,422-[lfw][42000]XNorm: 23.336758 Training: 2022-04-11 12:58:04,422-[lfw][42000]Accuracy-Flip: 0.99750+-0.00227 Training: 2022-04-11 12:58:04,423-[lfw][42000]Accuracy-Highest: 0.99767 Training: 2022-04-11 12:58:36,034-[cfp_fp][42000]XNorm: 20.295787 Training: 2022-04-11 12:58:36,035-[cfp_fp][42000]Accuracy-Flip: 0.95957+-0.00998 Training: 2022-04-11 12:58:36,036-[cfp_fp][42000]Accuracy-Highest: 0.96414 Training: 2022-04-11 12:59:03,276-[agedb_30][42000]XNorm: 22.960819 Training: 2022-04-11 12:59:03,277-[agedb_30][42000]Accuracy-Flip: 0.97533+-0.00802 Training: 2022-04-11 12:59:03,277-[agedb_30][42000]Accuracy-Highest: 0.97567 Training: 2022-04-11 12:59:05,140-Speed 116.38 samples/sec Loss 6.7342 LearningRate 0.0342 Epoch: 8 Global Step: 42010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:06,991-Speed 5535.02 samples/sec Loss 6.8189 LearningRate 0.0342 Epoch: 8 Global Step: 42020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:08,817-Speed 5611.18 samples/sec Loss 6.7574 LearningRate 0.0342 Epoch: 8 Global Step: 42030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:10,654-Speed 5576.33 samples/sec Loss 6.7431 LearningRate 0.0342 Epoch: 8 Global Step: 42040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:12,491-Speed 5577.92 samples/sec Loss 6.8638 LearningRate 0.0341 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:14,336-Speed 5555.76 samples/sec Loss 6.7581 LearningRate 0.0341 Epoch: 8 Global Step: 42060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 12:59:16,172-Speed 5579.32 samples/sec Loss 6.8437 LearningRate 0.0341 Epoch: 8 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:18,010-Speed 5572.82 samples/sec Loss 6.8691 LearningRate 0.0341 Epoch: 8 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:19,878-Speed 5488.60 samples/sec Loss 6.8858 LearningRate 0.0341 Epoch: 8 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:21,729-Speed 5533.76 samples/sec Loss 6.6954 LearningRate 0.0341 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:23,576-Speed 5546.01 samples/sec Loss 6.8480 LearningRate 0.0341 Epoch: 8 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:25,428-Speed 5532.95 samples/sec Loss 6.7276 LearningRate 0.0341 Epoch: 8 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:27,316-Speed 5425.98 samples/sec Loss 6.8074 LearningRate 0.0341 Epoch: 8 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:29,150-Speed 5586.97 samples/sec Loss 6.6836 LearningRate 0.0340 Epoch: 8 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:30,992-Speed 5565.08 samples/sec Loss 6.7984 LearningRate 0.0340 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:32,822-Speed 5595.68 samples/sec Loss 6.7559 LearningRate 0.0340 Epoch: 8 Global Step: 42160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:34,668-Speed 5553.50 samples/sec Loss 6.6072 LearningRate 0.0340 Epoch: 8 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:59:36,506-Speed 5572.54 samples/sec Loss 6.6640 LearningRate 0.0340 Epoch: 8 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:59:38,367-Speed 5505.94 samples/sec Loss 6.7541 LearningRate 0.0340 Epoch: 8 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:59:40,201-Speed 5588.06 samples/sec Loss 6.7303 LearningRate 0.0340 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 12:59:42,039-Speed 5573.35 samples/sec Loss 6.8249 LearningRate 0.0340 Epoch: 8 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:43,881-Speed 5563.18 samples/sec Loss 6.6799 LearningRate 0.0339 Epoch: 8 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:45,716-Speed 5580.69 samples/sec Loss 6.8450 LearningRate 0.0339 Epoch: 8 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:47,569-Speed 5527.90 samples/sec Loss 6.7494 LearningRate 0.0339 Epoch: 8 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:49,418-Speed 5543.96 samples/sec Loss 6.7784 LearningRate 0.0339 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:51,276-Speed 5514.10 samples/sec Loss 6.7476 LearningRate 0.0339 Epoch: 8 Global Step: 42260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:53,110-Speed 5585.39 samples/sec Loss 6.8323 LearningRate 0.0339 Epoch: 8 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:54,941-Speed 5597.32 samples/sec Loss 6.7838 LearningRate 0.0339 Epoch: 8 Global Step: 42280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:56,774-Speed 5588.34 samples/sec Loss 6.8291 LearningRate 0.0339 Epoch: 8 Global Step: 42290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 12:59:58,621-Speed 5547.35 samples/sec Loss 6.7886 LearningRate 0.0339 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:00,468-Speed 5548.21 samples/sec Loss 6.8730 LearningRate 0.0338 Epoch: 8 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:00:02,312-Speed 5554.57 samples/sec Loss 6.7171 LearningRate 0.0338 Epoch: 8 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:00:04,156-Speed 5558.15 samples/sec Loss 6.7066 LearningRate 0.0338 Epoch: 8 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:05,994-Speed 5576.76 samples/sec Loss 6.6483 LearningRate 0.0338 Epoch: 8 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:07,843-Speed 5540.52 samples/sec Loss 6.8202 LearningRate 0.0338 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:09,703-Speed 5508.97 samples/sec Loss 6.9380 LearningRate 0.0338 Epoch: 8 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:11,542-Speed 5570.06 samples/sec Loss 6.9029 LearningRate 0.0338 Epoch: 8 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:13,405-Speed 5502.88 samples/sec Loss 6.7246 LearningRate 0.0338 Epoch: 8 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:15,257-Speed 5533.90 samples/sec Loss 6.7584 LearningRate 0.0338 Epoch: 8 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:17,121-Speed 5494.71 samples/sec Loss 6.7873 LearningRate 0.0337 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:18,963-Speed 5565.01 samples/sec Loss 6.8838 LearningRate 0.0337 Epoch: 8 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:20,837-Speed 5466.37 samples/sec Loss 6.7079 LearningRate 0.0337 Epoch: 8 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:22,670-Speed 5588.26 samples/sec Loss 6.8712 LearningRate 0.0337 Epoch: 8 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:00:24,539-Speed 5482.79 samples/sec Loss 6.8627 LearningRate 0.0337 Epoch: 8 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:00:26,379-Speed 5567.00 samples/sec Loss 6.8254 LearningRate 0.0337 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:28,233-Speed 5526.17 samples/sec Loss 6.9395 LearningRate 0.0337 Epoch: 8 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:30,084-Speed 5538.35 samples/sec Loss 6.6583 LearningRate 0.0337 Epoch: 8 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:31,929-Speed 5551.11 samples/sec Loss 6.9545 LearningRate 0.0336 Epoch: 8 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:33,803-Speed 5502.56 samples/sec Loss 6.7555 LearningRate 0.0336 Epoch: 8 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:35,660-Speed 5517.99 samples/sec Loss 6.7420 LearningRate 0.0336 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:37,528-Speed 5484.02 samples/sec Loss 6.7120 LearningRate 0.0336 Epoch: 8 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:39,410-Speed 5446.38 samples/sec Loss 6.7066 LearningRate 0.0336 Epoch: 8 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:41,248-Speed 5574.89 samples/sec Loss 6.8835 LearningRate 0.0336 Epoch: 8 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:43,093-Speed 5551.24 samples/sec Loss 6.7890 LearningRate 0.0336 Epoch: 8 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:44,956-Speed 5500.86 samples/sec Loss 6.7638 LearningRate 0.0336 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:00:46,784-Speed 5604.13 samples/sec Loss 6.7055 LearningRate 0.0336 Epoch: 8 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:48,664-Speed 5449.57 samples/sec Loss 6.7590 LearningRate 0.0335 Epoch: 8 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:50,504-Speed 5568.19 samples/sec Loss 6.8193 LearningRate 0.0335 Epoch: 8 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:00:52,337-Speed 5590.89 samples/sec Loss 6.6093 LearningRate 0.0335 Epoch: 8 Global Step: 42590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:00:54,199-Speed 5503.69 samples/sec Loss 6.6793 LearningRate 0.0335 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:00:56,042-Speed 5557.15 samples/sec Loss 7.0184 LearningRate 0.0335 Epoch: 8 Global Step: 42610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:00:57,878-Speed 5579.53 samples/sec Loss 6.7545 LearningRate 0.0335 Epoch: 8 Global Step: 42620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:00:59,721-Speed 5560.24 samples/sec Loss 6.7609 LearningRate 0.0335 Epoch: 8 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:01,596-Speed 5464.86 samples/sec Loss 6.6182 LearningRate 0.0335 Epoch: 8 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:03,497-Speed 5388.78 samples/sec Loss 6.7193 LearningRate 0.0335 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:05,338-Speed 5564.31 samples/sec Loss 6.7362 LearningRate 0.0334 Epoch: 8 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:07,236-Speed 5399.36 samples/sec Loss 6.7710 LearningRate 0.0334 Epoch: 8 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:09,077-Speed 5566.23 samples/sec Loss 6.8273 LearningRate 0.0334 Epoch: 8 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:01:10,937-Speed 5507.97 samples/sec Loss 6.7523 LearningRate 0.0334 Epoch: 8 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:12,774-Speed 5575.02 samples/sec Loss 6.6902 LearningRate 0.0334 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:14,611-Speed 5577.09 samples/sec Loss 6.7339 LearningRate 0.0334 Epoch: 8 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:16,480-Speed 5484.74 samples/sec Loss 6.8472 LearningRate 0.0334 Epoch: 8 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:18,344-Speed 5495.88 samples/sec Loss 6.8331 LearningRate 0.0334 Epoch: 8 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:20,194-Speed 5538.97 samples/sec Loss 6.9060 LearningRate 0.0334 Epoch: 8 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:22,092-Speed 5397.42 samples/sec Loss 6.9642 LearningRate 0.0333 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:23,928-Speed 5581.58 samples/sec Loss 6.6711 LearningRate 0.0333 Epoch: 8 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:25,790-Speed 5502.72 samples/sec Loss 6.8145 LearningRate 0.0333 Epoch: 8 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:27,639-Speed 5540.91 samples/sec Loss 6.7969 LearningRate 0.0333 Epoch: 8 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:29,507-Speed 5485.43 samples/sec Loss 6.7171 LearningRate 0.0333 Epoch: 8 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:01:31,345-Speed 5572.38 samples/sec Loss 6.8266 LearningRate 0.0333 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:01:33,205-Speed 5508.05 samples/sec Loss 6.8036 LearningRate 0.0333 Epoch: 8 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:01:35,062-Speed 5520.94 samples/sec Loss 6.8098 LearningRate 0.0333 Epoch: 8 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:36,897-Speed 5583.79 samples/sec Loss 6.7602 LearningRate 0.0332 Epoch: 8 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:38,778-Speed 5447.74 samples/sec Loss 6.6805 LearningRate 0.0332 Epoch: 8 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:40,653-Speed 5463.36 samples/sec Loss 6.8152 LearningRate 0.0332 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:42,519-Speed 5491.97 samples/sec Loss 6.6921 LearningRate 0.0332 Epoch: 8 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:44,367-Speed 5544.64 samples/sec Loss 6.8180 LearningRate 0.0332 Epoch: 8 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:46,205-Speed 5572.38 samples/sec Loss 6.8339 LearningRate 0.0332 Epoch: 8 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:48,062-Speed 5518.21 samples/sec Loss 6.8608 LearningRate 0.0332 Epoch: 8 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:49,973-Speed 5361.61 samples/sec Loss 6.6613 LearningRate 0.0332 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:51,810-Speed 5576.83 samples/sec Loss 6.6685 LearningRate 0.0332 Epoch: 8 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:53,658-Speed 5546.23 samples/sec Loss 6.8886 LearningRate 0.0331 Epoch: 8 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:01:55,485-Speed 5607.34 samples/sec Loss 6.8878 LearningRate 0.0331 Epoch: 8 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:57,329-Speed 5555.34 samples/sec Loss 6.7349 LearningRate 0.0331 Epoch: 8 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:01:59,164-Speed 5583.53 samples/sec Loss 6.7317 LearningRate 0.0331 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:01,010-Speed 5550.53 samples/sec Loss 6.7626 LearningRate 0.0331 Epoch: 8 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:02,888-Speed 5454.55 samples/sec Loss 6.7822 LearningRate 0.0331 Epoch: 8 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:04,747-Speed 5513.80 samples/sec Loss 6.7680 LearningRate 0.0331 Epoch: 8 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:06,616-Speed 5481.28 samples/sec Loss 6.7643 LearningRate 0.0331 Epoch: 8 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:08,455-Speed 5572.66 samples/sec Loss 6.7828 LearningRate 0.0331 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:10,299-Speed 5555.90 samples/sec Loss 6.6962 LearningRate 0.0330 Epoch: 8 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:12,161-Speed 5500.78 samples/sec Loss 6.7630 LearningRate 0.0330 Epoch: 8 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:14,068-Speed 5373.22 samples/sec Loss 6.7574 LearningRate 0.0330 Epoch: 8 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:02:15,907-Speed 5571.68 samples/sec Loss 6.6339 LearningRate 0.0330 Epoch: 8 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:17,786-Speed 5451.01 samples/sec Loss 6.7419 LearningRate 0.0330 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:19,641-Speed 5524.45 samples/sec Loss 6.6245 LearningRate 0.0330 Epoch: 8 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:21,511-Speed 5479.30 samples/sec Loss 6.7720 LearningRate 0.0330 Epoch: 8 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:23,361-Speed 5535.91 samples/sec Loss 6.5486 LearningRate 0.0330 Epoch: 8 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:25,219-Speed 5516.87 samples/sec Loss 6.8924 LearningRate 0.0330 Epoch: 8 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:27,076-Speed 5517.47 samples/sec Loss 6.9396 LearningRate 0.0329 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:28,914-Speed 5576.21 samples/sec Loss 6.7810 LearningRate 0.0329 Epoch: 8 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:30,760-Speed 5551.58 samples/sec Loss 6.7275 LearningRate 0.0329 Epoch: 8 Global Step: 43120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:32,626-Speed 5489.90 samples/sec Loss 6.6861 LearningRate 0.0329 Epoch: 8 Global Step: 43130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:34,490-Speed 5499.12 samples/sec Loss 6.7367 LearningRate 0.0329 Epoch: 8 Global Step: 43140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:36,344-Speed 5526.23 samples/sec Loss 6.7123 LearningRate 0.0329 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:38,238-Speed 5408.83 samples/sec Loss 6.7597 LearningRate 0.0329 Epoch: 8 Global Step: 43160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:40,086-Speed 5544.60 samples/sec Loss 6.9539 LearningRate 0.0329 Epoch: 8 Global Step: 43170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:41,955-Speed 5481.56 samples/sec Loss 6.7419 LearningRate 0.0329 Epoch: 8 Global Step: 43180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:43,809-Speed 5527.17 samples/sec Loss 6.8551 LearningRate 0.0328 Epoch: 8 Global Step: 43190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:45,671-Speed 5504.10 samples/sec Loss 6.7676 LearningRate 0.0328 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:47,543-Speed 5472.43 samples/sec Loss 6.6800 LearningRate 0.0328 Epoch: 8 Global Step: 43210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:02:49,378-Speed 5584.04 samples/sec Loss 6.7141 LearningRate 0.0328 Epoch: 8 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:51,248-Speed 5478.81 samples/sec Loss 6.7122 LearningRate 0.0328 Epoch: 8 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:53,099-Speed 5535.45 samples/sec Loss 6.6192 LearningRate 0.0328 Epoch: 8 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:54,967-Speed 5486.76 samples/sec Loss 6.8915 LearningRate 0.0328 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:56,812-Speed 5551.26 samples/sec Loss 6.6949 LearningRate 0.0328 Epoch: 8 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:02:58,695-Speed 5442.90 samples/sec Loss 6.8410 LearningRate 0.0327 Epoch: 8 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:00,573-Speed 5454.48 samples/sec Loss 6.7734 LearningRate 0.0327 Epoch: 8 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:02,422-Speed 5540.19 samples/sec Loss 6.8076 LearningRate 0.0327 Epoch: 8 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:04,277-Speed 5523.51 samples/sec Loss 6.6870 LearningRate 0.0327 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:06,127-Speed 5537.39 samples/sec Loss 6.8483 LearningRate 0.0327 Epoch: 8 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:07,970-Speed 5558.77 samples/sec Loss 6.7852 LearningRate 0.0327 Epoch: 8 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:09,796-Speed 5611.20 samples/sec Loss 6.5973 LearningRate 0.0327 Epoch: 8 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:11,634-Speed 5574.78 samples/sec Loss 6.8725 LearningRate 0.0327 Epoch: 8 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:13,512-Speed 5455.24 samples/sec Loss 6.7624 LearningRate 0.0327 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:15,372-Speed 5507.27 samples/sec Loss 6.8798 LearningRate 0.0326 Epoch: 8 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:17,222-Speed 5539.99 samples/sec Loss 6.6991 LearningRate 0.0326 Epoch: 8 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:19,088-Speed 5491.84 samples/sec Loss 6.5953 LearningRate 0.0326 Epoch: 8 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:20,934-Speed 5547.35 samples/sec Loss 6.7638 LearningRate 0.0326 Epoch: 8 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:22,812-Speed 5457.24 samples/sec Loss 6.7378 LearningRate 0.0326 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:24,663-Speed 5535.78 samples/sec Loss 6.7009 LearningRate 0.0326 Epoch: 8 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:26,502-Speed 5571.45 samples/sec Loss 6.6958 LearningRate 0.0326 Epoch: 8 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:28,369-Speed 5485.80 samples/sec Loss 6.8362 LearningRate 0.0326 Epoch: 8 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:30,197-Speed 5607.60 samples/sec Loss 6.7564 LearningRate 0.0326 Epoch: 8 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:32,066-Speed 5479.04 samples/sec Loss 6.8192 LearningRate 0.0325 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:33,943-Speed 5461.45 samples/sec Loss 6.6504 LearningRate 0.0325 Epoch: 8 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:35,780-Speed 5574.99 samples/sec Loss 6.8304 LearningRate 0.0325 Epoch: 8 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:37,624-Speed 5557.16 samples/sec Loss 6.8571 LearningRate 0.0325 Epoch: 8 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:39,474-Speed 5539.52 samples/sec Loss 6.7873 LearningRate 0.0325 Epoch: 8 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:41,309-Speed 5583.92 samples/sec Loss 6.7943 LearningRate 0.0325 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:43,177-Speed 5483.33 samples/sec Loss 6.6718 LearningRate 0.0325 Epoch: 8 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:45,012-Speed 5583.70 samples/sec Loss 6.7285 LearningRate 0.0325 Epoch: 8 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:46,887-Speed 5464.49 samples/sec Loss 6.7506 LearningRate 0.0325 Epoch: 8 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:48,727-Speed 5569.21 samples/sec Loss 6.7111 LearningRate 0.0324 Epoch: 8 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:50,585-Speed 5514.11 samples/sec Loss 6.7940 LearningRate 0.0324 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:52,432-Speed 5547.15 samples/sec Loss 6.8769 LearningRate 0.0324 Epoch: 8 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:54,288-Speed 5520.44 samples/sec Loss 6.7130 LearningRate 0.0324 Epoch: 8 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:03:56,130-Speed 5561.54 samples/sec Loss 6.6874 LearningRate 0.0324 Epoch: 8 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:57,999-Speed 5483.67 samples/sec Loss 6.6647 LearningRate 0.0324 Epoch: 8 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:03:59,843-Speed 5558.59 samples/sec Loss 6.7114 LearningRate 0.0324 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:01,724-Speed 5447.13 samples/sec Loss 6.6955 LearningRate 0.0324 Epoch: 8 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:03,576-Speed 5532.90 samples/sec Loss 6.6631 LearningRate 0.0324 Epoch: 8 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:05,412-Speed 5581.97 samples/sec Loss 6.6863 LearningRate 0.0323 Epoch: 8 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:07,276-Speed 5495.99 samples/sec Loss 6.7075 LearningRate 0.0323 Epoch: 8 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:09,110-Speed 5587.15 samples/sec Loss 6.7660 LearningRate 0.0323 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:10,951-Speed 5564.55 samples/sec Loss 6.7868 LearningRate 0.0323 Epoch: 8 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:12,814-Speed 5500.11 samples/sec Loss 6.8824 LearningRate 0.0323 Epoch: 8 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:14,649-Speed 5584.76 samples/sec Loss 6.8022 LearningRate 0.0323 Epoch: 8 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:16,511-Speed 5500.79 samples/sec Loss 6.7788 LearningRate 0.0323 Epoch: 8 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:18,377-Speed 5489.54 samples/sec Loss 6.9468 LearningRate 0.0323 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:20,243-Speed 5493.21 samples/sec Loss 6.8303 LearningRate 0.0323 Epoch: 8 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:22,089-Speed 5551.27 samples/sec Loss 6.7458 LearningRate 0.0322 Epoch: 8 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:23,959-Speed 5478.80 samples/sec Loss 6.7762 LearningRate 0.0322 Epoch: 8 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:25,799-Speed 5569.02 samples/sec Loss 6.8095 LearningRate 0.0322 Epoch: 8 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:27,678-Speed 5453.20 samples/sec Loss 6.8189 LearningRate 0.0322 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:29,523-Speed 5553.43 samples/sec Loss 6.8379 LearningRate 0.0322 Epoch: 8 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:31,382-Speed 5515.37 samples/sec Loss 6.5804 LearningRate 0.0322 Epoch: 8 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:33,234-Speed 5531.95 samples/sec Loss 6.9125 LearningRate 0.0322 Epoch: 8 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:04:35,071-Speed 5577.97 samples/sec Loss 6.6765 LearningRate 0.0322 Epoch: 8 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:36,920-Speed 5542.45 samples/sec Loss 6.7678 LearningRate 0.0322 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:38,793-Speed 5470.14 samples/sec Loss 6.8545 LearningRate 0.0321 Epoch: 8 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:04:40,633-Speed 5567.97 samples/sec Loss 6.5810 LearningRate 0.0321 Epoch: 8 Global Step: 43820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:42,521-Speed 5426.42 samples/sec Loss 6.6950 LearningRate 0.0321 Epoch: 8 Global Step: 43830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:44,369-Speed 5546.34 samples/sec Loss 6.5884 LearningRate 0.0321 Epoch: 8 Global Step: 43840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:46,203-Speed 5585.53 samples/sec Loss 6.7142 LearningRate 0.0321 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:48,051-Speed 5542.28 samples/sec Loss 6.6108 LearningRate 0.0321 Epoch: 8 Global Step: 43860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:49,888-Speed 5580.52 samples/sec Loss 6.5762 LearningRate 0.0321 Epoch: 8 Global Step: 43870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:51,806-Speed 5340.55 samples/sec Loss 6.6743 LearningRate 0.0321 Epoch: 8 Global Step: 43880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:53,669-Speed 5499.36 samples/sec Loss 6.7076 LearningRate 0.0321 Epoch: 8 Global Step: 43890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:55,510-Speed 5564.60 samples/sec Loss 6.6232 LearningRate 0.0320 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:57,356-Speed 5549.33 samples/sec Loss 6.6856 LearningRate 0.0320 Epoch: 8 Global Step: 43910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:04:59,246-Speed 5422.45 samples/sec Loss 6.5932 LearningRate 0.0320 Epoch: 8 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:01,095-Speed 5541.03 samples/sec Loss 6.6491 LearningRate 0.0320 Epoch: 8 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:02,959-Speed 5495.28 samples/sec Loss 6.7652 LearningRate 0.0320 Epoch: 8 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:04,838-Speed 5455.87 samples/sec Loss 6.8078 LearningRate 0.0320 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:06,677-Speed 5569.51 samples/sec Loss 6.6820 LearningRate 0.0320 Epoch: 8 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:08,577-Speed 5393.71 samples/sec Loss 6.6173 LearningRate 0.0320 Epoch: 8 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:10,426-Speed 5542.24 samples/sec Loss 6.6847 LearningRate 0.0319 Epoch: 8 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:12,297-Speed 5475.36 samples/sec Loss 6.6059 LearningRate 0.0319 Epoch: 8 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:14,143-Speed 5552.14 samples/sec Loss 6.8991 LearningRate 0.0319 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:05:41,473-[lfw][44000]XNorm: 22.386428 Training: 2022-04-11 13:05:41,474-[lfw][44000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 13:05:41,475-[lfw][44000]Accuracy-Highest: 0.99767 Training: 2022-04-11 13:06:13,056-[cfp_fp][44000]XNorm: 19.237414 Training: 2022-04-11 13:06:13,057-[cfp_fp][44000]Accuracy-Flip: 0.96771+-0.00698 Training: 2022-04-11 13:06:13,058-[cfp_fp][44000]Accuracy-Highest: 0.96771 Training: 2022-04-11 13:06:40,325-[agedb_30][44000]XNorm: 22.253874 Training: 2022-04-11 13:06:40,325-[agedb_30][44000]Accuracy-Flip: 0.97683+-0.00689 Training: 2022-04-11 13:06:40,326-[agedb_30][44000]Accuracy-Highest: 0.97683 Training: 2022-04-11 13:06:42,179-Speed 116.32 samples/sec Loss 6.7326 LearningRate 0.0319 Epoch: 8 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:44,032-Speed 5529.78 samples/sec Loss 6.8239 LearningRate 0.0319 Epoch: 8 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:45,859-Speed 5607.76 samples/sec Loss 6.8913 LearningRate 0.0319 Epoch: 8 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:47,712-Speed 5527.30 samples/sec Loss 6.8547 LearningRate 0.0319 Epoch: 8 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:49,561-Speed 5541.79 samples/sec Loss 6.7406 LearningRate 0.0319 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:51,405-Speed 5555.69 samples/sec Loss 6.7493 LearningRate 0.0319 Epoch: 8 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:53,259-Speed 5528.53 samples/sec Loss 6.6830 LearningRate 0.0318 Epoch: 8 Global Step: 44070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:55,123-Speed 5494.40 samples/sec Loss 6.7764 LearningRate 0.0318 Epoch: 8 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:56,978-Speed 5527.01 samples/sec Loss 6.6421 LearningRate 0.0318 Epoch: 8 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:06:58,819-Speed 5562.73 samples/sec Loss 6.7382 LearningRate 0.0318 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:00,653-Speed 5588.10 samples/sec Loss 6.7964 LearningRate 0.0318 Epoch: 8 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:02,479-Speed 5611.67 samples/sec Loss 6.6929 LearningRate 0.0318 Epoch: 8 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:04,327-Speed 5544.84 samples/sec Loss 6.7023 LearningRate 0.0318 Epoch: 8 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:06,172-Speed 5553.10 samples/sec Loss 6.7270 LearningRate 0.0318 Epoch: 8 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:08,045-Speed 5470.36 samples/sec Loss 6.7858 LearningRate 0.0318 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:09,878-Speed 5590.70 samples/sec Loss 6.5946 LearningRate 0.0317 Epoch: 8 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:11,723-Speed 5554.03 samples/sec Loss 6.6202 LearningRate 0.0317 Epoch: 8 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:13,564-Speed 5564.81 samples/sec Loss 6.6746 LearningRate 0.0317 Epoch: 8 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:15,438-Speed 5466.13 samples/sec Loss 6.5878 LearningRate 0.0317 Epoch: 8 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:17,276-Speed 5573.98 samples/sec Loss 6.7490 LearningRate 0.0317 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:19,118-Speed 5563.20 samples/sec Loss 6.7334 LearningRate 0.0317 Epoch: 8 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:20,967-Speed 5541.45 samples/sec Loss 6.6866 LearningRate 0.0317 Epoch: 8 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:07:22,851-Speed 5438.43 samples/sec Loss 6.9000 LearningRate 0.0317 Epoch: 8 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:24,687-Speed 5580.09 samples/sec Loss 6.7226 LearningRate 0.0317 Epoch: 8 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:26,549-Speed 5500.58 samples/sec Loss 6.8437 LearningRate 0.0316 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:28,396-Speed 5548.73 samples/sec Loss 6.6481 LearningRate 0.0316 Epoch: 8 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:30,255-Speed 5513.54 samples/sec Loss 6.7457 LearningRate 0.0316 Epoch: 8 Global Step: 44270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:32,087-Speed 5590.59 samples/sec Loss 6.7793 LearningRate 0.0316 Epoch: 8 Global Step: 44280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:33,960-Speed 5471.99 samples/sec Loss 6.6992 LearningRate 0.0316 Epoch: 8 Global Step: 44290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:35,825-Speed 5495.83 samples/sec Loss 6.6044 LearningRate 0.0316 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:37,665-Speed 5568.28 samples/sec Loss 6.6188 LearningRate 0.0316 Epoch: 8 Global Step: 44310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:39,518-Speed 5525.76 samples/sec Loss 6.7471 LearningRate 0.0316 Epoch: 8 Global Step: 44320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:41,352-Speed 5588.74 samples/sec Loss 6.7097 LearningRate 0.0316 Epoch: 8 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:07:43,179-Speed 5605.76 samples/sec Loss 6.9512 LearningRate 0.0315 Epoch: 8 Global Step: 44340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:45,022-Speed 5556.98 samples/sec Loss 6.5896 LearningRate 0.0315 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:46,863-Speed 5566.80 samples/sec Loss 6.6378 LearningRate 0.0315 Epoch: 8 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:48,713-Speed 5537.77 samples/sec Loss 6.7250 LearningRate 0.0315 Epoch: 8 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:50,570-Speed 5520.37 samples/sec Loss 6.6856 LearningRate 0.0315 Epoch: 8 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:52,418-Speed 5542.75 samples/sec Loss 6.7420 LearningRate 0.0315 Epoch: 8 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:54,249-Speed 5593.09 samples/sec Loss 6.5805 LearningRate 0.0315 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:56,080-Speed 5596.64 samples/sec Loss 6.6386 LearningRate 0.0315 Epoch: 8 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:57,912-Speed 5591.48 samples/sec Loss 6.7017 LearningRate 0.0315 Epoch: 8 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:07:59,778-Speed 5491.87 samples/sec Loss 6.7126 LearningRate 0.0314 Epoch: 8 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:01,611-Speed 5588.48 samples/sec Loss 6.5759 LearningRate 0.0314 Epoch: 8 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:03,460-Speed 5543.93 samples/sec Loss 6.6771 LearningRate 0.0314 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:05,302-Speed 5565.67 samples/sec Loss 6.7249 LearningRate 0.0314 Epoch: 8 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:07,147-Speed 5550.83 samples/sec Loss 6.7883 LearningRate 0.0314 Epoch: 8 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:08,983-Speed 5579.18 samples/sec Loss 6.7260 LearningRate 0.0314 Epoch: 8 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:10,822-Speed 5574.29 samples/sec Loss 6.6477 LearningRate 0.0314 Epoch: 8 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:12,674-Speed 5530.27 samples/sec Loss 6.5816 LearningRate 0.0314 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:14,516-Speed 5563.28 samples/sec Loss 6.5953 LearningRate 0.0314 Epoch: 8 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:16,396-Speed 5449.92 samples/sec Loss 6.6799 LearningRate 0.0313 Epoch: 8 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:18,240-Speed 5557.36 samples/sec Loss 6.6589 LearningRate 0.0313 Epoch: 8 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:20,073-Speed 5589.70 samples/sec Loss 6.7970 LearningRate 0.0313 Epoch: 8 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:08:21,914-Speed 5564.25 samples/sec Loss 6.6431 LearningRate 0.0313 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:08:23,739-Speed 5614.90 samples/sec Loss 6.7539 LearningRate 0.0313 Epoch: 8 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:25,600-Speed 5507.10 samples/sec Loss 6.8352 LearningRate 0.0313 Epoch: 8 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:27,456-Speed 5519.11 samples/sec Loss 6.6963 LearningRate 0.0313 Epoch: 8 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:29,308-Speed 5530.49 samples/sec Loss 6.7956 LearningRate 0.0313 Epoch: 8 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:31,143-Speed 5586.80 samples/sec Loss 6.7259 LearningRate 0.0313 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:32,996-Speed 5528.95 samples/sec Loss 6.5131 LearningRate 0.0312 Epoch: 8 Global Step: 44610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:34,842-Speed 5549.57 samples/sec Loss 6.6936 LearningRate 0.0312 Epoch: 8 Global Step: 44620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:36,700-Speed 5514.87 samples/sec Loss 6.5235 LearningRate 0.0312 Epoch: 8 Global Step: 44630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:38,563-Speed 5499.32 samples/sec Loss 6.6691 LearningRate 0.0312 Epoch: 8 Global Step: 44640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:08:40,405-Speed 5562.13 samples/sec Loss 6.5196 LearningRate 0.0312 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:42,236-Speed 5596.43 samples/sec Loss 6.6361 LearningRate 0.0312 Epoch: 8 Global Step: 44660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:44,090-Speed 5525.83 samples/sec Loss 6.6451 LearningRate 0.0312 Epoch: 8 Global Step: 44670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:45,926-Speed 5582.84 samples/sec Loss 6.7580 LearningRate 0.0312 Epoch: 8 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:47,764-Speed 5571.74 samples/sec Loss 6.6453 LearningRate 0.0312 Epoch: 8 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:49,608-Speed 5555.67 samples/sec Loss 6.6466 LearningRate 0.0312 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:51,454-Speed 5552.51 samples/sec Loss 6.5198 LearningRate 0.0311 Epoch: 8 Global Step: 44710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:53,306-Speed 5532.92 samples/sec Loss 6.5789 LearningRate 0.0311 Epoch: 8 Global Step: 44720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:55,138-Speed 5590.22 samples/sec Loss 6.7030 LearningRate 0.0311 Epoch: 8 Global Step: 44730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:56,972-Speed 5587.81 samples/sec Loss 6.6412 LearningRate 0.0311 Epoch: 8 Global Step: 44740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:08:58,821-Speed 5540.03 samples/sec Loss 6.6910 LearningRate 0.0311 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:00,660-Speed 5573.42 samples/sec Loss 6.7024 LearningRate 0.0311 Epoch: 8 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:02,520-Speed 5506.41 samples/sec Loss 6.6617 LearningRate 0.0311 Epoch: 8 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:04,392-Speed 5474.18 samples/sec Loss 6.7385 LearningRate 0.0311 Epoch: 8 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:06,254-Speed 5503.49 samples/sec Loss 6.6167 LearningRate 0.0311 Epoch: 8 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:08,115-Speed 5503.73 samples/sec Loss 6.6625 LearningRate 0.0310 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:09,949-Speed 5586.81 samples/sec Loss 6.6160 LearningRate 0.0310 Epoch: 8 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:11,818-Speed 5481.54 samples/sec Loss 6.5325 LearningRate 0.0310 Epoch: 8 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:13,662-Speed 5556.05 samples/sec Loss 6.7730 LearningRate 0.0310 Epoch: 8 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:15,502-Speed 5569.46 samples/sec Loss 6.6989 LearningRate 0.0310 Epoch: 8 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:17,348-Speed 5548.62 samples/sec Loss 6.6956 LearningRate 0.0310 Epoch: 8 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:09:19,206-Speed 5513.81 samples/sec Loss 6.5592 LearningRate 0.0310 Epoch: 8 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:09:21,073-Speed 5488.35 samples/sec Loss 6.6674 LearningRate 0.0310 Epoch: 8 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:22,912-Speed 5571.63 samples/sec Loss 6.6798 LearningRate 0.0310 Epoch: 8 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:24,750-Speed 5574.57 samples/sec Loss 6.5952 LearningRate 0.0309 Epoch: 8 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:26,604-Speed 5527.00 samples/sec Loss 6.5589 LearningRate 0.0309 Epoch: 8 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:28,443-Speed 5568.94 samples/sec Loss 6.8130 LearningRate 0.0309 Epoch: 8 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:30,287-Speed 5556.50 samples/sec Loss 6.6648 LearningRate 0.0309 Epoch: 8 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:32,125-Speed 5577.64 samples/sec Loss 6.6482 LearningRate 0.0309 Epoch: 8 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:33,964-Speed 5569.04 samples/sec Loss 6.7391 LearningRate 0.0309 Epoch: 8 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:35,799-Speed 5585.37 samples/sec Loss 6.5748 LearningRate 0.0309 Epoch: 8 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:37,632-Speed 5588.33 samples/sec Loss 6.6336 LearningRate 0.0309 Epoch: 8 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:39,477-Speed 5555.43 samples/sec Loss 6.6400 LearningRate 0.0309 Epoch: 8 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 13:09:41,333-Speed 5519.30 samples/sec Loss 6.6867 LearningRate 0.0308 Epoch: 8 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:09:43,164-Speed 5595.40 samples/sec Loss 6.6721 LearningRate 0.0308 Epoch: 8 Global Step: 44990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:45,018-Speed 5527.35 samples/sec Loss 6.5759 LearningRate 0.0308 Epoch: 8 Global Step: 45000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:46,859-Speed 5566.02 samples/sec Loss 6.5959 LearningRate 0.0308 Epoch: 8 Global Step: 45010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:48,705-Speed 5550.39 samples/sec Loss 6.6286 LearningRate 0.0308 Epoch: 8 Global Step: 45020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:50,582-Speed 5488.30 samples/sec Loss 6.6051 LearningRate 0.0308 Epoch: 8 Global Step: 45030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:52,441-Speed 5509.32 samples/sec Loss 6.7178 LearningRate 0.0308 Epoch: 8 Global Step: 45040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:54,312-Speed 5477.34 samples/sec Loss 6.6482 LearningRate 0.0308 Epoch: 8 Global Step: 45050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:56,147-Speed 5586.09 samples/sec Loss 6.5152 LearningRate 0.0308 Epoch: 8 Global Step: 45060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:58,027-Speed 5447.94 samples/sec Loss 6.6214 LearningRate 0.0307 Epoch: 8 Global Step: 45070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:09:59,902-Speed 5464.95 samples/sec Loss 6.5186 LearningRate 0.0307 Epoch: 8 Global Step: 45080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:01,739-Speed 5576.39 samples/sec Loss 6.6178 LearningRate 0.0307 Epoch: 8 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:03,579-Speed 5569.99 samples/sec Loss 6.6495 LearningRate 0.0307 Epoch: 8 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:05,423-Speed 5554.12 samples/sec Loss 6.6696 LearningRate 0.0307 Epoch: 8 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:07,263-Speed 5569.37 samples/sec Loss 6.6277 LearningRate 0.0307 Epoch: 8 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:09,103-Speed 5569.85 samples/sec Loss 6.4199 LearningRate 0.0307 Epoch: 8 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:10,972-Speed 5482.60 samples/sec Loss 6.5519 LearningRate 0.0307 Epoch: 8 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:12,817-Speed 5551.79 samples/sec Loss 6.6577 LearningRate 0.0307 Epoch: 8 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:14,694-Speed 5457.36 samples/sec Loss 6.5705 LearningRate 0.0306 Epoch: 8 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:16,535-Speed 5566.80 samples/sec Loss 6.6977 LearningRate 0.0306 Epoch: 8 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:18,395-Speed 5508.86 samples/sec Loss 6.5451 LearningRate 0.0306 Epoch: 8 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:20,227-Speed 5592.86 samples/sec Loss 6.6171 LearningRate 0.0306 Epoch: 8 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:22,066-Speed 5571.06 samples/sec Loss 6.5480 LearningRate 0.0306 Epoch: 8 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:23,915-Speed 5541.09 samples/sec Loss 6.6105 LearningRate 0.0306 Epoch: 8 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:25,755-Speed 5568.56 samples/sec Loss 6.6943 LearningRate 0.0306 Epoch: 8 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:27,594-Speed 5572.18 samples/sec Loss 6.6509 LearningRate 0.0306 Epoch: 8 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:29,429-Speed 5581.35 samples/sec Loss 6.7697 LearningRate 0.0306 Epoch: 8 Global Step: 45240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:31,320-Speed 5420.35 samples/sec Loss 6.6766 LearningRate 0.0305 Epoch: 8 Global Step: 45250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:33,152-Speed 5589.46 samples/sec Loss 6.6160 LearningRate 0.0305 Epoch: 8 Global Step: 45260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:34,991-Speed 5570.61 samples/sec Loss 6.5437 LearningRate 0.0305 Epoch: 8 Global Step: 45270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:36,857-Speed 5491.51 samples/sec Loss 6.6718 LearningRate 0.0305 Epoch: 8 Global Step: 45280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:38,719-Speed 5501.86 samples/sec Loss 6.5218 LearningRate 0.0305 Epoch: 8 Global Step: 45290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:40,567-Speed 5545.31 samples/sec Loss 6.5661 LearningRate 0.0305 Epoch: 8 Global Step: 45300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:42,406-Speed 5569.70 samples/sec Loss 6.6692 LearningRate 0.0305 Epoch: 8 Global Step: 45310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:44,239-Speed 5587.85 samples/sec Loss 6.5695 LearningRate 0.0305 Epoch: 8 Global Step: 45320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:46,078-Speed 5572.48 samples/sec Loss 6.7441 LearningRate 0.0305 Epoch: 8 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:47,911-Speed 5589.01 samples/sec Loss 6.5330 LearningRate 0.0304 Epoch: 8 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:49,752-Speed 5563.97 samples/sec Loss 6.4978 LearningRate 0.0304 Epoch: 8 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:51,597-Speed 5555.98 samples/sec Loss 6.5636 LearningRate 0.0304 Epoch: 8 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:10:53,441-Speed 5554.21 samples/sec Loss 6.6404 LearningRate 0.0304 Epoch: 8 Global Step: 45370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:55,298-Speed 5515.82 samples/sec Loss 6.6043 LearningRate 0.0304 Epoch: 8 Global Step: 45380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:57,139-Speed 5567.85 samples/sec Loss 6.7940 LearningRate 0.0304 Epoch: 8 Global Step: 45390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:10:59,002-Speed 5496.25 samples/sec Loss 6.7159 LearningRate 0.0304 Epoch: 8 Global Step: 45400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:00,838-Speed 5583.47 samples/sec Loss 6.4973 LearningRate 0.0304 Epoch: 8 Global Step: 45410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:02,673-Speed 5580.69 samples/sec Loss 6.5654 LearningRate 0.0304 Epoch: 8 Global Step: 45420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:04,515-Speed 5562.32 samples/sec Loss 6.6857 LearningRate 0.0304 Epoch: 8 Global Step: 45430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:06,361-Speed 5548.99 samples/sec Loss 6.6387 LearningRate 0.0303 Epoch: 8 Global Step: 45440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:08,222-Speed 5506.11 samples/sec Loss 6.6281 LearningRate 0.0303 Epoch: 8 Global Step: 45450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:10,067-Speed 5551.00 samples/sec Loss 6.7118 LearningRate 0.0303 Epoch: 8 Global Step: 45460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:11,942-Speed 5465.62 samples/sec Loss 6.6318 LearningRate 0.0303 Epoch: 8 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:13,818-Speed 5460.11 samples/sec Loss 6.6835 LearningRate 0.0303 Epoch: 8 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:15,672-Speed 5528.37 samples/sec Loss 6.7797 LearningRate 0.0303 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:17,504-Speed 5592.80 samples/sec Loss 6.6390 LearningRate 0.0303 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:19,336-Speed 5591.52 samples/sec Loss 6.6109 LearningRate 0.0303 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:21,237-Speed 5391.20 samples/sec Loss 6.5169 LearningRate 0.0303 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:32,363-Speed 920.42 samples/sec Loss 5.9587 LearningRate 0.0302 Epoch: 9 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:34,241-Speed 5455.42 samples/sec Loss 5.6053 LearningRate 0.0302 Epoch: 9 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:36,085-Speed 5555.42 samples/sec Loss 5.8099 LearningRate 0.0302 Epoch: 9 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:37,950-Speed 5494.39 samples/sec Loss 5.7144 LearningRate 0.0302 Epoch: 9 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:39,911-Speed 5224.04 samples/sec Loss 5.8874 LearningRate 0.0302 Epoch: 9 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:41,789-Speed 5456.13 samples/sec Loss 5.7745 LearningRate 0.0302 Epoch: 9 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:43,653-Speed 5495.29 samples/sec Loss 5.8018 LearningRate 0.0302 Epoch: 9 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:45,494-Speed 5565.21 samples/sec Loss 5.8479 LearningRate 0.0302 Epoch: 9 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:47,337-Speed 5557.28 samples/sec Loss 5.9058 LearningRate 0.0302 Epoch: 9 Global Step: 45610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:49,214-Speed 5459.18 samples/sec Loss 5.6494 LearningRate 0.0301 Epoch: 9 Global Step: 45620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:51,093-Speed 5451.66 samples/sec Loss 5.7198 LearningRate 0.0301 Epoch: 9 Global Step: 45630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:11:52,945-Speed 5531.30 samples/sec Loss 5.8512 LearningRate 0.0301 Epoch: 9 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:54,787-Speed 5562.80 samples/sec Loss 5.8290 LearningRate 0.0301 Epoch: 9 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:56,655-Speed 5485.80 samples/sec Loss 5.6201 LearningRate 0.0301 Epoch: 9 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:11:58,505-Speed 5538.63 samples/sec Loss 5.8669 LearningRate 0.0301 Epoch: 9 Global Step: 45670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:00,357-Speed 5529.10 samples/sec Loss 5.9704 LearningRate 0.0301 Epoch: 9 Global Step: 45680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:02,225-Speed 5489.79 samples/sec Loss 5.8022 LearningRate 0.0301 Epoch: 9 Global Step: 45690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:04,076-Speed 5533.63 samples/sec Loss 5.8275 LearningRate 0.0301 Epoch: 9 Global Step: 45700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:05,929-Speed 5530.13 samples/sec Loss 5.7434 LearningRate 0.0300 Epoch: 9 Global Step: 45710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:07,774-Speed 5552.88 samples/sec Loss 5.9307 LearningRate 0.0300 Epoch: 9 Global Step: 45720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:09,613-Speed 5569.74 samples/sec Loss 5.7834 LearningRate 0.0300 Epoch: 9 Global Step: 45730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:12:11,461-Speed 5544.78 samples/sec Loss 5.8573 LearningRate 0.0300 Epoch: 9 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:13,307-Speed 5551.29 samples/sec Loss 5.8114 LearningRate 0.0300 Epoch: 9 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:15,193-Speed 5432.01 samples/sec Loss 5.9919 LearningRate 0.0300 Epoch: 9 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:17,052-Speed 5512.55 samples/sec Loss 5.9214 LearningRate 0.0300 Epoch: 9 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:18,895-Speed 5555.56 samples/sec Loss 5.8472 LearningRate 0.0300 Epoch: 9 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:20,733-Speed 5576.41 samples/sec Loss 6.0719 LearningRate 0.0300 Epoch: 9 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:22,576-Speed 5560.67 samples/sec Loss 5.8488 LearningRate 0.0299 Epoch: 9 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:12:24,424-Speed 5541.77 samples/sec Loss 5.8297 LearningRate 0.0299 Epoch: 9 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:26,281-Speed 5517.10 samples/sec Loss 6.0221 LearningRate 0.0299 Epoch: 9 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:28,173-Speed 5416.38 samples/sec Loss 5.9456 LearningRate 0.0299 Epoch: 9 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:30,027-Speed 5526.74 samples/sec Loss 5.8951 LearningRate 0.0299 Epoch: 9 Global Step: 45840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:31,908-Speed 5447.34 samples/sec Loss 5.9823 LearningRate 0.0299 Epoch: 9 Global Step: 45850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:33,751-Speed 5557.33 samples/sec Loss 6.0235 LearningRate 0.0299 Epoch: 9 Global Step: 45860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:35,591-Speed 5566.59 samples/sec Loss 6.0305 LearningRate 0.0299 Epoch: 9 Global Step: 45870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:37,443-Speed 5534.88 samples/sec Loss 5.8903 LearningRate 0.0299 Epoch: 9 Global Step: 45880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:39,308-Speed 5492.89 samples/sec Loss 5.8008 LearningRate 0.0299 Epoch: 9 Global Step: 45890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:41,163-Speed 5521.29 samples/sec Loss 6.0851 LearningRate 0.0298 Epoch: 9 Global Step: 45900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:43,035-Speed 5475.10 samples/sec Loss 6.0349 LearningRate 0.0298 Epoch: 9 Global Step: 45910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:44,884-Speed 5539.54 samples/sec Loss 6.0669 LearningRate 0.0298 Epoch: 9 Global Step: 45920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:46,753-Speed 5487.56 samples/sec Loss 5.9489 LearningRate 0.0298 Epoch: 9 Global Step: 45930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:12:48,600-Speed 5544.70 samples/sec Loss 6.0742 LearningRate 0.0298 Epoch: 9 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:50,492-Speed 5416.98 samples/sec Loss 6.1917 LearningRate 0.0298 Epoch: 9 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:52,328-Speed 5579.29 samples/sec Loss 6.0585 LearningRate 0.0298 Epoch: 9 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:54,179-Speed 5534.33 samples/sec Loss 5.9837 LearningRate 0.0298 Epoch: 9 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:56,042-Speed 5500.04 samples/sec Loss 5.8969 LearningRate 0.0298 Epoch: 9 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:57,914-Speed 5474.72 samples/sec Loss 6.1139 LearningRate 0.0297 Epoch: 9 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:12:59,757-Speed 5556.89 samples/sec Loss 6.0678 LearningRate 0.0297 Epoch: 9 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:13:27,062-[lfw][46000]XNorm: 22.744840 Training: 2022-04-11 13:13:27,063-[lfw][46000]Accuracy-Flip: 0.99633+-0.00306 Training: 2022-04-11 13:13:27,064-[lfw][46000]Accuracy-Highest: 0.99767 Training: 2022-04-11 13:13:58,395-[cfp_fp][46000]XNorm: 19.897713 Training: 2022-04-11 13:13:58,396-[cfp_fp][46000]Accuracy-Flip: 0.96714+-0.00958 Training: 2022-04-11 13:13:58,397-[cfp_fp][46000]Accuracy-Highest: 0.96771 Training: 2022-04-11 13:14:25,640-[agedb_30][46000]XNorm: 22.336008 Training: 2022-04-11 13:14:25,641-[agedb_30][46000]Accuracy-Flip: 0.97667+-0.00749 Training: 2022-04-11 13:14:25,642-[agedb_30][46000]Accuracy-Highest: 0.97683 Training: 2022-04-11 13:14:27,486-Speed 116.72 samples/sec Loss 6.0884 LearningRate 0.0297 Epoch: 9 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:29,332-Speed 5550.75 samples/sec Loss 6.0095 LearningRate 0.0297 Epoch: 9 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:31,181-Speed 5540.68 samples/sec Loss 6.1671 LearningRate 0.0297 Epoch: 9 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:33,027-Speed 5552.43 samples/sec Loss 5.9846 LearningRate 0.0297 Epoch: 9 Global Step: 46040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:34,876-Speed 5540.23 samples/sec Loss 6.0896 LearningRate 0.0297 Epoch: 9 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:36,718-Speed 5564.40 samples/sec Loss 6.1052 LearningRate 0.0297 Epoch: 9 Global Step: 46060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:38,555-Speed 5576.67 samples/sec Loss 6.1408 LearningRate 0.0297 Epoch: 9 Global Step: 46070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:40,433-Speed 5454.45 samples/sec Loss 6.1048 LearningRate 0.0296 Epoch: 9 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:42,263-Speed 5598.56 samples/sec Loss 6.1490 LearningRate 0.0296 Epoch: 9 Global Step: 46090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:44,096-Speed 5590.08 samples/sec Loss 6.1750 LearningRate 0.0296 Epoch: 9 Global Step: 46100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:45,926-Speed 5597.45 samples/sec Loss 6.1409 LearningRate 0.0296 Epoch: 9 Global Step: 46110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:47,758-Speed 5592.00 samples/sec Loss 6.1731 LearningRate 0.0296 Epoch: 9 Global Step: 46120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:49,626-Speed 5482.03 samples/sec Loss 6.1441 LearningRate 0.0296 Epoch: 9 Global Step: 46130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 13:14:51,461-Speed 5586.09 samples/sec Loss 6.1539 LearningRate 0.0296 Epoch: 9 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:53,318-Speed 5515.39 samples/sec Loss 5.9725 LearningRate 0.0296 Epoch: 9 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:55,179-Speed 5505.75 samples/sec Loss 6.2079 LearningRate 0.0296 Epoch: 9 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 13:14:57,009-Speed 5600.37 samples/sec Loss 6.0566 LearningRate 0.0295 Epoch: 9 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:14:58,881-Speed 5471.19 samples/sec Loss 6.0996 LearningRate 0.0295 Epoch: 9 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:00,728-Speed 5547.94 samples/sec Loss 6.2194 LearningRate 0.0295 Epoch: 9 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:02,606-Speed 5455.27 samples/sec Loss 6.1373 LearningRate 0.0295 Epoch: 9 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:04,448-Speed 5561.79 samples/sec Loss 6.1146 LearningRate 0.0295 Epoch: 9 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:06,301-Speed 5531.94 samples/sec Loss 6.0939 LearningRate 0.0295 Epoch: 9 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:08,162-Speed 5505.48 samples/sec Loss 6.0765 LearningRate 0.0295 Epoch: 9 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:10,000-Speed 5573.63 samples/sec Loss 6.1983 LearningRate 0.0295 Epoch: 9 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:15:11,845-Speed 5553.14 samples/sec Loss 6.1848 LearningRate 0.0295 Epoch: 9 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:13,697-Speed 5533.51 samples/sec Loss 6.2287 LearningRate 0.0295 Epoch: 9 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:15,547-Speed 5537.30 samples/sec Loss 6.1389 LearningRate 0.0294 Epoch: 9 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:17,408-Speed 5506.11 samples/sec Loss 6.1225 LearningRate 0.0294 Epoch: 9 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:19,249-Speed 5563.86 samples/sec Loss 6.2906 LearningRate 0.0294 Epoch: 9 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:21,133-Speed 5440.80 samples/sec Loss 6.2963 LearningRate 0.0294 Epoch: 9 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:23,010-Speed 5456.53 samples/sec Loss 6.1669 LearningRate 0.0294 Epoch: 9 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:24,867-Speed 5516.72 samples/sec Loss 6.2386 LearningRate 0.0294 Epoch: 9 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:26,735-Speed 5487.47 samples/sec Loss 6.0893 LearningRate 0.0294 Epoch: 9 Global Step: 46330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:28,588-Speed 5529.49 samples/sec Loss 6.1687 LearningRate 0.0294 Epoch: 9 Global Step: 46340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:30,449-Speed 5506.01 samples/sec Loss 6.2203 LearningRate 0.0294 Epoch: 9 Global Step: 46350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:32,289-Speed 5569.20 samples/sec Loss 6.1720 LearningRate 0.0293 Epoch: 9 Global Step: 46360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:34,161-Speed 5473.73 samples/sec Loss 6.2921 LearningRate 0.0293 Epoch: 9 Global Step: 46370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:36,044-Speed 5441.72 samples/sec Loss 6.2676 LearningRate 0.0293 Epoch: 9 Global Step: 46380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:37,901-Speed 5517.03 samples/sec Loss 6.3869 LearningRate 0.0293 Epoch: 9 Global Step: 46390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:39,779-Speed 5453.48 samples/sec Loss 6.0655 LearningRate 0.0293 Epoch: 9 Global Step: 46400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:41,664-Speed 5437.56 samples/sec Loss 6.0280 LearningRate 0.0293 Epoch: 9 Global Step: 46410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:43,509-Speed 5551.30 samples/sec Loss 6.0627 LearningRate 0.0293 Epoch: 9 Global Step: 46420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:15:45,370-Speed 5504.47 samples/sec Loss 6.1726 LearningRate 0.0293 Epoch: 9 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:47,219-Speed 5543.21 samples/sec Loss 6.1605 LearningRate 0.0293 Epoch: 9 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:49,113-Speed 5409.52 samples/sec Loss 6.1321 LearningRate 0.0292 Epoch: 9 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:50,985-Speed 5473.53 samples/sec Loss 6.2879 LearningRate 0.0292 Epoch: 9 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:52,834-Speed 5538.75 samples/sec Loss 6.1950 LearningRate 0.0292 Epoch: 9 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:54,678-Speed 5558.09 samples/sec Loss 6.1410 LearningRate 0.0292 Epoch: 9 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:56,528-Speed 5538.62 samples/sec Loss 6.2376 LearningRate 0.0292 Epoch: 9 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:15:58,394-Speed 5490.68 samples/sec Loss 6.2512 LearningRate 0.0292 Epoch: 9 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:00,233-Speed 5569.26 samples/sec Loss 6.2704 LearningRate 0.0292 Epoch: 9 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:02,082-Speed 5542.94 samples/sec Loss 6.2874 LearningRate 0.0292 Epoch: 9 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:03,932-Speed 5539.16 samples/sec Loss 6.2731 LearningRate 0.0292 Epoch: 9 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:16:05,774-Speed 5561.51 samples/sec Loss 6.1908 LearningRate 0.0292 Epoch: 9 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:16:07,636-Speed 5501.43 samples/sec Loss 6.2548 LearningRate 0.0291 Epoch: 9 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:09,472-Speed 5581.58 samples/sec Loss 6.2923 LearningRate 0.0291 Epoch: 9 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:11,381-Speed 5367.61 samples/sec Loss 6.3132 LearningRate 0.0291 Epoch: 9 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:13,268-Speed 5427.44 samples/sec Loss 6.1435 LearningRate 0.0291 Epoch: 9 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:15,108-Speed 5569.11 samples/sec Loss 6.3028 LearningRate 0.0291 Epoch: 9 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:16,963-Speed 5525.05 samples/sec Loss 6.1320 LearningRate 0.0291 Epoch: 9 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:18,816-Speed 5528.06 samples/sec Loss 6.3842 LearningRate 0.0291 Epoch: 9 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:20,653-Speed 5578.93 samples/sec Loss 6.1737 LearningRate 0.0291 Epoch: 9 Global Step: 46620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:22,524-Speed 5474.19 samples/sec Loss 6.2855 LearningRate 0.0291 Epoch: 9 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:24,366-Speed 5563.58 samples/sec Loss 6.1667 LearningRate 0.0290 Epoch: 9 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:26,237-Speed 5476.45 samples/sec Loss 6.4839 LearningRate 0.0290 Epoch: 9 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:16:28,105-Speed 5484.34 samples/sec Loss 6.3499 LearningRate 0.0290 Epoch: 9 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:29,971-Speed 5490.40 samples/sec Loss 6.2568 LearningRate 0.0290 Epoch: 9 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:31,845-Speed 5467.24 samples/sec Loss 6.3582 LearningRate 0.0290 Epoch: 9 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:33,724-Speed 5453.46 samples/sec Loss 6.1450 LearningRate 0.0290 Epoch: 9 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:35,570-Speed 5549.94 samples/sec Loss 6.2888 LearningRate 0.0290 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:37,418-Speed 5546.48 samples/sec Loss 6.4390 LearningRate 0.0290 Epoch: 9 Global Step: 46710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:39,286-Speed 5484.56 samples/sec Loss 6.3369 LearningRate 0.0290 Epoch: 9 Global Step: 46720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:41,162-Speed 5461.70 samples/sec Loss 6.2606 LearningRate 0.0290 Epoch: 9 Global Step: 46730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:43,035-Speed 5470.71 samples/sec Loss 6.2705 LearningRate 0.0289 Epoch: 9 Global Step: 46740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:44,895-Speed 5505.68 samples/sec Loss 6.2552 LearningRate 0.0289 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:46,730-Speed 5587.10 samples/sec Loss 6.2991 LearningRate 0.0289 Epoch: 9 Global Step: 46760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:48,592-Speed 5501.33 samples/sec Loss 6.2202 LearningRate 0.0289 Epoch: 9 Global Step: 46770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:50,427-Speed 5583.71 samples/sec Loss 6.3746 LearningRate 0.0289 Epoch: 9 Global Step: 46780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:52,273-Speed 5549.63 samples/sec Loss 6.3498 LearningRate 0.0289 Epoch: 9 Global Step: 46790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:54,123-Speed 5538.19 samples/sec Loss 6.2634 LearningRate 0.0289 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:16:55,969-Speed 5551.59 samples/sec Loss 6.1213 LearningRate 0.0289 Epoch: 9 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:57,838-Speed 5482.58 samples/sec Loss 6.1571 LearningRate 0.0289 Epoch: 9 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:16:59,680-Speed 5562.56 samples/sec Loss 6.2028 LearningRate 0.0288 Epoch: 9 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:01,537-Speed 5519.25 samples/sec Loss 6.1327 LearningRate 0.0288 Epoch: 9 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:03,389-Speed 5531.82 samples/sec Loss 6.2885 LearningRate 0.0288 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:05,251-Speed 5503.58 samples/sec Loss 6.4093 LearningRate 0.0288 Epoch: 9 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:07,088-Speed 5577.88 samples/sec Loss 6.3139 LearningRate 0.0288 Epoch: 9 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:08,964-Speed 5460.30 samples/sec Loss 6.3517 LearningRate 0.0288 Epoch: 9 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:10,810-Speed 5552.17 samples/sec Loss 6.1187 LearningRate 0.0288 Epoch: 9 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:12,675-Speed 5492.91 samples/sec Loss 6.1784 LearningRate 0.0288 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:14,542-Speed 5487.55 samples/sec Loss 6.1524 LearningRate 0.0288 Epoch: 9 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:17:16,393-Speed 5537.21 samples/sec Loss 6.2373 LearningRate 0.0287 Epoch: 9 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:17:18,224-Speed 5593.14 samples/sec Loss 6.0927 LearningRate 0.0287 Epoch: 9 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:20,073-Speed 5540.15 samples/sec Loss 6.2627 LearningRate 0.0287 Epoch: 9 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:21,926-Speed 5530.69 samples/sec Loss 6.3013 LearningRate 0.0287 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:23,779-Speed 5527.52 samples/sec Loss 6.0969 LearningRate 0.0287 Epoch: 9 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:25,624-Speed 5553.87 samples/sec Loss 6.2739 LearningRate 0.0287 Epoch: 9 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:27,470-Speed 5549.77 samples/sec Loss 6.3358 LearningRate 0.0287 Epoch: 9 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:29,296-Speed 5611.77 samples/sec Loss 6.2852 LearningRate 0.0287 Epoch: 9 Global Step: 46990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:31,139-Speed 5559.27 samples/sec Loss 6.3263 LearningRate 0.0287 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:32,979-Speed 5565.66 samples/sec Loss 6.2760 LearningRate 0.0287 Epoch: 9 Global Step: 47010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:34,826-Speed 5548.42 samples/sec Loss 6.3749 LearningRate 0.0286 Epoch: 9 Global Step: 47020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:36,660-Speed 5585.72 samples/sec Loss 6.0945 LearningRate 0.0286 Epoch: 9 Global Step: 47030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:38,521-Speed 5505.11 samples/sec Loss 6.1838 LearningRate 0.0286 Epoch: 9 Global Step: 47040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:40,376-Speed 5524.75 samples/sec Loss 6.1945 LearningRate 0.0286 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:42,223-Speed 5546.30 samples/sec Loss 6.3514 LearningRate 0.0286 Epoch: 9 Global Step: 47060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:44,062-Speed 5569.90 samples/sec Loss 6.3550 LearningRate 0.0286 Epoch: 9 Global Step: 47070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:45,903-Speed 5567.10 samples/sec Loss 6.5714 LearningRate 0.0286 Epoch: 9 Global Step: 47080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:17:47,737-Speed 5585.23 samples/sec Loss 6.4093 LearningRate 0.0286 Epoch: 9 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:49,574-Speed 5576.98 samples/sec Loss 6.3280 LearningRate 0.0286 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:51,433-Speed 5511.30 samples/sec Loss 6.1451 LearningRate 0.0285 Epoch: 9 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:53,291-Speed 5514.29 samples/sec Loss 6.4699 LearningRate 0.0285 Epoch: 9 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:55,141-Speed 5537.11 samples/sec Loss 6.2114 LearningRate 0.0285 Epoch: 9 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:56,998-Speed 5518.77 samples/sec Loss 6.1386 LearningRate 0.0285 Epoch: 9 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:17:58,871-Speed 5470.01 samples/sec Loss 6.3206 LearningRate 0.0285 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:00,709-Speed 5575.24 samples/sec Loss 6.2497 LearningRate 0.0285 Epoch: 9 Global Step: 47160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:02,563-Speed 5527.11 samples/sec Loss 6.2531 LearningRate 0.0285 Epoch: 9 Global Step: 47170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:04,413-Speed 5537.52 samples/sec Loss 6.2876 LearningRate 0.0285 Epoch: 9 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:06,271-Speed 5516.68 samples/sec Loss 6.4015 LearningRate 0.0285 Epoch: 9 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:08,132-Speed 5506.02 samples/sec Loss 6.3096 LearningRate 0.0285 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:09,983-Speed 5532.34 samples/sec Loss 6.2978 LearningRate 0.0284 Epoch: 9 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:11,872-Speed 5424.47 samples/sec Loss 6.2505 LearningRate 0.0284 Epoch: 9 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:13,721-Speed 5542.71 samples/sec Loss 6.2320 LearningRate 0.0284 Epoch: 9 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:15,606-Speed 5436.53 samples/sec Loss 6.2116 LearningRate 0.0284 Epoch: 9 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:17,460-Speed 5525.41 samples/sec Loss 6.3202 LearningRate 0.0284 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:19,314-Speed 5526.07 samples/sec Loss 6.2478 LearningRate 0.0284 Epoch: 9 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:21,149-Speed 5585.01 samples/sec Loss 6.1982 LearningRate 0.0284 Epoch: 9 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:23,007-Speed 5515.05 samples/sec Loss 6.2494 LearningRate 0.0284 Epoch: 9 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:24,866-Speed 5509.12 samples/sec Loss 6.2352 LearningRate 0.0284 Epoch: 9 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:18:26,813-Speed 5265.27 samples/sec Loss 6.1452 LearningRate 0.0283 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:18:28,661-Speed 5545.63 samples/sec Loss 6.3719 LearningRate 0.0283 Epoch: 9 Global Step: 47310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:30,513-Speed 5531.58 samples/sec Loss 6.2962 LearningRate 0.0283 Epoch: 9 Global Step: 47320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:32,349-Speed 5579.33 samples/sec Loss 6.2878 LearningRate 0.0283 Epoch: 9 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:34,212-Speed 5500.90 samples/sec Loss 6.1584 LearningRate 0.0283 Epoch: 9 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:36,052-Speed 5567.66 samples/sec Loss 6.3565 LearningRate 0.0283 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:37,912-Speed 5506.88 samples/sec Loss 6.3299 LearningRate 0.0283 Epoch: 9 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:39,776-Speed 5498.53 samples/sec Loss 6.2753 LearningRate 0.0283 Epoch: 9 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:41,632-Speed 5520.55 samples/sec Loss 6.1861 LearningRate 0.0283 Epoch: 9 Global Step: 47380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:43,492-Speed 5508.04 samples/sec Loss 6.2940 LearningRate 0.0283 Epoch: 9 Global Step: 47390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:45,347-Speed 5524.25 samples/sec Loss 6.3891 LearningRate 0.0282 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:18:47,189-Speed 5562.23 samples/sec Loss 6.2796 LearningRate 0.0282 Epoch: 9 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:49,049-Speed 5508.11 samples/sec Loss 6.2838 LearningRate 0.0282 Epoch: 9 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:50,890-Speed 5563.99 samples/sec Loss 6.1760 LearningRate 0.0282 Epoch: 9 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:52,727-Speed 5578.78 samples/sec Loss 6.3433 LearningRate 0.0282 Epoch: 9 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:54,573-Speed 5549.62 samples/sec Loss 6.4229 LearningRate 0.0282 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:56,414-Speed 5564.00 samples/sec Loss 6.2531 LearningRate 0.0282 Epoch: 9 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:18:58,284-Speed 5480.57 samples/sec Loss 6.4184 LearningRate 0.0282 Epoch: 9 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:00,118-Speed 5583.64 samples/sec Loss 6.2171 LearningRate 0.0282 Epoch: 9 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:01,968-Speed 5539.17 samples/sec Loss 6.2758 LearningRate 0.0281 Epoch: 9 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:03,838-Speed 5480.44 samples/sec Loss 6.3353 LearningRate 0.0281 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:05,682-Speed 5554.60 samples/sec Loss 6.2659 LearningRate 0.0281 Epoch: 9 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:19:07,553-Speed 5477.39 samples/sec Loss 6.2900 LearningRate 0.0281 Epoch: 9 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:19:09,391-Speed 5573.20 samples/sec Loss 6.2567 LearningRate 0.0281 Epoch: 9 Global Step: 47530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:11,253-Speed 5504.65 samples/sec Loss 6.3412 LearningRate 0.0281 Epoch: 9 Global Step: 47540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:13,126-Speed 5467.50 samples/sec Loss 6.3575 LearningRate 0.0281 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:14,978-Speed 5534.23 samples/sec Loss 6.3419 LearningRate 0.0281 Epoch: 9 Global Step: 47560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:16,838-Speed 5507.32 samples/sec Loss 6.2541 LearningRate 0.0281 Epoch: 9 Global Step: 47570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:18,673-Speed 5583.23 samples/sec Loss 6.2452 LearningRate 0.0281 Epoch: 9 Global Step: 47580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:20,553-Speed 5449.77 samples/sec Loss 6.1854 LearningRate 0.0280 Epoch: 9 Global Step: 47590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:22,393-Speed 5570.21 samples/sec Loss 6.2484 LearningRate 0.0280 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:24,270-Speed 5456.78 samples/sec Loss 6.4035 LearningRate 0.0280 Epoch: 9 Global Step: 47610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:26,127-Speed 5518.99 samples/sec Loss 6.3538 LearningRate 0.0280 Epoch: 9 Global Step: 47620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:27,991-Speed 5495.07 samples/sec Loss 6.4206 LearningRate 0.0280 Epoch: 9 Global Step: 47630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:29,838-Speed 5546.71 samples/sec Loss 6.3680 LearningRate 0.0280 Epoch: 9 Global Step: 47640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:31,711-Speed 5471.19 samples/sec Loss 6.3053 LearningRate 0.0280 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:33,556-Speed 5551.42 samples/sec Loss 6.2817 LearningRate 0.0280 Epoch: 9 Global Step: 47660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:35,397-Speed 5564.86 samples/sec Loss 6.1364 LearningRate 0.0280 Epoch: 9 Global Step: 47670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:37,265-Speed 5488.58 samples/sec Loss 6.2495 LearningRate 0.0279 Epoch: 9 Global Step: 47680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:39,140-Speed 5463.86 samples/sec Loss 6.2782 LearningRate 0.0279 Epoch: 9 Global Step: 47690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:40,988-Speed 5546.22 samples/sec Loss 6.3271 LearningRate 0.0279 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:42,835-Speed 5544.31 samples/sec Loss 6.3325 LearningRate 0.0279 Epoch: 9 Global Step: 47710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:44,699-Speed 5496.02 samples/sec Loss 6.2763 LearningRate 0.0279 Epoch: 9 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:46,535-Speed 5581.58 samples/sec Loss 6.3287 LearningRate 0.0279 Epoch: 9 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:48,376-Speed 5566.89 samples/sec Loss 6.4357 LearningRate 0.0279 Epoch: 9 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:50,218-Speed 5560.64 samples/sec Loss 6.2609 LearningRate 0.0279 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:52,058-Speed 5567.22 samples/sec Loss 6.2442 LearningRate 0.0279 Epoch: 9 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:19:53,903-Speed 5552.04 samples/sec Loss 6.1306 LearningRate 0.0279 Epoch: 9 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:55,743-Speed 5567.96 samples/sec Loss 6.2810 LearningRate 0.0278 Epoch: 9 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:57,585-Speed 5563.15 samples/sec Loss 6.2307 LearningRate 0.0278 Epoch: 9 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:19:59,437-Speed 5534.32 samples/sec Loss 6.3213 LearningRate 0.0278 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:01,306-Speed 5480.64 samples/sec Loss 6.2876 LearningRate 0.0278 Epoch: 9 Global Step: 47810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:03,147-Speed 5568.15 samples/sec Loss 6.3618 LearningRate 0.0278 Epoch: 9 Global Step: 47820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:04,997-Speed 5539.40 samples/sec Loss 6.1041 LearningRate 0.0278 Epoch: 9 Global Step: 47830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:06,859-Speed 5504.67 samples/sec Loss 6.2542 LearningRate 0.0278 Epoch: 9 Global Step: 47840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:08,720-Speed 5503.50 samples/sec Loss 6.3757 LearningRate 0.0278 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:10,587-Speed 5487.20 samples/sec Loss 6.2078 LearningRate 0.0278 Epoch: 9 Global Step: 47860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:12,455-Speed 5488.01 samples/sec Loss 6.3619 LearningRate 0.0278 Epoch: 9 Global Step: 47870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:14,299-Speed 5555.60 samples/sec Loss 6.2414 LearningRate 0.0277 Epoch: 9 Global Step: 47880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:16,163-Speed 5498.27 samples/sec Loss 6.3277 LearningRate 0.0277 Epoch: 9 Global Step: 47890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:18,007-Speed 5555.75 samples/sec Loss 6.2398 LearningRate 0.0277 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:19,840-Speed 5588.11 samples/sec Loss 6.1585 LearningRate 0.0277 Epoch: 9 Global Step: 47910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:21,722-Speed 5446.02 samples/sec Loss 6.2456 LearningRate 0.0277 Epoch: 9 Global Step: 47920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:23,582-Speed 5508.58 samples/sec Loss 6.4065 LearningRate 0.0277 Epoch: 9 Global Step: 47930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:25,431-Speed 5542.51 samples/sec Loss 6.2701 LearningRate 0.0277 Epoch: 9 Global Step: 47940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:27,269-Speed 5574.87 samples/sec Loss 6.2398 LearningRate 0.0277 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:29,139-Speed 5477.98 samples/sec Loss 6.4434 LearningRate 0.0277 Epoch: 9 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:20:31,008-Speed 5484.69 samples/sec Loss 6.1468 LearningRate 0.0276 Epoch: 9 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:20:32,854-Speed 5551.16 samples/sec Loss 6.3850 LearningRate 0.0276 Epoch: 9 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:20:34,720-Speed 5489.10 samples/sec Loss 6.3673 LearningRate 0.0276 Epoch: 9 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:20:36,566-Speed 5550.26 samples/sec Loss 6.3474 LearningRate 0.0276 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:21:03,911-[lfw][48000]XNorm: 20.791561 Training: 2022-04-11 13:21:03,912-[lfw][48000]Accuracy-Flip: 0.99783+-0.00236 Training: 2022-04-11 13:21:03,912-[lfw][48000]Accuracy-Highest: 0.99783 Training: 2022-04-11 13:21:35,489-[cfp_fp][48000]XNorm: 18.188981 Training: 2022-04-11 13:21:35,490-[cfp_fp][48000]Accuracy-Flip: 0.96743+-0.00787 Training: 2022-04-11 13:21:35,491-[cfp_fp][48000]Accuracy-Highest: 0.96771 Training: 2022-04-11 13:22:02,360-[agedb_30][48000]XNorm: 20.354975 Training: 2022-04-11 13:22:02,361-[agedb_30][48000]Accuracy-Flip: 0.97817+-0.00794 Training: 2022-04-11 13:22:02,361-[agedb_30][48000]Accuracy-Highest: 0.97817 Training: 2022-04-11 13:22:04,258-Speed 116.77 samples/sec Loss 6.4198 LearningRate 0.0276 Epoch: 9 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:06,090-Speed 5591.81 samples/sec Loss 6.1424 LearningRate 0.0276 Epoch: 9 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:07,924-Speed 5587.59 samples/sec Loss 6.2647 LearningRate 0.0276 Epoch: 9 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:09,755-Speed 5593.67 samples/sec Loss 6.3502 LearningRate 0.0276 Epoch: 9 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:11,608-Speed 5529.50 samples/sec Loss 6.3039 LearningRate 0.0276 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:13,438-Speed 5602.42 samples/sec Loss 6.3129 LearningRate 0.0276 Epoch: 9 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:15,268-Speed 5597.59 samples/sec Loss 6.0944 LearningRate 0.0275 Epoch: 9 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:22:17,092-Speed 5614.09 samples/sec Loss 6.2919 LearningRate 0.0275 Epoch: 9 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:18,972-Speed 5450.31 samples/sec Loss 6.3901 LearningRate 0.0275 Epoch: 9 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:20,803-Speed 5594.91 samples/sec Loss 6.3005 LearningRate 0.0275 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:22,653-Speed 5539.10 samples/sec Loss 6.2318 LearningRate 0.0275 Epoch: 9 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:24,507-Speed 5526.80 samples/sec Loss 6.5105 LearningRate 0.0275 Epoch: 9 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:26,354-Speed 5547.15 samples/sec Loss 6.1962 LearningRate 0.0275 Epoch: 9 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:28,223-Speed 5482.10 samples/sec Loss 6.3738 LearningRate 0.0275 Epoch: 9 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:30,058-Speed 5583.42 samples/sec Loss 6.3673 LearningRate 0.0275 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:31,908-Speed 5537.74 samples/sec Loss 6.2401 LearningRate 0.0274 Epoch: 9 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:33,797-Speed 5424.75 samples/sec Loss 6.2214 LearningRate 0.0274 Epoch: 9 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:35,706-Speed 5368.20 samples/sec Loss 6.3813 LearningRate 0.0274 Epoch: 9 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:22:37,551-Speed 5553.97 samples/sec Loss 6.3050 LearningRate 0.0274 Epoch: 9 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:39,398-Speed 5546.82 samples/sec Loss 6.3704 LearningRate 0.0274 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:41,260-Speed 5501.86 samples/sec Loss 6.2786 LearningRate 0.0274 Epoch: 9 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:43,105-Speed 5554.39 samples/sec Loss 6.2584 LearningRate 0.0274 Epoch: 9 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:44,963-Speed 5516.19 samples/sec Loss 6.0964 LearningRate 0.0274 Epoch: 9 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:46,813-Speed 5538.29 samples/sec Loss 6.3063 LearningRate 0.0274 Epoch: 9 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:48,641-Speed 5605.22 samples/sec Loss 6.1985 LearningRate 0.0274 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:50,489-Speed 5541.00 samples/sec Loss 6.1287 LearningRate 0.0273 Epoch: 9 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:52,343-Speed 5528.73 samples/sec Loss 6.1227 LearningRate 0.0273 Epoch: 9 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:54,203-Speed 5508.21 samples/sec Loss 6.0873 LearningRate 0.0273 Epoch: 9 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:56,060-Speed 5514.87 samples/sec Loss 6.2064 LearningRate 0.0273 Epoch: 9 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:22:57,897-Speed 5579.59 samples/sec Loss 6.2082 LearningRate 0.0273 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:22:59,793-Speed 5402.50 samples/sec Loss 6.4348 LearningRate 0.0273 Epoch: 9 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:01,671-Speed 5454.94 samples/sec Loss 6.2583 LearningRate 0.0273 Epoch: 9 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:03,507-Speed 5582.46 samples/sec Loss 6.2341 LearningRate 0.0273 Epoch: 9 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:05,366-Speed 5508.81 samples/sec Loss 6.2205 LearningRate 0.0273 Epoch: 9 Global Step: 48340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:07,217-Speed 5537.87 samples/sec Loss 6.3363 LearningRate 0.0273 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:09,082-Speed 5494.64 samples/sec Loss 6.2751 LearningRate 0.0272 Epoch: 9 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:10,939-Speed 5516.77 samples/sec Loss 6.2940 LearningRate 0.0272 Epoch: 9 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:12,856-Speed 5342.97 samples/sec Loss 6.1495 LearningRate 0.0272 Epoch: 9 Global Step: 48380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:14,707-Speed 5535.48 samples/sec Loss 6.1575 LearningRate 0.0272 Epoch: 9 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:16,555-Speed 5545.42 samples/sec Loss 6.2552 LearningRate 0.0272 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:18,413-Speed 5515.25 samples/sec Loss 6.2722 LearningRate 0.0272 Epoch: 9 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:20,276-Speed 5500.29 samples/sec Loss 6.2453 LearningRate 0.0272 Epoch: 9 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:22,119-Speed 5559.11 samples/sec Loss 6.0807 LearningRate 0.0272 Epoch: 9 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:23,978-Speed 5511.28 samples/sec Loss 6.2639 LearningRate 0.0272 Epoch: 9 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:25,832-Speed 5527.46 samples/sec Loss 6.2126 LearningRate 0.0271 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:27,690-Speed 5515.01 samples/sec Loss 6.2956 LearningRate 0.0271 Epoch: 9 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:29,534-Speed 5554.63 samples/sec Loss 6.2460 LearningRate 0.0271 Epoch: 9 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:31,371-Speed 5575.00 samples/sec Loss 6.2888 LearningRate 0.0271 Epoch: 9 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:33,209-Speed 5574.34 samples/sec Loss 6.2896 LearningRate 0.0271 Epoch: 9 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:35,070-Speed 5506.67 samples/sec Loss 6.0858 LearningRate 0.0271 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:36,925-Speed 5522.26 samples/sec Loss 6.2574 LearningRate 0.0271 Epoch: 9 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:38,761-Speed 5580.79 samples/sec Loss 6.3101 LearningRate 0.0271 Epoch: 9 Global Step: 48520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:40,611-Speed 5537.58 samples/sec Loss 6.3916 LearningRate 0.0271 Epoch: 9 Global Step: 48530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:42,489-Speed 5507.81 samples/sec Loss 6.3484 LearningRate 0.0271 Epoch: 9 Global Step: 48540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:44,324-Speed 5582.01 samples/sec Loss 6.1862 LearningRate 0.0270 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:46,175-Speed 5535.69 samples/sec Loss 6.3391 LearningRate 0.0270 Epoch: 9 Global Step: 48560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:48,027-Speed 5531.99 samples/sec Loss 6.2819 LearningRate 0.0270 Epoch: 9 Global Step: 48570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:49,940-Speed 5355.11 samples/sec Loss 6.2802 LearningRate 0.0270 Epoch: 9 Global Step: 48580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:51,796-Speed 5519.37 samples/sec Loss 6.1688 LearningRate 0.0270 Epoch: 9 Global Step: 48590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:23:53,635-Speed 5569.91 samples/sec Loss 6.2068 LearningRate 0.0270 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:55,476-Speed 5566.85 samples/sec Loss 6.1132 LearningRate 0.0270 Epoch: 9 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:57,305-Speed 5600.61 samples/sec Loss 6.1556 LearningRate 0.0270 Epoch: 9 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:23:59,172-Speed 5489.75 samples/sec Loss 6.2628 LearningRate 0.0270 Epoch: 9 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:01,005-Speed 5587.53 samples/sec Loss 6.2761 LearningRate 0.0270 Epoch: 9 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:02,844-Speed 5572.92 samples/sec Loss 6.2163 LearningRate 0.0269 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:04,706-Speed 5502.02 samples/sec Loss 6.2096 LearningRate 0.0269 Epoch: 9 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:06,540-Speed 5584.38 samples/sec Loss 6.2249 LearningRate 0.0269 Epoch: 9 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:08,379-Speed 5572.10 samples/sec Loss 6.3005 LearningRate 0.0269 Epoch: 9 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:10,216-Speed 5578.25 samples/sec Loss 6.1861 LearningRate 0.0269 Epoch: 9 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:12,057-Speed 5564.28 samples/sec Loss 6.2262 LearningRate 0.0269 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:13,921-Speed 5496.40 samples/sec Loss 6.2235 LearningRate 0.0269 Epoch: 9 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:15,760-Speed 5570.49 samples/sec Loss 6.2422 LearningRate 0.0269 Epoch: 9 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:17,608-Speed 5543.88 samples/sec Loss 6.2758 LearningRate 0.0269 Epoch: 9 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:19,447-Speed 5570.84 samples/sec Loss 6.1557 LearningRate 0.0269 Epoch: 9 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:21,305-Speed 5512.11 samples/sec Loss 6.2712 LearningRate 0.0268 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:23,146-Speed 5565.54 samples/sec Loss 6.2062 LearningRate 0.0268 Epoch: 9 Global Step: 48760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:25,002-Speed 5520.38 samples/sec Loss 6.2109 LearningRate 0.0268 Epoch: 9 Global Step: 48770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:26,865-Speed 5497.42 samples/sec Loss 6.2622 LearningRate 0.0268 Epoch: 9 Global Step: 48780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:28,709-Speed 5557.86 samples/sec Loss 6.3417 LearningRate 0.0268 Epoch: 9 Global Step: 48790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:30,579-Speed 5479.09 samples/sec Loss 6.2234 LearningRate 0.0268 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:32,423-Speed 5558.33 samples/sec Loss 6.2885 LearningRate 0.0268 Epoch: 9 Global Step: 48810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:34,274-Speed 5535.63 samples/sec Loss 6.2244 LearningRate 0.0268 Epoch: 9 Global Step: 48820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:36,131-Speed 5516.83 samples/sec Loss 6.4010 LearningRate 0.0268 Epoch: 9 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:37,979-Speed 5545.71 samples/sec Loss 6.1020 LearningRate 0.0267 Epoch: 9 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:24:39,834-Speed 5523.05 samples/sec Loss 6.2916 LearningRate 0.0267 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:41,680-Speed 5548.18 samples/sec Loss 6.1243 LearningRate 0.0267 Epoch: 9 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:43,537-Speed 5516.58 samples/sec Loss 6.1887 LearningRate 0.0267 Epoch: 9 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:45,395-Speed 5515.37 samples/sec Loss 6.1185 LearningRate 0.0267 Epoch: 9 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:47,240-Speed 5553.77 samples/sec Loss 6.2466 LearningRate 0.0267 Epoch: 9 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:49,093-Speed 5527.46 samples/sec Loss 6.1773 LearningRate 0.0267 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:50,938-Speed 5552.04 samples/sec Loss 6.2258 LearningRate 0.0267 Epoch: 9 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:52,781-Speed 5562.16 samples/sec Loss 6.1451 LearningRate 0.0267 Epoch: 9 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:54,623-Speed 5558.94 samples/sec Loss 6.2188 LearningRate 0.0267 Epoch: 9 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:56,465-Speed 5562.17 samples/sec Loss 6.1524 LearningRate 0.0266 Epoch: 9 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:24:58,301-Speed 5581.06 samples/sec Loss 6.2771 LearningRate 0.0266 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:25:00,180-Speed 5452.70 samples/sec Loss 6.2581 LearningRate 0.0266 Epoch: 9 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:25:02,014-Speed 5585.03 samples/sec Loss 6.3256 LearningRate 0.0266 Epoch: 9 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:03,872-Speed 5514.21 samples/sec Loss 6.2565 LearningRate 0.0266 Epoch: 9 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:05,706-Speed 5584.22 samples/sec Loss 6.2389 LearningRate 0.0266 Epoch: 9 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:07,574-Speed 5486.05 samples/sec Loss 6.2105 LearningRate 0.0266 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:09,420-Speed 5547.38 samples/sec Loss 6.2348 LearningRate 0.0266 Epoch: 9 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:11,283-Speed 5500.30 samples/sec Loss 6.3635 LearningRate 0.0266 Epoch: 9 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:13,135-Speed 5532.49 samples/sec Loss 6.2326 LearningRate 0.0266 Epoch: 9 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:15,013-Speed 5454.12 samples/sec Loss 6.1325 LearningRate 0.0265 Epoch: 9 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:16,858-Speed 5552.47 samples/sec Loss 6.1365 LearningRate 0.0265 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:18,725-Speed 5488.24 samples/sec Loss 6.1202 LearningRate 0.0265 Epoch: 9 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:20,569-Speed 5556.21 samples/sec Loss 6.3010 LearningRate 0.0265 Epoch: 9 Global Step: 49070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:25:22,415-Speed 5551.13 samples/sec Loss 6.3700 LearningRate 0.0265 Epoch: 9 Global Step: 49080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:25:24,257-Speed 5560.73 samples/sec Loss 6.1229 LearningRate 0.0265 Epoch: 9 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:26,129-Speed 5472.26 samples/sec Loss 6.2250 LearningRate 0.0265 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:27,973-Speed 5559.54 samples/sec Loss 6.2774 LearningRate 0.0265 Epoch: 9 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:29,837-Speed 5495.18 samples/sec Loss 6.1934 LearningRate 0.0265 Epoch: 9 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:31,686-Speed 5539.55 samples/sec Loss 6.1787 LearningRate 0.0265 Epoch: 9 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:33,547-Speed 5506.56 samples/sec Loss 6.0925 LearningRate 0.0264 Epoch: 9 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:35,387-Speed 5567.26 samples/sec Loss 6.2963 LearningRate 0.0264 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:37,254-Speed 5488.59 samples/sec Loss 6.1041 LearningRate 0.0264 Epoch: 9 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:39,110-Speed 5520.04 samples/sec Loss 6.3502 LearningRate 0.0264 Epoch: 9 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:40,954-Speed 5557.66 samples/sec Loss 6.2156 LearningRate 0.0264 Epoch: 9 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:42,780-Speed 5612.02 samples/sec Loss 6.1611 LearningRate 0.0264 Epoch: 9 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:44,656-Speed 5460.95 samples/sec Loss 6.1695 LearningRate 0.0264 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:46,494-Speed 5571.48 samples/sec Loss 6.2536 LearningRate 0.0264 Epoch: 9 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:48,351-Speed 5517.59 samples/sec Loss 6.2568 LearningRate 0.0264 Epoch: 9 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:50,200-Speed 5540.41 samples/sec Loss 6.2441 LearningRate 0.0264 Epoch: 9 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:52,050-Speed 5538.57 samples/sec Loss 6.1039 LearningRate 0.0263 Epoch: 9 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:53,909-Speed 5512.34 samples/sec Loss 6.0913 LearningRate 0.0263 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:55,741-Speed 5590.30 samples/sec Loss 6.1862 LearningRate 0.0263 Epoch: 9 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:57,580-Speed 5572.49 samples/sec Loss 6.2040 LearningRate 0.0263 Epoch: 9 Global Step: 49270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:25:59,463-Speed 5442.31 samples/sec Loss 6.1277 LearningRate 0.0263 Epoch: 9 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:01,299-Speed 5579.29 samples/sec Loss 6.2341 LearningRate 0.0263 Epoch: 9 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:26:03,162-Speed 5499.08 samples/sec Loss 6.2959 LearningRate 0.0263 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:05,023-Speed 5505.21 samples/sec Loss 6.1880 LearningRate 0.0263 Epoch: 9 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:06,873-Speed 5537.78 samples/sec Loss 6.2379 LearningRate 0.0263 Epoch: 9 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:08,728-Speed 5526.21 samples/sec Loss 6.1569 LearningRate 0.0263 Epoch: 9 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:10,571-Speed 5557.62 samples/sec Loss 6.0705 LearningRate 0.0262 Epoch: 9 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:12,435-Speed 5494.81 samples/sec Loss 6.2581 LearningRate 0.0262 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:14,276-Speed 5568.74 samples/sec Loss 6.0729 LearningRate 0.0262 Epoch: 9 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:16,115-Speed 5568.91 samples/sec Loss 6.2033 LearningRate 0.0262 Epoch: 9 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:17,959-Speed 5556.59 samples/sec Loss 6.0911 LearningRate 0.0262 Epoch: 9 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:19,802-Speed 5560.36 samples/sec Loss 6.1857 LearningRate 0.0262 Epoch: 9 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:21,644-Speed 5560.61 samples/sec Loss 6.2917 LearningRate 0.0262 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:23,519-Speed 5465.17 samples/sec Loss 6.2648 LearningRate 0.0262 Epoch: 9 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:25,353-Speed 5587.69 samples/sec Loss 6.0264 LearningRate 0.0262 Epoch: 9 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:27,229-Speed 5463.62 samples/sec Loss 6.0703 LearningRate 0.0261 Epoch: 9 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:29,080-Speed 5537.71 samples/sec Loss 6.2600 LearningRate 0.0261 Epoch: 9 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:30,933-Speed 5526.98 samples/sec Loss 6.3257 LearningRate 0.0261 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:32,789-Speed 5523.39 samples/sec Loss 6.2989 LearningRate 0.0261 Epoch: 9 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:34,623-Speed 5584.98 samples/sec Loss 6.2654 LearningRate 0.0261 Epoch: 9 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:36,464-Speed 5566.18 samples/sec Loss 6.2278 LearningRate 0.0261 Epoch: 9 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:38,318-Speed 5526.39 samples/sec Loss 6.1331 LearningRate 0.0261 Epoch: 9 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:40,182-Speed 5495.61 samples/sec Loss 6.2759 LearningRate 0.0261 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:26:42,020-Speed 5575.97 samples/sec Loss 6.1566 LearningRate 0.0261 Epoch: 9 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:43,885-Speed 5491.02 samples/sec Loss 6.0749 LearningRate 0.0261 Epoch: 9 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:26:45,727-Speed 5562.58 samples/sec Loss 6.1683 LearningRate 0.0260 Epoch: 9 Global Step: 49530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:47,588-Speed 5507.32 samples/sec Loss 6.2760 LearningRate 0.0260 Epoch: 9 Global Step: 49540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:49,479-Speed 5416.59 samples/sec Loss 6.2282 LearningRate 0.0260 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:51,328-Speed 5541.03 samples/sec Loss 6.1117 LearningRate 0.0260 Epoch: 9 Global Step: 49560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:53,186-Speed 5517.04 samples/sec Loss 6.2297 LearningRate 0.0260 Epoch: 9 Global Step: 49570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:55,039-Speed 5527.74 samples/sec Loss 6.2075 LearningRate 0.0260 Epoch: 9 Global Step: 49580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:56,912-Speed 5471.66 samples/sec Loss 6.2019 LearningRate 0.0260 Epoch: 9 Global Step: 49590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:26:58,748-Speed 5579.61 samples/sec Loss 6.2103 LearningRate 0.0260 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:00,617-Speed 5482.84 samples/sec Loss 6.1856 LearningRate 0.0260 Epoch: 9 Global Step: 49610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:02,461-Speed 5553.35 samples/sec Loss 6.2881 LearningRate 0.0260 Epoch: 9 Global Step: 49620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:04,317-Speed 5520.52 samples/sec Loss 6.1908 LearningRate 0.0259 Epoch: 9 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:06,154-Speed 5576.79 samples/sec Loss 6.1562 LearningRate 0.0259 Epoch: 9 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:08,003-Speed 5541.18 samples/sec Loss 6.1722 LearningRate 0.0259 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:09,839-Speed 5582.07 samples/sec Loss 6.1825 LearningRate 0.0259 Epoch: 9 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:11,698-Speed 5509.74 samples/sec Loss 6.2775 LearningRate 0.0259 Epoch: 9 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:13,564-Speed 5488.98 samples/sec Loss 6.2617 LearningRate 0.0259 Epoch: 9 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:15,407-Speed 5558.67 samples/sec Loss 6.2725 LearningRate 0.0259 Epoch: 9 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:17,265-Speed 5514.59 samples/sec Loss 6.2229 LearningRate 0.0259 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:19,109-Speed 5556.64 samples/sec Loss 6.1478 LearningRate 0.0259 Epoch: 9 Global Step: 49710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:20,947-Speed 5573.22 samples/sec Loss 6.2601 LearningRate 0.0259 Epoch: 9 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:22,878-Speed 5305.82 samples/sec Loss 6.1845 LearningRate 0.0258 Epoch: 9 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:27:24,736-Speed 5513.67 samples/sec Loss 6.0668 LearningRate 0.0258 Epoch: 9 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:26,586-Speed 5539.83 samples/sec Loss 6.1637 LearningRate 0.0258 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:28,435-Speed 5540.26 samples/sec Loss 6.1445 LearningRate 0.0258 Epoch: 9 Global Step: 49760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:30,311-Speed 5462.82 samples/sec Loss 6.2220 LearningRate 0.0258 Epoch: 9 Global Step: 49770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:32,157-Speed 5549.42 samples/sec Loss 6.2151 LearningRate 0.0258 Epoch: 9 Global Step: 49780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:34,040-Speed 5441.86 samples/sec Loss 6.2478 LearningRate 0.0258 Epoch: 9 Global Step: 49790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:35,878-Speed 5573.47 samples/sec Loss 6.2491 LearningRate 0.0258 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:37,730-Speed 5529.16 samples/sec Loss 6.1079 LearningRate 0.0258 Epoch: 9 Global Step: 49810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:39,595-Speed 5495.73 samples/sec Loss 6.0489 LearningRate 0.0258 Epoch: 9 Global Step: 49820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:41,430-Speed 5582.02 samples/sec Loss 6.2511 LearningRate 0.0257 Epoch: 9 Global Step: 49830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:43,312-Speed 5445.33 samples/sec Loss 6.1031 LearningRate 0.0257 Epoch: 9 Global Step: 49840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:27:45,151-Speed 5572.13 samples/sec Loss 6.2430 LearningRate 0.0257 Epoch: 9 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:46,990-Speed 5569.05 samples/sec Loss 6.2733 LearningRate 0.0257 Epoch: 9 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:48,875-Speed 5436.25 samples/sec Loss 6.2958 LearningRate 0.0257 Epoch: 9 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:50,715-Speed 5566.16 samples/sec Loss 6.2158 LearningRate 0.0257 Epoch: 9 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:52,570-Speed 5522.64 samples/sec Loss 6.2029 LearningRate 0.0257 Epoch: 9 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:54,436-Speed 5491.18 samples/sec Loss 6.2440 LearningRate 0.0257 Epoch: 9 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:56,276-Speed 5570.03 samples/sec Loss 6.1851 LearningRate 0.0257 Epoch: 9 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:58,109-Speed 5589.61 samples/sec Loss 6.1899 LearningRate 0.0257 Epoch: 9 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:27:59,967-Speed 5513.08 samples/sec Loss 6.2295 LearningRate 0.0256 Epoch: 9 Global Step: 49930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:28:01,808-Speed 5564.97 samples/sec Loss 6.0523 LearningRate 0.0256 Epoch: 9 Global Step: 49940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:28:03,662-Speed 5529.05 samples/sec Loss 6.0422 LearningRate 0.0256 Epoch: 9 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:05,524-Speed 5501.34 samples/sec Loss 6.1570 LearningRate 0.0256 Epoch: 9 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:07,365-Speed 5565.54 samples/sec Loss 6.1818 LearningRate 0.0256 Epoch: 9 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:09,215-Speed 5541.08 samples/sec Loss 6.1866 LearningRate 0.0256 Epoch: 9 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:11,054-Speed 5568.90 samples/sec Loss 5.9426 LearningRate 0.0256 Epoch: 9 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:12,921-Speed 5489.01 samples/sec Loss 6.1781 LearningRate 0.0256 Epoch: 9 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:28:40,212-[lfw][50000]XNorm: 23.300236 Training: 2022-04-11 13:28:40,213-[lfw][50000]Accuracy-Flip: 0.99817+-0.00189 Training: 2022-04-11 13:28:40,213-[lfw][50000]Accuracy-Highest: 0.99817 Training: 2022-04-11 13:29:11,595-[cfp_fp][50000]XNorm: 20.925915 Training: 2022-04-11 13:29:11,600-[cfp_fp][50000]Accuracy-Flip: 0.97029+-0.00825 Training: 2022-04-11 13:29:11,601-[cfp_fp][50000]Accuracy-Highest: 0.97029 Training: 2022-04-11 13:29:38,747-[agedb_30][50000]XNorm: 23.270061 Training: 2022-04-11 13:29:38,748-[agedb_30][50000]Accuracy-Flip: 0.97700+-0.00657 Training: 2022-04-11 13:29:38,748-[agedb_30][50000]Accuracy-Highest: 0.97817 Training: 2022-04-11 13:29:40,602-Speed 116.79 samples/sec Loss 6.0598 LearningRate 0.0256 Epoch: 9 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:42,441-Speed 5568.36 samples/sec Loss 6.0536 LearningRate 0.0256 Epoch: 9 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:44,288-Speed 5546.47 samples/sec Loss 6.1205 LearningRate 0.0255 Epoch: 9 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:46,118-Speed 5599.42 samples/sec Loss 6.1326 LearningRate 0.0255 Epoch: 9 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:47,968-Speed 5537.71 samples/sec Loss 6.0668 LearningRate 0.0255 Epoch: 9 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:49,812-Speed 5558.90 samples/sec Loss 6.1322 LearningRate 0.0255 Epoch: 9 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:51,690-Speed 5454.69 samples/sec Loss 6.0573 LearningRate 0.0255 Epoch: 9 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:53,543-Speed 5530.87 samples/sec Loss 6.1183 LearningRate 0.0255 Epoch: 9 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:55,394-Speed 5537.11 samples/sec Loss 6.1477 LearningRate 0.0255 Epoch: 9 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:57,238-Speed 5553.49 samples/sec Loss 5.9689 LearningRate 0.0255 Epoch: 9 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:29:59,095-Speed 5520.46 samples/sec Loss 6.1289 LearningRate 0.0255 Epoch: 9 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:00,930-Speed 5583.24 samples/sec Loss 6.1445 LearningRate 0.0255 Epoch: 9 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:02,779-Speed 5539.74 samples/sec Loss 5.8781 LearningRate 0.0254 Epoch: 9 Global Step: 50130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:04,645-Speed 5492.77 samples/sec Loss 6.2205 LearningRate 0.0254 Epoch: 9 Global Step: 50140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:06,488-Speed 5556.98 samples/sec Loss 6.2336 LearningRate 0.0254 Epoch: 9 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:08,324-Speed 5578.63 samples/sec Loss 6.2484 LearningRate 0.0254 Epoch: 9 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:10,164-Speed 5568.31 samples/sec Loss 6.0911 LearningRate 0.0254 Epoch: 9 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:12,012-Speed 5544.12 samples/sec Loss 6.1867 LearningRate 0.0254 Epoch: 9 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:13,862-Speed 5539.39 samples/sec Loss 6.1651 LearningRate 0.0254 Epoch: 9 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:15,713-Speed 5535.32 samples/sec Loss 6.3055 LearningRate 0.0254 Epoch: 9 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:17,568-Speed 5523.39 samples/sec Loss 6.2510 LearningRate 0.0254 Epoch: 9 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:19,434-Speed 5489.32 samples/sec Loss 6.0756 LearningRate 0.0254 Epoch: 9 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:30:21,269-Speed 5584.30 samples/sec Loss 6.1100 LearningRate 0.0253 Epoch: 9 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:23,118-Speed 5541.07 samples/sec Loss 6.3820 LearningRate 0.0253 Epoch: 9 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:24,953-Speed 5583.60 samples/sec Loss 6.2381 LearningRate 0.0253 Epoch: 9 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:26,795-Speed 5561.34 samples/sec Loss 6.1444 LearningRate 0.0253 Epoch: 9 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:28,649-Speed 5524.81 samples/sec Loss 5.9129 LearningRate 0.0253 Epoch: 9 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:30,491-Speed 5563.85 samples/sec Loss 6.0996 LearningRate 0.0253 Epoch: 9 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:32,348-Speed 5515.98 samples/sec Loss 6.0883 LearningRate 0.0253 Epoch: 9 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:34,205-Speed 5518.63 samples/sec Loss 6.2804 LearningRate 0.0253 Epoch: 9 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:36,067-Speed 5503.41 samples/sec Loss 6.2622 LearningRate 0.0253 Epoch: 9 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:37,916-Speed 5540.55 samples/sec Loss 6.0884 LearningRate 0.0253 Epoch: 9 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:39,772-Speed 5520.31 samples/sec Loss 6.1825 LearningRate 0.0252 Epoch: 9 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:41,636-Speed 5494.46 samples/sec Loss 6.0882 LearningRate 0.0252 Epoch: 9 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:43,475-Speed 5571.35 samples/sec Loss 6.1434 LearningRate 0.0252 Epoch: 9 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:45,340-Speed 5495.30 samples/sec Loss 6.1913 LearningRate 0.0252 Epoch: 9 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:47,174-Speed 5584.76 samples/sec Loss 6.1601 LearningRate 0.0252 Epoch: 9 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:49,022-Speed 5546.20 samples/sec Loss 6.0693 LearningRate 0.0252 Epoch: 9 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:50,877-Speed 5521.24 samples/sec Loss 6.2961 LearningRate 0.0252 Epoch: 9 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:52,739-Speed 5503.63 samples/sec Loss 6.1852 LearningRate 0.0252 Epoch: 9 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:54,590-Speed 5537.26 samples/sec Loss 6.0505 LearningRate 0.0252 Epoch: 9 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:56,428-Speed 5572.12 samples/sec Loss 6.1867 LearningRate 0.0252 Epoch: 9 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:30:58,286-Speed 5515.13 samples/sec Loss 6.1112 LearningRate 0.0251 Epoch: 9 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:00,134-Speed 5545.13 samples/sec Loss 6.2366 LearningRate 0.0251 Epoch: 9 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:01,997-Speed 5499.38 samples/sec Loss 6.1395 LearningRate 0.0251 Epoch: 9 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:03,846-Speed 5541.69 samples/sec Loss 6.0517 LearningRate 0.0251 Epoch: 9 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:05,687-Speed 5563.72 samples/sec Loss 6.2294 LearningRate 0.0251 Epoch: 9 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:07,532-Speed 5553.54 samples/sec Loss 6.0274 LearningRate 0.0251 Epoch: 9 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:09,384-Speed 5533.75 samples/sec Loss 5.9700 LearningRate 0.0251 Epoch: 9 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:11,244-Speed 5507.49 samples/sec Loss 6.1538 LearningRate 0.0251 Epoch: 9 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:13,105-Speed 5506.58 samples/sec Loss 6.0692 LearningRate 0.0251 Epoch: 9 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:14,958-Speed 5529.41 samples/sec Loss 6.1825 LearningRate 0.0251 Epoch: 9 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:16,829-Speed 5474.46 samples/sec Loss 6.0899 LearningRate 0.0250 Epoch: 9 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:31:18,660-Speed 5595.31 samples/sec Loss 5.9258 LearningRate 0.0250 Epoch: 9 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:20,516-Speed 5521.67 samples/sec Loss 6.0693 LearningRate 0.0250 Epoch: 9 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:22,374-Speed 5513.83 samples/sec Loss 6.2033 LearningRate 0.0250 Epoch: 9 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:24,297-Speed 5330.51 samples/sec Loss 6.1424 LearningRate 0.0250 Epoch: 9 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:26,107-Speed 5660.13 samples/sec Loss 6.1475 LearningRate 0.0250 Epoch: 9 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:37,616-Speed 889.84 samples/sec Loss 5.1303 LearningRate 0.0250 Epoch: 10 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:39,598-Speed 5170.69 samples/sec Loss 5.2095 LearningRate 0.0250 Epoch: 10 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:41,460-Speed 5502.36 samples/sec Loss 5.2876 LearningRate 0.0250 Epoch: 10 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:43,310-Speed 5538.54 samples/sec Loss 5.3400 LearningRate 0.0250 Epoch: 10 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:45,146-Speed 5579.16 samples/sec Loss 5.1747 LearningRate 0.0250 Epoch: 10 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:47,016-Speed 5479.60 samples/sec Loss 5.2742 LearningRate 0.0249 Epoch: 10 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:48,861-Speed 5551.52 samples/sec Loss 5.2701 LearningRate 0.0249 Epoch: 10 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:50,730-Speed 5483.09 samples/sec Loss 5.2850 LearningRate 0.0249 Epoch: 10 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:52,627-Speed 5399.76 samples/sec Loss 5.3410 LearningRate 0.0249 Epoch: 10 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:54,463-Speed 5578.34 samples/sec Loss 5.3285 LearningRate 0.0249 Epoch: 10 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:56,309-Speed 5551.99 samples/sec Loss 5.2325 LearningRate 0.0249 Epoch: 10 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:31:58,166-Speed 5516.31 samples/sec Loss 5.1572 LearningRate 0.0249 Epoch: 10 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:00,007-Speed 5564.93 samples/sec Loss 5.2718 LearningRate 0.0249 Epoch: 10 Global Step: 50710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:01,916-Speed 5366.84 samples/sec Loss 5.4409 LearningRate 0.0249 Epoch: 10 Global Step: 50720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:03,817-Speed 5387.75 samples/sec Loss 5.3327 LearningRate 0.0249 Epoch: 10 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:05,699-Speed 5444.81 samples/sec Loss 5.4437 LearningRate 0.0248 Epoch: 10 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:32:07,561-Speed 5503.53 samples/sec Loss 5.4280 LearningRate 0.0248 Epoch: 10 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:09,406-Speed 5552.70 samples/sec Loss 5.3106 LearningRate 0.0248 Epoch: 10 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:11,246-Speed 5568.62 samples/sec Loss 5.3059 LearningRate 0.0248 Epoch: 10 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:13,114-Speed 5482.31 samples/sec Loss 5.3312 LearningRate 0.0248 Epoch: 10 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:14,989-Speed 5466.05 samples/sec Loss 5.4414 LearningRate 0.0248 Epoch: 10 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:16,857-Speed 5483.34 samples/sec Loss 5.4515 LearningRate 0.0248 Epoch: 10 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:18,701-Speed 5557.82 samples/sec Loss 5.4682 LearningRate 0.0248 Epoch: 10 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:20,542-Speed 5564.33 samples/sec Loss 5.5753 LearningRate 0.0248 Epoch: 10 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:22,393-Speed 5535.04 samples/sec Loss 5.4622 LearningRate 0.0248 Epoch: 10 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:24,249-Speed 5520.59 samples/sec Loss 5.4283 LearningRate 0.0247 Epoch: 10 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:26,087-Speed 5574.83 samples/sec Loss 5.3438 LearningRate 0.0247 Epoch: 10 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:27,979-Speed 5412.85 samples/sec Loss 5.3519 LearningRate 0.0247 Epoch: 10 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:29,828-Speed 5542.15 samples/sec Loss 5.3885 LearningRate 0.0247 Epoch: 10 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:31,694-Speed 5489.83 samples/sec Loss 5.4447 LearningRate 0.0247 Epoch: 10 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:33,547-Speed 5531.10 samples/sec Loss 5.5420 LearningRate 0.0247 Epoch: 10 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:35,414-Speed 5487.60 samples/sec Loss 5.5057 LearningRate 0.0247 Epoch: 10 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:37,258-Speed 5558.69 samples/sec Loss 5.3686 LearningRate 0.0247 Epoch: 10 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:39,104-Speed 5550.20 samples/sec Loss 5.4240 LearningRate 0.0247 Epoch: 10 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:40,984-Speed 5448.37 samples/sec Loss 5.3697 LearningRate 0.0247 Epoch: 10 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:42,844-Speed 5509.39 samples/sec Loss 5.5471 LearningRate 0.0246 Epoch: 10 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:44,687-Speed 5558.26 samples/sec Loss 5.4601 LearningRate 0.0246 Epoch: 10 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:32:46,536-Speed 5542.40 samples/sec Loss 5.4826 LearningRate 0.0246 Epoch: 10 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:48,387-Speed 5534.47 samples/sec Loss 5.3353 LearningRate 0.0246 Epoch: 10 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:50,240-Speed 5532.00 samples/sec Loss 5.3963 LearningRate 0.0246 Epoch: 10 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:52,146-Speed 5374.87 samples/sec Loss 5.6534 LearningRate 0.0246 Epoch: 10 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:54,002-Speed 5521.59 samples/sec Loss 5.4745 LearningRate 0.0246 Epoch: 10 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:55,857-Speed 5524.05 samples/sec Loss 5.7287 LearningRate 0.0246 Epoch: 10 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:57,714-Speed 5514.81 samples/sec Loss 5.5916 LearningRate 0.0246 Epoch: 10 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:32:59,570-Speed 5522.02 samples/sec Loss 5.6224 LearningRate 0.0246 Epoch: 10 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:01,454-Speed 5436.01 samples/sec Loss 5.5170 LearningRate 0.0245 Epoch: 10 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:03,313-Speed 5512.56 samples/sec Loss 5.5078 LearningRate 0.0245 Epoch: 10 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:05,146-Speed 5588.86 samples/sec Loss 5.6445 LearningRate 0.0245 Epoch: 10 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:07,041-Speed 5406.14 samples/sec Loss 5.3973 LearningRate 0.0245 Epoch: 10 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:08,906-Speed 5494.17 samples/sec Loss 5.5091 LearningRate 0.0245 Epoch: 10 Global Step: 51080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:10,759-Speed 5531.33 samples/sec Loss 5.5171 LearningRate 0.0245 Epoch: 10 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:12,636-Speed 5456.95 samples/sec Loss 5.4912 LearningRate 0.0245 Epoch: 10 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:14,517-Speed 5447.43 samples/sec Loss 5.3980 LearningRate 0.0245 Epoch: 10 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:16,376-Speed 5511.63 samples/sec Loss 5.4368 LearningRate 0.0245 Epoch: 10 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:18,223-Speed 5548.31 samples/sec Loss 5.5695 LearningRate 0.0245 Epoch: 10 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:20,089-Speed 5490.17 samples/sec Loss 5.7616 LearningRate 0.0244 Epoch: 10 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:21,926-Speed 5577.11 samples/sec Loss 5.4973 LearningRate 0.0244 Epoch: 10 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:23,756-Speed 5597.54 samples/sec Loss 5.6054 LearningRate 0.0244 Epoch: 10 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:25,613-Speed 5516.63 samples/sec Loss 5.6486 LearningRate 0.0244 Epoch: 10 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:27,484-Speed 5476.38 samples/sec Loss 5.5849 LearningRate 0.0244 Epoch: 10 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:29,326-Speed 5562.10 samples/sec Loss 5.7270 LearningRate 0.0244 Epoch: 10 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:31,181-Speed 5524.92 samples/sec Loss 5.6926 LearningRate 0.0244 Epoch: 10 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:33,037-Speed 5518.31 samples/sec Loss 5.5584 LearningRate 0.0244 Epoch: 10 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:34,896-Speed 5513.47 samples/sec Loss 5.6899 LearningRate 0.0244 Epoch: 10 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:36,780-Speed 5438.35 samples/sec Loss 5.6838 LearningRate 0.0244 Epoch: 10 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:38,623-Speed 5555.71 samples/sec Loss 5.8815 LearningRate 0.0244 Epoch: 10 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:40,513-Speed 5422.05 samples/sec Loss 5.7593 LearningRate 0.0243 Epoch: 10 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:42,359-Speed 5548.82 samples/sec Loss 5.7256 LearningRate 0.0243 Epoch: 10 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:44,230-Speed 5479.69 samples/sec Loss 5.6466 LearningRate 0.0243 Epoch: 10 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:46,077-Speed 5544.80 samples/sec Loss 5.6143 LearningRate 0.0243 Epoch: 10 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:47,928-Speed 5536.78 samples/sec Loss 5.5550 LearningRate 0.0243 Epoch: 10 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:49,793-Speed 5493.30 samples/sec Loss 5.5994 LearningRate 0.0243 Epoch: 10 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:51,658-Speed 5495.00 samples/sec Loss 5.5051 LearningRate 0.0243 Epoch: 10 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:53,517-Speed 5511.69 samples/sec Loss 5.5752 LearningRate 0.0243 Epoch: 10 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:55,363-Speed 5551.97 samples/sec Loss 5.6289 LearningRate 0.0243 Epoch: 10 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:57,201-Speed 5574.33 samples/sec Loss 5.6971 LearningRate 0.0243 Epoch: 10 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:33:59,081-Speed 5450.41 samples/sec Loss 5.7905 LearningRate 0.0242 Epoch: 10 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:00,923-Speed 5563.58 samples/sec Loss 5.6365 LearningRate 0.0242 Epoch: 10 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:34:02,783-Speed 5506.44 samples/sec Loss 5.7226 LearningRate 0.0242 Epoch: 10 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:04,629-Speed 5553.01 samples/sec Loss 5.6789 LearningRate 0.0242 Epoch: 10 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:06,473-Speed 5555.08 samples/sec Loss 5.7514 LearningRate 0.0242 Epoch: 10 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:08,309-Speed 5578.91 samples/sec Loss 5.6872 LearningRate 0.0242 Epoch: 10 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:10,170-Speed 5506.78 samples/sec Loss 5.5221 LearningRate 0.0242 Epoch: 10 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:12,023-Speed 5527.61 samples/sec Loss 5.6558 LearningRate 0.0242 Epoch: 10 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:13,862-Speed 5570.58 samples/sec Loss 5.7777 LearningRate 0.0242 Epoch: 10 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:15,697-Speed 5583.56 samples/sec Loss 5.6372 LearningRate 0.0242 Epoch: 10 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:17,544-Speed 5548.16 samples/sec Loss 5.7446 LearningRate 0.0241 Epoch: 10 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:19,400-Speed 5519.32 samples/sec Loss 5.7353 LearningRate 0.0241 Epoch: 10 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:21,234-Speed 5585.57 samples/sec Loss 5.8049 LearningRate 0.0241 Epoch: 10 Global Step: 51470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:23,078-Speed 5556.21 samples/sec Loss 5.6780 LearningRate 0.0241 Epoch: 10 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:24,949-Speed 5476.93 samples/sec Loss 5.7400 LearningRate 0.0241 Epoch: 10 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:26,784-Speed 5583.13 samples/sec Loss 5.6257 LearningRate 0.0241 Epoch: 10 Global Step: 51500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:28,634-Speed 5538.87 samples/sec Loss 5.7227 LearningRate 0.0241 Epoch: 10 Global Step: 51510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:30,495-Speed 5504.39 samples/sec Loss 5.7913 LearningRate 0.0241 Epoch: 10 Global Step: 51520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:32,363-Speed 5484.28 samples/sec Loss 5.6528 LearningRate 0.0241 Epoch: 10 Global Step: 51530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:34,207-Speed 5556.96 samples/sec Loss 5.6754 LearningRate 0.0241 Epoch: 10 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:36,056-Speed 5541.61 samples/sec Loss 5.7359 LearningRate 0.0241 Epoch: 10 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:37,896-Speed 5565.54 samples/sec Loss 5.6180 LearningRate 0.0240 Epoch: 10 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:39,780-Speed 5441.18 samples/sec Loss 5.5339 LearningRate 0.0240 Epoch: 10 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:41,623-Speed 5558.64 samples/sec Loss 5.7339 LearningRate 0.0240 Epoch: 10 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:43,493-Speed 5477.51 samples/sec Loss 5.7482 LearningRate 0.0240 Epoch: 10 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:34:45,338-Speed 5553.19 samples/sec Loss 5.6550 LearningRate 0.0240 Epoch: 10 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:47,208-Speed 5479.04 samples/sec Loss 5.5799 LearningRate 0.0240 Epoch: 10 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:49,043-Speed 5583.63 samples/sec Loss 5.5659 LearningRate 0.0240 Epoch: 10 Global Step: 51620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:50,884-Speed 5563.28 samples/sec Loss 5.7024 LearningRate 0.0240 Epoch: 10 Global Step: 51630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:52,725-Speed 5565.03 samples/sec Loss 5.7303 LearningRate 0.0240 Epoch: 10 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:54,602-Speed 5459.14 samples/sec Loss 5.6975 LearningRate 0.0240 Epoch: 10 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:56,450-Speed 5543.49 samples/sec Loss 5.5216 LearningRate 0.0239 Epoch: 10 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:34:58,293-Speed 5561.50 samples/sec Loss 5.6110 LearningRate 0.0239 Epoch: 10 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:35:00,146-Speed 5529.56 samples/sec Loss 5.8459 LearningRate 0.0239 Epoch: 10 Global Step: 51680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:35:02,000-Speed 5524.85 samples/sec Loss 5.7419 LearningRate 0.0239 Epoch: 10 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:35:03,856-Speed 5519.35 samples/sec Loss 5.7849 LearningRate 0.0239 Epoch: 10 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:05,764-Speed 5370.58 samples/sec Loss 5.8097 LearningRate 0.0239 Epoch: 10 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:07,617-Speed 5529.96 samples/sec Loss 5.7243 LearningRate 0.0239 Epoch: 10 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:09,474-Speed 5518.92 samples/sec Loss 5.7829 LearningRate 0.0239 Epoch: 10 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:11,313-Speed 5569.05 samples/sec Loss 5.7626 LearningRate 0.0239 Epoch: 10 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:13,180-Speed 5488.10 samples/sec Loss 5.7750 LearningRate 0.0239 Epoch: 10 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:15,047-Speed 5489.26 samples/sec Loss 5.7927 LearningRate 0.0238 Epoch: 10 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:16,892-Speed 5552.52 samples/sec Loss 5.7847 LearningRate 0.0238 Epoch: 10 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:18,734-Speed 5561.44 samples/sec Loss 5.7081 LearningRate 0.0238 Epoch: 10 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:20,582-Speed 5544.70 samples/sec Loss 5.8778 LearningRate 0.0238 Epoch: 10 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:22,442-Speed 5506.63 samples/sec Loss 5.7472 LearningRate 0.0238 Epoch: 10 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:24,289-Speed 5548.54 samples/sec Loss 5.7271 LearningRate 0.0238 Epoch: 10 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:26,131-Speed 5559.45 samples/sec Loss 5.6983 LearningRate 0.0238 Epoch: 10 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:27,969-Speed 5573.85 samples/sec Loss 5.7607 LearningRate 0.0238 Epoch: 10 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:29,815-Speed 5548.64 samples/sec Loss 5.6435 LearningRate 0.0238 Epoch: 10 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:31,655-Speed 5568.92 samples/sec Loss 5.6690 LearningRate 0.0238 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:33,512-Speed 5517.26 samples/sec Loss 5.8676 LearningRate 0.0238 Epoch: 10 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:35,367-Speed 5524.27 samples/sec Loss 5.7130 LearningRate 0.0237 Epoch: 10 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:37,228-Speed 5505.42 samples/sec Loss 5.7868 LearningRate 0.0237 Epoch: 10 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:39,080-Speed 5530.12 samples/sec Loss 5.7549 LearningRate 0.0237 Epoch: 10 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:40,928-Speed 5546.71 samples/sec Loss 5.7271 LearningRate 0.0237 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:35:42,773-Speed 5550.93 samples/sec Loss 5.7043 LearningRate 0.0237 Epoch: 10 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:44,628-Speed 5522.85 samples/sec Loss 5.7400 LearningRate 0.0237 Epoch: 10 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:46,470-Speed 5563.32 samples/sec Loss 5.7138 LearningRate 0.0237 Epoch: 10 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:48,315-Speed 5552.46 samples/sec Loss 5.7402 LearningRate 0.0237 Epoch: 10 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:50,167-Speed 5530.45 samples/sec Loss 5.8021 LearningRate 0.0237 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:52,012-Speed 5553.22 samples/sec Loss 5.8820 LearningRate 0.0237 Epoch: 10 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:53,856-Speed 5555.75 samples/sec Loss 5.8749 LearningRate 0.0236 Epoch: 10 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:55,707-Speed 5535.87 samples/sec Loss 5.8083 LearningRate 0.0236 Epoch: 10 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:57,582-Speed 5463.59 samples/sec Loss 5.6832 LearningRate 0.0236 Epoch: 10 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:35:59,427-Speed 5552.58 samples/sec Loss 5.6450 LearningRate 0.0236 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:36:26,568-[lfw][52000]XNorm: 21.797578 Training: 2022-04-11 13:36:26,569-[lfw][52000]Accuracy-Flip: 0.99717+-0.00334 Training: 2022-04-11 13:36:26,569-[lfw][52000]Accuracy-Highest: 0.99817 Training: 2022-04-11 13:36:58,181-[cfp_fp][52000]XNorm: 18.944493 Training: 2022-04-11 13:36:58,182-[cfp_fp][52000]Accuracy-Flip: 0.96586+-0.00839 Training: 2022-04-11 13:36:58,183-[cfp_fp][52000]Accuracy-Highest: 0.97029 Training: 2022-04-11 13:37:25,405-[agedb_30][52000]XNorm: 21.262291 Training: 2022-04-11 13:37:25,405-[agedb_30][52000]Accuracy-Flip: 0.97383+-0.00695 Training: 2022-04-11 13:37:25,406-[agedb_30][52000]Accuracy-Highest: 0.97817 Training: 2022-04-11 13:37:27,260-Speed 116.59 samples/sec Loss 5.6838 LearningRate 0.0236 Epoch: 10 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:37:29,083-Speed 5618.94 samples/sec Loss 5.8508 LearningRate 0.0236 Epoch: 10 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:30,913-Speed 5599.62 samples/sec Loss 5.6542 LearningRate 0.0236 Epoch: 10 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:32,744-Speed 5593.19 samples/sec Loss 5.7509 LearningRate 0.0236 Epoch: 10 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:34,579-Speed 5582.47 samples/sec Loss 5.6976 LearningRate 0.0236 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:36,418-Speed 5573.54 samples/sec Loss 5.8405 LearningRate 0.0236 Epoch: 10 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:38,250-Speed 5590.22 samples/sec Loss 5.7805 LearningRate 0.0235 Epoch: 10 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:40,105-Speed 5524.00 samples/sec Loss 5.8060 LearningRate 0.0235 Epoch: 10 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:41,949-Speed 5559.36 samples/sec Loss 5.7258 LearningRate 0.0235 Epoch: 10 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:43,793-Speed 5556.38 samples/sec Loss 5.7463 LearningRate 0.0235 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:45,660-Speed 5485.87 samples/sec Loss 5.7722 LearningRate 0.0235 Epoch: 10 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:47,497-Speed 5577.00 samples/sec Loss 5.6701 LearningRate 0.0235 Epoch: 10 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:49,344-Speed 5547.83 samples/sec Loss 5.6989 LearningRate 0.0235 Epoch: 10 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:51,259-Speed 5351.57 samples/sec Loss 5.7983 LearningRate 0.0235 Epoch: 10 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:53,092-Speed 5592.30 samples/sec Loss 5.7580 LearningRate 0.0235 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:54,926-Speed 5583.59 samples/sec Loss 5.7040 LearningRate 0.0235 Epoch: 10 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:56,780-Speed 5526.70 samples/sec Loss 5.7102 LearningRate 0.0235 Epoch: 10 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:37:58,613-Speed 5591.15 samples/sec Loss 5.7459 LearningRate 0.0234 Epoch: 10 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:00,468-Speed 5523.23 samples/sec Loss 5.8408 LearningRate 0.0234 Epoch: 10 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:02,325-Speed 5515.62 samples/sec Loss 5.8196 LearningRate 0.0234 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:04,162-Speed 5577.33 samples/sec Loss 5.6866 LearningRate 0.0234 Epoch: 10 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:06,018-Speed 5522.27 samples/sec Loss 5.7341 LearningRate 0.0234 Epoch: 10 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:07,854-Speed 5578.48 samples/sec Loss 5.8757 LearningRate 0.0234 Epoch: 10 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:09,724-Speed 5479.89 samples/sec Loss 5.7715 LearningRate 0.0234 Epoch: 10 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:11,565-Speed 5566.16 samples/sec Loss 5.6527 LearningRate 0.0234 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:13,428-Speed 5499.19 samples/sec Loss 5.7002 LearningRate 0.0234 Epoch: 10 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:15,346-Speed 5342.61 samples/sec Loss 5.8580 LearningRate 0.0234 Epoch: 10 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:17,251-Speed 5378.57 samples/sec Loss 5.7196 LearningRate 0.0233 Epoch: 10 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:19,100-Speed 5541.32 samples/sec Loss 5.7879 LearningRate 0.0233 Epoch: 10 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:20,943-Speed 5557.99 samples/sec Loss 5.7888 LearningRate 0.0233 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:22,813-Speed 5479.75 samples/sec Loss 5.6608 LearningRate 0.0233 Epoch: 10 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:24,664-Speed 5532.91 samples/sec Loss 5.7650 LearningRate 0.0233 Epoch: 10 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:38:26,522-Speed 5513.22 samples/sec Loss 5.7522 LearningRate 0.0233 Epoch: 10 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:28,376-Speed 5526.18 samples/sec Loss 5.7068 LearningRate 0.0233 Epoch: 10 Global Step: 52340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:30,214-Speed 5574.63 samples/sec Loss 5.7362 LearningRate 0.0233 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:32,063-Speed 5542.53 samples/sec Loss 5.8470 LearningRate 0.0233 Epoch: 10 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:33,895-Speed 5592.34 samples/sec Loss 5.7533 LearningRate 0.0233 Epoch: 10 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:35,744-Speed 5541.82 samples/sec Loss 5.7281 LearningRate 0.0233 Epoch: 10 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:37,614-Speed 5478.76 samples/sec Loss 5.5591 LearningRate 0.0232 Epoch: 10 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:39,462-Speed 5541.76 samples/sec Loss 5.7810 LearningRate 0.0232 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:41,319-Speed 5518.26 samples/sec Loss 5.7307 LearningRate 0.0232 Epoch: 10 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:43,185-Speed 5489.70 samples/sec Loss 5.8571 LearningRate 0.0232 Epoch: 10 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:45,055-Speed 5480.73 samples/sec Loss 5.8080 LearningRate 0.0232 Epoch: 10 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:38:46,898-Speed 5558.42 samples/sec Loss 5.7259 LearningRate 0.0232 Epoch: 10 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:48,778-Speed 5449.25 samples/sec Loss 5.8520 LearningRate 0.0232 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:50,640-Speed 5504.75 samples/sec Loss 5.7744 LearningRate 0.0232 Epoch: 10 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:52,491-Speed 5534.71 samples/sec Loss 5.7044 LearningRate 0.0232 Epoch: 10 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:54,335-Speed 5556.27 samples/sec Loss 5.7912 LearningRate 0.0232 Epoch: 10 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:56,181-Speed 5550.61 samples/sec Loss 5.7446 LearningRate 0.0231 Epoch: 10 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:58,038-Speed 5516.08 samples/sec Loss 5.6655 LearningRate 0.0231 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:38:59,896-Speed 5512.61 samples/sec Loss 5.6522 LearningRate 0.0231 Epoch: 10 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:01,762-Speed 5492.41 samples/sec Loss 5.8373 LearningRate 0.0231 Epoch: 10 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:03,612-Speed 5539.72 samples/sec Loss 5.8097 LearningRate 0.0231 Epoch: 10 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:05,479-Speed 5486.17 samples/sec Loss 5.8515 LearningRate 0.0231 Epoch: 10 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:39:07,329-Speed 5540.27 samples/sec Loss 5.8425 LearningRate 0.0231 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:09,181-Speed 5529.42 samples/sec Loss 5.7387 LearningRate 0.0231 Epoch: 10 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:11,035-Speed 5526.58 samples/sec Loss 5.7857 LearningRate 0.0231 Epoch: 10 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:12,904-Speed 5481.76 samples/sec Loss 5.7866 LearningRate 0.0231 Epoch: 10 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:14,746-Speed 5561.32 samples/sec Loss 5.7343 LearningRate 0.0231 Epoch: 10 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:16,607-Speed 5507.32 samples/sec Loss 5.6772 LearningRate 0.0230 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:18,449-Speed 5560.91 samples/sec Loss 5.7838 LearningRate 0.0230 Epoch: 10 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:20,295-Speed 5549.42 samples/sec Loss 5.7554 LearningRate 0.0230 Epoch: 10 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:22,133-Speed 5578.02 samples/sec Loss 5.7649 LearningRate 0.0230 Epoch: 10 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:23,990-Speed 5514.86 samples/sec Loss 5.8373 LearningRate 0.0230 Epoch: 10 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:25,823-Speed 5590.82 samples/sec Loss 5.7671 LearningRate 0.0230 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:27,665-Speed 5561.02 samples/sec Loss 5.7526 LearningRate 0.0230 Epoch: 10 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:29,544-Speed 5454.22 samples/sec Loss 5.6654 LearningRate 0.0230 Epoch: 10 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:31,391-Speed 5547.33 samples/sec Loss 5.9179 LearningRate 0.0230 Epoch: 10 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:33,247-Speed 5518.74 samples/sec Loss 5.6338 LearningRate 0.0230 Epoch: 10 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:35,107-Speed 5506.91 samples/sec Loss 5.8413 LearningRate 0.0229 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:36,968-Speed 5508.13 samples/sec Loss 5.7575 LearningRate 0.0229 Epoch: 10 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:38,841-Speed 5469.42 samples/sec Loss 5.6526 LearningRate 0.0229 Epoch: 10 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:39:40,684-Speed 5560.92 samples/sec Loss 5.7847 LearningRate 0.0229 Epoch: 10 Global Step: 52730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:42,582-Speed 5396.78 samples/sec Loss 5.6633 LearningRate 0.0229 Epoch: 10 Global Step: 52740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:44,423-Speed 5564.20 samples/sec Loss 5.6285 LearningRate 0.0229 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:46,302-Speed 5454.39 samples/sec Loss 5.8385 LearningRate 0.0229 Epoch: 10 Global Step: 52760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:48,152-Speed 5536.56 samples/sec Loss 5.6890 LearningRate 0.0229 Epoch: 10 Global Step: 52770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:50,021-Speed 5480.74 samples/sec Loss 5.6753 LearningRate 0.0229 Epoch: 10 Global Step: 52780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:51,874-Speed 5530.53 samples/sec Loss 5.8054 LearningRate 0.0229 Epoch: 10 Global Step: 52790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:53,748-Speed 5467.15 samples/sec Loss 5.8317 LearningRate 0.0229 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:55,594-Speed 5551.97 samples/sec Loss 5.7341 LearningRate 0.0228 Epoch: 10 Global Step: 52810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:57,452-Speed 5514.36 samples/sec Loss 5.8469 LearningRate 0.0228 Epoch: 10 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:39:59,303-Speed 5532.17 samples/sec Loss 5.7997 LearningRate 0.0228 Epoch: 10 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:01,187-Speed 5439.31 samples/sec Loss 5.8445 LearningRate 0.0228 Epoch: 10 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:03,027-Speed 5569.38 samples/sec Loss 5.8615 LearningRate 0.0228 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:04,889-Speed 5502.10 samples/sec Loss 5.8786 LearningRate 0.0228 Epoch: 10 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:06,734-Speed 5553.47 samples/sec Loss 5.7795 LearningRate 0.0228 Epoch: 10 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:08,604-Speed 5477.86 samples/sec Loss 5.8147 LearningRate 0.0228 Epoch: 10 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:10,449-Speed 5552.40 samples/sec Loss 5.8707 LearningRate 0.0228 Epoch: 10 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:12,319-Speed 5481.30 samples/sec Loss 5.8866 LearningRate 0.0228 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:14,158-Speed 5569.73 samples/sec Loss 5.6144 LearningRate 0.0227 Epoch: 10 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:16,034-Speed 5462.24 samples/sec Loss 5.7861 LearningRate 0.0227 Epoch: 10 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:17,894-Speed 5508.21 samples/sec Loss 5.7628 LearningRate 0.0227 Epoch: 10 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:19,742-Speed 5544.66 samples/sec Loss 5.6922 LearningRate 0.0227 Epoch: 10 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:21,591-Speed 5542.59 samples/sec Loss 5.8598 LearningRate 0.0227 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:23,466-Speed 5462.39 samples/sec Loss 5.8808 LearningRate 0.0227 Epoch: 10 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:25,307-Speed 5565.28 samples/sec Loss 5.6855 LearningRate 0.0227 Epoch: 10 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:27,148-Speed 5566.29 samples/sec Loss 5.9144 LearningRate 0.0227 Epoch: 10 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:29,000-Speed 5530.34 samples/sec Loss 5.9113 LearningRate 0.0227 Epoch: 10 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:30,853-Speed 5529.48 samples/sec Loss 5.7086 LearningRate 0.0227 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:32,726-Speed 5469.29 samples/sec Loss 5.8245 LearningRate 0.0227 Epoch: 10 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:34,563-Speed 5577.30 samples/sec Loss 5.7978 LearningRate 0.0226 Epoch: 10 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:36,396-Speed 5591.72 samples/sec Loss 5.8962 LearningRate 0.0226 Epoch: 10 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:38,240-Speed 5554.51 samples/sec Loss 5.7447 LearningRate 0.0226 Epoch: 10 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:40,087-Speed 5546.20 samples/sec Loss 5.7280 LearningRate 0.0226 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:41,989-Speed 5386.64 samples/sec Loss 5.7866 LearningRate 0.0226 Epoch: 10 Global Step: 53060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:43,837-Speed 5545.95 samples/sec Loss 5.7707 LearningRate 0.0226 Epoch: 10 Global Step: 53070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:45,703-Speed 5490.60 samples/sec Loss 5.8063 LearningRate 0.0226 Epoch: 10 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:47,547-Speed 5556.75 samples/sec Loss 5.7439 LearningRate 0.0226 Epoch: 10 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:49,406-Speed 5509.79 samples/sec Loss 5.6707 LearningRate 0.0226 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:51,288-Speed 5445.61 samples/sec Loss 5.7937 LearningRate 0.0226 Epoch: 10 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:53,136-Speed 5543.59 samples/sec Loss 5.8524 LearningRate 0.0226 Epoch: 10 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:40:54,980-Speed 5553.43 samples/sec Loss 5.8172 LearningRate 0.0225 Epoch: 10 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:40:56,851-Speed 5476.47 samples/sec Loss 5.7199 LearningRate 0.0225 Epoch: 10 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:40:58,732-Speed 5447.33 samples/sec Loss 5.8467 LearningRate 0.0225 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:00,598-Speed 5491.14 samples/sec Loss 5.7315 LearningRate 0.0225 Epoch: 10 Global Step: 53160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:02,454-Speed 5522.90 samples/sec Loss 5.9080 LearningRate 0.0225 Epoch: 10 Global Step: 53170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:04,321-Speed 5486.63 samples/sec Loss 5.7060 LearningRate 0.0225 Epoch: 10 Global Step: 53180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:06,161-Speed 5567.99 samples/sec Loss 5.7226 LearningRate 0.0225 Epoch: 10 Global Step: 53190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:08,030-Speed 5482.66 samples/sec Loss 5.8014 LearningRate 0.0225 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:09,886-Speed 5518.61 samples/sec Loss 5.7669 LearningRate 0.0225 Epoch: 10 Global Step: 53210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:11,765-Speed 5453.59 samples/sec Loss 5.7604 LearningRate 0.0225 Epoch: 10 Global Step: 53220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:13,623-Speed 5513.93 samples/sec Loss 5.7729 LearningRate 0.0224 Epoch: 10 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:15,488-Speed 5493.52 samples/sec Loss 5.8007 LearningRate 0.0224 Epoch: 10 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:17,345-Speed 5518.51 samples/sec Loss 5.8082 LearningRate 0.0224 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:19,194-Speed 5540.41 samples/sec Loss 5.7654 LearningRate 0.0224 Epoch: 10 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:21,030-Speed 5582.36 samples/sec Loss 5.7835 LearningRate 0.0224 Epoch: 10 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:22,867-Speed 5575.99 samples/sec Loss 5.6649 LearningRate 0.0224 Epoch: 10 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:24,741-Speed 5466.99 samples/sec Loss 5.7344 LearningRate 0.0224 Epoch: 10 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:26,585-Speed 5557.47 samples/sec Loss 5.7629 LearningRate 0.0224 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:28,465-Speed 5447.47 samples/sec Loss 5.8999 LearningRate 0.0224 Epoch: 10 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:30,312-Speed 5546.42 samples/sec Loss 5.6172 LearningRate 0.0224 Epoch: 10 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:32,179-Speed 5489.82 samples/sec Loss 5.7652 LearningRate 0.0224 Epoch: 10 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:34,051-Speed 5473.42 samples/sec Loss 5.7465 LearningRate 0.0223 Epoch: 10 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:35,887-Speed 5579.70 samples/sec Loss 5.8172 LearningRate 0.0223 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:37,736-Speed 5541.73 samples/sec Loss 5.7947 LearningRate 0.0223 Epoch: 10 Global Step: 53360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:39,570-Speed 5585.73 samples/sec Loss 5.6645 LearningRate 0.0223 Epoch: 10 Global Step: 53370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:41,429-Speed 5512.16 samples/sec Loss 5.6774 LearningRate 0.0223 Epoch: 10 Global Step: 53380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:43,269-Speed 5567.87 samples/sec Loss 5.6934 LearningRate 0.0223 Epoch: 10 Global Step: 53390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:45,110-Speed 5564.06 samples/sec Loss 5.7294 LearningRate 0.0223 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:46,957-Speed 5550.22 samples/sec Loss 5.5792 LearningRate 0.0223 Epoch: 10 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:48,847-Speed 5420.13 samples/sec Loss 5.8713 LearningRate 0.0223 Epoch: 10 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:50,700-Speed 5529.76 samples/sec Loss 5.9179 LearningRate 0.0223 Epoch: 10 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:41:52,562-Speed 5500.24 samples/sec Loss 5.8255 LearningRate 0.0223 Epoch: 10 Global Step: 53440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:54,405-Speed 5559.81 samples/sec Loss 5.9072 LearningRate 0.0222 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:56,248-Speed 5558.87 samples/sec Loss 5.6209 LearningRate 0.0222 Epoch: 10 Global Step: 53460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:58,116-Speed 5485.81 samples/sec Loss 5.7548 LearningRate 0.0222 Epoch: 10 Global Step: 53470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:41:59,964-Speed 5543.86 samples/sec Loss 5.8057 LearningRate 0.0222 Epoch: 10 Global Step: 53480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:01,832-Speed 5484.59 samples/sec Loss 5.9212 LearningRate 0.0222 Epoch: 10 Global Step: 53490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:03,675-Speed 5560.84 samples/sec Loss 5.8456 LearningRate 0.0222 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:05,530-Speed 5521.55 samples/sec Loss 5.8002 LearningRate 0.0222 Epoch: 10 Global Step: 53510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:07,379-Speed 5541.26 samples/sec Loss 5.7054 LearningRate 0.0222 Epoch: 10 Global Step: 53520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:09,216-Speed 5577.27 samples/sec Loss 5.9624 LearningRate 0.0222 Epoch: 10 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:42:11,056-Speed 5566.37 samples/sec Loss 5.6352 LearningRate 0.0222 Epoch: 10 Global Step: 53540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:12,905-Speed 5542.67 samples/sec Loss 5.8572 LearningRate 0.0222 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:14,742-Speed 5575.22 samples/sec Loss 5.6869 LearningRate 0.0221 Epoch: 10 Global Step: 53560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:16,605-Speed 5496.97 samples/sec Loss 5.8512 LearningRate 0.0221 Epoch: 10 Global Step: 53570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:18,449-Speed 5558.58 samples/sec Loss 5.7756 LearningRate 0.0221 Epoch: 10 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:20,284-Speed 5580.14 samples/sec Loss 5.7938 LearningRate 0.0221 Epoch: 10 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:22,126-Speed 5564.40 samples/sec Loss 5.6666 LearningRate 0.0221 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:23,966-Speed 5565.22 samples/sec Loss 5.6841 LearningRate 0.0221 Epoch: 10 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:25,838-Speed 5473.71 samples/sec Loss 5.7660 LearningRate 0.0221 Epoch: 10 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:27,687-Speed 5540.14 samples/sec Loss 5.7133 LearningRate 0.0221 Epoch: 10 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:29,562-Speed 5462.49 samples/sec Loss 5.8061 LearningRate 0.0221 Epoch: 10 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:42:31,412-Speed 5539.74 samples/sec Loss 5.8839 LearningRate 0.0221 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:33,256-Speed 5555.01 samples/sec Loss 5.8115 LearningRate 0.0220 Epoch: 10 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:35,097-Speed 5564.79 samples/sec Loss 5.8337 LearningRate 0.0220 Epoch: 10 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:36,948-Speed 5535.08 samples/sec Loss 5.8878 LearningRate 0.0220 Epoch: 10 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:38,840-Speed 5414.08 samples/sec Loss 5.6656 LearningRate 0.0220 Epoch: 10 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:40,716-Speed 5458.85 samples/sec Loss 5.7294 LearningRate 0.0220 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:42,575-Speed 5512.91 samples/sec Loss 5.7705 LearningRate 0.0220 Epoch: 10 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:44,424-Speed 5540.80 samples/sec Loss 5.6668 LearningRate 0.0220 Epoch: 10 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:46,276-Speed 5533.50 samples/sec Loss 5.6854 LearningRate 0.0220 Epoch: 10 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:48,120-Speed 5555.07 samples/sec Loss 5.6753 LearningRate 0.0220 Epoch: 10 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:49,991-Speed 5475.92 samples/sec Loss 5.6945 LearningRate 0.0220 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:51,850-Speed 5510.01 samples/sec Loss 5.7620 LearningRate 0.0220 Epoch: 10 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:53,715-Speed 5493.51 samples/sec Loss 5.7301 LearningRate 0.0219 Epoch: 10 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:55,560-Speed 5553.53 samples/sec Loss 5.8394 LearningRate 0.0219 Epoch: 10 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:57,405-Speed 5553.85 samples/sec Loss 5.7692 LearningRate 0.0219 Epoch: 10 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:42:59,255-Speed 5537.87 samples/sec Loss 5.7216 LearningRate 0.0219 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:01,113-Speed 5515.44 samples/sec Loss 5.7042 LearningRate 0.0219 Epoch: 10 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:02,970-Speed 5516.66 samples/sec Loss 5.6566 LearningRate 0.0219 Epoch: 10 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:04,857-Speed 5429.23 samples/sec Loss 5.8542 LearningRate 0.0219 Epoch: 10 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:06,707-Speed 5539.34 samples/sec Loss 5.7806 LearningRate 0.0219 Epoch: 10 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:08,551-Speed 5555.01 samples/sec Loss 5.7051 LearningRate 0.0219 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:10,425-Speed 5468.36 samples/sec Loss 5.7538 LearningRate 0.0219 Epoch: 10 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:12,287-Speed 5502.50 samples/sec Loss 5.8133 LearningRate 0.0219 Epoch: 10 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:14,161-Speed 5468.24 samples/sec Loss 5.7583 LearningRate 0.0218 Epoch: 10 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:15,998-Speed 5575.36 samples/sec Loss 5.7999 LearningRate 0.0218 Epoch: 10 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:17,856-Speed 5515.86 samples/sec Loss 5.9414 LearningRate 0.0218 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:19,697-Speed 5565.25 samples/sec Loss 5.6751 LearningRate 0.0218 Epoch: 10 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:43:21,557-Speed 5507.65 samples/sec Loss 5.8503 LearningRate 0.0218 Epoch: 10 Global Step: 53920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:23,403-Speed 5549.95 samples/sec Loss 5.7812 LearningRate 0.0218 Epoch: 10 Global Step: 53930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:25,262-Speed 5510.74 samples/sec Loss 5.8014 LearningRate 0.0218 Epoch: 10 Global Step: 53940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:27,141-Speed 5451.56 samples/sec Loss 5.6876 LearningRate 0.0218 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:29,033-Speed 5417.66 samples/sec Loss 5.9654 LearningRate 0.0218 Epoch: 10 Global Step: 53960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:30,940-Speed 5370.73 samples/sec Loss 5.7887 LearningRate 0.0218 Epoch: 10 Global Step: 53970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:32,797-Speed 5517.31 samples/sec Loss 5.6957 LearningRate 0.0218 Epoch: 10 Global Step: 53980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:34,637-Speed 5569.03 samples/sec Loss 5.8523 LearningRate 0.0217 Epoch: 10 Global Step: 53990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:43:36,517-Speed 5449.15 samples/sec Loss 5.7996 LearningRate 0.0217 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:44:03,632-[lfw][54000]XNorm: 22.709581 Training: 2022-04-11 13:44:03,633-[lfw][54000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 13:44:03,634-[lfw][54000]Accuracy-Highest: 0.99817 Training: 2022-04-11 13:44:35,029-[cfp_fp][54000]XNorm: 20.050151 Training: 2022-04-11 13:44:35,030-[cfp_fp][54000]Accuracy-Flip: 0.96186+-0.01034 Training: 2022-04-11 13:44:35,031-[cfp_fp][54000]Accuracy-Highest: 0.97029 Training: 2022-04-11 13:45:02,161-[agedb_30][54000]XNorm: 22.348742 Training: 2022-04-11 13:45:02,162-[agedb_30][54000]Accuracy-Flip: 0.97783+-0.00760 Training: 2022-04-11 13:45:02,162-[agedb_30][54000]Accuracy-Highest: 0.97817 Training: 2022-04-11 13:45:04,024-Speed 117.02 samples/sec Loss 5.9189 LearningRate 0.0217 Epoch: 10 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:05,857-Speed 5589.35 samples/sec Loss 5.7533 LearningRate 0.0217 Epoch: 10 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:07,703-Speed 5547.17 samples/sec Loss 5.6040 LearningRate 0.0217 Epoch: 10 Global Step: 54030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:09,540-Speed 5576.90 samples/sec Loss 5.7507 LearningRate 0.0217 Epoch: 10 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:11,372-Speed 5593.62 samples/sec Loss 5.6199 LearningRate 0.0217 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:13,203-Speed 5594.16 samples/sec Loss 5.7403 LearningRate 0.0217 Epoch: 10 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:15,034-Speed 5594.69 samples/sec Loss 5.7391 LearningRate 0.0217 Epoch: 10 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:16,864-Speed 5597.64 samples/sec Loss 5.6273 LearningRate 0.0217 Epoch: 10 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:18,700-Speed 5582.66 samples/sec Loss 5.6505 LearningRate 0.0217 Epoch: 10 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:20,553-Speed 5528.76 samples/sec Loss 5.8474 LearningRate 0.0216 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:22,401-Speed 5545.79 samples/sec Loss 5.5895 LearningRate 0.0216 Epoch: 10 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:24,247-Speed 5549.18 samples/sec Loss 5.7268 LearningRate 0.0216 Epoch: 10 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:26,074-Speed 5608.97 samples/sec Loss 5.7342 LearningRate 0.0216 Epoch: 10 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:27,939-Speed 5495.56 samples/sec Loss 5.6365 LearningRate 0.0216 Epoch: 10 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:29,787-Speed 5541.35 samples/sec Loss 5.6630 LearningRate 0.0216 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:31,625-Speed 5573.50 samples/sec Loss 5.5985 LearningRate 0.0216 Epoch: 10 Global Step: 54160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:33,470-Speed 5554.14 samples/sec Loss 5.7321 LearningRate 0.0216 Epoch: 10 Global Step: 54170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:35,309-Speed 5571.46 samples/sec Loss 5.7372 LearningRate 0.0216 Epoch: 10 Global Step: 54180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:37,167-Speed 5515.08 samples/sec Loss 5.7052 LearningRate 0.0216 Epoch: 10 Global Step: 54190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:39,024-Speed 5517.41 samples/sec Loss 5.7340 LearningRate 0.0215 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:40,883-Speed 5511.25 samples/sec Loss 5.7311 LearningRate 0.0215 Epoch: 10 Global Step: 54210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:42,724-Speed 5566.08 samples/sec Loss 5.8080 LearningRate 0.0215 Epoch: 10 Global Step: 54220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:44,595-Speed 5473.41 samples/sec Loss 5.8639 LearningRate 0.0215 Epoch: 10 Global Step: 54230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:46,439-Speed 5556.30 samples/sec Loss 5.6249 LearningRate 0.0215 Epoch: 10 Global Step: 54240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:48,310-Speed 5476.20 samples/sec Loss 5.6632 LearningRate 0.0215 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:45:50,184-Speed 5466.66 samples/sec Loss 5.7268 LearningRate 0.0215 Epoch: 10 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:52,044-Speed 5511.12 samples/sec Loss 5.7376 LearningRate 0.0215 Epoch: 10 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:53,895-Speed 5534.71 samples/sec Loss 5.7608 LearningRate 0.0215 Epoch: 10 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:55,750-Speed 5522.34 samples/sec Loss 5.6530 LearningRate 0.0215 Epoch: 10 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:57,593-Speed 5562.43 samples/sec Loss 5.7394 LearningRate 0.0215 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:45:59,445-Speed 5532.46 samples/sec Loss 5.6920 LearningRate 0.0214 Epoch: 10 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:01,291-Speed 5551.07 samples/sec Loss 5.7849 LearningRate 0.0214 Epoch: 10 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:03,169-Speed 5454.79 samples/sec Loss 5.9050 LearningRate 0.0214 Epoch: 10 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:05,007-Speed 5575.76 samples/sec Loss 5.6897 LearningRate 0.0214 Epoch: 10 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:06,874-Speed 5488.31 samples/sec Loss 5.5696 LearningRate 0.0214 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:08,708-Speed 5586.02 samples/sec Loss 5.8646 LearningRate 0.0214 Epoch: 10 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:10,565-Speed 5516.71 samples/sec Loss 5.8399 LearningRate 0.0214 Epoch: 10 Global Step: 54370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:12,436-Speed 5478.42 samples/sec Loss 5.7278 LearningRate 0.0214 Epoch: 10 Global Step: 54380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:14,294-Speed 5514.62 samples/sec Loss 5.6689 LearningRate 0.0214 Epoch: 10 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:16,138-Speed 5554.12 samples/sec Loss 5.8713 LearningRate 0.0214 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:18,009-Speed 5478.49 samples/sec Loss 5.7891 LearningRate 0.0214 Epoch: 10 Global Step: 54410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:46:19,842-Speed 5591.71 samples/sec Loss 5.7810 LearningRate 0.0213 Epoch: 10 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:21,692-Speed 5535.30 samples/sec Loss 5.6521 LearningRate 0.0213 Epoch: 10 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:23,535-Speed 5562.58 samples/sec Loss 5.6939 LearningRate 0.0213 Epoch: 10 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:25,419-Speed 5436.69 samples/sec Loss 5.6754 LearningRate 0.0213 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:27,283-Speed 5498.35 samples/sec Loss 5.6394 LearningRate 0.0213 Epoch: 10 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:29,138-Speed 5521.75 samples/sec Loss 5.7549 LearningRate 0.0213 Epoch: 10 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:30,999-Speed 5507.08 samples/sec Loss 5.7935 LearningRate 0.0213 Epoch: 10 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:32,841-Speed 5562.12 samples/sec Loss 5.6489 LearningRate 0.0213 Epoch: 10 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:34,711-Speed 5480.92 samples/sec Loss 5.7283 LearningRate 0.0213 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:36,559-Speed 5544.09 samples/sec Loss 5.7868 LearningRate 0.0213 Epoch: 10 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:38,405-Speed 5549.34 samples/sec Loss 5.6961 LearningRate 0.0213 Epoch: 10 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:40,247-Speed 5560.53 samples/sec Loss 5.6003 LearningRate 0.0212 Epoch: 10 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:42,109-Speed 5503.30 samples/sec Loss 5.7545 LearningRate 0.0212 Epoch: 10 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:43,949-Speed 5569.11 samples/sec Loss 5.6955 LearningRate 0.0212 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:45,807-Speed 5513.83 samples/sec Loss 5.6564 LearningRate 0.0212 Epoch: 10 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:47,667-Speed 5510.55 samples/sec Loss 5.6574 LearningRate 0.0212 Epoch: 10 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:49,551-Speed 5438.96 samples/sec Loss 5.8508 LearningRate 0.0212 Epoch: 10 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:51,387-Speed 5581.98 samples/sec Loss 5.6973 LearningRate 0.0212 Epoch: 10 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:53,255-Speed 5485.58 samples/sec Loss 5.7041 LearningRate 0.0212 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:55,088-Speed 5589.24 samples/sec Loss 5.7443 LearningRate 0.0212 Epoch: 10 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:46:56,924-Speed 5581.17 samples/sec Loss 5.6193 LearningRate 0.0212 Epoch: 10 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:46:58,759-Speed 5585.09 samples/sec Loss 5.7329 LearningRate 0.0212 Epoch: 10 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:00,599-Speed 5568.74 samples/sec Loss 5.7094 LearningRate 0.0211 Epoch: 10 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:02,440-Speed 5563.59 samples/sec Loss 5.4920 LearningRate 0.0211 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:04,276-Speed 5581.28 samples/sec Loss 5.7769 LearningRate 0.0211 Epoch: 10 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:06,109-Speed 5588.32 samples/sec Loss 5.6320 LearningRate 0.0211 Epoch: 10 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:07,952-Speed 5559.13 samples/sec Loss 5.8249 LearningRate 0.0211 Epoch: 10 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:09,799-Speed 5545.00 samples/sec Loss 5.6858 LearningRate 0.0211 Epoch: 10 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:11,658-Speed 5510.23 samples/sec Loss 5.5567 LearningRate 0.0211 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:13,539-Speed 5447.11 samples/sec Loss 5.6349 LearningRate 0.0211 Epoch: 10 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:15,397-Speed 5512.06 samples/sec Loss 5.8096 LearningRate 0.0211 Epoch: 10 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:17,232-Speed 5582.74 samples/sec Loss 5.6639 LearningRate 0.0211 Epoch: 10 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:19,073-Speed 5566.34 samples/sec Loss 5.6324 LearningRate 0.0211 Epoch: 10 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:20,909-Speed 5579.34 samples/sec Loss 5.8911 LearningRate 0.0210 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:22,743-Speed 5587.12 samples/sec Loss 5.6507 LearningRate 0.0210 Epoch: 10 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:24,574-Speed 5593.64 samples/sec Loss 5.6838 LearningRate 0.0210 Epoch: 10 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:26,418-Speed 5553.35 samples/sec Loss 5.7050 LearningRate 0.0210 Epoch: 10 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:28,274-Speed 5523.81 samples/sec Loss 5.5862 LearningRate 0.0210 Epoch: 10 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:30,111-Speed 5574.29 samples/sec Loss 5.8139 LearningRate 0.0210 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:31,946-Speed 5583.03 samples/sec Loss 5.7551 LearningRate 0.0210 Epoch: 10 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:33,780-Speed 5588.00 samples/sec Loss 5.7423 LearningRate 0.0210 Epoch: 10 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:35,632-Speed 5532.05 samples/sec Loss 5.6422 LearningRate 0.0210 Epoch: 10 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:37,477-Speed 5551.20 samples/sec Loss 5.6937 LearningRate 0.0210 Epoch: 10 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:39,322-Speed 5554.01 samples/sec Loss 5.7168 LearningRate 0.0210 Epoch: 10 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:41,171-Speed 5541.38 samples/sec Loss 5.7707 LearningRate 0.0209 Epoch: 10 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:47:43,003-Speed 5591.05 samples/sec Loss 5.6153 LearningRate 0.0209 Epoch: 10 Global Step: 54870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:44,856-Speed 5530.25 samples/sec Loss 5.6833 LearningRate 0.0209 Epoch: 10 Global Step: 54880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:46,721-Speed 5492.97 samples/sec Loss 5.6826 LearningRate 0.0209 Epoch: 10 Global Step: 54890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:48,563-Speed 5564.26 samples/sec Loss 5.7865 LearningRate 0.0209 Epoch: 10 Global Step: 54900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:50,403-Speed 5567.26 samples/sec Loss 5.6192 LearningRate 0.0209 Epoch: 10 Global Step: 54910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:52,249-Speed 5550.07 samples/sec Loss 5.6057 LearningRate 0.0209 Epoch: 10 Global Step: 54920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:54,086-Speed 5578.61 samples/sec Loss 5.6434 LearningRate 0.0209 Epoch: 10 Global Step: 54930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:55,937-Speed 5534.64 samples/sec Loss 5.6767 LearningRate 0.0209 Epoch: 10 Global Step: 54940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:57,767-Speed 5599.80 samples/sec Loss 5.6864 LearningRate 0.0209 Epoch: 10 Global Step: 54950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:47:59,601-Speed 5585.89 samples/sec Loss 5.7692 LearningRate 0.0209 Epoch: 10 Global Step: 54960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:01,441-Speed 5565.98 samples/sec Loss 5.7004 LearningRate 0.0208 Epoch: 10 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:03,279-Speed 5574.27 samples/sec Loss 5.7020 LearningRate 0.0208 Epoch: 10 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:05,122-Speed 5558.64 samples/sec Loss 5.6455 LearningRate 0.0208 Epoch: 10 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:06,958-Speed 5580.34 samples/sec Loss 5.7164 LearningRate 0.0208 Epoch: 10 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:08,795-Speed 5576.66 samples/sec Loss 5.7892 LearningRate 0.0208 Epoch: 10 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:10,656-Speed 5508.37 samples/sec Loss 5.7022 LearningRate 0.0208 Epoch: 10 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:12,484-Speed 5606.17 samples/sec Loss 5.7236 LearningRate 0.0208 Epoch: 10 Global Step: 55030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:14,343-Speed 5511.56 samples/sec Loss 5.6259 LearningRate 0.0208 Epoch: 10 Global Step: 55040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:16,178-Speed 5581.33 samples/sec Loss 5.6528 LearningRate 0.0208 Epoch: 10 Global Step: 55050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:18,036-Speed 5518.11 samples/sec Loss 5.6803 LearningRate 0.0208 Epoch: 10 Global Step: 55060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:19,878-Speed 5560.91 samples/sec Loss 5.5741 LearningRate 0.0208 Epoch: 10 Global Step: 55070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:21,756-Speed 5456.79 samples/sec Loss 5.6715 LearningRate 0.0207 Epoch: 10 Global Step: 55080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:23,621-Speed 5493.20 samples/sec Loss 5.6759 LearningRate 0.0207 Epoch: 10 Global Step: 55090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:25,486-Speed 5493.41 samples/sec Loss 5.6615 LearningRate 0.0207 Epoch: 10 Global Step: 55100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:27,340-Speed 5526.51 samples/sec Loss 5.7286 LearningRate 0.0207 Epoch: 10 Global Step: 55110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:29,195-Speed 5523.00 samples/sec Loss 5.6487 LearningRate 0.0207 Epoch: 10 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:48:31,044-Speed 5540.97 samples/sec Loss 5.4630 LearningRate 0.0207 Epoch: 10 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:32,924-Speed 5452.37 samples/sec Loss 5.6324 LearningRate 0.0207 Epoch: 10 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:34,785-Speed 5503.35 samples/sec Loss 5.7635 LearningRate 0.0207 Epoch: 10 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:36,618-Speed 5589.27 samples/sec Loss 5.6174 LearningRate 0.0207 Epoch: 10 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:38,487-Speed 5483.47 samples/sec Loss 5.6854 LearningRate 0.0207 Epoch: 10 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:40,368-Speed 5447.41 samples/sec Loss 5.5463 LearningRate 0.0207 Epoch: 10 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:42,216-Speed 5543.55 samples/sec Loss 5.6347 LearningRate 0.0207 Epoch: 10 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:44,081-Speed 5494.39 samples/sec Loss 5.5807 LearningRate 0.0206 Epoch: 10 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:45,918-Speed 5577.22 samples/sec Loss 5.7519 LearningRate 0.0206 Epoch: 10 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:47,789-Speed 5476.05 samples/sec Loss 5.6653 LearningRate 0.0206 Epoch: 10 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:49,629-Speed 5568.25 samples/sec Loss 5.6181 LearningRate 0.0206 Epoch: 10 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:51,465-Speed 5578.24 samples/sec Loss 5.4731 LearningRate 0.0206 Epoch: 10 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:53,314-Speed 5541.57 samples/sec Loss 5.6735 LearningRate 0.0206 Epoch: 10 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:55,156-Speed 5563.88 samples/sec Loss 5.6742 LearningRate 0.0206 Epoch: 10 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:57,026-Speed 5476.76 samples/sec Loss 5.6727 LearningRate 0.0206 Epoch: 10 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:48:58,873-Speed 5549.31 samples/sec Loss 5.6279 LearningRate 0.0206 Epoch: 10 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:00,758-Speed 5433.53 samples/sec Loss 5.5920 LearningRate 0.0206 Epoch: 10 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:02,608-Speed 5539.42 samples/sec Loss 5.6800 LearningRate 0.0206 Epoch: 10 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:04,458-Speed 5536.93 samples/sec Loss 5.6449 LearningRate 0.0205 Epoch: 10 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:06,330-Speed 5473.70 samples/sec Loss 5.5979 LearningRate 0.0205 Epoch: 10 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:08,216-Speed 5432.72 samples/sec Loss 5.6765 LearningRate 0.0205 Epoch: 10 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:10,068-Speed 5531.93 samples/sec Loss 5.4374 LearningRate 0.0205 Epoch: 10 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:11,911-Speed 5561.22 samples/sec Loss 5.6005 LearningRate 0.0205 Epoch: 10 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:13,751-Speed 5568.79 samples/sec Loss 5.5765 LearningRate 0.0205 Epoch: 10 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:15,621-Speed 5477.81 samples/sec Loss 5.6002 LearningRate 0.0205 Epoch: 10 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:17,464-Speed 5558.83 samples/sec Loss 5.6818 LearningRate 0.0205 Epoch: 10 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:19,321-Speed 5517.58 samples/sec Loss 5.5366 LearningRate 0.0205 Epoch: 10 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:21,173-Speed 5532.25 samples/sec Loss 5.7954 LearningRate 0.0205 Epoch: 10 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:23,045-Speed 5472.42 samples/sec Loss 5.7020 LearningRate 0.0205 Epoch: 10 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:24,894-Speed 5540.36 samples/sec Loss 5.5624 LearningRate 0.0204 Epoch: 10 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:26,760-Speed 5490.69 samples/sec Loss 5.7360 LearningRate 0.0204 Epoch: 10 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:49:28,601-Speed 5566.19 samples/sec Loss 5.5792 LearningRate 0.0204 Epoch: 10 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:30,463-Speed 5501.87 samples/sec Loss 5.5037 LearningRate 0.0204 Epoch: 10 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:32,308-Speed 5553.34 samples/sec Loss 5.8643 LearningRate 0.0204 Epoch: 10 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:34,162-Speed 5527.22 samples/sec Loss 5.6429 LearningRate 0.0204 Epoch: 10 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:36,035-Speed 5469.86 samples/sec Loss 5.5198 LearningRate 0.0204 Epoch: 10 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:37,898-Speed 5500.47 samples/sec Loss 5.5862 LearningRate 0.0204 Epoch: 10 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:39,755-Speed 5514.78 samples/sec Loss 5.5750 LearningRate 0.0204 Epoch: 10 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:41,615-Speed 5509.17 samples/sec Loss 5.7029 LearningRate 0.0204 Epoch: 10 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:49:43,459-Speed 5555.18 samples/sec Loss 5.6983 LearningRate 0.0204 Epoch: 10 Global Step: 55520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:45,319-Speed 5509.19 samples/sec Loss 5.6516 LearningRate 0.0203 Epoch: 10 Global Step: 55530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:47,167-Speed 5545.59 samples/sec Loss 5.5986 LearningRate 0.0203 Epoch: 10 Global Step: 55540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:49,022-Speed 5523.28 samples/sec Loss 5.5851 LearningRate 0.0203 Epoch: 10 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:50,865-Speed 5559.01 samples/sec Loss 5.7020 LearningRate 0.0203 Epoch: 10 Global Step: 55560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:52,718-Speed 5530.23 samples/sec Loss 5.7410 LearningRate 0.0203 Epoch: 10 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:54,568-Speed 5538.00 samples/sec Loss 5.6379 LearningRate 0.0203 Epoch: 10 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:56,429-Speed 5507.29 samples/sec Loss 5.6747 LearningRate 0.0203 Epoch: 10 Global Step: 55590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:49:58,267-Speed 5574.55 samples/sec Loss 5.5762 LearningRate 0.0203 Epoch: 10 Global Step: 55600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:00,151-Speed 5437.57 samples/sec Loss 5.6034 LearningRate 0.0203 Epoch: 10 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:01,993-Speed 5561.13 samples/sec Loss 5.4683 LearningRate 0.0203 Epoch: 10 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:03,918-Speed 5322.49 samples/sec Loss 5.6437 LearningRate 0.0203 Epoch: 10 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:15,448-Speed 888.29 samples/sec Loss 5.4565 LearningRate 0.0202 Epoch: 11 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:17,336-Speed 5427.77 samples/sec Loss 4.8766 LearningRate 0.0202 Epoch: 11 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:19,207-Speed 5477.25 samples/sec Loss 4.8261 LearningRate 0.0202 Epoch: 11 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:21,181-Speed 5188.49 samples/sec Loss 4.7381 LearningRate 0.0202 Epoch: 11 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:23,060-Speed 5454.76 samples/sec Loss 4.7083 LearningRate 0.0202 Epoch: 11 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:24,966-Speed 5375.99 samples/sec Loss 4.8162 LearningRate 0.0202 Epoch: 11 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:26,831-Speed 5494.35 samples/sec Loss 4.7462 LearningRate 0.0202 Epoch: 11 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:28,687-Speed 5518.91 samples/sec Loss 4.7164 LearningRate 0.0202 Epoch: 11 Global Step: 55710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:30,550-Speed 5505.95 samples/sec Loss 4.8561 LearningRate 0.0202 Epoch: 11 Global Step: 55720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:32,397-Speed 5551.61 samples/sec Loss 4.6588 LearningRate 0.0202 Epoch: 11 Global Step: 55730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:34,254-Speed 5517.10 samples/sec Loss 4.7921 LearningRate 0.0202 Epoch: 11 Global Step: 55740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:36,112-Speed 5514.08 samples/sec Loss 4.8635 LearningRate 0.0202 Epoch: 11 Global Step: 55750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:37,988-Speed 5463.55 samples/sec Loss 4.8222 LearningRate 0.0201 Epoch: 11 Global Step: 55760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:50:39,875-Speed 5429.70 samples/sec Loss 4.7493 LearningRate 0.0201 Epoch: 11 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:41,756-Speed 5447.61 samples/sec Loss 4.9538 LearningRate 0.0201 Epoch: 11 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:43,597-Speed 5564.34 samples/sec Loss 4.8864 LearningRate 0.0201 Epoch: 11 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:45,448-Speed 5536.81 samples/sec Loss 4.7755 LearningRate 0.0201 Epoch: 11 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:47,288-Speed 5566.76 samples/sec Loss 4.7998 LearningRate 0.0201 Epoch: 11 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:49,153-Speed 5494.80 samples/sec Loss 4.8333 LearningRate 0.0201 Epoch: 11 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:51,011-Speed 5513.14 samples/sec Loss 4.9341 LearningRate 0.0201 Epoch: 11 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:52,890-Speed 5452.96 samples/sec Loss 4.9448 LearningRate 0.0201 Epoch: 11 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:54,742-Speed 5531.51 samples/sec Loss 4.7695 LearningRate 0.0201 Epoch: 11 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:56,622-Speed 5448.99 samples/sec Loss 4.8345 LearningRate 0.0201 Epoch: 11 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:50:58,479-Speed 5516.75 samples/sec Loss 4.9084 LearningRate 0.0200 Epoch: 11 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:51:00,342-Speed 5501.06 samples/sec Loss 4.8106 LearningRate 0.0200 Epoch: 11 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:51:02,228-Speed 5432.97 samples/sec Loss 4.8785 LearningRate 0.0200 Epoch: 11 Global Step: 55890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:04,090-Speed 5502.86 samples/sec Loss 4.9176 LearningRate 0.0200 Epoch: 11 Global Step: 55900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:05,964-Speed 5466.71 samples/sec Loss 4.8908 LearningRate 0.0200 Epoch: 11 Global Step: 55910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:07,832-Speed 5485.95 samples/sec Loss 4.9080 LearningRate 0.0200 Epoch: 11 Global Step: 55920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:09,692-Speed 5506.66 samples/sec Loss 4.9049 LearningRate 0.0200 Epoch: 11 Global Step: 55930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:11,555-Speed 5500.13 samples/sec Loss 4.9196 LearningRate 0.0200 Epoch: 11 Global Step: 55940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:13,413-Speed 5518.28 samples/sec Loss 4.9004 LearningRate 0.0200 Epoch: 11 Global Step: 55950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:15,266-Speed 5528.08 samples/sec Loss 5.0980 LearningRate 0.0200 Epoch: 11 Global Step: 55960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:17,133-Speed 5487.70 samples/sec Loss 4.8800 LearningRate 0.0200 Epoch: 11 Global Step: 55970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:18,988-Speed 5523.03 samples/sec Loss 4.9868 LearningRate 0.0199 Epoch: 11 Global Step: 55980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:51:20,869-Speed 5448.97 samples/sec Loss 4.9955 LearningRate 0.0199 Epoch: 11 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:51:22,720-Speed 5533.16 samples/sec Loss 5.0226 LearningRate 0.0199 Epoch: 11 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:51:49,901-[lfw][56000]XNorm: 22.698495 Training: 2022-04-11 13:51:49,902-[lfw][56000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 13:51:49,903-[lfw][56000]Accuracy-Highest: 0.99817 Training: 2022-04-11 13:52:21,409-[cfp_fp][56000]XNorm: 20.081252 Training: 2022-04-11 13:52:21,410-[cfp_fp][56000]Accuracy-Flip: 0.97257+-0.00859 Training: 2022-04-11 13:52:21,411-[cfp_fp][56000]Accuracy-Highest: 0.97257 Training: 2022-04-11 13:52:48,523-[agedb_30][56000]XNorm: 22.154971 Training: 2022-04-11 13:52:48,524-[agedb_30][56000]Accuracy-Flip: 0.97733+-0.00651 Training: 2022-04-11 13:52:48,525-[agedb_30][56000]Accuracy-Highest: 0.97817 Training: 2022-04-11 13:52:50,408-Speed 116.78 samples/sec Loss 5.0368 LearningRate 0.0199 Epoch: 11 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:52:52,269-Speed 5507.43 samples/sec Loss 5.0220 LearningRate 0.0199 Epoch: 11 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:52:54,130-Speed 5504.77 samples/sec Loss 4.9117 LearningRate 0.0199 Epoch: 11 Global Step: 56030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:52:55,971-Speed 5565.46 samples/sec Loss 5.0721 LearningRate 0.0199 Epoch: 11 Global Step: 56040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:52:57,834-Speed 5499.91 samples/sec Loss 5.0192 LearningRate 0.0199 Epoch: 11 Global Step: 56050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:52:59,692-Speed 5515.35 samples/sec Loss 5.0632 LearningRate 0.0199 Epoch: 11 Global Step: 56060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:01,598-Speed 5373.68 samples/sec Loss 5.0544 LearningRate 0.0199 Epoch: 11 Global Step: 56070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:03,438-Speed 5571.55 samples/sec Loss 5.0412 LearningRate 0.0199 Epoch: 11 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:05,293-Speed 5521.55 samples/sec Loss 4.9415 LearningRate 0.0198 Epoch: 11 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:07,140-Speed 5549.96 samples/sec Loss 5.0997 LearningRate 0.0198 Epoch: 11 Global Step: 56100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:08,990-Speed 5536.12 samples/sec Loss 4.9981 LearningRate 0.0198 Epoch: 11 Global Step: 56110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:10,830-Speed 5569.73 samples/sec Loss 5.0542 LearningRate 0.0198 Epoch: 11 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:12,669-Speed 5568.13 samples/sec Loss 5.0114 LearningRate 0.0198 Epoch: 11 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:14,518-Speed 5543.98 samples/sec Loss 5.0307 LearningRate 0.0198 Epoch: 11 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:16,366-Speed 5545.16 samples/sec Loss 4.8961 LearningRate 0.0198 Epoch: 11 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:18,262-Speed 5403.47 samples/sec Loss 5.0311 LearningRate 0.0198 Epoch: 11 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:20,101-Speed 5571.92 samples/sec Loss 5.0841 LearningRate 0.0198 Epoch: 11 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:21,935-Speed 5584.49 samples/sec Loss 5.0687 LearningRate 0.0198 Epoch: 11 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:23,778-Speed 5561.53 samples/sec Loss 5.0130 LearningRate 0.0198 Epoch: 11 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:25,646-Speed 5485.53 samples/sec Loss 5.1587 LearningRate 0.0198 Epoch: 11 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:27,531-Speed 5433.88 samples/sec Loss 5.1617 LearningRate 0.0197 Epoch: 11 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:29,381-Speed 5536.91 samples/sec Loss 5.1047 LearningRate 0.0197 Epoch: 11 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:31,247-Speed 5492.65 samples/sec Loss 5.1309 LearningRate 0.0197 Epoch: 11 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:33,090-Speed 5558.74 samples/sec Loss 4.9600 LearningRate 0.0197 Epoch: 11 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:34,958-Speed 5487.88 samples/sec Loss 4.9563 LearningRate 0.0197 Epoch: 11 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:36,815-Speed 5518.55 samples/sec Loss 5.0720 LearningRate 0.0197 Epoch: 11 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:38,688-Speed 5471.26 samples/sec Loss 5.1117 LearningRate 0.0197 Epoch: 11 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:40,544-Speed 5519.63 samples/sec Loss 4.9760 LearningRate 0.0197 Epoch: 11 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:42,398-Speed 5524.12 samples/sec Loss 4.9858 LearningRate 0.0197 Epoch: 11 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:44,230-Speed 5595.96 samples/sec Loss 5.1267 LearningRate 0.0197 Epoch: 11 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:46,112-Speed 5443.60 samples/sec Loss 5.1778 LearningRate 0.0197 Epoch: 11 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:47,948-Speed 5581.20 samples/sec Loss 5.0369 LearningRate 0.0196 Epoch: 11 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:49,830-Speed 5442.99 samples/sec Loss 4.9593 LearningRate 0.0196 Epoch: 11 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:51,699-Speed 5482.16 samples/sec Loss 4.9747 LearningRate 0.0196 Epoch: 11 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:53,546-Speed 5550.42 samples/sec Loss 4.9728 LearningRate 0.0196 Epoch: 11 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:55,398-Speed 5532.57 samples/sec Loss 5.0192 LearningRate 0.0196 Epoch: 11 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:53:57,243-Speed 5552.52 samples/sec Loss 5.1488 LearningRate 0.0196 Epoch: 11 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:53:59,098-Speed 5522.87 samples/sec Loss 5.2059 LearningRate 0.0196 Epoch: 11 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:00,952-Speed 5526.48 samples/sec Loss 5.0333 LearningRate 0.0196 Epoch: 11 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:02,829-Speed 5458.04 samples/sec Loss 5.0774 LearningRate 0.0196 Epoch: 11 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:04,697-Speed 5485.85 samples/sec Loss 4.9367 LearningRate 0.0196 Epoch: 11 Global Step: 56410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:06,537-Speed 5566.39 samples/sec Loss 5.1707 LearningRate 0.0196 Epoch: 11 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:08,391-Speed 5529.33 samples/sec Loss 5.0469 LearningRate 0.0196 Epoch: 11 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:10,244-Speed 5531.75 samples/sec Loss 5.0769 LearningRate 0.0195 Epoch: 11 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:12,115-Speed 5475.61 samples/sec Loss 5.1499 LearningRate 0.0195 Epoch: 11 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:13,973-Speed 5517.33 samples/sec Loss 5.3538 LearningRate 0.0195 Epoch: 11 Global Step: 56460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:54:15,885-Speed 5360.95 samples/sec Loss 5.1565 LearningRate 0.0195 Epoch: 11 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:17,725-Speed 5567.34 samples/sec Loss 5.1710 LearningRate 0.0195 Epoch: 11 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:19,603-Speed 5458.64 samples/sec Loss 4.9558 LearningRate 0.0195 Epoch: 11 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:21,451-Speed 5542.82 samples/sec Loss 5.0597 LearningRate 0.0195 Epoch: 11 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:23,317-Speed 5494.45 samples/sec Loss 5.2198 LearningRate 0.0195 Epoch: 11 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:25,162-Speed 5555.46 samples/sec Loss 5.1907 LearningRate 0.0195 Epoch: 11 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:27,038-Speed 5461.51 samples/sec Loss 5.1977 LearningRate 0.0195 Epoch: 11 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:28,908-Speed 5482.15 samples/sec Loss 5.1429 LearningRate 0.0195 Epoch: 11 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:30,797-Speed 5425.12 samples/sec Loss 5.0642 LearningRate 0.0194 Epoch: 11 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:32,652-Speed 5522.05 samples/sec Loss 5.1963 LearningRate 0.0194 Epoch: 11 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:34,503-Speed 5537.48 samples/sec Loss 5.2143 LearningRate 0.0194 Epoch: 11 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:54:36,380-Speed 5458.88 samples/sec Loss 5.1822 LearningRate 0.0194 Epoch: 11 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:38,219-Speed 5570.67 samples/sec Loss 5.2703 LearningRate 0.0194 Epoch: 11 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:40,104-Speed 5435.88 samples/sec Loss 5.2754 LearningRate 0.0194 Epoch: 11 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:41,969-Speed 5491.45 samples/sec Loss 5.0775 LearningRate 0.0194 Epoch: 11 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:43,837-Speed 5486.59 samples/sec Loss 5.1119 LearningRate 0.0194 Epoch: 11 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:45,679-Speed 5562.33 samples/sec Loss 5.1683 LearningRate 0.0194 Epoch: 11 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:47,551-Speed 5474.85 samples/sec Loss 5.1847 LearningRate 0.0194 Epoch: 11 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:49,394-Speed 5559.34 samples/sec Loss 5.2792 LearningRate 0.0194 Epoch: 11 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:51,292-Speed 5397.26 samples/sec Loss 5.0243 LearningRate 0.0194 Epoch: 11 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:53,214-Speed 5329.72 samples/sec Loss 5.2386 LearningRate 0.0193 Epoch: 11 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:55,212-Speed 5129.80 samples/sec Loss 5.1173 LearningRate 0.0193 Epoch: 11 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:57,055-Speed 5558.51 samples/sec Loss 5.0834 LearningRate 0.0193 Epoch: 11 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:54:58,914-Speed 5515.84 samples/sec Loss 5.1004 LearningRate 0.0193 Epoch: 11 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:00,774-Speed 5507.25 samples/sec Loss 5.1337 LearningRate 0.0193 Epoch: 11 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:02,633-Speed 5510.43 samples/sec Loss 5.1408 LearningRate 0.0193 Epoch: 11 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:04,476-Speed 5561.70 samples/sec Loss 5.2336 LearningRate 0.0193 Epoch: 11 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:06,339-Speed 5498.31 samples/sec Loss 5.0605 LearningRate 0.0193 Epoch: 11 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:08,174-Speed 5581.71 samples/sec Loss 5.1411 LearningRate 0.0193 Epoch: 11 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:10,020-Speed 5551.34 samples/sec Loss 5.1433 LearningRate 0.0193 Epoch: 11 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:11,892-Speed 5472.37 samples/sec Loss 5.2229 LearningRate 0.0193 Epoch: 11 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:13,772-Speed 5451.19 samples/sec Loss 5.1995 LearningRate 0.0192 Epoch: 11 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:15,616-Speed 5557.24 samples/sec Loss 5.1443 LearningRate 0.0192 Epoch: 11 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:17,488-Speed 5470.40 samples/sec Loss 5.1652 LearningRate 0.0192 Epoch: 11 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:19,332-Speed 5558.46 samples/sec Loss 5.3388 LearningRate 0.0192 Epoch: 11 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:21,169-Speed 5575.39 samples/sec Loss 5.2346 LearningRate 0.0192 Epoch: 11 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:23,024-Speed 5522.91 samples/sec Loss 5.2074 LearningRate 0.0192 Epoch: 11 Global Step: 56830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:24,913-Speed 5425.68 samples/sec Loss 5.2893 LearningRate 0.0192 Epoch: 11 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:26,777-Speed 5496.03 samples/sec Loss 5.1598 LearningRate 0.0192 Epoch: 11 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:28,630-Speed 5525.40 samples/sec Loss 5.1656 LearningRate 0.0192 Epoch: 11 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:30,523-Speed 5414.78 samples/sec Loss 5.2706 LearningRate 0.0192 Epoch: 11 Global Step: 56870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:32,360-Speed 5576.82 samples/sec Loss 5.2635 LearningRate 0.0192 Epoch: 11 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:34,227-Speed 5488.92 samples/sec Loss 5.1104 LearningRate 0.0192 Epoch: 11 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:36,080-Speed 5527.15 samples/sec Loss 5.3019 LearningRate 0.0191 Epoch: 11 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:37,963-Speed 5442.30 samples/sec Loss 5.2049 LearningRate 0.0191 Epoch: 11 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:39,819-Speed 5521.29 samples/sec Loss 5.1693 LearningRate 0.0191 Epoch: 11 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:41,681-Speed 5500.47 samples/sec Loss 5.2200 LearningRate 0.0191 Epoch: 11 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:43,522-Speed 5567.62 samples/sec Loss 5.2132 LearningRate 0.0191 Epoch: 11 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:45,388-Speed 5489.14 samples/sec Loss 5.1438 LearningRate 0.0191 Epoch: 11 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:47,249-Speed 5506.14 samples/sec Loss 5.2103 LearningRate 0.0191 Epoch: 11 Global Step: 56960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:49,109-Speed 5511.52 samples/sec Loss 5.0598 LearningRate 0.0191 Epoch: 11 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:50,986-Speed 5459.82 samples/sec Loss 5.2323 LearningRate 0.0191 Epoch: 11 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:52,833-Speed 5544.39 samples/sec Loss 5.0846 LearningRate 0.0191 Epoch: 11 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:55:54,707-Speed 5468.63 samples/sec Loss 5.2632 LearningRate 0.0191 Epoch: 11 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:56,554-Speed 5546.71 samples/sec Loss 5.3143 LearningRate 0.0190 Epoch: 11 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:55:58,390-Speed 5578.95 samples/sec Loss 5.2309 LearningRate 0.0190 Epoch: 11 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:00,253-Speed 5500.46 samples/sec Loss 5.2073 LearningRate 0.0190 Epoch: 11 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:02,095-Speed 5562.79 samples/sec Loss 5.2878 LearningRate 0.0190 Epoch: 11 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:03,967-Speed 5472.78 samples/sec Loss 5.1894 LearningRate 0.0190 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:05,801-Speed 5587.28 samples/sec Loss 5.2012 LearningRate 0.0190 Epoch: 11 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:07,646-Speed 5550.82 samples/sec Loss 5.1947 LearningRate 0.0190 Epoch: 11 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:09,493-Speed 5547.75 samples/sec Loss 5.1657 LearningRate 0.0190 Epoch: 11 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:11,363-Speed 5478.05 samples/sec Loss 5.2611 LearningRate 0.0190 Epoch: 11 Global Step: 57090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:13,217-Speed 5525.85 samples/sec Loss 5.2955 LearningRate 0.0190 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:15,091-Speed 5468.72 samples/sec Loss 5.1347 LearningRate 0.0190 Epoch: 11 Global Step: 57110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:16,953-Speed 5502.63 samples/sec Loss 5.3464 LearningRate 0.0190 Epoch: 11 Global Step: 57120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:18,808-Speed 5523.59 samples/sec Loss 5.2014 LearningRate 0.0189 Epoch: 11 Global Step: 57130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:20,648-Speed 5568.28 samples/sec Loss 5.2178 LearningRate 0.0189 Epoch: 11 Global Step: 57140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:22,507-Speed 5509.73 samples/sec Loss 5.1779 LearningRate 0.0189 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:24,360-Speed 5531.35 samples/sec Loss 5.1715 LearningRate 0.0189 Epoch: 11 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:26,204-Speed 5554.17 samples/sec Loss 5.3239 LearningRate 0.0189 Epoch: 11 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:28,063-Speed 5510.78 samples/sec Loss 5.2199 LearningRate 0.0189 Epoch: 11 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:29,902-Speed 5573.00 samples/sec Loss 5.2332 LearningRate 0.0189 Epoch: 11 Global Step: 57190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:31,773-Speed 5477.10 samples/sec Loss 5.2913 LearningRate 0.0189 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:33,615-Speed 5560.77 samples/sec Loss 5.1978 LearningRate 0.0189 Epoch: 11 Global Step: 57210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:35,485-Speed 5480.26 samples/sec Loss 5.1838 LearningRate 0.0189 Epoch: 11 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:37,344-Speed 5509.53 samples/sec Loss 5.2105 LearningRate 0.0189 Epoch: 11 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:39,223-Speed 5452.86 samples/sec Loss 5.1171 LearningRate 0.0188 Epoch: 11 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:56:41,092-Speed 5481.19 samples/sec Loss 5.3231 LearningRate 0.0188 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:42,943-Speed 5534.95 samples/sec Loss 5.2371 LearningRate 0.0188 Epoch: 11 Global Step: 57260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:44,814-Speed 5520.14 samples/sec Loss 5.2273 LearningRate 0.0188 Epoch: 11 Global Step: 57270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:46,654-Speed 5570.33 samples/sec Loss 5.1047 LearningRate 0.0188 Epoch: 11 Global Step: 57280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:48,504-Speed 5535.03 samples/sec Loss 5.2167 LearningRate 0.0188 Epoch: 11 Global Step: 57290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:50,364-Speed 5509.50 samples/sec Loss 5.3135 LearningRate 0.0188 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:52,220-Speed 5520.00 samples/sec Loss 5.1562 LearningRate 0.0188 Epoch: 11 Global Step: 57310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:54,069-Speed 5543.47 samples/sec Loss 5.2181 LearningRate 0.0188 Epoch: 11 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:55,940-Speed 5476.74 samples/sec Loss 5.2316 LearningRate 0.0188 Epoch: 11 Global Step: 57330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:57,782-Speed 5559.97 samples/sec Loss 5.2226 LearningRate 0.0188 Epoch: 11 Global Step: 57340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:56:59,650-Speed 5487.62 samples/sec Loss 5.4202 LearningRate 0.0188 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:01,494-Speed 5553.62 samples/sec Loss 5.3441 LearningRate 0.0187 Epoch: 11 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:03,358-Speed 5496.69 samples/sec Loss 5.2950 LearningRate 0.0187 Epoch: 11 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:05,228-Speed 5481.09 samples/sec Loss 5.1390 LearningRate 0.0187 Epoch: 11 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:07,076-Speed 5541.55 samples/sec Loss 5.2847 LearningRate 0.0187 Epoch: 11 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:08,938-Speed 5503.05 samples/sec Loss 5.2874 LearningRate 0.0187 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:10,792-Speed 5528.47 samples/sec Loss 5.2004 LearningRate 0.0187 Epoch: 11 Global Step: 57410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:12,653-Speed 5505.12 samples/sec Loss 5.2390 LearningRate 0.0187 Epoch: 11 Global Step: 57420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:14,526-Speed 5470.30 samples/sec Loss 5.2204 LearningRate 0.0187 Epoch: 11 Global Step: 57430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:16,381-Speed 5522.17 samples/sec Loss 5.3566 LearningRate 0.0187 Epoch: 11 Global Step: 57440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:18,221-Speed 5567.57 samples/sec Loss 5.2728 LearningRate 0.0187 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:20,075-Speed 5527.16 samples/sec Loss 5.2827 LearningRate 0.0187 Epoch: 11 Global Step: 57460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:21,924-Speed 5541.70 samples/sec Loss 5.2125 LearningRate 0.0187 Epoch: 11 Global Step: 57470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:23,767-Speed 5557.88 samples/sec Loss 5.3858 LearningRate 0.0186 Epoch: 11 Global Step: 57480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:25,658-Speed 5418.64 samples/sec Loss 5.2641 LearningRate 0.0186 Epoch: 11 Global Step: 57490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:27,515-Speed 5517.62 samples/sec Loss 5.3542 LearningRate 0.0186 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:29,368-Speed 5532.58 samples/sec Loss 5.2786 LearningRate 0.0186 Epoch: 11 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:31,210-Speed 5560.83 samples/sec Loss 5.3003 LearningRate 0.0186 Epoch: 11 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:33,054-Speed 5556.97 samples/sec Loss 5.2973 LearningRate 0.0186 Epoch: 11 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:34,906-Speed 5533.70 samples/sec Loss 5.0813 LearningRate 0.0186 Epoch: 11 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:36,745-Speed 5570.37 samples/sec Loss 5.2694 LearningRate 0.0186 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:38,625-Speed 5449.92 samples/sec Loss 5.1571 LearningRate 0.0186 Epoch: 11 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:40,474-Speed 5542.19 samples/sec Loss 5.2293 LearningRate 0.0186 Epoch: 11 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:42,344-Speed 5478.79 samples/sec Loss 5.3060 LearningRate 0.0186 Epoch: 11 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:44,191-Speed 5546.21 samples/sec Loss 5.2729 LearningRate 0.0186 Epoch: 11 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:46,041-Speed 5538.05 samples/sec Loss 5.2918 LearningRate 0.0185 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:47,883-Speed 5561.75 samples/sec Loss 5.1600 LearningRate 0.0185 Epoch: 11 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:49,777-Speed 5409.98 samples/sec Loss 5.2322 LearningRate 0.0185 Epoch: 11 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:51,630-Speed 5530.64 samples/sec Loss 5.2872 LearningRate 0.0185 Epoch: 11 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:57:53,487-Speed 5516.15 samples/sec Loss 5.2342 LearningRate 0.0185 Epoch: 11 Global Step: 57640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:55,338-Speed 5536.69 samples/sec Loss 5.3955 LearningRate 0.0185 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:57,208-Speed 5479.45 samples/sec Loss 5.2178 LearningRate 0.0185 Epoch: 11 Global Step: 57660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:57:59,070-Speed 5505.41 samples/sec Loss 5.3225 LearningRate 0.0185 Epoch: 11 Global Step: 57670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:00,910-Speed 5568.16 samples/sec Loss 5.3002 LearningRate 0.0185 Epoch: 11 Global Step: 57680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:02,752-Speed 5560.59 samples/sec Loss 5.1745 LearningRate 0.0185 Epoch: 11 Global Step: 57690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:04,624-Speed 5473.62 samples/sec Loss 5.2734 LearningRate 0.0185 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:06,493-Speed 5482.19 samples/sec Loss 5.3403 LearningRate 0.0184 Epoch: 11 Global Step: 57710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:08,346-Speed 5527.29 samples/sec Loss 5.3031 LearningRate 0.0184 Epoch: 11 Global Step: 57720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:10,204-Speed 5515.48 samples/sec Loss 5.3202 LearningRate 0.0184 Epoch: 11 Global Step: 57730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:12,056-Speed 5532.27 samples/sec Loss 5.3653 LearningRate 0.0184 Epoch: 11 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:13,942-Speed 5432.90 samples/sec Loss 5.2589 LearningRate 0.0184 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:15,788-Speed 5547.52 samples/sec Loss 5.2402 LearningRate 0.0184 Epoch: 11 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:17,634-Speed 5551.28 samples/sec Loss 5.3333 LearningRate 0.0184 Epoch: 11 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:19,481-Speed 5546.20 samples/sec Loss 5.2912 LearningRate 0.0184 Epoch: 11 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:21,327-Speed 5549.00 samples/sec Loss 5.2963 LearningRate 0.0184 Epoch: 11 Global Step: 57790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:23,177-Speed 5536.77 samples/sec Loss 5.3221 LearningRate 0.0184 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:25,047-Speed 5484.30 samples/sec Loss 5.2341 LearningRate 0.0184 Epoch: 11 Global Step: 57810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:26,883-Speed 5578.40 samples/sec Loss 5.1913 LearningRate 0.0184 Epoch: 11 Global Step: 57820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:28,728-Speed 5552.98 samples/sec Loss 5.2734 LearningRate 0.0183 Epoch: 11 Global Step: 57830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:30,577-Speed 5539.57 samples/sec Loss 5.4204 LearningRate 0.0183 Epoch: 11 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 13:58:32,411-Speed 5586.91 samples/sec Loss 5.2244 LearningRate 0.0183 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:34,262-Speed 5535.55 samples/sec Loss 5.1822 LearningRate 0.0183 Epoch: 11 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:36,115-Speed 5528.98 samples/sec Loss 5.2571 LearningRate 0.0183 Epoch: 11 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:37,981-Speed 5489.11 samples/sec Loss 5.2627 LearningRate 0.0183 Epoch: 11 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:39,841-Speed 5508.97 samples/sec Loss 5.2782 LearningRate 0.0183 Epoch: 11 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:41,702-Speed 5506.10 samples/sec Loss 5.3433 LearningRate 0.0183 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 13:58:43,549-Speed 5546.03 samples/sec Loss 5.2234 LearningRate 0.0183 Epoch: 11 Global Step: 57910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:45,396-Speed 5546.12 samples/sec Loss 5.2581 LearningRate 0.0183 Epoch: 11 Global Step: 57920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:47,253-Speed 5518.41 samples/sec Loss 5.3636 LearningRate 0.0183 Epoch: 11 Global Step: 57930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:49,111-Speed 5513.51 samples/sec Loss 5.3622 LearningRate 0.0183 Epoch: 11 Global Step: 57940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:50,957-Speed 5550.68 samples/sec Loss 5.2782 LearningRate 0.0182 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:52,829-Speed 5474.04 samples/sec Loss 5.3166 LearningRate 0.0182 Epoch: 11 Global Step: 57960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:54,680-Speed 5534.61 samples/sec Loss 5.3564 LearningRate 0.0182 Epoch: 11 Global Step: 57970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:56,537-Speed 5518.89 samples/sec Loss 5.2957 LearningRate 0.0182 Epoch: 11 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:58:58,384-Speed 5544.43 samples/sec Loss 5.3294 LearningRate 0.0182 Epoch: 11 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 13:59:00,217-Speed 5590.53 samples/sec Loss 5.3350 LearningRate 0.0182 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 13:59:27,638-[lfw][58000]XNorm: 22.861997 Training: 2022-04-11 13:59:27,639-[lfw][58000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 13:59:27,639-[lfw][58000]Accuracy-Highest: 0.99817 Training: 2022-04-11 13:59:59,210-[cfp_fp][58000]XNorm: 20.234653 Training: 2022-04-11 13:59:59,211-[cfp_fp][58000]Accuracy-Flip: 0.97371+-0.00721 Training: 2022-04-11 13:59:59,212-[cfp_fp][58000]Accuracy-Highest: 0.97371 Training: 2022-04-11 14:00:26,480-[agedb_30][58000]XNorm: 22.841101 Training: 2022-04-11 14:00:26,481-[agedb_30][58000]Accuracy-Flip: 0.97767+-0.00659 Training: 2022-04-11 14:00:26,482-[agedb_30][58000]Accuracy-Highest: 0.97817 Training: 2022-04-11 14:00:28,331-Speed 116.22 samples/sec Loss 5.2523 LearningRate 0.0182 Epoch: 11 Global Step: 58010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:30,171-Speed 5566.63 samples/sec Loss 5.4575 LearningRate 0.0182 Epoch: 11 Global Step: 58020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:32,000-Speed 5602.33 samples/sec Loss 5.3829 LearningRate 0.0182 Epoch: 11 Global Step: 58030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:33,845-Speed 5551.99 samples/sec Loss 5.2061 LearningRate 0.0182 Epoch: 11 Global Step: 58040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:35,671-Speed 5612.80 samples/sec Loss 5.2360 LearningRate 0.0182 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:37,541-Speed 5479.79 samples/sec Loss 5.3561 LearningRate 0.0182 Epoch: 11 Global Step: 58060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:39,375-Speed 5585.12 samples/sec Loss 5.3345 LearningRate 0.0181 Epoch: 11 Global Step: 58070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:41,223-Speed 5545.84 samples/sec Loss 5.2834 LearningRate 0.0181 Epoch: 11 Global Step: 58080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:43,061-Speed 5572.90 samples/sec Loss 5.4563 LearningRate 0.0181 Epoch: 11 Global Step: 58090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:00:44,893-Speed 5593.30 samples/sec Loss 5.1529 LearningRate 0.0181 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:46,731-Speed 5573.67 samples/sec Loss 5.2414 LearningRate 0.0181 Epoch: 11 Global Step: 58110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:48,620-Speed 5425.97 samples/sec Loss 5.2433 LearningRate 0.0181 Epoch: 11 Global Step: 58120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:50,458-Speed 5572.46 samples/sec Loss 5.4309 LearningRate 0.0181 Epoch: 11 Global Step: 58130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:52,322-Speed 5498.06 samples/sec Loss 5.2795 LearningRate 0.0181 Epoch: 11 Global Step: 58140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:54,158-Speed 5580.79 samples/sec Loss 5.2789 LearningRate 0.0181 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:56,022-Speed 5493.67 samples/sec Loss 5.2987 LearningRate 0.0181 Epoch: 11 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:57,854-Speed 5594.31 samples/sec Loss 5.2817 LearningRate 0.0181 Epoch: 11 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:00:59,715-Speed 5506.28 samples/sec Loss 5.2079 LearningRate 0.0181 Epoch: 11 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:01,581-Speed 5491.56 samples/sec Loss 5.2639 LearningRate 0.0180 Epoch: 11 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:03,453-Speed 5471.52 samples/sec Loss 5.3517 LearningRate 0.0180 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:05,312-Speed 5512.68 samples/sec Loss 5.2969 LearningRate 0.0180 Epoch: 11 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:07,151-Speed 5570.61 samples/sec Loss 5.1033 LearningRate 0.0180 Epoch: 11 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:09,012-Speed 5506.85 samples/sec Loss 5.2108 LearningRate 0.0180 Epoch: 11 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:10,894-Speed 5445.16 samples/sec Loss 5.1943 LearningRate 0.0180 Epoch: 11 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:12,738-Speed 5558.75 samples/sec Loss 5.2037 LearningRate 0.0180 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:14,603-Speed 5493.67 samples/sec Loss 5.2664 LearningRate 0.0180 Epoch: 11 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:16,442-Speed 5573.33 samples/sec Loss 5.3228 LearningRate 0.0180 Epoch: 11 Global Step: 58270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:18,319-Speed 5458.03 samples/sec Loss 5.3406 LearningRate 0.0180 Epoch: 11 Global Step: 58280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:20,161-Speed 5563.29 samples/sec Loss 5.2387 LearningRate 0.0180 Epoch: 11 Global Step: 58290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:22,025-Speed 5495.70 samples/sec Loss 5.3063 LearningRate 0.0180 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:23,868-Speed 5560.49 samples/sec Loss 5.1461 LearningRate 0.0179 Epoch: 11 Global Step: 58310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:25,727-Speed 5509.19 samples/sec Loss 5.2855 LearningRate 0.0179 Epoch: 11 Global Step: 58320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:27,578-Speed 5537.03 samples/sec Loss 5.3131 LearningRate 0.0179 Epoch: 11 Global Step: 58330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:29,461-Speed 5439.96 samples/sec Loss 5.1686 LearningRate 0.0179 Epoch: 11 Global Step: 58340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:31,305-Speed 5554.59 samples/sec Loss 5.3719 LearningRate 0.0179 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:33,173-Speed 5486.84 samples/sec Loss 5.1635 LearningRate 0.0179 Epoch: 11 Global Step: 58360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:35,014-Speed 5564.86 samples/sec Loss 5.3242 LearningRate 0.0179 Epoch: 11 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:36,870-Speed 5521.84 samples/sec Loss 5.3519 LearningRate 0.0179 Epoch: 11 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:38,726-Speed 5519.61 samples/sec Loss 5.1801 LearningRate 0.0179 Epoch: 11 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:40,582-Speed 5520.09 samples/sec Loss 5.3671 LearningRate 0.0179 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:42,433-Speed 5534.53 samples/sec Loss 5.3789 LearningRate 0.0179 Epoch: 11 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:01:44,283-Speed 5539.11 samples/sec Loss 5.1224 LearningRate 0.0179 Epoch: 11 Global Step: 58420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:46,116-Speed 5588.76 samples/sec Loss 5.1608 LearningRate 0.0178 Epoch: 11 Global Step: 58430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:47,994-Speed 5455.88 samples/sec Loss 5.3382 LearningRate 0.0178 Epoch: 11 Global Step: 58440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:49,842-Speed 5544.98 samples/sec Loss 5.3036 LearningRate 0.0178 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:51,703-Speed 5509.71 samples/sec Loss 5.2943 LearningRate 0.0178 Epoch: 11 Global Step: 58460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:53,570-Speed 5486.51 samples/sec Loss 5.3046 LearningRate 0.0178 Epoch: 11 Global Step: 58470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:01:55,406-Speed 5582.12 samples/sec Loss 5.1696 LearningRate 0.0178 Epoch: 11 Global Step: 58480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:01:57,270-Speed 5495.38 samples/sec Loss 5.1854 LearningRate 0.0178 Epoch: 11 Global Step: 58490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:01:59,124-Speed 5527.77 samples/sec Loss 5.2803 LearningRate 0.0178 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:01,002-Speed 5455.74 samples/sec Loss 5.3543 LearningRate 0.0178 Epoch: 11 Global Step: 58510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:02,851-Speed 5541.29 samples/sec Loss 5.3380 LearningRate 0.0178 Epoch: 11 Global Step: 58520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:04,709-Speed 5513.21 samples/sec Loss 5.1811 LearningRate 0.0178 Epoch: 11 Global Step: 58530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:06,546-Speed 5576.94 samples/sec Loss 5.2931 LearningRate 0.0178 Epoch: 11 Global Step: 58540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:08,404-Speed 5515.57 samples/sec Loss 5.2169 LearningRate 0.0177 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:10,246-Speed 5561.40 samples/sec Loss 5.4143 LearningRate 0.0177 Epoch: 11 Global Step: 58560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:12,109-Speed 5501.37 samples/sec Loss 5.3068 LearningRate 0.0177 Epoch: 11 Global Step: 58570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:02:13,975-Speed 5490.66 samples/sec Loss 5.4858 LearningRate 0.0177 Epoch: 11 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:15,814-Speed 5568.21 samples/sec Loss 5.2511 LearningRate 0.0177 Epoch: 11 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:17,653-Speed 5574.27 samples/sec Loss 5.0967 LearningRate 0.0177 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:19,517-Speed 5494.86 samples/sec Loss 5.3011 LearningRate 0.0177 Epoch: 11 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:21,359-Speed 5562.45 samples/sec Loss 5.2258 LearningRate 0.0177 Epoch: 11 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:23,215-Speed 5520.62 samples/sec Loss 5.3039 LearningRate 0.0177 Epoch: 11 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:25,102-Speed 5428.38 samples/sec Loss 5.2485 LearningRate 0.0177 Epoch: 11 Global Step: 58640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:26,940-Speed 5574.96 samples/sec Loss 5.4165 LearningRate 0.0177 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:28,792-Speed 5531.69 samples/sec Loss 5.3023 LearningRate 0.0177 Epoch: 11 Global Step: 58660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:30,637-Speed 5552.80 samples/sec Loss 5.3006 LearningRate 0.0176 Epoch: 11 Global Step: 58670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:32,507-Speed 5481.56 samples/sec Loss 5.2382 LearningRate 0.0176 Epoch: 11 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:02:34,341-Speed 5585.62 samples/sec Loss 5.3512 LearningRate 0.0176 Epoch: 11 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:02:36,218-Speed 5459.53 samples/sec Loss 5.2825 LearningRate 0.0176 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:02:38,067-Speed 5542.75 samples/sec Loss 5.2588 LearningRate 0.0176 Epoch: 11 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:02:39,917-Speed 5536.50 samples/sec Loss 5.3700 LearningRate 0.0176 Epoch: 11 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:02:41,767-Speed 5538.24 samples/sec Loss 5.1588 LearningRate 0.0176 Epoch: 11 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:43,607-Speed 5567.58 samples/sec Loss 5.2051 LearningRate 0.0176 Epoch: 11 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:45,445-Speed 5573.75 samples/sec Loss 5.1018 LearningRate 0.0176 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:47,310-Speed 5494.97 samples/sec Loss 5.2005 LearningRate 0.0176 Epoch: 11 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:49,154-Speed 5556.29 samples/sec Loss 5.2609 LearningRate 0.0176 Epoch: 11 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:50,986-Speed 5592.27 samples/sec Loss 5.1655 LearningRate 0.0176 Epoch: 11 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:52,855-Speed 5480.24 samples/sec Loss 5.2600 LearningRate 0.0175 Epoch: 11 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:54,720-Speed 5492.00 samples/sec Loss 5.2698 LearningRate 0.0175 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:56,570-Speed 5539.43 samples/sec Loss 5.1911 LearningRate 0.0175 Epoch: 11 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:02:58,437-Speed 5488.74 samples/sec Loss 5.3378 LearningRate 0.0175 Epoch: 11 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:00,268-Speed 5591.59 samples/sec Loss 5.2018 LearningRate 0.0175 Epoch: 11 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:02,111-Speed 5561.16 samples/sec Loss 5.2082 LearningRate 0.0175 Epoch: 11 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:03,950-Speed 5572.35 samples/sec Loss 5.1519 LearningRate 0.0175 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:05,798-Speed 5544.48 samples/sec Loss 5.3206 LearningRate 0.0175 Epoch: 11 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:07,653-Speed 5524.42 samples/sec Loss 5.1951 LearningRate 0.0175 Epoch: 11 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:09,485-Speed 5589.32 samples/sec Loss 5.1549 LearningRate 0.0175 Epoch: 11 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:11,365-Speed 5449.32 samples/sec Loss 5.3242 LearningRate 0.0175 Epoch: 11 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:13,204-Speed 5573.83 samples/sec Loss 5.1915 LearningRate 0.0175 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:15,075-Speed 5475.76 samples/sec Loss 5.2449 LearningRate 0.0174 Epoch: 11 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:16,916-Speed 5566.87 samples/sec Loss 5.2697 LearningRate 0.0174 Epoch: 11 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:18,764-Speed 5540.40 samples/sec Loss 5.1163 LearningRate 0.0174 Epoch: 11 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:20,613-Speed 5542.01 samples/sec Loss 5.3396 LearningRate 0.0174 Epoch: 11 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:22,471-Speed 5516.01 samples/sec Loss 5.1259 LearningRate 0.0174 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:24,316-Speed 5552.42 samples/sec Loss 5.2247 LearningRate 0.0174 Epoch: 11 Global Step: 58960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:26,151-Speed 5583.11 samples/sec Loss 5.1877 LearningRate 0.0174 Epoch: 11 Global Step: 58970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:28,010-Speed 5512.96 samples/sec Loss 5.1383 LearningRate 0.0174 Epoch: 11 Global Step: 58980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:29,843-Speed 5587.51 samples/sec Loss 5.1005 LearningRate 0.0174 Epoch: 11 Global Step: 58990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:31,682-Speed 5571.49 samples/sec Loss 5.3231 LearningRate 0.0174 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:33,544-Speed 5504.15 samples/sec Loss 5.2541 LearningRate 0.0174 Epoch: 11 Global Step: 59010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:35,383-Speed 5570.16 samples/sec Loss 5.2841 LearningRate 0.0174 Epoch: 11 Global Step: 59020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:37,238-Speed 5526.15 samples/sec Loss 5.2703 LearningRate 0.0173 Epoch: 11 Global Step: 59030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:39,096-Speed 5514.45 samples/sec Loss 5.2325 LearningRate 0.0173 Epoch: 11 Global Step: 59040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:40,967-Speed 5476.86 samples/sec Loss 5.3050 LearningRate 0.0173 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:42,812-Speed 5553.54 samples/sec Loss 5.1469 LearningRate 0.0173 Epoch: 11 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:03:44,663-Speed 5535.57 samples/sec Loss 5.4270 LearningRate 0.0173 Epoch: 11 Global Step: 59070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:46,538-Speed 5465.66 samples/sec Loss 5.3491 LearningRate 0.0173 Epoch: 11 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:48,414-Speed 5461.11 samples/sec Loss 5.0896 LearningRate 0.0173 Epoch: 11 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:50,311-Speed 5402.56 samples/sec Loss 5.2485 LearningRate 0.0173 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:52,170-Speed 5511.08 samples/sec Loss 5.2595 LearningRate 0.0173 Epoch: 11 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:54,036-Speed 5492.12 samples/sec Loss 5.3145 LearningRate 0.0173 Epoch: 11 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:55,884-Speed 5543.01 samples/sec Loss 5.0376 LearningRate 0.0173 Epoch: 11 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:57,748-Speed 5500.32 samples/sec Loss 5.2571 LearningRate 0.0173 Epoch: 11 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:03:59,607-Speed 5510.68 samples/sec Loss 5.1882 LearningRate 0.0172 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:01,446-Speed 5571.31 samples/sec Loss 5.1662 LearningRate 0.0172 Epoch: 11 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:03,333-Speed 5429.43 samples/sec Loss 5.2233 LearningRate 0.0172 Epoch: 11 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:05,179-Speed 5552.50 samples/sec Loss 5.2324 LearningRate 0.0172 Epoch: 11 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:07,082-Speed 5384.62 samples/sec Loss 5.1578 LearningRate 0.0172 Epoch: 11 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:08,913-Speed 5595.77 samples/sec Loss 5.2627 LearningRate 0.0172 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:10,777-Speed 5495.64 samples/sec Loss 5.1264 LearningRate 0.0172 Epoch: 11 Global Step: 59210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:12,699-Speed 5332.54 samples/sec Loss 5.2335 LearningRate 0.0172 Epoch: 11 Global Step: 59220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:14,563-Speed 5498.26 samples/sec Loss 5.2875 LearningRate 0.0172 Epoch: 11 Global Step: 59230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:16,400-Speed 5576.16 samples/sec Loss 5.2110 LearningRate 0.0172 Epoch: 11 Global Step: 59240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:18,241-Speed 5568.52 samples/sec Loss 5.2516 LearningRate 0.0172 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:20,077-Speed 5579.95 samples/sec Loss 5.3318 LearningRate 0.0172 Epoch: 11 Global Step: 59260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:21,947-Speed 5478.61 samples/sec Loss 5.2702 LearningRate 0.0171 Epoch: 11 Global Step: 59270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:23,793-Speed 5589.03 samples/sec Loss 5.2822 LearningRate 0.0171 Epoch: 11 Global Step: 59280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:25,663-Speed 5478.48 samples/sec Loss 5.1617 LearningRate 0.0171 Epoch: 11 Global Step: 59290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:27,511-Speed 5546.48 samples/sec Loss 5.3365 LearningRate 0.0171 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:29,357-Speed 5548.33 samples/sec Loss 5.2786 LearningRate 0.0171 Epoch: 11 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:31,236-Speed 5451.09 samples/sec Loss 5.3341 LearningRate 0.0171 Epoch: 11 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:33,073-Speed 5579.47 samples/sec Loss 5.3778 LearningRate 0.0171 Epoch: 11 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:34,918-Speed 5550.56 samples/sec Loss 5.1893 LearningRate 0.0171 Epoch: 11 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:36,778-Speed 5508.26 samples/sec Loss 5.2022 LearningRate 0.0171 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:38,614-Speed 5579.83 samples/sec Loss 5.2556 LearningRate 0.0171 Epoch: 11 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:40,478-Speed 5497.37 samples/sec Loss 5.3498 LearningRate 0.0171 Epoch: 11 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:42,321-Speed 5557.97 samples/sec Loss 5.2044 LearningRate 0.0171 Epoch: 11 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:44,177-Speed 5521.61 samples/sec Loss 5.3228 LearningRate 0.0170 Epoch: 11 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:04:46,014-Speed 5576.59 samples/sec Loss 5.1072 LearningRate 0.0170 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 14:04:47,835-Speed 5626.87 samples/sec Loss 5.1549 LearningRate 0.0170 Epoch: 11 Global Step: 59410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:49,681-Speed 5547.19 samples/sec Loss 5.1600 LearningRate 0.0170 Epoch: 11 Global Step: 59420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:51,529-Speed 5545.00 samples/sec Loss 5.2027 LearningRate 0.0170 Epoch: 11 Global Step: 59430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:53,420-Speed 5419.70 samples/sec Loss 5.1868 LearningRate 0.0170 Epoch: 11 Global Step: 59440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:55,259-Speed 5570.22 samples/sec Loss 5.1690 LearningRate 0.0170 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:57,120-Speed 5507.64 samples/sec Loss 5.3898 LearningRate 0.0170 Epoch: 11 Global Step: 59460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:04:58,967-Speed 5546.12 samples/sec Loss 5.0968 LearningRate 0.0170 Epoch: 11 Global Step: 59470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:00,886-Speed 5339.95 samples/sec Loss 5.3073 LearningRate 0.0170 Epoch: 11 Global Step: 59480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:02,732-Speed 5550.10 samples/sec Loss 5.0015 LearningRate 0.0170 Epoch: 11 Global Step: 59490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:04,634-Speed 5385.71 samples/sec Loss 5.1964 LearningRate 0.0170 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:06,513-Speed 5451.56 samples/sec Loss 5.0861 LearningRate 0.0170 Epoch: 11 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:08,352-Speed 5572.49 samples/sec Loss 5.2467 LearningRate 0.0169 Epoch: 11 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:10,212-Speed 5510.46 samples/sec Loss 5.1481 LearningRate 0.0169 Epoch: 11 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:12,071-Speed 5512.28 samples/sec Loss 5.1641 LearningRate 0.0169 Epoch: 11 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:13,947-Speed 5460.87 samples/sec Loss 5.2495 LearningRate 0.0169 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:15,804-Speed 5517.50 samples/sec Loss 5.2998 LearningRate 0.0169 Epoch: 11 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:17,651-Speed 5546.59 samples/sec Loss 5.1997 LearningRate 0.0169 Epoch: 11 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:19,537-Speed 5432.88 samples/sec Loss 5.2982 LearningRate 0.0169 Epoch: 11 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:21,382-Speed 5555.64 samples/sec Loss 5.1070 LearningRate 0.0169 Epoch: 11 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:23,248-Speed 5493.25 samples/sec Loss 5.2454 LearningRate 0.0169 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:25,080-Speed 5589.31 samples/sec Loss 5.1889 LearningRate 0.0169 Epoch: 11 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:26,943-Speed 5502.73 samples/sec Loss 5.1083 LearningRate 0.0169 Epoch: 11 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:28,778-Speed 5584.62 samples/sec Loss 5.1036 LearningRate 0.0169 Epoch: 11 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:30,632-Speed 5524.20 samples/sec Loss 5.2134 LearningRate 0.0168 Epoch: 11 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:32,474-Speed 5562.04 samples/sec Loss 5.2660 LearningRate 0.0168 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:34,327-Speed 5527.21 samples/sec Loss 5.1514 LearningRate 0.0168 Epoch: 11 Global Step: 59660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:36,173-Speed 5552.96 samples/sec Loss 5.3745 LearningRate 0.0168 Epoch: 11 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:38,026-Speed 5529.02 samples/sec Loss 5.1265 LearningRate 0.0168 Epoch: 11 Global Step: 59680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:05:39,872-Speed 5549.51 samples/sec Loss 5.1653 LearningRate 0.0168 Epoch: 11 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:41,730-Speed 5514.59 samples/sec Loss 5.3313 LearningRate 0.0168 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:43,573-Speed 5557.60 samples/sec Loss 5.1034 LearningRate 0.0168 Epoch: 11 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:45,439-Speed 5493.43 samples/sec Loss 5.0250 LearningRate 0.0168 Epoch: 11 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:47,274-Speed 5583.11 samples/sec Loss 5.1642 LearningRate 0.0168 Epoch: 11 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:49,147-Speed 5469.75 samples/sec Loss 5.2001 LearningRate 0.0168 Epoch: 11 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:50,993-Speed 5549.19 samples/sec Loss 5.2676 LearningRate 0.0168 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:52,880-Speed 5432.35 samples/sec Loss 5.1251 LearningRate 0.0167 Epoch: 11 Global Step: 59760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:54,718-Speed 5573.51 samples/sec Loss 5.0927 LearningRate 0.0167 Epoch: 11 Global Step: 59770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:56,573-Speed 5521.58 samples/sec Loss 5.0700 LearningRate 0.0167 Epoch: 11 Global Step: 59780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:05:58,425-Speed 5533.93 samples/sec Loss 5.3023 LearningRate 0.0167 Epoch: 11 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:00,291-Speed 5491.30 samples/sec Loss 5.1946 LearningRate 0.0167 Epoch: 11 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:02,135-Speed 5554.71 samples/sec Loss 5.2117 LearningRate 0.0167 Epoch: 11 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:04,017-Speed 5444.16 samples/sec Loss 5.1140 LearningRate 0.0167 Epoch: 11 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:05,885-Speed 5484.14 samples/sec Loss 5.1079 LearningRate 0.0167 Epoch: 11 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:07,727-Speed 5562.55 samples/sec Loss 5.2891 LearningRate 0.0167 Epoch: 11 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:09,594-Speed 5485.48 samples/sec Loss 5.0879 LearningRate 0.0167 Epoch: 11 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:11,442-Speed 5544.98 samples/sec Loss 5.0774 LearningRate 0.0167 Epoch: 11 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:13,318-Speed 5462.03 samples/sec Loss 5.1108 LearningRate 0.0167 Epoch: 11 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:15,197-Speed 5452.87 samples/sec Loss 5.1542 LearningRate 0.0167 Epoch: 11 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:17,066-Speed 5482.20 samples/sec Loss 5.1566 LearningRate 0.0166 Epoch: 11 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:18,932-Speed 5491.21 samples/sec Loss 5.2892 LearningRate 0.0166 Epoch: 11 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:20,790-Speed 5514.11 samples/sec Loss 5.2381 LearningRate 0.0166 Epoch: 11 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:22,650-Speed 5509.00 samples/sec Loss 5.2525 LearningRate 0.0166 Epoch: 11 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:24,547-Speed 5399.15 samples/sec Loss 5.0366 LearningRate 0.0166 Epoch: 11 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:26,494-Speed 5262.00 samples/sec Loss 5.1894 LearningRate 0.0166 Epoch: 11 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:28,343-Speed 5542.37 samples/sec Loss 5.2244 LearningRate 0.0166 Epoch: 11 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:30,210-Speed 5486.37 samples/sec Loss 5.3106 LearningRate 0.0166 Epoch: 11 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:32,060-Speed 5541.59 samples/sec Loss 5.2435 LearningRate 0.0166 Epoch: 11 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:33,922-Speed 5501.95 samples/sec Loss 5.2576 LearningRate 0.0166 Epoch: 11 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:35,770-Speed 5542.35 samples/sec Loss 5.3226 LearningRate 0.0166 Epoch: 11 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:06:37,628-Speed 5516.15 samples/sec Loss 5.1387 LearningRate 0.0166 Epoch: 11 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:07:04,938-[lfw][60000]XNorm: 22.662538 Training: 2022-04-11 14:07:04,939-[lfw][60000]Accuracy-Flip: 0.99733+-0.00238 Training: 2022-04-11 14:07:04,940-[lfw][60000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:07:36,483-[cfp_fp][60000]XNorm: 20.125682 Training: 2022-04-11 14:07:36,484-[cfp_fp][60000]Accuracy-Flip: 0.97443+-0.00692 Training: 2022-04-11 14:07:36,485-[cfp_fp][60000]Accuracy-Highest: 0.97443 Training: 2022-04-11 14:08:03,752-[agedb_30][60000]XNorm: 22.528334 Training: 2022-04-11 14:08:03,753-[agedb_30][60000]Accuracy-Flip: 0.97867+-0.00748 Training: 2022-04-11 14:08:03,753-[agedb_30][60000]Accuracy-Highest: 0.97867 Training: 2022-04-11 14:08:05,623-Speed 116.37 samples/sec Loss 5.1254 LearningRate 0.0165 Epoch: 11 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:07,479-Speed 5518.25 samples/sec Loss 5.2929 LearningRate 0.0165 Epoch: 11 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:09,308-Speed 5603.49 samples/sec Loss 5.1819 LearningRate 0.0165 Epoch: 11 Global Step: 60030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:11,154-Speed 5548.97 samples/sec Loss 5.1316 LearningRate 0.0165 Epoch: 11 Global Step: 60040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:13,011-Speed 5519.15 samples/sec Loss 5.1836 LearningRate 0.0165 Epoch: 11 Global Step: 60050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:14,843-Speed 5591.14 samples/sec Loss 5.1787 LearningRate 0.0165 Epoch: 11 Global Step: 60060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:16,685-Speed 5561.93 samples/sec Loss 5.1325 LearningRate 0.0165 Epoch: 11 Global Step: 60070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:18,524-Speed 5569.71 samples/sec Loss 5.2634 LearningRate 0.0165 Epoch: 11 Global Step: 60080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:20,376-Speed 5534.15 samples/sec Loss 5.1093 LearningRate 0.0165 Epoch: 11 Global Step: 60090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:22,210-Speed 5586.14 samples/sec Loss 5.0939 LearningRate 0.0165 Epoch: 11 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:08:24,059-Speed 5539.65 samples/sec Loss 5.4828 LearningRate 0.0165 Epoch: 11 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:08:25,892-Speed 5589.59 samples/sec Loss 5.1568 LearningRate 0.0165 Epoch: 11 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:27,740-Speed 5543.04 samples/sec Loss 5.2697 LearningRate 0.0165 Epoch: 11 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:29,592-Speed 5533.49 samples/sec Loss 5.2542 LearningRate 0.0164 Epoch: 11 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:31,463-Speed 5474.96 samples/sec Loss 5.1760 LearningRate 0.0164 Epoch: 11 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:33,304-Speed 5567.25 samples/sec Loss 5.2566 LearningRate 0.0164 Epoch: 11 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:35,149-Speed 5552.40 samples/sec Loss 5.0148 LearningRate 0.0164 Epoch: 11 Global Step: 60170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:36,997-Speed 5546.69 samples/sec Loss 5.2721 LearningRate 0.0164 Epoch: 11 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:38,837-Speed 5570.42 samples/sec Loss 5.0658 LearningRate 0.0164 Epoch: 11 Global Step: 60190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:40,701-Speed 5494.15 samples/sec Loss 4.9649 LearningRate 0.0164 Epoch: 11 Global Step: 60200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:42,545-Speed 5556.65 samples/sec Loss 5.0809 LearningRate 0.0164 Epoch: 11 Global Step: 60210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:44,375-Speed 5600.24 samples/sec Loss 5.1172 LearningRate 0.0164 Epoch: 11 Global Step: 60220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:46,212-Speed 5577.85 samples/sec Loss 5.1726 LearningRate 0.0164 Epoch: 11 Global Step: 60230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:48,095-Speed 5442.73 samples/sec Loss 5.1660 LearningRate 0.0164 Epoch: 11 Global Step: 60240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:49,946-Speed 5534.92 samples/sec Loss 5.1456 LearningRate 0.0164 Epoch: 11 Global Step: 60250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:51,802-Speed 5520.83 samples/sec Loss 4.9987 LearningRate 0.0163 Epoch: 11 Global Step: 60260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:53,654-Speed 5531.51 samples/sec Loss 5.1519 LearningRate 0.0163 Epoch: 11 Global Step: 60270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:55,589-Speed 5293.47 samples/sec Loss 5.1364 LearningRate 0.0163 Epoch: 11 Global Step: 60280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:57,445-Speed 5521.12 samples/sec Loss 5.2884 LearningRate 0.0163 Epoch: 11 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:08:59,281-Speed 5581.76 samples/sec Loss 5.1469 LearningRate 0.0163 Epoch: 11 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:09:01,142-Speed 5502.74 samples/sec Loss 5.1199 LearningRate 0.0163 Epoch: 11 Global Step: 60310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:09:03,026-Speed 5438.62 samples/sec Loss 5.2621 LearningRate 0.0163 Epoch: 11 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:04,903-Speed 5460.91 samples/sec Loss 5.2207 LearningRate 0.0163 Epoch: 11 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:06,762-Speed 5511.09 samples/sec Loss 5.1756 LearningRate 0.0163 Epoch: 11 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:08,630-Speed 5484.05 samples/sec Loss 5.1756 LearningRate 0.0163 Epoch: 11 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:10,475-Speed 5554.24 samples/sec Loss 5.0935 LearningRate 0.0163 Epoch: 11 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:12,324-Speed 5539.20 samples/sec Loss 5.2411 LearningRate 0.0163 Epoch: 11 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:14,169-Speed 5553.71 samples/sec Loss 5.1947 LearningRate 0.0163 Epoch: 11 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:16,039-Speed 5479.25 samples/sec Loss 5.1366 LearningRate 0.0162 Epoch: 11 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:17,877-Speed 5575.08 samples/sec Loss 5.1070 LearningRate 0.0162 Epoch: 11 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:19,742-Speed 5493.83 samples/sec Loss 5.2671 LearningRate 0.0162 Epoch: 11 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:21,588-Speed 5550.40 samples/sec Loss 5.1572 LearningRate 0.0162 Epoch: 11 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 14:09:23,418-Speed 5598.69 samples/sec Loss 5.2894 LearningRate 0.0162 Epoch: 11 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:25,268-Speed 5539.68 samples/sec Loss 5.1785 LearningRate 0.0162 Epoch: 11 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:27,117-Speed 5540.36 samples/sec Loss 5.1572 LearningRate 0.0162 Epoch: 11 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:28,958-Speed 5564.66 samples/sec Loss 5.2314 LearningRate 0.0162 Epoch: 11 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:30,797-Speed 5568.59 samples/sec Loss 5.1879 LearningRate 0.0162 Epoch: 11 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:32,665-Speed 5484.41 samples/sec Loss 5.1563 LearningRate 0.0162 Epoch: 11 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:34,511-Speed 5553.67 samples/sec Loss 4.9895 LearningRate 0.0162 Epoch: 11 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:36,370-Speed 5508.92 samples/sec Loss 5.1713 LearningRate 0.0162 Epoch: 11 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:38,224-Speed 5526.55 samples/sec Loss 5.0806 LearningRate 0.0161 Epoch: 11 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:40,071-Speed 5548.84 samples/sec Loss 5.1835 LearningRate 0.0161 Epoch: 11 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:41,907-Speed 5577.69 samples/sec Loss 5.0365 LearningRate 0.0161 Epoch: 11 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:43,773-Speed 5492.88 samples/sec Loss 5.0851 LearningRate 0.0161 Epoch: 11 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:45,624-Speed 5533.81 samples/sec Loss 5.1250 LearningRate 0.0161 Epoch: 11 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:47,478-Speed 5527.02 samples/sec Loss 5.1297 LearningRate 0.0161 Epoch: 11 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:49,364-Speed 5432.45 samples/sec Loss 5.1729 LearningRate 0.0161 Epoch: 11 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:51,234-Speed 5477.75 samples/sec Loss 5.1520 LearningRate 0.0161 Epoch: 11 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:53,099-Speed 5493.39 samples/sec Loss 5.2921 LearningRate 0.0161 Epoch: 11 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:54,970-Speed 5477.69 samples/sec Loss 5.1000 LearningRate 0.0161 Epoch: 11 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:56,819-Speed 5540.53 samples/sec Loss 5.1472 LearningRate 0.0161 Epoch: 11 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:09:58,666-Speed 5545.60 samples/sec Loss 5.1389 LearningRate 0.0161 Epoch: 11 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:10:00,522-Speed 5522.71 samples/sec Loss 5.2430 LearningRate 0.0161 Epoch: 11 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:10:02,355-Speed 5589.08 samples/sec Loss 5.2157 LearningRate 0.0160 Epoch: 11 Global Step: 60640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:04,214-Speed 5511.23 samples/sec Loss 5.1279 LearningRate 0.0160 Epoch: 11 Global Step: 60650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:06,060-Speed 5550.02 samples/sec Loss 5.3132 LearningRate 0.0160 Epoch: 11 Global Step: 60660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:07,914-Speed 5525.43 samples/sec Loss 5.1244 LearningRate 0.0160 Epoch: 11 Global Step: 60670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:09,750-Speed 5581.76 samples/sec Loss 5.1396 LearningRate 0.0160 Epoch: 11 Global Step: 60680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:11,670-Speed 5334.91 samples/sec Loss 5.2076 LearningRate 0.0160 Epoch: 11 Global Step: 60690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:23,802-Speed 844.20 samples/sec Loss 4.7788 LearningRate 0.0160 Epoch: 12 Global Step: 60700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:25,706-Speed 5381.47 samples/sec Loss 4.1145 LearningRate 0.0160 Epoch: 12 Global Step: 60710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:27,640-Speed 5299.60 samples/sec Loss 4.2252 LearningRate 0.0160 Epoch: 12 Global Step: 60720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:29,524-Speed 5441.17 samples/sec Loss 4.1748 LearningRate 0.0160 Epoch: 12 Global Step: 60730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:31,452-Speed 5313.53 samples/sec Loss 4.2830 LearningRate 0.0160 Epoch: 12 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:10:33,679-Speed 4600.74 samples/sec Loss 4.2633 LearningRate 0.0160 Epoch: 12 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:10:35,521-Speed 5563.34 samples/sec Loss 4.1771 LearningRate 0.0159 Epoch: 12 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:37,437-Speed 5345.85 samples/sec Loss 4.3699 LearningRate 0.0159 Epoch: 12 Global Step: 60770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:39,282-Speed 5556.69 samples/sec Loss 4.2891 LearningRate 0.0159 Epoch: 12 Global Step: 60780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:41,145-Speed 5497.83 samples/sec Loss 4.3044 LearningRate 0.0159 Epoch: 12 Global Step: 60790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:42,994-Speed 5542.05 samples/sec Loss 4.4495 LearningRate 0.0159 Epoch: 12 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:44,870-Speed 5463.25 samples/sec Loss 4.2491 LearningRate 0.0159 Epoch: 12 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:46,706-Speed 5578.90 samples/sec Loss 4.3093 LearningRate 0.0159 Epoch: 12 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:48,563-Speed 5518.32 samples/sec Loss 4.3386 LearningRate 0.0159 Epoch: 12 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:50,421-Speed 5514.78 samples/sec Loss 4.3483 LearningRate 0.0159 Epoch: 12 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:10:52,269-Speed 5543.24 samples/sec Loss 4.4680 LearningRate 0.0159 Epoch: 12 Global Step: 60850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:10:54,157-Speed 5427.45 samples/sec Loss 4.3892 LearningRate 0.0159 Epoch: 12 Global Step: 60860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:10:55,997-Speed 5567.34 samples/sec Loss 4.2900 LearningRate 0.0159 Epoch: 12 Global Step: 60870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:10:57,889-Speed 5415.84 samples/sec Loss 4.3427 LearningRate 0.0159 Epoch: 12 Global Step: 60880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:10:59,734-Speed 5552.85 samples/sec Loss 4.2420 LearningRate 0.0158 Epoch: 12 Global Step: 60890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:01,619-Speed 5437.21 samples/sec Loss 4.3920 LearningRate 0.0158 Epoch: 12 Global Step: 60900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:03,486-Speed 5487.21 samples/sec Loss 4.4917 LearningRate 0.0158 Epoch: 12 Global Step: 60910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:05,345-Speed 5512.28 samples/sec Loss 4.2717 LearningRate 0.0158 Epoch: 12 Global Step: 60920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:07,197-Speed 5530.08 samples/sec Loss 4.3966 LearningRate 0.0158 Epoch: 12 Global Step: 60930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:09,111-Speed 5355.06 samples/sec Loss 4.3845 LearningRate 0.0158 Epoch: 12 Global Step: 60940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-11 14:11:10,973-Speed 5501.17 samples/sec Loss 4.4085 LearningRate 0.0158 Epoch: 12 Global Step: 60950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:12,837-Speed 5497.05 samples/sec Loss 4.3473 LearningRate 0.0158 Epoch: 12 Global Step: 60960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:14,687-Speed 5538.12 samples/sec Loss 4.3760 LearningRate 0.0158 Epoch: 12 Global Step: 60970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:16,548-Speed 5506.33 samples/sec Loss 4.5464 LearningRate 0.0158 Epoch: 12 Global Step: 60980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:18,425-Speed 5458.36 samples/sec Loss 4.3909 LearningRate 0.0158 Epoch: 12 Global Step: 60990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:20,278-Speed 5527.84 samples/sec Loss 4.3568 LearningRate 0.0158 Epoch: 12 Global Step: 61000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:22,143-Speed 5495.73 samples/sec Loss 4.5706 LearningRate 0.0158 Epoch: 12 Global Step: 61010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:24,002-Speed 5509.67 samples/sec Loss 4.4852 LearningRate 0.0157 Epoch: 12 Global Step: 61020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:25,890-Speed 5427.68 samples/sec Loss 4.4716 LearningRate 0.0157 Epoch: 12 Global Step: 61030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:27,731-Speed 5563.77 samples/sec Loss 4.4718 LearningRate 0.0157 Epoch: 12 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:29,596-Speed 5496.22 samples/sec Loss 4.4406 LearningRate 0.0157 Epoch: 12 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:31,443-Speed 5546.96 samples/sec Loss 4.3693 LearningRate 0.0157 Epoch: 12 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:33,315-Speed 5471.81 samples/sec Loss 4.5140 LearningRate 0.0157 Epoch: 12 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:35,171-Speed 5518.64 samples/sec Loss 4.4568 LearningRate 0.0157 Epoch: 12 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:37,029-Speed 5515.78 samples/sec Loss 4.5651 LearningRate 0.0157 Epoch: 12 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:38,872-Speed 5559.76 samples/sec Loss 4.5141 LearningRate 0.0157 Epoch: 12 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:40,776-Speed 5380.64 samples/sec Loss 4.3464 LearningRate 0.0157 Epoch: 12 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:42,638-Speed 5502.96 samples/sec Loss 4.4292 LearningRate 0.0157 Epoch: 12 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:44,563-Speed 5322.65 samples/sec Loss 4.5818 LearningRate 0.0157 Epoch: 12 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:46,412-Speed 5538.98 samples/sec Loss 4.4463 LearningRate 0.0157 Epoch: 12 Global Step: 61140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:48,300-Speed 5428.26 samples/sec Loss 4.4590 LearningRate 0.0156 Epoch: 12 Global Step: 61150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:50,164-Speed 5498.19 samples/sec Loss 4.4999 LearningRate 0.0156 Epoch: 12 Global Step: 61160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:52,041-Speed 5457.25 samples/sec Loss 4.5321 LearningRate 0.0156 Epoch: 12 Global Step: 61170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:53,902-Speed 5503.32 samples/sec Loss 4.5952 LearningRate 0.0156 Epoch: 12 Global Step: 61180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:11:55,804-Speed 5388.64 samples/sec Loss 4.5659 LearningRate 0.0156 Epoch: 12 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:57,645-Speed 5563.28 samples/sec Loss 4.6092 LearningRate 0.0156 Epoch: 12 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:11:59,521-Speed 5464.03 samples/sec Loss 4.3627 LearningRate 0.0156 Epoch: 12 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:01,396-Speed 5462.57 samples/sec Loss 4.6223 LearningRate 0.0156 Epoch: 12 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:03,261-Speed 5493.74 samples/sec Loss 4.4275 LearningRate 0.0156 Epoch: 12 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:05,120-Speed 5511.66 samples/sec Loss 4.3241 LearningRate 0.0156 Epoch: 12 Global Step: 61240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:06,997-Speed 5457.37 samples/sec Loss 4.5693 LearningRate 0.0156 Epoch: 12 Global Step: 61250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:08,839-Speed 5562.79 samples/sec Loss 4.3674 LearningRate 0.0156 Epoch: 12 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:10,725-Speed 5432.09 samples/sec Loss 4.5742 LearningRate 0.0155 Epoch: 12 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:12,567-Speed 5561.05 samples/sec Loss 4.6540 LearningRate 0.0155 Epoch: 12 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:14,421-Speed 5527.05 samples/sec Loss 4.3376 LearningRate 0.0155 Epoch: 12 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:16,327-Speed 5376.64 samples/sec Loss 4.6129 LearningRate 0.0155 Epoch: 12 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:18,183-Speed 5518.94 samples/sec Loss 4.3244 LearningRate 0.0155 Epoch: 12 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:20,063-Speed 5448.92 samples/sec Loss 4.5986 LearningRate 0.0155 Epoch: 12 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:21,934-Speed 5475.59 samples/sec Loss 4.4104 LearningRate 0.0155 Epoch: 12 Global Step: 61330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:23,811-Speed 5459.37 samples/sec Loss 4.6441 LearningRate 0.0155 Epoch: 12 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:25,661-Speed 5536.60 samples/sec Loss 4.4822 LearningRate 0.0155 Epoch: 12 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:27,509-Speed 5542.26 samples/sec Loss 4.6403 LearningRate 0.0155 Epoch: 12 Global Step: 61360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:29,378-Speed 5482.52 samples/sec Loss 4.6077 LearningRate 0.0155 Epoch: 12 Global Step: 61370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:31,218-Speed 5568.56 samples/sec Loss 4.6005 LearningRate 0.0155 Epoch: 12 Global Step: 61380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:33,062-Speed 5554.32 samples/sec Loss 4.6518 LearningRate 0.0155 Epoch: 12 Global Step: 61390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:34,914-Speed 5533.09 samples/sec Loss 4.7093 LearningRate 0.0154 Epoch: 12 Global Step: 61400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:36,788-Speed 5466.64 samples/sec Loss 4.6215 LearningRate 0.0154 Epoch: 12 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:38,650-Speed 5500.97 samples/sec Loss 4.4268 LearningRate 0.0154 Epoch: 12 Global Step: 61420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:40,552-Speed 5387.85 samples/sec Loss 4.4706 LearningRate 0.0154 Epoch: 12 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:42,410-Speed 5514.38 samples/sec Loss 4.7488 LearningRate 0.0154 Epoch: 12 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:44,313-Speed 5384.10 samples/sec Loss 4.6346 LearningRate 0.0154 Epoch: 12 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:46,166-Speed 5528.55 samples/sec Loss 4.6618 LearningRate 0.0154 Epoch: 12 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:48,028-Speed 5502.96 samples/sec Loss 4.5805 LearningRate 0.0154 Epoch: 12 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:49,871-Speed 5556.27 samples/sec Loss 4.5318 LearningRate 0.0154 Epoch: 12 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:51,767-Speed 5404.92 samples/sec Loss 4.5699 LearningRate 0.0154 Epoch: 12 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:12:53,621-Speed 5523.64 samples/sec Loss 4.5745 LearningRate 0.0154 Epoch: 12 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:55,470-Speed 5543.76 samples/sec Loss 4.7373 LearningRate 0.0154 Epoch: 12 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:57,310-Speed 5565.34 samples/sec Loss 4.6246 LearningRate 0.0154 Epoch: 12 Global Step: 61520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:12:59,144-Speed 5587.03 samples/sec Loss 4.6409 LearningRate 0.0153 Epoch: 12 Global Step: 61530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:01,044-Speed 5392.29 samples/sec Loss 4.6965 LearningRate 0.0153 Epoch: 12 Global Step: 61540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:02,880-Speed 5579.75 samples/sec Loss 4.6832 LearningRate 0.0153 Epoch: 12 Global Step: 61550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:04,725-Speed 5553.14 samples/sec Loss 4.5407 LearningRate 0.0153 Epoch: 12 Global Step: 61560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:06,624-Speed 5396.12 samples/sec Loss 4.5623 LearningRate 0.0153 Epoch: 12 Global Step: 61570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:08,465-Speed 5561.31 samples/sec Loss 4.6642 LearningRate 0.0153 Epoch: 12 Global Step: 61580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:10,330-Speed 5495.61 samples/sec Loss 4.6871 LearningRate 0.0153 Epoch: 12 Global Step: 61590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:12,214-Speed 5436.20 samples/sec Loss 4.7728 LearningRate 0.0153 Epoch: 12 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:14,054-Speed 5567.50 samples/sec Loss 4.6118 LearningRate 0.0153 Epoch: 12 Global Step: 61610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:15,944-Speed 5423.66 samples/sec Loss 4.5693 LearningRate 0.0153 Epoch: 12 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:17,830-Speed 5430.82 samples/sec Loss 4.6699 LearningRate 0.0153 Epoch: 12 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:19,699-Speed 5481.63 samples/sec Loss 4.5239 LearningRate 0.0153 Epoch: 12 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:21,551-Speed 5532.35 samples/sec Loss 4.6437 LearningRate 0.0153 Epoch: 12 Global Step: 61650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:23,410-Speed 5508.84 samples/sec Loss 4.7319 LearningRate 0.0152 Epoch: 12 Global Step: 61660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:25,293-Speed 5442.34 samples/sec Loss 4.6177 LearningRate 0.0152 Epoch: 12 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:27,155-Speed 5502.49 samples/sec Loss 4.8331 LearningRate 0.0152 Epoch: 12 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:28,998-Speed 5557.26 samples/sec Loss 4.6507 LearningRate 0.0152 Epoch: 12 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:30,854-Speed 5518.96 samples/sec Loss 4.7323 LearningRate 0.0152 Epoch: 12 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 14:13:32,713-Speed 5513.08 samples/sec Loss 4.6817 LearningRate 0.0152 Epoch: 12 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:34,562-Speed 5541.42 samples/sec Loss 4.6097 LearningRate 0.0152 Epoch: 12 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:36,428-Speed 5488.36 samples/sec Loss 4.5465 LearningRate 0.0152 Epoch: 12 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:38,266-Speed 5574.87 samples/sec Loss 4.7427 LearningRate 0.0152 Epoch: 12 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:40,106-Speed 5565.72 samples/sec Loss 4.4728 LearningRate 0.0152 Epoch: 12 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 14:13:41,991-Speed 5436.60 samples/sec Loss 4.5886 LearningRate 0.0152 Epoch: 12 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:43,837-Speed 5549.48 samples/sec Loss 4.6513 LearningRate 0.0152 Epoch: 12 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:45,676-Speed 5570.24 samples/sec Loss 4.5731 LearningRate 0.0152 Epoch: 12 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:47,537-Speed 5504.99 samples/sec Loss 4.6721 LearningRate 0.0151 Epoch: 12 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:49,373-Speed 5582.19 samples/sec Loss 4.6293 LearningRate 0.0151 Epoch: 12 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:51,223-Speed 5535.72 samples/sec Loss 4.7339 LearningRate 0.0151 Epoch: 12 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 14:13:53,094-Speed 5477.04 samples/sec Loss 4.7389 LearningRate 0.0151 Epoch: 12 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:54,937-Speed 5559.74 samples/sec Loss 4.7286 LearningRate 0.0151 Epoch: 12 Global Step: 61830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:56,784-Speed 5545.56 samples/sec Loss 4.6889 LearningRate 0.0151 Epoch: 12 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:13:58,636-Speed 5530.59 samples/sec Loss 4.7741 LearningRate 0.0151 Epoch: 12 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:00,484-Speed 5543.72 samples/sec Loss 4.6530 LearningRate 0.0151 Epoch: 12 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:02,324-Speed 5569.67 samples/sec Loss 4.7980 LearningRate 0.0151 Epoch: 12 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:04,184-Speed 5508.40 samples/sec Loss 4.6974 LearningRate 0.0151 Epoch: 12 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:06,052-Speed 5481.88 samples/sec Loss 4.7166 LearningRate 0.0151 Epoch: 12 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:07,889-Speed 5576.84 samples/sec Loss 4.8404 LearningRate 0.0151 Epoch: 12 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:14:09,722-Speed 5588.72 samples/sec Loss 4.6763 LearningRate 0.0151 Epoch: 12 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:11,572-Speed 5537.64 samples/sec Loss 4.7345 LearningRate 0.0150 Epoch: 12 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:13,430-Speed 5513.04 samples/sec Loss 4.7499 LearningRate 0.0150 Epoch: 12 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:15,285-Speed 5522.87 samples/sec Loss 4.7613 LearningRate 0.0150 Epoch: 12 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:17,127-Speed 5561.81 samples/sec Loss 4.7766 LearningRate 0.0150 Epoch: 12 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:18,970-Speed 5558.94 samples/sec Loss 4.7553 LearningRate 0.0150 Epoch: 12 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:20,804-Speed 5584.96 samples/sec Loss 4.7811 LearningRate 0.0150 Epoch: 12 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:22,655-Speed 5535.40 samples/sec Loss 4.6492 LearningRate 0.0150 Epoch: 12 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:24,495-Speed 5567.24 samples/sec Loss 4.6925 LearningRate 0.0150 Epoch: 12 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:26,344-Speed 5542.04 samples/sec Loss 4.6648 LearningRate 0.0150 Epoch: 12 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:14:53,883-[lfw][62000]XNorm: 22.733218 Training: 2022-04-11 14:14:53,883-[lfw][62000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 14:14:53,884-[lfw][62000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:15:25,200-[cfp_fp][62000]XNorm: 20.083805 Training: 2022-04-11 14:15:25,201-[cfp_fp][62000]Accuracy-Flip: 0.97771+-0.00816 Training: 2022-04-11 14:15:25,201-[cfp_fp][62000]Accuracy-Highest: 0.97771 Training: 2022-04-11 14:15:52,370-[agedb_30][62000]XNorm: 22.227357 Training: 2022-04-11 14:15:52,371-[agedb_30][62000]Accuracy-Flip: 0.97867+-0.00694 Training: 2022-04-11 14:15:52,372-[agedb_30][62000]Accuracy-Highest: 0.97867 Training: 2022-04-11 14:15:54,249-Speed 116.49 samples/sec Loss 4.7526 LearningRate 0.0150 Epoch: 12 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:15:56,091-Speed 5560.63 samples/sec Loss 4.7187 LearningRate 0.0150 Epoch: 12 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:15:57,937-Speed 5550.46 samples/sec Loss 4.7228 LearningRate 0.0150 Epoch: 12 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:15:59,797-Speed 5509.44 samples/sec Loss 4.7437 LearningRate 0.0150 Epoch: 12 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:01,662-Speed 5495.17 samples/sec Loss 4.5443 LearningRate 0.0149 Epoch: 12 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:03,521-Speed 5511.25 samples/sec Loss 4.5602 LearningRate 0.0149 Epoch: 12 Global Step: 62060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:05,405-Speed 5437.18 samples/sec Loss 4.6718 LearningRate 0.0149 Epoch: 12 Global Step: 62070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:07,240-Speed 5585.33 samples/sec Loss 4.7972 LearningRate 0.0149 Epoch: 12 Global Step: 62080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:09,095-Speed 5522.23 samples/sec Loss 4.6767 LearningRate 0.0149 Epoch: 12 Global Step: 62090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:10,930-Speed 5582.93 samples/sec Loss 4.6693 LearningRate 0.0149 Epoch: 12 Global Step: 62100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:12,833-Speed 5386.54 samples/sec Loss 4.6611 LearningRate 0.0149 Epoch: 12 Global Step: 62110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:14,704-Speed 5474.99 samples/sec Loss 4.6850 LearningRate 0.0149 Epoch: 12 Global Step: 62120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:16,565-Speed 5504.12 samples/sec Loss 4.6601 LearningRate 0.0149 Epoch: 12 Global Step: 62130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:18,429-Speed 5497.52 samples/sec Loss 4.7689 LearningRate 0.0149 Epoch: 12 Global Step: 62140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:20,266-Speed 5576.24 samples/sec Loss 4.6672 LearningRate 0.0149 Epoch: 12 Global Step: 62150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:16:22,123-Speed 5517.84 samples/sec Loss 4.6760 LearningRate 0.0149 Epoch: 12 Global Step: 62160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:23,960-Speed 5577.63 samples/sec Loss 4.7109 LearningRate 0.0149 Epoch: 12 Global Step: 62170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:25,794-Speed 5587.55 samples/sec Loss 4.6156 LearningRate 0.0148 Epoch: 12 Global Step: 62180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:27,656-Speed 5504.50 samples/sec Loss 4.7574 LearningRate 0.0148 Epoch: 12 Global Step: 62190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:29,528-Speed 5472.12 samples/sec Loss 4.7557 LearningRate 0.0148 Epoch: 12 Global Step: 62200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:31,366-Speed 5576.64 samples/sec Loss 4.7692 LearningRate 0.0148 Epoch: 12 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:33,228-Speed 5507.18 samples/sec Loss 4.6420 LearningRate 0.0148 Epoch: 12 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:35,085-Speed 5516.54 samples/sec Loss 4.5510 LearningRate 0.0148 Epoch: 12 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:36,959-Speed 5466.86 samples/sec Loss 4.8429 LearningRate 0.0148 Epoch: 12 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:38,886-Speed 5317.44 samples/sec Loss 4.7066 LearningRate 0.0148 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:16:40,766-Speed 5451.16 samples/sec Loss 4.9051 LearningRate 0.0148 Epoch: 12 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:42,632-Speed 5491.44 samples/sec Loss 4.6814 LearningRate 0.0148 Epoch: 12 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:44,507-Speed 5462.69 samples/sec Loss 4.7725 LearningRate 0.0148 Epoch: 12 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:46,383-Speed 5464.11 samples/sec Loss 4.7336 LearningRate 0.0148 Epoch: 12 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:48,293-Speed 5362.87 samples/sec Loss 4.7421 LearningRate 0.0148 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:50,134-Speed 5566.87 samples/sec Loss 4.7459 LearningRate 0.0147 Epoch: 12 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:51,981-Speed 5545.08 samples/sec Loss 4.7309 LearningRate 0.0147 Epoch: 12 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:53,864-Speed 5444.78 samples/sec Loss 4.7217 LearningRate 0.0147 Epoch: 12 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:55,713-Speed 5541.40 samples/sec Loss 4.8001 LearningRate 0.0147 Epoch: 12 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:57,568-Speed 5523.34 samples/sec Loss 4.8055 LearningRate 0.0147 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:16:59,460-Speed 5415.63 samples/sec Loss 4.6976 LearningRate 0.0147 Epoch: 12 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:01,302-Speed 5561.28 samples/sec Loss 4.8236 LearningRate 0.0147 Epoch: 12 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:03,203-Speed 5390.18 samples/sec Loss 4.7997 LearningRate 0.0147 Epoch: 12 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:05,084-Speed 5447.15 samples/sec Loss 4.7615 LearningRate 0.0147 Epoch: 12 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:06,941-Speed 5516.15 samples/sec Loss 4.5926 LearningRate 0.0147 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:08,777-Speed 5585.16 samples/sec Loss 4.6714 LearningRate 0.0147 Epoch: 12 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:10,621-Speed 5557.39 samples/sec Loss 4.7450 LearningRate 0.0147 Epoch: 12 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:12,493-Speed 5472.88 samples/sec Loss 4.6841 LearningRate 0.0147 Epoch: 12 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:14,393-Speed 5392.10 samples/sec Loss 4.6978 LearningRate 0.0147 Epoch: 12 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:16,245-Speed 5531.27 samples/sec Loss 4.7832 LearningRate 0.0146 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:18,106-Speed 5508.00 samples/sec Loss 4.7030 LearningRate 0.0146 Epoch: 12 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:19,969-Speed 5498.16 samples/sec Loss 4.7344 LearningRate 0.0146 Epoch: 12 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:21,829-Speed 5508.91 samples/sec Loss 4.8097 LearningRate 0.0146 Epoch: 12 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:23,691-Speed 5504.83 samples/sec Loss 4.8469 LearningRate 0.0146 Epoch: 12 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:25,588-Speed 5400.77 samples/sec Loss 4.8115 LearningRate 0.0146 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:27,452-Speed 5496.65 samples/sec Loss 4.7413 LearningRate 0.0146 Epoch: 12 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:29,351-Speed 5395.22 samples/sec Loss 4.6670 LearningRate 0.0146 Epoch: 12 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:31,204-Speed 5533.66 samples/sec Loss 4.7502 LearningRate 0.0146 Epoch: 12 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:33,057-Speed 5531.01 samples/sec Loss 4.7579 LearningRate 0.0146 Epoch: 12 Global Step: 62540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:34,901-Speed 5560.90 samples/sec Loss 4.8226 LearningRate 0.0146 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:36,752-Speed 5535.03 samples/sec Loss 4.7071 LearningRate 0.0146 Epoch: 12 Global Step: 62560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:38,600-Speed 5544.67 samples/sec Loss 4.5988 LearningRate 0.0146 Epoch: 12 Global Step: 62570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:40,470-Speed 5480.09 samples/sec Loss 4.8133 LearningRate 0.0145 Epoch: 12 Global Step: 62580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:42,339-Speed 5480.64 samples/sec Loss 4.8087 LearningRate 0.0145 Epoch: 12 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:44,177-Speed 5575.17 samples/sec Loss 4.7914 LearningRate 0.0145 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:46,037-Speed 5510.77 samples/sec Loss 4.8610 LearningRate 0.0145 Epoch: 12 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:47,907-Speed 5487.55 samples/sec Loss 4.6669 LearningRate 0.0145 Epoch: 12 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:49,772-Speed 5491.31 samples/sec Loss 4.6912 LearningRate 0.0145 Epoch: 12 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:17:51,634-Speed 5502.96 samples/sec Loss 4.7198 LearningRate 0.0145 Epoch: 12 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:53,491-Speed 5520.20 samples/sec Loss 4.7829 LearningRate 0.0145 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:55,359-Speed 5482.28 samples/sec Loss 4.7200 LearningRate 0.0145 Epoch: 12 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:57,214-Speed 5524.62 samples/sec Loss 4.7396 LearningRate 0.0145 Epoch: 12 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:17:59,075-Speed 5507.64 samples/sec Loss 4.5683 LearningRate 0.0145 Epoch: 12 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:00,946-Speed 5473.23 samples/sec Loss 4.7401 LearningRate 0.0145 Epoch: 12 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:02,827-Speed 5449.96 samples/sec Loss 4.7315 LearningRate 0.0145 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:04,666-Speed 5568.81 samples/sec Loss 4.8538 LearningRate 0.0144 Epoch: 12 Global Step: 62710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:06,516-Speed 5538.07 samples/sec Loss 4.8214 LearningRate 0.0144 Epoch: 12 Global Step: 62720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:08,409-Speed 5413.10 samples/sec Loss 4.7582 LearningRate 0.0144 Epoch: 12 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:10,268-Speed 5509.64 samples/sec Loss 4.8892 LearningRate 0.0144 Epoch: 12 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:12,146-Speed 5455.40 samples/sec Loss 4.6895 LearningRate 0.0144 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:13,990-Speed 5557.87 samples/sec Loss 4.8520 LearningRate 0.0144 Epoch: 12 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:15,884-Speed 5409.24 samples/sec Loss 4.8259 LearningRate 0.0144 Epoch: 12 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:17,746-Speed 5504.56 samples/sec Loss 4.8194 LearningRate 0.0144 Epoch: 12 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:19,598-Speed 5531.71 samples/sec Loss 4.7387 LearningRate 0.0144 Epoch: 12 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:21,460-Speed 5500.94 samples/sec Loss 4.7953 LearningRate 0.0144 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:23,298-Speed 5576.45 samples/sec Loss 4.7584 LearningRate 0.0144 Epoch: 12 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:25,190-Speed 5415.11 samples/sec Loss 4.8117 LearningRate 0.0144 Epoch: 12 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:27,032-Speed 5561.88 samples/sec Loss 4.6803 LearningRate 0.0144 Epoch: 12 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:28,877-Speed 5552.62 samples/sec Loss 4.7416 LearningRate 0.0143 Epoch: 12 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:30,757-Speed 5451.86 samples/sec Loss 4.7879 LearningRate 0.0143 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:32,601-Speed 5555.85 samples/sec Loss 4.6569 LearningRate 0.0143 Epoch: 12 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:34,461-Speed 5507.10 samples/sec Loss 4.7374 LearningRate 0.0143 Epoch: 12 Global Step: 62870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:36,315-Speed 5524.95 samples/sec Loss 4.7226 LearningRate 0.0143 Epoch: 12 Global Step: 62880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:38,216-Speed 5392.36 samples/sec Loss 4.6641 LearningRate 0.0143 Epoch: 12 Global Step: 62890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:18:40,054-Speed 5574.57 samples/sec Loss 4.6979 LearningRate 0.0143 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:41,932-Speed 5454.79 samples/sec Loss 4.7508 LearningRate 0.0143 Epoch: 12 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:43,791-Speed 5511.78 samples/sec Loss 4.8112 LearningRate 0.0143 Epoch: 12 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:45,626-Speed 5582.64 samples/sec Loss 4.5974 LearningRate 0.0143 Epoch: 12 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:47,484-Speed 5514.26 samples/sec Loss 4.7562 LearningRate 0.0143 Epoch: 12 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:49,332-Speed 5545.17 samples/sec Loss 4.7860 LearningRate 0.0143 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:51,215-Speed 5440.68 samples/sec Loss 4.8520 LearningRate 0.0143 Epoch: 12 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:53,050-Speed 5583.70 samples/sec Loss 4.7208 LearningRate 0.0143 Epoch: 12 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:54,892-Speed 5561.67 samples/sec Loss 4.7465 LearningRate 0.0142 Epoch: 12 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:56,733-Speed 5567.71 samples/sec Loss 4.7586 LearningRate 0.0142 Epoch: 12 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:18:58,633-Speed 5392.80 samples/sec Loss 4.7225 LearningRate 0.0142 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:00,487-Speed 5525.53 samples/sec Loss 4.7098 LearningRate 0.0142 Epoch: 12 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:02,365-Speed 5454.88 samples/sec Loss 4.8162 LearningRate 0.0142 Epoch: 12 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:04,219-Speed 5529.08 samples/sec Loss 4.6317 LearningRate 0.0142 Epoch: 12 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:06,065-Speed 5549.34 samples/sec Loss 4.8151 LearningRate 0.0142 Epoch: 12 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:07,924-Speed 5509.98 samples/sec Loss 4.7461 LearningRate 0.0142 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:09,777-Speed 5528.04 samples/sec Loss 4.8296 LearningRate 0.0142 Epoch: 12 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:11,646-Speed 5483.66 samples/sec Loss 4.7776 LearningRate 0.0142 Epoch: 12 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:13,519-Speed 5470.78 samples/sec Loss 4.6941 LearningRate 0.0142 Epoch: 12 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:15,365-Speed 5548.94 samples/sec Loss 4.7099 LearningRate 0.0142 Epoch: 12 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:17,220-Speed 5522.45 samples/sec Loss 4.8068 LearningRate 0.0142 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:19,066-Speed 5551.10 samples/sec Loss 4.7024 LearningRate 0.0141 Epoch: 12 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:20,927-Speed 5504.43 samples/sec Loss 4.8328 LearningRate 0.0141 Epoch: 12 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:22,810-Speed 5445.34 samples/sec Loss 4.6618 LearningRate 0.0141 Epoch: 12 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:24,659-Speed 5537.95 samples/sec Loss 4.8012 LearningRate 0.0141 Epoch: 12 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:26,514-Speed 5523.23 samples/sec Loss 4.7565 LearningRate 0.0141 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:28,375-Speed 5506.86 samples/sec Loss 4.7554 LearningRate 0.0141 Epoch: 12 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:30,232-Speed 5517.87 samples/sec Loss 4.7799 LearningRate 0.0141 Epoch: 12 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:32,097-Speed 5492.59 samples/sec Loss 4.6954 LearningRate 0.0141 Epoch: 12 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:33,957-Speed 5509.82 samples/sec Loss 4.7426 LearningRate 0.0141 Epoch: 12 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:35,804-Speed 5546.99 samples/sec Loss 4.7981 LearningRate 0.0141 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:37,681-Speed 5458.54 samples/sec Loss 4.8456 LearningRate 0.0141 Epoch: 12 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:39,513-Speed 5593.47 samples/sec Loss 4.6864 LearningRate 0.0141 Epoch: 12 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:41,395-Speed 5443.35 samples/sec Loss 4.7695 LearningRate 0.0141 Epoch: 12 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:43,238-Speed 5562.33 samples/sec Loss 4.7566 LearningRate 0.0141 Epoch: 12 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:45,117-Speed 5451.84 samples/sec Loss 4.8225 LearningRate 0.0140 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:46,968-Speed 5533.50 samples/sec Loss 4.6258 LearningRate 0.0140 Epoch: 12 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:48,833-Speed 5494.43 samples/sec Loss 4.7539 LearningRate 0.0140 Epoch: 12 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:50,672-Speed 5573.01 samples/sec Loss 4.6981 LearningRate 0.0140 Epoch: 12 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:52,537-Speed 5492.41 samples/sec Loss 4.8503 LearningRate 0.0140 Epoch: 12 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:54,383-Speed 5551.56 samples/sec Loss 4.7683 LearningRate 0.0140 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:56,246-Speed 5499.61 samples/sec Loss 4.7873 LearningRate 0.0140 Epoch: 12 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:19:58,091-Speed 5555.85 samples/sec Loss 4.6603 LearningRate 0.0140 Epoch: 12 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:19:59,961-Speed 5477.80 samples/sec Loss 4.7086 LearningRate 0.0140 Epoch: 12 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:01,807-Speed 5552.44 samples/sec Loss 4.7906 LearningRate 0.0140 Epoch: 12 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:03,684-Speed 5458.35 samples/sec Loss 4.8122 LearningRate 0.0140 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:05,525-Speed 5565.80 samples/sec Loss 4.7562 LearningRate 0.0140 Epoch: 12 Global Step: 63360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:07,364-Speed 5569.92 samples/sec Loss 4.6648 LearningRate 0.0140 Epoch: 12 Global Step: 63370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:09,204-Speed 5570.48 samples/sec Loss 4.7263 LearningRate 0.0139 Epoch: 12 Global Step: 63380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:11,075-Speed 5476.34 samples/sec Loss 4.6921 LearningRate 0.0139 Epoch: 12 Global Step: 63390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:12,919-Speed 5555.57 samples/sec Loss 4.7656 LearningRate 0.0139 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:14,785-Speed 5491.14 samples/sec Loss 4.7529 LearningRate 0.0139 Epoch: 12 Global Step: 63410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:16,667-Speed 5444.67 samples/sec Loss 4.9289 LearningRate 0.0139 Epoch: 12 Global Step: 63420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:18,506-Speed 5571.14 samples/sec Loss 4.7854 LearningRate 0.0139 Epoch: 12 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:20,363-Speed 5519.06 samples/sec Loss 4.6645 LearningRate 0.0139 Epoch: 12 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:22,211-Speed 5542.05 samples/sec Loss 4.8388 LearningRate 0.0139 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:24,066-Speed 5523.69 samples/sec Loss 4.8976 LearningRate 0.0139 Epoch: 12 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:25,936-Speed 5479.57 samples/sec Loss 4.7431 LearningRate 0.0139 Epoch: 12 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:27,817-Speed 5447.50 samples/sec Loss 4.7933 LearningRate 0.0139 Epoch: 12 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:29,657-Speed 5567.71 samples/sec Loss 4.8266 LearningRate 0.0139 Epoch: 12 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:31,542-Speed 5435.40 samples/sec Loss 4.6832 LearningRate 0.0139 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:33,381-Speed 5570.89 samples/sec Loss 4.7640 LearningRate 0.0139 Epoch: 12 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:35,266-Speed 5435.50 samples/sec Loss 4.7239 LearningRate 0.0138 Epoch: 12 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:37,160-Speed 5410.72 samples/sec Loss 4.6082 LearningRate 0.0138 Epoch: 12 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:39,033-Speed 5468.18 samples/sec Loss 4.8641 LearningRate 0.0138 Epoch: 12 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:40,867-Speed 5586.11 samples/sec Loss 4.7758 LearningRate 0.0138 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:42,717-Speed 5539.60 samples/sec Loss 4.7625 LearningRate 0.0138 Epoch: 12 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:20:44,568-Speed 5534.10 samples/sec Loss 4.7926 LearningRate 0.0138 Epoch: 12 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:46,423-Speed 5523.29 samples/sec Loss 4.6198 LearningRate 0.0138 Epoch: 12 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:48,279-Speed 5519.96 samples/sec Loss 4.9008 LearningRate 0.0138 Epoch: 12 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:50,173-Speed 5408.65 samples/sec Loss 4.6252 LearningRate 0.0138 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:20:52,006-Speed 5587.87 samples/sec Loss 4.7735 LearningRate 0.0138 Epoch: 12 Global Step: 63610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:20:53,901-Speed 5408.12 samples/sec Loss 4.6104 LearningRate 0.0138 Epoch: 12 Global Step: 63620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:20:55,746-Speed 5551.76 samples/sec Loss 4.6824 LearningRate 0.0138 Epoch: 12 Global Step: 63630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:20:57,621-Speed 5464.41 samples/sec Loss 4.7562 LearningRate 0.0138 Epoch: 12 Global Step: 63640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:20:59,487-Speed 5492.28 samples/sec Loss 4.7538 LearningRate 0.0137 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:01,350-Speed 5500.42 samples/sec Loss 4.6563 LearningRate 0.0137 Epoch: 12 Global Step: 63660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:03,250-Speed 5392.42 samples/sec Loss 4.6497 LearningRate 0.0137 Epoch: 12 Global Step: 63670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:05,156-Speed 5373.72 samples/sec Loss 4.7826 LearningRate 0.0137 Epoch: 12 Global Step: 63680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:07,001-Speed 5554.95 samples/sec Loss 4.6908 LearningRate 0.0137 Epoch: 12 Global Step: 63690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:08,862-Speed 5505.66 samples/sec Loss 4.8807 LearningRate 0.0137 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:21:10,706-Speed 5556.68 samples/sec Loss 4.8143 LearningRate 0.0137 Epoch: 12 Global Step: 63710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:12,556-Speed 5535.76 samples/sec Loss 4.6425 LearningRate 0.0137 Epoch: 12 Global Step: 63720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:14,461-Speed 5378.09 samples/sec Loss 4.6634 LearningRate 0.0137 Epoch: 12 Global Step: 63730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:16,301-Speed 5570.58 samples/sec Loss 4.6190 LearningRate 0.0137 Epoch: 12 Global Step: 63740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:18,208-Speed 5372.19 samples/sec Loss 4.6966 LearningRate 0.0137 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:20,051-Speed 5560.72 samples/sec Loss 4.6943 LearningRate 0.0137 Epoch: 12 Global Step: 63760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:21,918-Speed 5485.51 samples/sec Loss 4.6755 LearningRate 0.0137 Epoch: 12 Global Step: 63770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:23,763-Speed 5557.12 samples/sec Loss 4.6339 LearningRate 0.0137 Epoch: 12 Global Step: 63780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:25,637-Speed 5464.86 samples/sec Loss 4.8446 LearningRate 0.0136 Epoch: 12 Global Step: 63790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:27,500-Speed 5499.67 samples/sec Loss 4.9290 LearningRate 0.0136 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:29,341-Speed 5566.30 samples/sec Loss 4.8991 LearningRate 0.0136 Epoch: 12 Global Step: 63810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:31,180-Speed 5570.10 samples/sec Loss 4.6986 LearningRate 0.0136 Epoch: 12 Global Step: 63820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:33,033-Speed 5531.90 samples/sec Loss 4.6795 LearningRate 0.0136 Epoch: 12 Global Step: 63830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:34,914-Speed 5445.11 samples/sec Loss 4.7600 LearningRate 0.0136 Epoch: 12 Global Step: 63840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:36,780-Speed 5488.53 samples/sec Loss 4.7633 LearningRate 0.0136 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:38,670-Speed 5422.00 samples/sec Loss 4.8027 LearningRate 0.0136 Epoch: 12 Global Step: 63860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:40,542-Speed 5474.22 samples/sec Loss 4.7866 LearningRate 0.0136 Epoch: 12 Global Step: 63870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:42,461-Speed 5338.66 samples/sec Loss 4.8258 LearningRate 0.0136 Epoch: 12 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:44,303-Speed 5564.71 samples/sec Loss 4.7248 LearningRate 0.0136 Epoch: 12 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:46,149-Speed 5548.52 samples/sec Loss 4.8362 LearningRate 0.0136 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:48,026-Speed 5458.28 samples/sec Loss 4.7586 LearningRate 0.0136 Epoch: 12 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:49,898-Speed 5474.03 samples/sec Loss 4.7272 LearningRate 0.0136 Epoch: 12 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:51,757-Speed 5513.22 samples/sec Loss 4.6988 LearningRate 0.0135 Epoch: 12 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:53,641-Speed 5439.21 samples/sec Loss 4.6152 LearningRate 0.0135 Epoch: 12 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:55,483-Speed 5561.61 samples/sec Loss 4.5510 LearningRate 0.0135 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:57,331-Speed 5543.27 samples/sec Loss 4.7802 LearningRate 0.0135 Epoch: 12 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:21:59,182-Speed 5537.78 samples/sec Loss 4.7483 LearningRate 0.0135 Epoch: 12 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:22:01,049-Speed 5490.26 samples/sec Loss 4.6874 LearningRate 0.0135 Epoch: 12 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:22:02,897-Speed 5545.36 samples/sec Loss 4.7563 LearningRate 0.0135 Epoch: 12 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:22:04,766-Speed 5480.27 samples/sec Loss 4.5929 LearningRate 0.0135 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:22:32,046-[lfw][64000]XNorm: 23.746053 Training: 2022-04-11 14:22:32,047-[lfw][64000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 14:22:32,048-[lfw][64000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:23:03,218-[cfp_fp][64000]XNorm: 21.213482 Training: 2022-04-11 14:23:03,219-[cfp_fp][64000]Accuracy-Flip: 0.97471+-0.00748 Training: 2022-04-11 14:23:03,220-[cfp_fp][64000]Accuracy-Highest: 0.97771 Training: 2022-04-11 14:23:30,454-[agedb_30][64000]XNorm: 23.290081 Training: 2022-04-11 14:23:30,454-[agedb_30][64000]Accuracy-Flip: 0.97850+-0.00754 Training: 2022-04-11 14:23:30,455-[agedb_30][64000]Accuracy-Highest: 0.97867 Training: 2022-04-11 14:23:32,302-Speed 116.98 samples/sec Loss 4.7613 LearningRate 0.0135 Epoch: 12 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:34,170-Speed 5484.34 samples/sec Loss 4.7140 LearningRate 0.0135 Epoch: 12 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:36,001-Speed 5594.05 samples/sec Loss 4.7202 LearningRate 0.0135 Epoch: 12 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:37,893-Speed 5415.93 samples/sec Loss 4.8728 LearningRate 0.0135 Epoch: 12 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:39,733-Speed 5566.87 samples/sec Loss 4.7712 LearningRate 0.0135 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:41,581-Speed 5546.67 samples/sec Loss 4.7062 LearningRate 0.0135 Epoch: 12 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:43,440-Speed 5511.30 samples/sec Loss 4.7106 LearningRate 0.0134 Epoch: 12 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:45,332-Speed 5414.73 samples/sec Loss 4.6451 LearningRate 0.0134 Epoch: 12 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:47,177-Speed 5555.10 samples/sec Loss 4.8150 LearningRate 0.0134 Epoch: 12 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:23:49,055-Speed 5458.89 samples/sec Loss 4.6827 LearningRate 0.0134 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:23:50,967-Speed 5356.61 samples/sec Loss 4.6264 LearningRate 0.0134 Epoch: 12 Global Step: 64110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:23:52,900-Speed 5302.97 samples/sec Loss 4.6758 LearningRate 0.0134 Epoch: 12 Global Step: 64120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:23:54,786-Speed 5429.00 samples/sec Loss 4.7850 LearningRate 0.0134 Epoch: 12 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:23:56,637-Speed 5538.06 samples/sec Loss 4.8231 LearningRate 0.0134 Epoch: 12 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:23:58,531-Speed 5408.53 samples/sec Loss 4.6909 LearningRate 0.0134 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:00,363-Speed 5593.60 samples/sec Loss 4.6977 LearningRate 0.0134 Epoch: 12 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:02,227-Speed 5496.14 samples/sec Loss 4.7420 LearningRate 0.0134 Epoch: 12 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:04,070-Speed 5562.02 samples/sec Loss 4.7279 LearningRate 0.0134 Epoch: 12 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:05,941-Speed 5474.37 samples/sec Loss 4.7529 LearningRate 0.0134 Epoch: 12 Global Step: 64190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:07,775-Speed 5586.03 samples/sec Loss 4.6171 LearningRate 0.0133 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:09,609-Speed 5587.59 samples/sec Loss 4.6200 LearningRate 0.0133 Epoch: 12 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:11,454-Speed 5553.19 samples/sec Loss 4.8103 LearningRate 0.0133 Epoch: 12 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:13,291-Speed 5578.89 samples/sec Loss 4.6269 LearningRate 0.0133 Epoch: 12 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:15,143-Speed 5529.80 samples/sec Loss 4.7121 LearningRate 0.0133 Epoch: 12 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:17,003-Speed 5508.47 samples/sec Loss 4.7341 LearningRate 0.0133 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:18,889-Speed 5435.35 samples/sec Loss 4.6746 LearningRate 0.0133 Epoch: 12 Global Step: 64260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:20,734-Speed 5556.12 samples/sec Loss 4.7414 LearningRate 0.0133 Epoch: 12 Global Step: 64270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:22,610-Speed 5459.77 samples/sec Loss 4.7655 LearningRate 0.0133 Epoch: 12 Global Step: 64280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:24,489-Speed 5453.14 samples/sec Loss 4.6674 LearningRate 0.0133 Epoch: 12 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:26,341-Speed 5532.46 samples/sec Loss 4.7452 LearningRate 0.0133 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:28,179-Speed 5574.36 samples/sec Loss 4.7426 LearningRate 0.0133 Epoch: 12 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:30,056-Speed 5459.48 samples/sec Loss 4.7575 LearningRate 0.0133 Epoch: 12 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:31,933-Speed 5459.41 samples/sec Loss 4.6878 LearningRate 0.0133 Epoch: 12 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:33,773-Speed 5567.26 samples/sec Loss 4.6819 LearningRate 0.0132 Epoch: 12 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:35,640-Speed 5490.38 samples/sec Loss 4.6242 LearningRate 0.0132 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:37,507-Speed 5486.01 samples/sec Loss 4.8598 LearningRate 0.0132 Epoch: 12 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:39,357-Speed 5538.18 samples/sec Loss 4.6661 LearningRate 0.0132 Epoch: 12 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:41,237-Speed 5451.05 samples/sec Loss 4.7043 LearningRate 0.0132 Epoch: 12 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:43,092-Speed 5523.57 samples/sec Loss 4.7306 LearningRate 0.0132 Epoch: 12 Global Step: 64390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:44,975-Speed 5439.52 samples/sec Loss 4.7617 LearningRate 0.0132 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:46,842-Speed 5487.66 samples/sec Loss 4.6605 LearningRate 0.0132 Epoch: 12 Global Step: 64410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:48,702-Speed 5509.96 samples/sec Loss 4.7874 LearningRate 0.0132 Epoch: 12 Global Step: 64420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:50,582-Speed 5453.09 samples/sec Loss 4.7400 LearningRate 0.0132 Epoch: 12 Global Step: 64430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:52,426-Speed 5556.82 samples/sec Loss 4.7218 LearningRate 0.0132 Epoch: 12 Global Step: 64440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:54,287-Speed 5505.28 samples/sec Loss 4.5419 LearningRate 0.0132 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:56,128-Speed 5567.01 samples/sec Loss 4.6443 LearningRate 0.0132 Epoch: 12 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:24:58,009-Speed 5447.47 samples/sec Loss 4.6499 LearningRate 0.0132 Epoch: 12 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:24:59,848-Speed 5570.75 samples/sec Loss 4.7411 LearningRate 0.0131 Epoch: 12 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:01,746-Speed 5398.46 samples/sec Loss 4.6172 LearningRate 0.0131 Epoch: 12 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:03,598-Speed 5532.37 samples/sec Loss 4.7479 LearningRate 0.0131 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:05,455-Speed 5517.78 samples/sec Loss 4.7092 LearningRate 0.0131 Epoch: 12 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:07,332-Speed 5456.81 samples/sec Loss 4.7098 LearningRate 0.0131 Epoch: 12 Global Step: 64520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:09,171-Speed 5571.95 samples/sec Loss 4.6488 LearningRate 0.0131 Epoch: 12 Global Step: 64530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:11,063-Speed 5417.66 samples/sec Loss 4.7780 LearningRate 0.0131 Epoch: 12 Global Step: 64540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:12,913-Speed 5537.16 samples/sec Loss 4.6057 LearningRate 0.0131 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:14,750-Speed 5577.01 samples/sec Loss 4.6927 LearningRate 0.0131 Epoch: 12 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:16,620-Speed 5479.16 samples/sec Loss 4.7717 LearningRate 0.0131 Epoch: 12 Global Step: 64570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:18,480-Speed 5510.40 samples/sec Loss 4.7113 LearningRate 0.0131 Epoch: 12 Global Step: 64580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:20,354-Speed 5464.61 samples/sec Loss 4.7648 LearningRate 0.0131 Epoch: 12 Global Step: 64590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:22,211-Speed 5520.28 samples/sec Loss 4.6969 LearningRate 0.0131 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:24,101-Speed 5420.39 samples/sec Loss 4.7307 LearningRate 0.0131 Epoch: 12 Global Step: 64610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:26,004-Speed 5384.11 samples/sec Loss 4.6525 LearningRate 0.0130 Epoch: 12 Global Step: 64620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:27,871-Speed 5487.21 samples/sec Loss 4.6995 LearningRate 0.0130 Epoch: 12 Global Step: 64630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:29,758-Speed 5429.83 samples/sec Loss 4.7804 LearningRate 0.0130 Epoch: 12 Global Step: 64640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:31,603-Speed 5554.44 samples/sec Loss 4.5535 LearningRate 0.0130 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:33,483-Speed 5450.39 samples/sec Loss 4.6747 LearningRate 0.0130 Epoch: 12 Global Step: 64660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:35,338-Speed 5523.11 samples/sec Loss 4.7027 LearningRate 0.0130 Epoch: 12 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:37,231-Speed 5411.07 samples/sec Loss 4.7071 LearningRate 0.0130 Epoch: 12 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:39,074-Speed 5560.88 samples/sec Loss 4.5264 LearningRate 0.0130 Epoch: 12 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:40,955-Speed 5446.58 samples/sec Loss 4.5857 LearningRate 0.0130 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:42,800-Speed 5552.91 samples/sec Loss 4.6518 LearningRate 0.0130 Epoch: 12 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:44,665-Speed 5494.76 samples/sec Loss 4.7534 LearningRate 0.0130 Epoch: 12 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:46,514-Speed 5540.97 samples/sec Loss 4.8085 LearningRate 0.0130 Epoch: 12 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:48,393-Speed 5453.62 samples/sec Loss 4.7604 LearningRate 0.0130 Epoch: 12 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:25:50,258-Speed 5493.19 samples/sec Loss 4.6959 LearningRate 0.0130 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:52,111-Speed 5529.08 samples/sec Loss 4.6959 LearningRate 0.0129 Epoch: 12 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:53,965-Speed 5528.49 samples/sec Loss 4.6337 LearningRate 0.0129 Epoch: 12 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:55,803-Speed 5574.72 samples/sec Loss 4.6739 LearningRate 0.0129 Epoch: 12 Global Step: 64780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:57,692-Speed 5422.98 samples/sec Loss 4.7533 LearningRate 0.0129 Epoch: 12 Global Step: 64790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:25:59,537-Speed 5555.06 samples/sec Loss 4.6432 LearningRate 0.0129 Epoch: 12 Global Step: 64800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:01,412-Speed 5463.48 samples/sec Loss 4.8700 LearningRate 0.0129 Epoch: 12 Global Step: 64810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:03,265-Speed 5530.93 samples/sec Loss 4.7734 LearningRate 0.0129 Epoch: 12 Global Step: 64820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:05,147-Speed 5443.44 samples/sec Loss 4.6327 LearningRate 0.0129 Epoch: 12 Global Step: 64830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:06,994-Speed 5548.15 samples/sec Loss 4.7085 LearningRate 0.0129 Epoch: 12 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:08,880-Speed 5431.88 samples/sec Loss 4.6587 LearningRate 0.0129 Epoch: 12 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:10,731-Speed 5536.73 samples/sec Loss 4.5671 LearningRate 0.0129 Epoch: 12 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:12,698-Speed 5208.19 samples/sec Loss 4.6318 LearningRate 0.0129 Epoch: 12 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:14,579-Speed 5447.73 samples/sec Loss 4.6555 LearningRate 0.0129 Epoch: 12 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:16,475-Speed 5403.67 samples/sec Loss 4.7400 LearningRate 0.0129 Epoch: 12 Global Step: 64890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:18,344-Speed 5482.77 samples/sec Loss 4.7494 LearningRate 0.0128 Epoch: 12 Global Step: 64900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:20,189-Speed 5554.33 samples/sec Loss 4.6748 LearningRate 0.0128 Epoch: 12 Global Step: 64910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:22,036-Speed 5547.80 samples/sec Loss 4.4907 LearningRate 0.0128 Epoch: 12 Global Step: 64920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:23,886-Speed 5537.21 samples/sec Loss 4.8125 LearningRate 0.0128 Epoch: 12 Global Step: 64930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:25,744-Speed 5516.40 samples/sec Loss 4.5804 LearningRate 0.0128 Epoch: 12 Global Step: 64940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:27,610-Speed 5491.37 samples/sec Loss 4.7666 LearningRate 0.0128 Epoch: 12 Global Step: 64950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:29,469-Speed 5511.27 samples/sec Loss 4.6506 LearningRate 0.0128 Epoch: 12 Global Step: 64960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:31,338-Speed 5481.26 samples/sec Loss 4.7193 LearningRate 0.0128 Epoch: 12 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:33,180-Speed 5563.70 samples/sec Loss 4.7161 LearningRate 0.0128 Epoch: 12 Global Step: 64980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:35,051-Speed 5475.51 samples/sec Loss 4.5145 LearningRate 0.0128 Epoch: 12 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:36,939-Speed 5429.42 samples/sec Loss 4.6096 LearningRate 0.0128 Epoch: 12 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:38,816-Speed 5455.44 samples/sec Loss 4.7717 LearningRate 0.0128 Epoch: 12 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:40,680-Speed 5499.69 samples/sec Loss 4.6802 LearningRate 0.0128 Epoch: 12 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:26:42,536-Speed 5518.46 samples/sec Loss 4.8100 LearningRate 0.0128 Epoch: 12 Global Step: 65030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:44,375-Speed 5572.94 samples/sec Loss 4.7153 LearningRate 0.0127 Epoch: 12 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:46,234-Speed 5509.27 samples/sec Loss 4.7028 LearningRate 0.0127 Epoch: 12 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:48,077-Speed 5560.19 samples/sec Loss 4.7711 LearningRate 0.0127 Epoch: 12 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:49,965-Speed 5426.94 samples/sec Loss 4.6258 LearningRate 0.0127 Epoch: 12 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:51,834-Speed 5482.54 samples/sec Loss 4.6701 LearningRate 0.0127 Epoch: 12 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:53,702-Speed 5483.86 samples/sec Loss 4.6217 LearningRate 0.0127 Epoch: 12 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:55,586-Speed 5439.89 samples/sec Loss 4.5634 LearningRate 0.0127 Epoch: 12 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:57,459-Speed 5471.10 samples/sec Loss 4.5179 LearningRate 0.0127 Epoch: 12 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:26:59,360-Speed 5389.31 samples/sec Loss 4.6346 LearningRate 0.0127 Epoch: 12 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:01,203-Speed 5560.59 samples/sec Loss 4.6772 LearningRate 0.0127 Epoch: 12 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:03,113-Speed 5363.80 samples/sec Loss 4.6063 LearningRate 0.0127 Epoch: 12 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:04,963-Speed 5537.28 samples/sec Loss 4.7064 LearningRate 0.0127 Epoch: 12 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:06,840-Speed 5460.44 samples/sec Loss 4.7727 LearningRate 0.0127 Epoch: 12 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:08,680-Speed 5568.15 samples/sec Loss 4.5741 LearningRate 0.0127 Epoch: 12 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:10,569-Speed 5421.84 samples/sec Loss 4.6374 LearningRate 0.0127 Epoch: 12 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:12,412-Speed 5561.93 samples/sec Loss 4.6970 LearningRate 0.0126 Epoch: 12 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:14,276-Speed 5493.85 samples/sec Loss 4.7344 LearningRate 0.0126 Epoch: 12 Global Step: 65200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:16,148-Speed 5474.69 samples/sec Loss 4.7780 LearningRate 0.0126 Epoch: 12 Global Step: 65210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:18,012-Speed 5498.68 samples/sec Loss 4.7416 LearningRate 0.0126 Epoch: 12 Global Step: 65220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:19,868-Speed 5520.23 samples/sec Loss 4.6161 LearningRate 0.0126 Epoch: 12 Global Step: 65230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:21,731-Speed 5500.23 samples/sec Loss 4.5898 LearningRate 0.0126 Epoch: 12 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:23,579-Speed 5544.07 samples/sec Loss 4.5401 LearningRate 0.0126 Epoch: 12 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:25,425-Speed 5547.20 samples/sec Loss 4.6085 LearningRate 0.0126 Epoch: 12 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:27,300-Speed 5466.83 samples/sec Loss 4.6873 LearningRate 0.0126 Epoch: 12 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:29,162-Speed 5501.92 samples/sec Loss 4.8021 LearningRate 0.0126 Epoch: 12 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:31,050-Speed 5428.99 samples/sec Loss 4.7122 LearningRate 0.0126 Epoch: 12 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:32,889-Speed 5568.70 samples/sec Loss 4.6998 LearningRate 0.0126 Epoch: 12 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:34,732-Speed 5560.42 samples/sec Loss 4.6948 LearningRate 0.0126 Epoch: 12 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:36,600-Speed 5484.68 samples/sec Loss 4.6442 LearningRate 0.0126 Epoch: 12 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:38,446-Speed 5553.78 samples/sec Loss 4.7632 LearningRate 0.0125 Epoch: 12 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:40,342-Speed 5404.24 samples/sec Loss 4.6368 LearningRate 0.0125 Epoch: 12 Global Step: 65340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:42,184-Speed 5563.87 samples/sec Loss 4.6666 LearningRate 0.0125 Epoch: 12 Global Step: 65350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:44,063-Speed 5452.37 samples/sec Loss 4.5683 LearningRate 0.0125 Epoch: 12 Global Step: 65360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:45,901-Speed 5574.38 samples/sec Loss 4.7809 LearningRate 0.0125 Epoch: 12 Global Step: 65370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:47,772-Speed 5476.70 samples/sec Loss 4.5354 LearningRate 0.0125 Epoch: 12 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:49,668-Speed 5404.13 samples/sec Loss 4.6032 LearningRate 0.0125 Epoch: 12 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:51,556-Speed 5425.65 samples/sec Loss 4.6164 LearningRate 0.0125 Epoch: 12 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:53,403-Speed 5546.18 samples/sec Loss 4.5080 LearningRate 0.0125 Epoch: 12 Global Step: 65410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:27:55,298-Speed 5406.20 samples/sec Loss 4.7001 LearningRate 0.0125 Epoch: 12 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:57,147-Speed 5543.53 samples/sec Loss 4.5572 LearningRate 0.0125 Epoch: 12 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:27:58,982-Speed 5581.50 samples/sec Loss 4.4869 LearningRate 0.0125 Epoch: 12 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:00,864-Speed 5445.37 samples/sec Loss 4.6166 LearningRate 0.0125 Epoch: 12 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:02,718-Speed 5528.06 samples/sec Loss 4.5699 LearningRate 0.0125 Epoch: 12 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:04,594-Speed 5460.28 samples/sec Loss 4.6568 LearningRate 0.0124 Epoch: 12 Global Step: 65470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:06,435-Speed 5565.20 samples/sec Loss 4.6920 LearningRate 0.0124 Epoch: 12 Global Step: 65480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:08,311-Speed 5460.59 samples/sec Loss 4.7800 LearningRate 0.0124 Epoch: 12 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:10,163-Speed 5531.71 samples/sec Loss 4.5087 LearningRate 0.0124 Epoch: 12 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:11,998-Speed 5583.37 samples/sec Loss 4.6860 LearningRate 0.0124 Epoch: 12 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:13,969-Speed 5199.13 samples/sec Loss 4.6332 LearningRate 0.0124 Epoch: 12 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:15,815-Speed 5550.99 samples/sec Loss 4.5869 LearningRate 0.0124 Epoch: 12 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:17,697-Speed 5442.03 samples/sec Loss 4.7913 LearningRate 0.0124 Epoch: 12 Global Step: 65540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:19,555-Speed 5513.19 samples/sec Loss 4.5176 LearningRate 0.0124 Epoch: 12 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:21,392-Speed 5579.15 samples/sec Loss 4.6384 LearningRate 0.0124 Epoch: 12 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:23,298-Speed 5374.93 samples/sec Loss 4.4999 LearningRate 0.0124 Epoch: 12 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:25,140-Speed 5561.37 samples/sec Loss 4.6839 LearningRate 0.0124 Epoch: 12 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:27,013-Speed 5469.87 samples/sec Loss 4.6803 LearningRate 0.0124 Epoch: 12 Global Step: 65590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:28,848-Speed 5583.61 samples/sec Loss 4.6482 LearningRate 0.0124 Epoch: 12 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:30,733-Speed 5435.53 samples/sec Loss 4.5858 LearningRate 0.0123 Epoch: 12 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:32,582-Speed 5543.16 samples/sec Loss 4.6457 LearningRate 0.0123 Epoch: 12 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:34,476-Speed 5408.68 samples/sec Loss 4.7293 LearningRate 0.0123 Epoch: 12 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:36,339-Speed 5501.52 samples/sec Loss 4.6478 LearningRate 0.0123 Epoch: 12 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:38,219-Speed 5451.18 samples/sec Loss 4.6543 LearningRate 0.0123 Epoch: 12 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:40,074-Speed 5524.30 samples/sec Loss 4.5234 LearningRate 0.0123 Epoch: 12 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:41,936-Speed 5505.02 samples/sec Loss 4.6945 LearningRate 0.0123 Epoch: 12 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:43,777-Speed 5563.35 samples/sec Loss 4.7435 LearningRate 0.0123 Epoch: 12 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:45,655-Speed 5454.91 samples/sec Loss 4.5286 LearningRate 0.0123 Epoch: 12 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:47,510-Speed 5525.61 samples/sec Loss 4.6242 LearningRate 0.0123 Epoch: 12 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:49,393-Speed 5440.62 samples/sec Loss 4.5406 LearningRate 0.0123 Epoch: 12 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:28:51,231-Speed 5577.32 samples/sec Loss 4.5725 LearningRate 0.0123 Epoch: 12 Global Step: 65720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:53,112-Speed 5445.77 samples/sec Loss 4.7997 LearningRate 0.0123 Epoch: 12 Global Step: 65730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:54,983-Speed 5475.37 samples/sec Loss 4.6728 LearningRate 0.0123 Epoch: 12 Global Step: 65740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:28:56,887-Speed 5383.81 samples/sec Loss 4.7202 LearningRate 0.0123 Epoch: 12 Global Step: 65750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:08,258-Speed 900.67 samples/sec Loss 4.1357 LearningRate 0.0122 Epoch: 13 Global Step: 65760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:10,103-Speed 5553.89 samples/sec Loss 3.7468 LearningRate 0.0122 Epoch: 13 Global Step: 65770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:12,029-Speed 5321.74 samples/sec Loss 3.8559 LearningRate 0.0122 Epoch: 13 Global Step: 65780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:13,961-Speed 5300.62 samples/sec Loss 3.6855 LearningRate 0.0122 Epoch: 13 Global Step: 65790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:15,827-Speed 5496.09 samples/sec Loss 3.7482 LearningRate 0.0122 Epoch: 13 Global Step: 65800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:17,739-Speed 5358.55 samples/sec Loss 3.7452 LearningRate 0.0122 Epoch: 13 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:19,604-Speed 5495.73 samples/sec Loss 3.8483 LearningRate 0.0122 Epoch: 13 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:21,461-Speed 5517.56 samples/sec Loss 3.6690 LearningRate 0.0122 Epoch: 13 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:23,378-Speed 5345.17 samples/sec Loss 3.7680 LearningRate 0.0122 Epoch: 13 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:25,260-Speed 5441.44 samples/sec Loss 3.8158 LearningRate 0.0122 Epoch: 13 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:27,124-Speed 5495.56 samples/sec Loss 3.8556 LearningRate 0.0122 Epoch: 13 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:28,982-Speed 5515.84 samples/sec Loss 3.8454 LearningRate 0.0122 Epoch: 13 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:30,840-Speed 5513.64 samples/sec Loss 3.7195 LearningRate 0.0122 Epoch: 13 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:32,720-Speed 5450.05 samples/sec Loss 3.8080 LearningRate 0.0122 Epoch: 13 Global Step: 65890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:34,572-Speed 5528.47 samples/sec Loss 3.8409 LearningRate 0.0121 Epoch: 13 Global Step: 65900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:36,426-Speed 5527.71 samples/sec Loss 3.9080 LearningRate 0.0121 Epoch: 13 Global Step: 65910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:38,299-Speed 5468.01 samples/sec Loss 3.8098 LearningRate 0.0121 Epoch: 13 Global Step: 65920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:40,146-Speed 5545.54 samples/sec Loss 3.9566 LearningRate 0.0121 Epoch: 13 Global Step: 65930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:42,047-Speed 5390.06 samples/sec Loss 3.8100 LearningRate 0.0121 Epoch: 13 Global Step: 65940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:43,883-Speed 5579.19 samples/sec Loss 3.8712 LearningRate 0.0121 Epoch: 13 Global Step: 65950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:45,726-Speed 5558.38 samples/sec Loss 3.9478 LearningRate 0.0121 Epoch: 13 Global Step: 65960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:47,596-Speed 5479.54 samples/sec Loss 3.7663 LearningRate 0.0121 Epoch: 13 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:49,487-Speed 5416.39 samples/sec Loss 3.9307 LearningRate 0.0121 Epoch: 13 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:29:51,326-Speed 5571.21 samples/sec Loss 3.8977 LearningRate 0.0121 Epoch: 13 Global Step: 65990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:29:53,219-Speed 5414.30 samples/sec Loss 3.9059 LearningRate 0.0121 Epoch: 13 Global Step: 66000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:30:20,604-[lfw][66000]XNorm: 21.880126 Training: 2022-04-11 14:30:20,605-[lfw][66000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 14:30:20,605-[lfw][66000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:30:52,172-[cfp_fp][66000]XNorm: 19.674214 Training: 2022-04-11 14:30:52,174-[cfp_fp][66000]Accuracy-Flip: 0.97771+-0.00698 Training: 2022-04-11 14:30:52,174-[cfp_fp][66000]Accuracy-Highest: 0.97771 Training: 2022-04-11 14:31:19,344-[agedb_30][66000]XNorm: 21.918721 Training: 2022-04-11 14:31:19,345-[agedb_30][66000]Accuracy-Flip: 0.97917+-0.00704 Training: 2022-04-11 14:31:19,346-[agedb_30][66000]Accuracy-Highest: 0.97917 Training: 2022-04-11 14:31:21,246-Speed 116.33 samples/sec Loss 3.8155 LearningRate 0.0121 Epoch: 13 Global Step: 66010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:23,096-Speed 5539.60 samples/sec Loss 3.8461 LearningRate 0.0121 Epoch: 13 Global Step: 66020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:24,957-Speed 5504.83 samples/sec Loss 3.8517 LearningRate 0.0121 Epoch: 13 Global Step: 66030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:26,797-Speed 5570.66 samples/sec Loss 3.9264 LearningRate 0.0121 Epoch: 13 Global Step: 66040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:28,665-Speed 5484.05 samples/sec Loss 3.9725 LearningRate 0.0120 Epoch: 13 Global Step: 66050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:30,569-Speed 5386.27 samples/sec Loss 3.8912 LearningRate 0.0120 Epoch: 13 Global Step: 66060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:32,422-Speed 5528.55 samples/sec Loss 3.7526 LearningRate 0.0120 Epoch: 13 Global Step: 66070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:34,281-Speed 5511.26 samples/sec Loss 3.9204 LearningRate 0.0120 Epoch: 13 Global Step: 66080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:36,122-Speed 5563.95 samples/sec Loss 3.7896 LearningRate 0.0120 Epoch: 13 Global Step: 66090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:38,043-Speed 5334.98 samples/sec Loss 4.0096 LearningRate 0.0120 Epoch: 13 Global Step: 66100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:39,880-Speed 5577.04 samples/sec Loss 3.8844 LearningRate 0.0120 Epoch: 13 Global Step: 66110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:31:41,775-Speed 5406.28 samples/sec Loss 4.1297 LearningRate 0.0120 Epoch: 13 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:43,610-Speed 5581.87 samples/sec Loss 3.9182 LearningRate 0.0120 Epoch: 13 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:45,515-Speed 5380.02 samples/sec Loss 3.9638 LearningRate 0.0120 Epoch: 13 Global Step: 66140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:47,358-Speed 5557.72 samples/sec Loss 3.8326 LearningRate 0.0120 Epoch: 13 Global Step: 66150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:49,216-Speed 5514.82 samples/sec Loss 3.9170 LearningRate 0.0120 Epoch: 13 Global Step: 66160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:51,067-Speed 5536.76 samples/sec Loss 4.0140 LearningRate 0.0120 Epoch: 13 Global Step: 66170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:52,929-Speed 5501.19 samples/sec Loss 3.8904 LearningRate 0.0120 Epoch: 13 Global Step: 66180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:54,793-Speed 5499.00 samples/sec Loss 3.9290 LearningRate 0.0120 Epoch: 13 Global Step: 66190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:56,653-Speed 5508.29 samples/sec Loss 4.0331 LearningRate 0.0119 Epoch: 13 Global Step: 66200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:31:58,513-Speed 5510.28 samples/sec Loss 3.9999 LearningRate 0.0119 Epoch: 13 Global Step: 66210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:00,370-Speed 5519.21 samples/sec Loss 3.7697 LearningRate 0.0119 Epoch: 13 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:02,265-Speed 5404.17 samples/sec Loss 4.0090 LearningRate 0.0119 Epoch: 13 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:04,120-Speed 5523.98 samples/sec Loss 3.9316 LearningRate 0.0119 Epoch: 13 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:05,970-Speed 5541.22 samples/sec Loss 3.8835 LearningRate 0.0119 Epoch: 13 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:07,842-Speed 5473.52 samples/sec Loss 3.8897 LearningRate 0.0119 Epoch: 13 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:09,693-Speed 5535.64 samples/sec Loss 4.0725 LearningRate 0.0119 Epoch: 13 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:11,549-Speed 5518.21 samples/sec Loss 4.0323 LearningRate 0.0119 Epoch: 13 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:13,433-Speed 5438.99 samples/sec Loss 3.9105 LearningRate 0.0119 Epoch: 13 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:15,350-Speed 5346.07 samples/sec Loss 3.9492 LearningRate 0.0119 Epoch: 13 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:17,230-Speed 5448.63 samples/sec Loss 4.0488 LearningRate 0.0119 Epoch: 13 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:19,079-Speed 5540.62 samples/sec Loss 3.9379 LearningRate 0.0119 Epoch: 13 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:20,950-Speed 5475.48 samples/sec Loss 4.0670 LearningRate 0.0119 Epoch: 13 Global Step: 66330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:22,821-Speed 5476.21 samples/sec Loss 3.9474 LearningRate 0.0118 Epoch: 13 Global Step: 66340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:24,702-Speed 5446.59 samples/sec Loss 3.9496 LearningRate 0.0118 Epoch: 13 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:26,561-Speed 5513.03 samples/sec Loss 4.0355 LearningRate 0.0118 Epoch: 13 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:28,415-Speed 5526.19 samples/sec Loss 4.0609 LearningRate 0.0118 Epoch: 13 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:30,280-Speed 5496.19 samples/sec Loss 4.0765 LearningRate 0.0118 Epoch: 13 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:32,123-Speed 5557.39 samples/sec Loss 4.0659 LearningRate 0.0118 Epoch: 13 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:33,986-Speed 5499.77 samples/sec Loss 4.0594 LearningRate 0.0118 Epoch: 13 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:32:35,828-Speed 5564.50 samples/sec Loss 3.9908 LearningRate 0.0118 Epoch: 13 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:37,708-Speed 5448.18 samples/sec Loss 4.0227 LearningRate 0.0118 Epoch: 13 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:39,561-Speed 5529.88 samples/sec Loss 3.8873 LearningRate 0.0118 Epoch: 13 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:41,403-Speed 5562.85 samples/sec Loss 4.0360 LearningRate 0.0118 Epoch: 13 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:43,241-Speed 5574.92 samples/sec Loss 4.0039 LearningRate 0.0118 Epoch: 13 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:45,078-Speed 5574.17 samples/sec Loss 4.0821 LearningRate 0.0118 Epoch: 13 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:46,947-Speed 5481.92 samples/sec Loss 4.0201 LearningRate 0.0118 Epoch: 13 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:48,826-Speed 5454.33 samples/sec Loss 4.1068 LearningRate 0.0118 Epoch: 13 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:50,676-Speed 5537.50 samples/sec Loss 4.0018 LearningRate 0.0117 Epoch: 13 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:52,554-Speed 5455.98 samples/sec Loss 3.9654 LearningRate 0.0117 Epoch: 13 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:54,403-Speed 5544.27 samples/sec Loss 4.0713 LearningRate 0.0117 Epoch: 13 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:56,271-Speed 5484.16 samples/sec Loss 4.0856 LearningRate 0.0117 Epoch: 13 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:32:58,178-Speed 5373.00 samples/sec Loss 3.9740 LearningRate 0.0117 Epoch: 13 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:00,070-Speed 5416.82 samples/sec Loss 3.9570 LearningRate 0.0117 Epoch: 13 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:01,997-Speed 5317.72 samples/sec Loss 4.0148 LearningRate 0.0117 Epoch: 13 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:03,868-Speed 5475.31 samples/sec Loss 4.0635 LearningRate 0.0117 Epoch: 13 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:05,732-Speed 5497.47 samples/sec Loss 4.1256 LearningRate 0.0117 Epoch: 13 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:07,577-Speed 5553.04 samples/sec Loss 4.0284 LearningRate 0.0117 Epoch: 13 Global Step: 66580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:09,443-Speed 5490.68 samples/sec Loss 3.9673 LearningRate 0.0117 Epoch: 13 Global Step: 66590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:11,345-Speed 5389.15 samples/sec Loss 4.1801 LearningRate 0.0117 Epoch: 13 Global Step: 66600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:13,221-Speed 5464.00 samples/sec Loss 4.1901 LearningRate 0.0117 Epoch: 13 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:15,089-Speed 5485.23 samples/sec Loss 4.1116 LearningRate 0.0117 Epoch: 13 Global Step: 66620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:16,949-Speed 5509.93 samples/sec Loss 4.1359 LearningRate 0.0117 Epoch: 13 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:18,791-Speed 5561.55 samples/sec Loss 4.1914 LearningRate 0.0116 Epoch: 13 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:20,687-Speed 5405.78 samples/sec Loss 4.0352 LearningRate 0.0116 Epoch: 13 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:22,556-Speed 5480.56 samples/sec Loss 4.0326 LearningRate 0.0116 Epoch: 13 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:24,400-Speed 5559.42 samples/sec Loss 3.8869 LearningRate 0.0116 Epoch: 13 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:26,259-Speed 5512.07 samples/sec Loss 4.0117 LearningRate 0.0116 Epoch: 13 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:28,148-Speed 5423.72 samples/sec Loss 4.0926 LearningRate 0.0116 Epoch: 13 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:30,053-Speed 5380.82 samples/sec Loss 4.0472 LearningRate 0.0116 Epoch: 13 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:31,901-Speed 5543.06 samples/sec Loss 4.2154 LearningRate 0.0116 Epoch: 13 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:33,755-Speed 5525.90 samples/sec Loss 4.1350 LearningRate 0.0116 Epoch: 13 Global Step: 66720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:35,593-Speed 5575.73 samples/sec Loss 4.0787 LearningRate 0.0116 Epoch: 13 Global Step: 66730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:37,436-Speed 5557.63 samples/sec Loss 4.0438 LearningRate 0.0116 Epoch: 13 Global Step: 66740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:39,328-Speed 5413.89 samples/sec Loss 4.0771 LearningRate 0.0116 Epoch: 13 Global Step: 66750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:41,168-Speed 5571.51 samples/sec Loss 4.1115 LearningRate 0.0116 Epoch: 13 Global Step: 66760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:43,015-Speed 5546.18 samples/sec Loss 4.0593 LearningRate 0.0116 Epoch: 13 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:44,909-Speed 5409.31 samples/sec Loss 4.0115 LearningRate 0.0116 Epoch: 13 Global Step: 66780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:46,751-Speed 5561.73 samples/sec Loss 4.1227 LearningRate 0.0115 Epoch: 13 Global Step: 66790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:48,628-Speed 5459.39 samples/sec Loss 4.1141 LearningRate 0.0115 Epoch: 13 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:50,478-Speed 5539.81 samples/sec Loss 4.1522 LearningRate 0.0115 Epoch: 13 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:33:52,350-Speed 5472.60 samples/sec Loss 4.1459 LearningRate 0.0115 Epoch: 13 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:54,257-Speed 5370.78 samples/sec Loss 4.0624 LearningRate 0.0115 Epoch: 13 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:56,135-Speed 5460.37 samples/sec Loss 4.0983 LearningRate 0.0115 Epoch: 13 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:58,023-Speed 5426.97 samples/sec Loss 4.2030 LearningRate 0.0115 Epoch: 13 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:33:59,915-Speed 5417.18 samples/sec Loss 4.1392 LearningRate 0.0115 Epoch: 13 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:01,768-Speed 5526.93 samples/sec Loss 4.1673 LearningRate 0.0115 Epoch: 13 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:03,658-Speed 5424.12 samples/sec Loss 4.0805 LearningRate 0.0115 Epoch: 13 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:05,578-Speed 5336.31 samples/sec Loss 4.1179 LearningRate 0.0115 Epoch: 13 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:07,434-Speed 5521.12 samples/sec Loss 4.0957 LearningRate 0.0115 Epoch: 13 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:09,317-Speed 5440.35 samples/sec Loss 3.9951 LearningRate 0.0115 Epoch: 13 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:11,131-Speed 5649.93 samples/sec Loss 4.2436 LearningRate 0.0115 Epoch: 13 Global Step: 66920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:12,975-Speed 5556.44 samples/sec Loss 4.1368 LearningRate 0.0114 Epoch: 13 Global Step: 66930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:14,837-Speed 5502.45 samples/sec Loss 4.2616 LearningRate 0.0114 Epoch: 13 Global Step: 66940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:16,727-Speed 5421.17 samples/sec Loss 4.1127 LearningRate 0.0114 Epoch: 13 Global Step: 66950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:18,593-Speed 5491.28 samples/sec Loss 4.0774 LearningRate 0.0114 Epoch: 13 Global Step: 66960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:20,496-Speed 5384.54 samples/sec Loss 4.0076 LearningRate 0.0114 Epoch: 13 Global Step: 66970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:22,337-Speed 5565.34 samples/sec Loss 4.1918 LearningRate 0.0114 Epoch: 13 Global Step: 66980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:24,179-Speed 5561.52 samples/sec Loss 4.2360 LearningRate 0.0114 Epoch: 13 Global Step: 66990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:26,055-Speed 5462.89 samples/sec Loss 4.0671 LearningRate 0.0114 Epoch: 13 Global Step: 67000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:27,907-Speed 5531.68 samples/sec Loss 4.0806 LearningRate 0.0114 Epoch: 13 Global Step: 67010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:34:29,776-Speed 5485.04 samples/sec Loss 4.1770 LearningRate 0.0114 Epoch: 13 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:31,619-Speed 5558.57 samples/sec Loss 4.0944 LearningRate 0.0114 Epoch: 13 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:33,492-Speed 5469.74 samples/sec Loss 4.1016 LearningRate 0.0114 Epoch: 13 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:35,341-Speed 5542.86 samples/sec Loss 4.1466 LearningRate 0.0114 Epoch: 13 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:37,243-Speed 5386.28 samples/sec Loss 4.0943 LearningRate 0.0114 Epoch: 13 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:39,084-Speed 5563.47 samples/sec Loss 4.1521 LearningRate 0.0114 Epoch: 13 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:40,975-Speed 5417.99 samples/sec Loss 4.1380 LearningRate 0.0113 Epoch: 13 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:42,819-Speed 5556.98 samples/sec Loss 4.3175 LearningRate 0.0113 Epoch: 13 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:44,689-Speed 5478.00 samples/sec Loss 4.2292 LearningRate 0.0113 Epoch: 13 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:46,548-Speed 5511.51 samples/sec Loss 4.1569 LearningRate 0.0113 Epoch: 13 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:34:48,428-Speed 5451.14 samples/sec Loss 4.0476 LearningRate 0.0113 Epoch: 13 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:50,268-Speed 5569.65 samples/sec Loss 4.1525 LearningRate 0.0113 Epoch: 13 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:52,120-Speed 5533.03 samples/sec Loss 4.0515 LearningRate 0.0113 Epoch: 13 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:53,982-Speed 5500.37 samples/sec Loss 4.1247 LearningRate 0.0113 Epoch: 13 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:55,825-Speed 5561.06 samples/sec Loss 4.1346 LearningRate 0.0113 Epoch: 13 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:57,707-Speed 5444.08 samples/sec Loss 4.2185 LearningRate 0.0113 Epoch: 13 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:34:59,550-Speed 5562.16 samples/sec Loss 4.0696 LearningRate 0.0113 Epoch: 13 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:01,424-Speed 5465.10 samples/sec Loss 4.2493 LearningRate 0.0113 Epoch: 13 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:03,304-Speed 5449.41 samples/sec Loss 4.1929 LearningRate 0.0113 Epoch: 13 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:05,144-Speed 5568.81 samples/sec Loss 4.2452 LearningRate 0.0113 Epoch: 13 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:07,004-Speed 5508.42 samples/sec Loss 4.1278 LearningRate 0.0113 Epoch: 13 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:08,859-Speed 5522.60 samples/sec Loss 4.0745 LearningRate 0.0112 Epoch: 13 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:10,738-Speed 5455.95 samples/sec Loss 4.1756 LearningRate 0.0112 Epoch: 13 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:12,611-Speed 5467.43 samples/sec Loss 4.0360 LearningRate 0.0112 Epoch: 13 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:14,494-Speed 5443.24 samples/sec Loss 4.1258 LearningRate 0.0112 Epoch: 13 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:16,350-Speed 5519.59 samples/sec Loss 4.1791 LearningRate 0.0112 Epoch: 13 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:18,208-Speed 5516.62 samples/sec Loss 4.0995 LearningRate 0.0112 Epoch: 13 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:20,053-Speed 5554.05 samples/sec Loss 4.0307 LearningRate 0.0112 Epoch: 13 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:21,936-Speed 5441.53 samples/sec Loss 4.1069 LearningRate 0.0112 Epoch: 13 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:23,784-Speed 5542.69 samples/sec Loss 4.2484 LearningRate 0.0112 Epoch: 13 Global Step: 67310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:25,639-Speed 5526.14 samples/sec Loss 4.1866 LearningRate 0.0112 Epoch: 13 Global Step: 67320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:27,498-Speed 5510.01 samples/sec Loss 4.2994 LearningRate 0.0112 Epoch: 13 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:29,398-Speed 5392.68 samples/sec Loss 4.2570 LearningRate 0.0112 Epoch: 13 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:31,287-Speed 5424.02 samples/sec Loss 4.1705 LearningRate 0.0112 Epoch: 13 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:33,130-Speed 5560.03 samples/sec Loss 4.0962 LearningRate 0.0112 Epoch: 13 Global Step: 67360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:34,997-Speed 5487.89 samples/sec Loss 4.2767 LearningRate 0.0112 Epoch: 13 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:36,881-Speed 5438.89 samples/sec Loss 4.1611 LearningRate 0.0112 Epoch: 13 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:38,748-Speed 5486.43 samples/sec Loss 4.2119 LearningRate 0.0111 Epoch: 13 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:40,597-Speed 5545.67 samples/sec Loss 4.2426 LearningRate 0.0111 Epoch: 13 Global Step: 67400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:42,480-Speed 5440.35 samples/sec Loss 4.1740 LearningRate 0.0111 Epoch: 13 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:44,321-Speed 5567.33 samples/sec Loss 4.2069 LearningRate 0.0111 Epoch: 13 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:46,166-Speed 5554.25 samples/sec Loss 4.1791 LearningRate 0.0111 Epoch: 13 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:48,044-Speed 5453.45 samples/sec Loss 4.1526 LearningRate 0.0111 Epoch: 13 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:49,925-Speed 5448.95 samples/sec Loss 4.2264 LearningRate 0.0111 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:35:51,801-Speed 5461.09 samples/sec Loss 4.1289 LearningRate 0.0111 Epoch: 13 Global Step: 67460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:53,676-Speed 5464.25 samples/sec Loss 4.3987 LearningRate 0.0111 Epoch: 13 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:55,523-Speed 5546.68 samples/sec Loss 4.2028 LearningRate 0.0111 Epoch: 13 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:57,366-Speed 5559.38 samples/sec Loss 4.1519 LearningRate 0.0111 Epoch: 13 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:35:59,238-Speed 5473.13 samples/sec Loss 4.3414 LearningRate 0.0111 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:01,075-Speed 5577.67 samples/sec Loss 4.1051 LearningRate 0.0111 Epoch: 13 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:02,935-Speed 5507.31 samples/sec Loss 4.2515 LearningRate 0.0111 Epoch: 13 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:04,797-Speed 5502.57 samples/sec Loss 4.1574 LearningRate 0.0111 Epoch: 13 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:06,640-Speed 5558.61 samples/sec Loss 4.1359 LearningRate 0.0110 Epoch: 13 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:08,496-Speed 5520.32 samples/sec Loss 4.1381 LearningRate 0.0110 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:10,372-Speed 5464.42 samples/sec Loss 4.3001 LearningRate 0.0110 Epoch: 13 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:12,246-Speed 5466.33 samples/sec Loss 4.1859 LearningRate 0.0110 Epoch: 13 Global Step: 67570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:14,103-Speed 5516.61 samples/sec Loss 4.1774 LearningRate 0.0110 Epoch: 13 Global Step: 67580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:15,967-Speed 5497.77 samples/sec Loss 4.1687 LearningRate 0.0110 Epoch: 13 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:17,832-Speed 5492.61 samples/sec Loss 4.2918 LearningRate 0.0110 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:19,674-Speed 5563.93 samples/sec Loss 4.1992 LearningRate 0.0110 Epoch: 13 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:21,515-Speed 5563.71 samples/sec Loss 4.1210 LearningRate 0.0110 Epoch: 13 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:23,391-Speed 5462.15 samples/sec Loss 4.1382 LearningRate 0.0110 Epoch: 13 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:25,310-Speed 5339.04 samples/sec Loss 4.3432 LearningRate 0.0110 Epoch: 13 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:27,160-Speed 5540.22 samples/sec Loss 4.3349 LearningRate 0.0110 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:29,014-Speed 5523.90 samples/sec Loss 4.2466 LearningRate 0.0110 Epoch: 13 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:30,852-Speed 5575.72 samples/sec Loss 4.1079 LearningRate 0.0110 Epoch: 13 Global Step: 67670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:32,747-Speed 5407.19 samples/sec Loss 4.2628 LearningRate 0.0110 Epoch: 13 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:34,614-Speed 5486.63 samples/sec Loss 4.2787 LearningRate 0.0109 Epoch: 13 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:36,484-Speed 5481.54 samples/sec Loss 4.3350 LearningRate 0.0109 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:38,327-Speed 5558.89 samples/sec Loss 4.2180 LearningRate 0.0109 Epoch: 13 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:40,187-Speed 5509.57 samples/sec Loss 4.2091 LearningRate 0.0109 Epoch: 13 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:42,030-Speed 5557.95 samples/sec Loss 4.1618 LearningRate 0.0109 Epoch: 13 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:43,886-Speed 5519.82 samples/sec Loss 4.3754 LearningRate 0.0109 Epoch: 13 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:45,767-Speed 5448.77 samples/sec Loss 4.2696 LearningRate 0.0109 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:47,615-Speed 5545.86 samples/sec Loss 4.0752 LearningRate 0.0109 Epoch: 13 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:49,485-Speed 5477.61 samples/sec Loss 4.2247 LearningRate 0.0109 Epoch: 13 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:51,374-Speed 5423.82 samples/sec Loss 4.1913 LearningRate 0.0109 Epoch: 13 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:36:53,256-Speed 5446.11 samples/sec Loss 4.1642 LearningRate 0.0109 Epoch: 13 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:55,097-Speed 5566.21 samples/sec Loss 4.0879 LearningRate 0.0109 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:56,956-Speed 5511.97 samples/sec Loss 4.2527 LearningRate 0.0109 Epoch: 13 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:36:58,854-Speed 5398.90 samples/sec Loss 4.1304 LearningRate 0.0109 Epoch: 13 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:00,724-Speed 5479.28 samples/sec Loss 4.2438 LearningRate 0.0109 Epoch: 13 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:02,582-Speed 5512.19 samples/sec Loss 4.0108 LearningRate 0.0108 Epoch: 13 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:04,447-Speed 5495.16 samples/sec Loss 4.1848 LearningRate 0.0108 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:06,302-Speed 5526.87 samples/sec Loss 4.1909 LearningRate 0.0108 Epoch: 13 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:08,180-Speed 5455.15 samples/sec Loss 4.2387 LearningRate 0.0108 Epoch: 13 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:10,018-Speed 5575.49 samples/sec Loss 4.1728 LearningRate 0.0108 Epoch: 13 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:11,871-Speed 5526.01 samples/sec Loss 4.2741 LearningRate 0.0108 Epoch: 13 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:37:13,761-Speed 5422.62 samples/sec Loss 4.1842 LearningRate 0.0108 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:37:15,653-Speed 5419.28 samples/sec Loss 4.1553 LearningRate 0.0108 Epoch: 13 Global Step: 67910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:17,529-Speed 5461.27 samples/sec Loss 4.3066 LearningRate 0.0108 Epoch: 13 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:19,419-Speed 5420.00 samples/sec Loss 4.1912 LearningRate 0.0108 Epoch: 13 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:21,261-Speed 5564.35 samples/sec Loss 4.4101 LearningRate 0.0108 Epoch: 13 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:23,144-Speed 5440.27 samples/sec Loss 4.3714 LearningRate 0.0108 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:24,993-Speed 5542.02 samples/sec Loss 4.1427 LearningRate 0.0108 Epoch: 13 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:26,881-Speed 5427.68 samples/sec Loss 4.1337 LearningRate 0.0108 Epoch: 13 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:28,729-Speed 5544.91 samples/sec Loss 4.3434 LearningRate 0.0108 Epoch: 13 Global Step: 67980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:30,595-Speed 5492.47 samples/sec Loss 4.2041 LearningRate 0.0108 Epoch: 13 Global Step: 67990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:32,469-Speed 5467.34 samples/sec Loss 4.2030 LearningRate 0.0107 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:37:59,885-[lfw][68000]XNorm: 21.546903 Training: 2022-04-11 14:37:59,886-[lfw][68000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 14:37:59,886-[lfw][68000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:38:31,289-[cfp_fp][68000]XNorm: 19.608410 Training: 2022-04-11 14:38:31,290-[cfp_fp][68000]Accuracy-Flip: 0.97886+-0.00820 Training: 2022-04-11 14:38:31,291-[cfp_fp][68000]Accuracy-Highest: 0.97886 Training: 2022-04-11 14:38:58,319-[agedb_30][68000]XNorm: 21.858776 Training: 2022-04-11 14:38:58,320-[agedb_30][68000]Accuracy-Flip: 0.97867+-0.00718 Training: 2022-04-11 14:38:58,321-[agedb_30][68000]Accuracy-Highest: 0.97917 Training: 2022-04-11 14:39:00,207-Speed 116.71 samples/sec Loss 4.2049 LearningRate 0.0107 Epoch: 13 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:02,086-Speed 5450.62 samples/sec Loss 4.2409 LearningRate 0.0107 Epoch: 13 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:03,981-Speed 5408.95 samples/sec Loss 4.0797 LearningRate 0.0107 Epoch: 13 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:05,849-Speed 5486.36 samples/sec Loss 4.2234 LearningRate 0.0107 Epoch: 13 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:07,737-Speed 5427.24 samples/sec Loss 4.3585 LearningRate 0.0107 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:09,585-Speed 5546.49 samples/sec Loss 4.4099 LearningRate 0.0107 Epoch: 13 Global Step: 68060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:11,449-Speed 5495.81 samples/sec Loss 4.2501 LearningRate 0.0107 Epoch: 13 Global Step: 68070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:13,326-Speed 5461.50 samples/sec Loss 4.2014 LearningRate 0.0107 Epoch: 13 Global Step: 68080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:15,180-Speed 5525.14 samples/sec Loss 4.2604 LearningRate 0.0107 Epoch: 13 Global Step: 68090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:17,074-Speed 5411.84 samples/sec Loss 4.1760 LearningRate 0.0107 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:18,904-Speed 5597.93 samples/sec Loss 4.1880 LearningRate 0.0107 Epoch: 13 Global Step: 68110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:20,741-Speed 5574.75 samples/sec Loss 4.2958 LearningRate 0.0107 Epoch: 13 Global Step: 68120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:22,621-Speed 5452.21 samples/sec Loss 4.1629 LearningRate 0.0107 Epoch: 13 Global Step: 68130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:24,497-Speed 5458.94 samples/sec Loss 4.2192 LearningRate 0.0107 Epoch: 13 Global Step: 68140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:26,326-Speed 5603.82 samples/sec Loss 4.1989 LearningRate 0.0106 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:28,222-Speed 5404.70 samples/sec Loss 4.1857 LearningRate 0.0106 Epoch: 13 Global Step: 68160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:30,075-Speed 5527.84 samples/sec Loss 4.2126 LearningRate 0.0106 Epoch: 13 Global Step: 68170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:31,944-Speed 5482.56 samples/sec Loss 4.3220 LearningRate 0.0106 Epoch: 13 Global Step: 68180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:33,804-Speed 5509.85 samples/sec Loss 4.2084 LearningRate 0.0106 Epoch: 13 Global Step: 68190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:35,672-Speed 5486.15 samples/sec Loss 4.2133 LearningRate 0.0106 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:37,516-Speed 5553.66 samples/sec Loss 4.0206 LearningRate 0.0106 Epoch: 13 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:39,373-Speed 5518.96 samples/sec Loss 4.1756 LearningRate 0.0106 Epoch: 13 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:41,258-Speed 5435.33 samples/sec Loss 4.1807 LearningRate 0.0106 Epoch: 13 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:43,103-Speed 5554.02 samples/sec Loss 4.2309 LearningRate 0.0106 Epoch: 13 Global Step: 68240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:44,962-Speed 5511.82 samples/sec Loss 4.1488 LearningRate 0.0106 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:46,808-Speed 5548.64 samples/sec Loss 4.1832 LearningRate 0.0106 Epoch: 13 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:48,693-Speed 5438.38 samples/sec Loss 4.3255 LearningRate 0.0106 Epoch: 13 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:50,573-Speed 5448.72 samples/sec Loss 4.0900 LearningRate 0.0106 Epoch: 13 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:52,456-Speed 5440.81 samples/sec Loss 4.2798 LearningRate 0.0106 Epoch: 13 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:39:54,287-Speed 5596.26 samples/sec Loss 4.2361 LearningRate 0.0106 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:56,157-Speed 5479.53 samples/sec Loss 4.2361 LearningRate 0.0105 Epoch: 13 Global Step: 68310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:58,002-Speed 5555.22 samples/sec Loss 4.3030 LearningRate 0.0105 Epoch: 13 Global Step: 68320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:39:59,836-Speed 5584.94 samples/sec Loss 4.1454 LearningRate 0.0105 Epoch: 13 Global Step: 68330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:01,722-Speed 5431.19 samples/sec Loss 4.2533 LearningRate 0.0105 Epoch: 13 Global Step: 68340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:03,584-Speed 5505.41 samples/sec Loss 4.1852 LearningRate 0.0105 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:05,696-Speed 4850.19 samples/sec Loss 4.2310 LearningRate 0.0105 Epoch: 13 Global Step: 68360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:07,541-Speed 5552.20 samples/sec Loss 4.2268 LearningRate 0.0105 Epoch: 13 Global Step: 68370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:10,610-Speed 3337.57 samples/sec Loss 4.2220 LearningRate 0.0105 Epoch: 13 Global Step: 68380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:12,477-Speed 5488.29 samples/sec Loss 4.2140 LearningRate 0.0105 Epoch: 13 Global Step: 68390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:14,350-Speed 5472.68 samples/sec Loss 4.2933 LearningRate 0.0105 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:40:16,239-Speed 5422.76 samples/sec Loss 4.0653 LearningRate 0.0105 Epoch: 13 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:40:18,114-Speed 5465.97 samples/sec Loss 4.1842 LearningRate 0.0105 Epoch: 13 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:40:19,965-Speed 5536.60 samples/sec Loss 4.1757 LearningRate 0.0105 Epoch: 13 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:40:21,815-Speed 5537.59 samples/sec Loss 4.2046 LearningRate 0.0105 Epoch: 13 Global Step: 68440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:23,718-Speed 5385.33 samples/sec Loss 4.2201 LearningRate 0.0105 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:25,588-Speed 5480.29 samples/sec Loss 4.3180 LearningRate 0.0104 Epoch: 13 Global Step: 68460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:27,468-Speed 5450.77 samples/sec Loss 4.1838 LearningRate 0.0104 Epoch: 13 Global Step: 68470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:29,321-Speed 5529.01 samples/sec Loss 4.0703 LearningRate 0.0104 Epoch: 13 Global Step: 68480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:31,180-Speed 5509.40 samples/sec Loss 4.1207 LearningRate 0.0104 Epoch: 13 Global Step: 68490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:33,032-Speed 5533.15 samples/sec Loss 4.1849 LearningRate 0.0104 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:34,878-Speed 5549.71 samples/sec Loss 4.1536 LearningRate 0.0104 Epoch: 13 Global Step: 68510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:36,845-Speed 5210.19 samples/sec Loss 4.2071 LearningRate 0.0104 Epoch: 13 Global Step: 68520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:38,759-Speed 5354.49 samples/sec Loss 4.0894 LearningRate 0.0104 Epoch: 13 Global Step: 68530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:40,751-Speed 5142.98 samples/sec Loss 4.0119 LearningRate 0.0104 Epoch: 13 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:40:42,612-Speed 5504.25 samples/sec Loss 4.2303 LearningRate 0.0104 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:40:44,482-Speed 5481.52 samples/sec Loss 4.1464 LearningRate 0.0104 Epoch: 13 Global Step: 68560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:46,341-Speed 5511.74 samples/sec Loss 4.2595 LearningRate 0.0104 Epoch: 13 Global Step: 68570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:48,221-Speed 5450.88 samples/sec Loss 4.1859 LearningRate 0.0104 Epoch: 13 Global Step: 68580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:50,076-Speed 5523.28 samples/sec Loss 4.2394 LearningRate 0.0104 Epoch: 13 Global Step: 68590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:51,977-Speed 5390.01 samples/sec Loss 4.1553 LearningRate 0.0104 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:53,832-Speed 5524.42 samples/sec Loss 4.2568 LearningRate 0.0104 Epoch: 13 Global Step: 68610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:55,676-Speed 5553.96 samples/sec Loss 4.2299 LearningRate 0.0103 Epoch: 13 Global Step: 68620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:57,582-Speed 5375.79 samples/sec Loss 4.2097 LearningRate 0.0103 Epoch: 13 Global Step: 68630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:40:59,451-Speed 5481.92 samples/sec Loss 4.2560 LearningRate 0.0103 Epoch: 13 Global Step: 68640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:01,347-Speed 5402.99 samples/sec Loss 4.2672 LearningRate 0.0103 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:03,199-Speed 5530.72 samples/sec Loss 4.1793 LearningRate 0.0103 Epoch: 13 Global Step: 68660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:05,073-Speed 5465.77 samples/sec Loss 4.4287 LearningRate 0.0103 Epoch: 13 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:06,938-Speed 5492.68 samples/sec Loss 4.3255 LearningRate 0.0103 Epoch: 13 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:08,793-Speed 5525.87 samples/sec Loss 4.2230 LearningRate 0.0103 Epoch: 13 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:10,666-Speed 5467.58 samples/sec Loss 4.2866 LearningRate 0.0103 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:12,541-Speed 5462.00 samples/sec Loss 4.2665 LearningRate 0.0103 Epoch: 13 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:14,418-Speed 5458.30 samples/sec Loss 4.3526 LearningRate 0.0103 Epoch: 13 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:16,282-Speed 5495.97 samples/sec Loss 4.1213 LearningRate 0.0103 Epoch: 13 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:18,129-Speed 5545.14 samples/sec Loss 4.1793 LearningRate 0.0103 Epoch: 13 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:19,989-Speed 5509.21 samples/sec Loss 4.1601 LearningRate 0.0103 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:21,831-Speed 5560.15 samples/sec Loss 4.3055 LearningRate 0.0103 Epoch: 13 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:41:23,688-Speed 5516.99 samples/sec Loss 4.2166 LearningRate 0.0103 Epoch: 13 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:41:25,543-Speed 5524.00 samples/sec Loss 4.2072 LearningRate 0.0102 Epoch: 13 Global Step: 68780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:27,392-Speed 5539.95 samples/sec Loss 4.1311 LearningRate 0.0102 Epoch: 13 Global Step: 68790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:29,242-Speed 5535.52 samples/sec Loss 4.1757 LearningRate 0.0102 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:31,104-Speed 5503.04 samples/sec Loss 4.3196 LearningRate 0.0102 Epoch: 13 Global Step: 68810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:32,959-Speed 5520.10 samples/sec Loss 4.2137 LearningRate 0.0102 Epoch: 13 Global Step: 68820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:34,798-Speed 5570.11 samples/sec Loss 4.2094 LearningRate 0.0102 Epoch: 13 Global Step: 68830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:36,663-Speed 5492.99 samples/sec Loss 4.1799 LearningRate 0.0102 Epoch: 13 Global Step: 68840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:38,510-Speed 5547.54 samples/sec Loss 4.2154 LearningRate 0.0102 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:40,392-Speed 5442.26 samples/sec Loss 4.1803 LearningRate 0.0102 Epoch: 13 Global Step: 68860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:42,241-Speed 5539.10 samples/sec Loss 4.2758 LearningRate 0.0102 Epoch: 13 Global Step: 68870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:44,085-Speed 5558.40 samples/sec Loss 4.1931 LearningRate 0.0102 Epoch: 13 Global Step: 68880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:45,933-Speed 5542.08 samples/sec Loss 4.1708 LearningRate 0.0102 Epoch: 13 Global Step: 68890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:47,811-Speed 5455.32 samples/sec Loss 4.2109 LearningRate 0.0102 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:41:49,759-Speed 5258.28 samples/sec Loss 4.2044 LearningRate 0.0102 Epoch: 13 Global Step: 68910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:51,614-Speed 5521.01 samples/sec Loss 4.2170 LearningRate 0.0102 Epoch: 13 Global Step: 68920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:53,463-Speed 5543.59 samples/sec Loss 4.2097 LearningRate 0.0102 Epoch: 13 Global Step: 68930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:55,311-Speed 5543.31 samples/sec Loss 4.2076 LearningRate 0.0101 Epoch: 13 Global Step: 68940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:57,158-Speed 5546.40 samples/sec Loss 4.2144 LearningRate 0.0101 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:41:59,009-Speed 5533.17 samples/sec Loss 4.3838 LearningRate 0.0101 Epoch: 13 Global Step: 68960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:00,884-Speed 5463.06 samples/sec Loss 4.1461 LearningRate 0.0101 Epoch: 13 Global Step: 68970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:02,752-Speed 5483.62 samples/sec Loss 4.2820 LearningRate 0.0101 Epoch: 13 Global Step: 68980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:04,643-Speed 5420.24 samples/sec Loss 4.3020 LearningRate 0.0101 Epoch: 13 Global Step: 68990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:06,493-Speed 5537.20 samples/sec Loss 4.2194 LearningRate 0.0101 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:08,338-Speed 5551.57 samples/sec Loss 4.2413 LearningRate 0.0101 Epoch: 13 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:10,197-Speed 5511.33 samples/sec Loss 4.2648 LearningRate 0.0101 Epoch: 13 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:12,055-Speed 5512.81 samples/sec Loss 4.2923 LearningRate 0.0101 Epoch: 13 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:13,990-Speed 5293.42 samples/sec Loss 4.3378 LearningRate 0.0101 Epoch: 13 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:15,847-Speed 5516.71 samples/sec Loss 4.1995 LearningRate 0.0101 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:17,709-Speed 5498.58 samples/sec Loss 4.2897 LearningRate 0.0101 Epoch: 13 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:19,574-Speed 5493.43 samples/sec Loss 4.0791 LearningRate 0.0101 Epoch: 13 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:21,420-Speed 5551.56 samples/sec Loss 4.2794 LearningRate 0.0101 Epoch: 13 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:23,291-Speed 5472.89 samples/sec Loss 4.1036 LearningRate 0.0101 Epoch: 13 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:25,140-Speed 5543.10 samples/sec Loss 4.0958 LearningRate 0.0100 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:27,020-Speed 5448.13 samples/sec Loss 4.3307 LearningRate 0.0100 Epoch: 13 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:28,866-Speed 5550.55 samples/sec Loss 4.2130 LearningRate 0.0100 Epoch: 13 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:30,717-Speed 5534.08 samples/sec Loss 4.2197 LearningRate 0.0100 Epoch: 13 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:32,555-Speed 5571.72 samples/sec Loss 4.2518 LearningRate 0.0100 Epoch: 13 Global Step: 69140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:34,404-Speed 5540.56 samples/sec Loss 4.2569 LearningRate 0.0100 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:36,261-Speed 5516.00 samples/sec Loss 4.1214 LearningRate 0.0100 Epoch: 13 Global Step: 69160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:38,130-Speed 5481.61 samples/sec Loss 4.1384 LearningRate 0.0100 Epoch: 13 Global Step: 69170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:39,987-Speed 5516.10 samples/sec Loss 4.3087 LearningRate 0.0100 Epoch: 13 Global Step: 69180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:41,847-Speed 5505.85 samples/sec Loss 4.2280 LearningRate 0.0100 Epoch: 13 Global Step: 69190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:43,710-Speed 5500.30 samples/sec Loss 4.2971 LearningRate 0.0100 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:45,560-Speed 5536.25 samples/sec Loss 4.1835 LearningRate 0.0100 Epoch: 13 Global Step: 69210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:47,415-Speed 5522.65 samples/sec Loss 4.0767 LearningRate 0.0100 Epoch: 13 Global Step: 69220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:49,278-Speed 5499.58 samples/sec Loss 4.3397 LearningRate 0.0100 Epoch: 13 Global Step: 69230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:42:51,124-Speed 5549.89 samples/sec Loss 4.2451 LearningRate 0.0100 Epoch: 13 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:52,980-Speed 5518.93 samples/sec Loss 4.2160 LearningRate 0.0100 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:54,845-Speed 5494.00 samples/sec Loss 4.2404 LearningRate 0.0099 Epoch: 13 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:56,695-Speed 5534.92 samples/sec Loss 4.2083 LearningRate 0.0099 Epoch: 13 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:42:58,563-Speed 5483.10 samples/sec Loss 4.2626 LearningRate 0.0099 Epoch: 13 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:43:00,402-Speed 5571.12 samples/sec Loss 4.2868 LearningRate 0.0099 Epoch: 13 Global Step: 69290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:43:02,284-Speed 5443.66 samples/sec Loss 4.1740 LearningRate 0.0099 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:43:04,129-Speed 5552.19 samples/sec Loss 4.1712 LearningRate 0.0099 Epoch: 13 Global Step: 69310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:43:05,991-Speed 5500.69 samples/sec Loss 4.2866 LearningRate 0.0099 Epoch: 13 Global Step: 69320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:43:07,845-Speed 5527.15 samples/sec Loss 4.1996 LearningRate 0.0099 Epoch: 13 Global Step: 69330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:43:09,706-Speed 5505.49 samples/sec Loss 4.2680 LearningRate 0.0099 Epoch: 13 Global Step: 69340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:11,564-Speed 5512.72 samples/sec Loss 4.2305 LearningRate 0.0099 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:13,449-Speed 5434.36 samples/sec Loss 4.1895 LearningRate 0.0099 Epoch: 13 Global Step: 69360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:15,317-Speed 5481.66 samples/sec Loss 4.2936 LearningRate 0.0099 Epoch: 13 Global Step: 69370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:17,171-Speed 5525.81 samples/sec Loss 4.2570 LearningRate 0.0099 Epoch: 13 Global Step: 69380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:19,014-Speed 5559.37 samples/sec Loss 4.1670 LearningRate 0.0099 Epoch: 13 Global Step: 69390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:20,855-Speed 5565.10 samples/sec Loss 4.2988 LearningRate 0.0099 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:22,706-Speed 5532.84 samples/sec Loss 4.2869 LearningRate 0.0099 Epoch: 13 Global Step: 69410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:24,580-Speed 5467.45 samples/sec Loss 4.2454 LearningRate 0.0098 Epoch: 13 Global Step: 69420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:26,447-Speed 5486.32 samples/sec Loss 4.1632 LearningRate 0.0098 Epoch: 13 Global Step: 69430 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:28,293-Speed 5548.85 samples/sec Loss 4.1865 LearningRate 0.0098 Epoch: 13 Global Step: 69440 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:30,144-Speed 5534.07 samples/sec Loss 4.1445 LearningRate 0.0098 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:32,003-Speed 5511.42 samples/sec Loss 4.3475 LearningRate 0.0098 Epoch: 13 Global Step: 69460 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:33,862-Speed 5510.91 samples/sec Loss 4.0860 LearningRate 0.0098 Epoch: 13 Global Step: 69470 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:35,713-Speed 5535.23 samples/sec Loss 4.2126 LearningRate 0.0098 Epoch: 13 Global Step: 69480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:37,558-Speed 5550.04 samples/sec Loss 4.2776 LearningRate 0.0098 Epoch: 13 Global Step: 69490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:39,407-Speed 5541.20 samples/sec Loss 4.1226 LearningRate 0.0098 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:41,260-Speed 5527.15 samples/sec Loss 4.1192 LearningRate 0.0098 Epoch: 13 Global Step: 69510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:43,110-Speed 5538.78 samples/sec Loss 4.2289 LearningRate 0.0098 Epoch: 13 Global Step: 69520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-11 14:43:44,967-Speed 5515.45 samples/sec Loss 4.0615 LearningRate 0.0098 Epoch: 13 Global Step: 69530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:46,828-Speed 5505.00 samples/sec Loss 4.2024 LearningRate 0.0098 Epoch: 13 Global Step: 69540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:48,688-Speed 5510.37 samples/sec Loss 4.0860 LearningRate 0.0098 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:50,545-Speed 5515.45 samples/sec Loss 4.1162 LearningRate 0.0098 Epoch: 13 Global Step: 69560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:52,401-Speed 5517.32 samples/sec Loss 4.2755 LearningRate 0.0098 Epoch: 13 Global Step: 69570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:54,266-Speed 5493.92 samples/sec Loss 4.2076 LearningRate 0.0097 Epoch: 13 Global Step: 69580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:56,120-Speed 5525.10 samples/sec Loss 4.2908 LearningRate 0.0097 Epoch: 13 Global Step: 69590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:57,981-Speed 5504.47 samples/sec Loss 4.1072 LearningRate 0.0097 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:43:59,846-Speed 5491.57 samples/sec Loss 4.3328 LearningRate 0.0097 Epoch: 13 Global Step: 69610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:01,718-Speed 5472.49 samples/sec Loss 4.1736 LearningRate 0.0097 Epoch: 13 Global Step: 69620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:03,584-Speed 5490.22 samples/sec Loss 4.1945 LearningRate 0.0097 Epoch: 13 Global Step: 69630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:05,453-Speed 5483.07 samples/sec Loss 4.1415 LearningRate 0.0097 Epoch: 13 Global Step: 69640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:07,300-Speed 5544.28 samples/sec Loss 4.1176 LearningRate 0.0097 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:09,143-Speed 5559.60 samples/sec Loss 4.2343 LearningRate 0.0097 Epoch: 13 Global Step: 69660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:10,991-Speed 5543.09 samples/sec Loss 4.3173 LearningRate 0.0097 Epoch: 13 Global Step: 69670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:12,878-Speed 5427.09 samples/sec Loss 4.0894 LearningRate 0.0097 Epoch: 13 Global Step: 69680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:14,747-Speed 5481.34 samples/sec Loss 4.1479 LearningRate 0.0097 Epoch: 13 Global Step: 69690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:16,609-Speed 5503.14 samples/sec Loss 4.0978 LearningRate 0.0097 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:18,475-Speed 5488.81 samples/sec Loss 4.1115 LearningRate 0.0097 Epoch: 13 Global Step: 69710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:20,319-Speed 5555.83 samples/sec Loss 4.2069 LearningRate 0.0097 Epoch: 13 Global Step: 69720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:22,165-Speed 5548.39 samples/sec Loss 3.9989 LearningRate 0.0097 Epoch: 13 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:44:24,017-Speed 5530.43 samples/sec Loss 4.2375 LearningRate 0.0096 Epoch: 13 Global Step: 69740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:25,891-Speed 5466.80 samples/sec Loss 4.1716 LearningRate 0.0096 Epoch: 13 Global Step: 69750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:27,822-Speed 5306.63 samples/sec Loss 4.3268 LearningRate 0.0096 Epoch: 13 Global Step: 69760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:29,666-Speed 5554.85 samples/sec Loss 4.2204 LearningRate 0.0096 Epoch: 13 Global Step: 69770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:31,531-Speed 5491.75 samples/sec Loss 4.1152 LearningRate 0.0096 Epoch: 13 Global Step: 69780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:33,380-Speed 5540.38 samples/sec Loss 4.0092 LearningRate 0.0096 Epoch: 13 Global Step: 69790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:35,236-Speed 5521.69 samples/sec Loss 4.1637 LearningRate 0.0096 Epoch: 13 Global Step: 69800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:37,115-Speed 5450.84 samples/sec Loss 4.2529 LearningRate 0.0096 Epoch: 13 Global Step: 69810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:38,977-Speed 5502.00 samples/sec Loss 4.0252 LearningRate 0.0096 Epoch: 13 Global Step: 69820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:40,866-Speed 5422.92 samples/sec Loss 4.2303 LearningRate 0.0096 Epoch: 13 Global Step: 69830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:44:42,747-Speed 5443.75 samples/sec Loss 4.0350 LearningRate 0.0096 Epoch: 13 Global Step: 69840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:44,604-Speed 5515.86 samples/sec Loss 4.3097 LearningRate 0.0096 Epoch: 13 Global Step: 69850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:46,463-Speed 5510.78 samples/sec Loss 4.1898 LearningRate 0.0096 Epoch: 13 Global Step: 69860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:48,309-Speed 5550.49 samples/sec Loss 4.2345 LearningRate 0.0096 Epoch: 13 Global Step: 69870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:50,152-Speed 5558.84 samples/sec Loss 4.2592 LearningRate 0.0096 Epoch: 13 Global Step: 69880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:52,011-Speed 5510.23 samples/sec Loss 4.1824 LearningRate 0.0096 Epoch: 13 Global Step: 69890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:53,862-Speed 5535.28 samples/sec Loss 4.0838 LearningRate 0.0095 Epoch: 13 Global Step: 69900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:55,712-Speed 5537.96 samples/sec Loss 4.0118 LearningRate 0.0095 Epoch: 13 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:57,549-Speed 5575.32 samples/sec Loss 4.0893 LearningRate 0.0095 Epoch: 13 Global Step: 69920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:44:59,396-Speed 5545.69 samples/sec Loss 4.1242 LearningRate 0.0095 Epoch: 13 Global Step: 69930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:45:01,239-Speed 5558.11 samples/sec Loss 4.0218 LearningRate 0.0095 Epoch: 13 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:45:03,102-Speed 5497.92 samples/sec Loss 4.1816 LearningRate 0.0095 Epoch: 13 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:45:04,970-Speed 5486.65 samples/sec Loss 4.1826 LearningRate 0.0095 Epoch: 13 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:45:06,848-Speed 5453.53 samples/sec Loss 4.1822 LearningRate 0.0095 Epoch: 13 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:45:08,708-Speed 5506.80 samples/sec Loss 4.1381 LearningRate 0.0095 Epoch: 13 Global Step: 69980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:45:10,559-Speed 5536.95 samples/sec Loss 4.1897 LearningRate 0.0095 Epoch: 13 Global Step: 69990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:45:12,426-Speed 5485.21 samples/sec Loss 4.1523 LearningRate 0.0095 Epoch: 13 Global Step: 70000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:45:39,467-[lfw][70000]XNorm: 23.398558 Training: 2022-04-11 14:45:39,468-[lfw][70000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 14:45:39,469-[lfw][70000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:46:12,258-[cfp_fp][70000]XNorm: 21.524082 Training: 2022-04-11 14:46:12,260-[cfp_fp][70000]Accuracy-Flip: 0.98057+-0.00636 Training: 2022-04-11 14:46:12,260-[cfp_fp][70000]Accuracy-Highest: 0.98057 Training: 2022-04-11 14:46:40,150-[agedb_30][70000]XNorm: 23.294387 Training: 2022-04-11 14:46:40,151-[agedb_30][70000]Accuracy-Flip: 0.98050+-0.00658 Training: 2022-04-11 14:46:40,151-[agedb_30][70000]Accuracy-Highest: 0.98050 Training: 2022-04-11 14:46:42,035-Speed 114.28 samples/sec Loss 4.1100 LearningRate 0.0095 Epoch: 13 Global Step: 70010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:46:43,919-Speed 5438.38 samples/sec Loss 4.1549 LearningRate 0.0095 Epoch: 13 Global Step: 70020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:46:45,761-Speed 5561.16 samples/sec Loss 4.1915 LearningRate 0.0095 Epoch: 13 Global Step: 70030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:46:47,605-Speed 5555.43 samples/sec Loss 4.0625 LearningRate 0.0095 Epoch: 13 Global Step: 70040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:49,491-Speed 5433.96 samples/sec Loss 4.0821 LearningRate 0.0095 Epoch: 13 Global Step: 70050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:51,349-Speed 5515.47 samples/sec Loss 4.1457 LearningRate 0.0095 Epoch: 13 Global Step: 70060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:53,269-Speed 5335.94 samples/sec Loss 4.2263 LearningRate 0.0094 Epoch: 13 Global Step: 70070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:55,119-Speed 5540.88 samples/sec Loss 4.0287 LearningRate 0.0094 Epoch: 13 Global Step: 70080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:57,007-Speed 5425.62 samples/sec Loss 4.2500 LearningRate 0.0094 Epoch: 13 Global Step: 70090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:46:58,851-Speed 5556.54 samples/sec Loss 4.2819 LearningRate 0.0094 Epoch: 13 Global Step: 70100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:47:00,760-Speed 5369.80 samples/sec Loss 4.1949 LearningRate 0.0094 Epoch: 13 Global Step: 70110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:47:02,654-Speed 5406.48 samples/sec Loss 4.0872 LearningRate 0.0094 Epoch: 13 Global Step: 70120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:47:04,570-Speed 5348.94 samples/sec Loss 4.0930 LearningRate 0.0094 Epoch: 13 Global Step: 70130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:47:06,457-Speed 5428.44 samples/sec Loss 4.0324 LearningRate 0.0094 Epoch: 13 Global Step: 70140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:08,357-Speed 5392.24 samples/sec Loss 4.1960 LearningRate 0.0094 Epoch: 13 Global Step: 70150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:10,231-Speed 5471.39 samples/sec Loss 4.1251 LearningRate 0.0094 Epoch: 13 Global Step: 70160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:12,157-Speed 5317.04 samples/sec Loss 4.2765 LearningRate 0.0094 Epoch: 13 Global Step: 70170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:14,011-Speed 5528.12 samples/sec Loss 4.1023 LearningRate 0.0094 Epoch: 13 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:15,916-Speed 5377.26 samples/sec Loss 4.1908 LearningRate 0.0094 Epoch: 13 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:17,760-Speed 5556.96 samples/sec Loss 4.1640 LearningRate 0.0094 Epoch: 13 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:19,606-Speed 5546.78 samples/sec Loss 4.1074 LearningRate 0.0094 Epoch: 13 Global Step: 70210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:21,482-Speed 5462.70 samples/sec Loss 4.2014 LearningRate 0.0094 Epoch: 13 Global Step: 70220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:23,332-Speed 5538.22 samples/sec Loss 4.1088 LearningRate 0.0093 Epoch: 13 Global Step: 70230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:25,175-Speed 5559.70 samples/sec Loss 4.2140 LearningRate 0.0093 Epoch: 13 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:47:27,017-Speed 5559.25 samples/sec Loss 4.1887 LearningRate 0.0093 Epoch: 13 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:47:28,863-Speed 5549.49 samples/sec Loss 4.1912 LearningRate 0.0093 Epoch: 13 Global Step: 70260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:30,728-Speed 5492.71 samples/sec Loss 4.2086 LearningRate 0.0093 Epoch: 13 Global Step: 70270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:32,566-Speed 5574.37 samples/sec Loss 4.1652 LearningRate 0.0093 Epoch: 13 Global Step: 70280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:34,407-Speed 5562.72 samples/sec Loss 4.0135 LearningRate 0.0093 Epoch: 13 Global Step: 70290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:36,255-Speed 5544.79 samples/sec Loss 3.9820 LearningRate 0.0093 Epoch: 13 Global Step: 70300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:38,099-Speed 5555.86 samples/sec Loss 4.1510 LearningRate 0.0093 Epoch: 13 Global Step: 70310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:39,945-Speed 5548.42 samples/sec Loss 4.1205 LearningRate 0.0093 Epoch: 13 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:41,798-Speed 5529.37 samples/sec Loss 4.0627 LearningRate 0.0093 Epoch: 13 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:43,641-Speed 5556.10 samples/sec Loss 4.0113 LearningRate 0.0093 Epoch: 13 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:45,491-Speed 5537.39 samples/sec Loss 4.2372 LearningRate 0.0093 Epoch: 13 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:47,348-Speed 5519.73 samples/sec Loss 4.2120 LearningRate 0.0093 Epoch: 13 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:49,198-Speed 5536.06 samples/sec Loss 4.0251 LearningRate 0.0093 Epoch: 13 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:51,054-Speed 5519.94 samples/sec Loss 4.1609 LearningRate 0.0093 Epoch: 13 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:52,960-Speed 5372.94 samples/sec Loss 4.1703 LearningRate 0.0093 Epoch: 13 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:54,808-Speed 5543.95 samples/sec Loss 4.1351 LearningRate 0.0092 Epoch: 13 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:56,649-Speed 5563.95 samples/sec Loss 4.1736 LearningRate 0.0092 Epoch: 13 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:47:58,495-Speed 5548.34 samples/sec Loss 4.2027 LearningRate 0.0092 Epoch: 13 Global Step: 70420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:00,353-Speed 5515.63 samples/sec Loss 4.2172 LearningRate 0.0092 Epoch: 13 Global Step: 70430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:02,212-Speed 5508.05 samples/sec Loss 3.9642 LearningRate 0.0092 Epoch: 13 Global Step: 70440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:04,074-Speed 5501.46 samples/sec Loss 4.0505 LearningRate 0.0092 Epoch: 13 Global Step: 70450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:05,940-Speed 5490.78 samples/sec Loss 4.1250 LearningRate 0.0092 Epoch: 13 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:07,775-Speed 5582.42 samples/sec Loss 4.2323 LearningRate 0.0092 Epoch: 13 Global Step: 70470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:09,625-Speed 5538.43 samples/sec Loss 3.9639 LearningRate 0.0092 Epoch: 13 Global Step: 70480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:11,590-Speed 5211.60 samples/sec Loss 3.9378 LearningRate 0.0092 Epoch: 13 Global Step: 70490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:13,547-Speed 5235.04 samples/sec Loss 4.0912 LearningRate 0.0092 Epoch: 13 Global Step: 70500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:15,439-Speed 5414.74 samples/sec Loss 4.1550 LearningRate 0.0092 Epoch: 13 Global Step: 70510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:17,286-Speed 5545.38 samples/sec Loss 4.1659 LearningRate 0.0092 Epoch: 13 Global Step: 70520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:19,132-Speed 5550.26 samples/sec Loss 3.9466 LearningRate 0.0092 Epoch: 13 Global Step: 70530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:20,976-Speed 5554.25 samples/sec Loss 4.1833 LearningRate 0.0092 Epoch: 13 Global Step: 70540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:22,821-Speed 5551.40 samples/sec Loss 4.1218 LearningRate 0.0092 Epoch: 13 Global Step: 70550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:24,670-Speed 5542.44 samples/sec Loss 4.1293 LearningRate 0.0092 Epoch: 13 Global Step: 70560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:26,557-Speed 5428.92 samples/sec Loss 4.2258 LearningRate 0.0091 Epoch: 13 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:28,415-Speed 5512.55 samples/sec Loss 4.0875 LearningRate 0.0091 Epoch: 13 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:30,285-Speed 5480.75 samples/sec Loss 4.0535 LearningRate 0.0091 Epoch: 13 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:32,128-Speed 5556.25 samples/sec Loss 4.1828 LearningRate 0.0091 Epoch: 13 Global Step: 70600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:33,975-Speed 5547.50 samples/sec Loss 4.2042 LearningRate 0.0091 Epoch: 13 Global Step: 70610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:35,826-Speed 5532.64 samples/sec Loss 4.1632 LearningRate 0.0091 Epoch: 13 Global Step: 70620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:37,673-Speed 5547.40 samples/sec Loss 3.9997 LearningRate 0.0091 Epoch: 13 Global Step: 70630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:39,519-Speed 5548.82 samples/sec Loss 4.1292 LearningRate 0.0091 Epoch: 13 Global Step: 70640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:41,366-Speed 5547.39 samples/sec Loss 4.0502 LearningRate 0.0091 Epoch: 13 Global Step: 70650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:43,208-Speed 5558.88 samples/sec Loss 4.0862 LearningRate 0.0091 Epoch: 13 Global Step: 70660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:45,047-Speed 5569.89 samples/sec Loss 4.2399 LearningRate 0.0091 Epoch: 13 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:46,891-Speed 5557.70 samples/sec Loss 4.0451 LearningRate 0.0091 Epoch: 13 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:48,750-Speed 5508.45 samples/sec Loss 4.1488 LearningRate 0.0091 Epoch: 13 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:50,602-Speed 5533.36 samples/sec Loss 3.9720 LearningRate 0.0091 Epoch: 13 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:52,452-Speed 5537.64 samples/sec Loss 4.1419 LearningRate 0.0091 Epoch: 13 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:48:54,291-Speed 5570.51 samples/sec Loss 4.2247 LearningRate 0.0091 Epoch: 13 Global Step: 70720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:56,135-Speed 5554.23 samples/sec Loss 4.2335 LearningRate 0.0090 Epoch: 13 Global Step: 70730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:57,981-Speed 5549.60 samples/sec Loss 4.1420 LearningRate 0.0090 Epoch: 13 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:48:59,836-Speed 5520.31 samples/sec Loss 4.0596 LearningRate 0.0090 Epoch: 13 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:01,698-Speed 5502.46 samples/sec Loss 4.1340 LearningRate 0.0090 Epoch: 13 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:03,580-Speed 5444.73 samples/sec Loss 4.2798 LearningRate 0.0090 Epoch: 13 Global Step: 70770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:05,445-Speed 5490.50 samples/sec Loss 4.0662 LearningRate 0.0090 Epoch: 13 Global Step: 70780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:07,293-Speed 5544.24 samples/sec Loss 4.1381 LearningRate 0.0090 Epoch: 13 Global Step: 70790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:09,138-Speed 5552.73 samples/sec Loss 4.1456 LearningRate 0.0090 Epoch: 13 Global Step: 70800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:11,085-Speed 5262.20 samples/sec Loss 4.1010 LearningRate 0.0090 Epoch: 13 Global Step: 70810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:22,063-Speed 932.84 samples/sec Loss 3.4596 LearningRate 0.0090 Epoch: 14 Global Step: 70820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:23,940-Speed 5460.30 samples/sec Loss 3.2745 LearningRate 0.0090 Epoch: 14 Global Step: 70830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:25,829-Speed 5420.44 samples/sec Loss 3.1980 LearningRate 0.0090 Epoch: 14 Global Step: 70840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:27,704-Speed 5464.05 samples/sec Loss 3.2938 LearningRate 0.0090 Epoch: 14 Global Step: 70850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:29,569-Speed 5492.93 samples/sec Loss 3.2384 LearningRate 0.0090 Epoch: 14 Global Step: 70860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:31,412-Speed 5557.56 samples/sec Loss 3.3431 LearningRate 0.0090 Epoch: 14 Global Step: 70870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:33,270-Speed 5512.22 samples/sec Loss 3.2982 LearningRate 0.0090 Epoch: 14 Global Step: 70880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:35,146-Speed 5460.87 samples/sec Loss 3.3008 LearningRate 0.0090 Epoch: 14 Global Step: 70890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:37,008-Speed 5500.12 samples/sec Loss 3.2277 LearningRate 0.0089 Epoch: 14 Global Step: 70900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:38,850-Speed 5562.67 samples/sec Loss 3.2129 LearningRate 0.0089 Epoch: 14 Global Step: 70910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:49:40,711-Speed 5502.99 samples/sec Loss 3.4601 LearningRate 0.0089 Epoch: 14 Global Step: 70920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:42,581-Speed 5478.88 samples/sec Loss 3.3423 LearningRate 0.0089 Epoch: 14 Global Step: 70930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:44,440-Speed 5510.83 samples/sec Loss 3.3486 LearningRate 0.0089 Epoch: 14 Global Step: 70940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:46,290-Speed 5537.51 samples/sec Loss 3.3942 LearningRate 0.0089 Epoch: 14 Global Step: 70950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:48,134-Speed 5553.82 samples/sec Loss 3.2857 LearningRate 0.0089 Epoch: 14 Global Step: 70960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:49,983-Speed 5540.65 samples/sec Loss 3.2967 LearningRate 0.0089 Epoch: 14 Global Step: 70970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:51,849-Speed 5489.96 samples/sec Loss 3.2378 LearningRate 0.0089 Epoch: 14 Global Step: 70980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:53,705-Speed 5519.56 samples/sec Loss 3.3119 LearningRate 0.0089 Epoch: 14 Global Step: 70990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:55,555-Speed 5538.41 samples/sec Loss 3.2968 LearningRate 0.0089 Epoch: 14 Global Step: 71000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:57,405-Speed 5536.46 samples/sec Loss 3.3233 LearningRate 0.0089 Epoch: 14 Global Step: 71010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:49:59,252-Speed 5545.21 samples/sec Loss 3.4551 LearningRate 0.0089 Epoch: 14 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:50:01,150-Speed 5396.57 samples/sec Loss 3.3144 LearningRate 0.0089 Epoch: 14 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:50:03,029-Speed 5451.88 samples/sec Loss 3.3685 LearningRate 0.0089 Epoch: 14 Global Step: 71040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:04,880-Speed 5533.07 samples/sec Loss 3.2748 LearningRate 0.0089 Epoch: 14 Global Step: 71050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:06,752-Speed 5474.82 samples/sec Loss 3.3467 LearningRate 0.0089 Epoch: 14 Global Step: 71060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:08,596-Speed 5555.25 samples/sec Loss 3.2811 LearningRate 0.0088 Epoch: 14 Global Step: 71070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:10,445-Speed 5541.27 samples/sec Loss 3.2737 LearningRate 0.0088 Epoch: 14 Global Step: 71080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:12,352-Speed 5372.12 samples/sec Loss 3.3778 LearningRate 0.0088 Epoch: 14 Global Step: 71090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:14,207-Speed 5520.31 samples/sec Loss 3.4504 LearningRate 0.0088 Epoch: 14 Global Step: 71100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:16,077-Speed 5479.54 samples/sec Loss 3.2747 LearningRate 0.0088 Epoch: 14 Global Step: 71110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:17,949-Speed 5469.89 samples/sec Loss 3.3701 LearningRate 0.0088 Epoch: 14 Global Step: 71120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:19,791-Speed 5561.88 samples/sec Loss 3.3549 LearningRate 0.0088 Epoch: 14 Global Step: 71130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:21,640-Speed 5542.07 samples/sec Loss 3.4475 LearningRate 0.0088 Epoch: 14 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:50:23,472-Speed 5589.13 samples/sec Loss 3.4578 LearningRate 0.0088 Epoch: 14 Global Step: 71150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:25,319-Speed 5547.94 samples/sec Loss 3.4805 LearningRate 0.0088 Epoch: 14 Global Step: 71160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:27,174-Speed 5522.77 samples/sec Loss 3.4459 LearningRate 0.0088 Epoch: 14 Global Step: 71170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:29,028-Speed 5526.75 samples/sec Loss 3.3835 LearningRate 0.0088 Epoch: 14 Global Step: 71180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:30,881-Speed 5527.74 samples/sec Loss 3.2046 LearningRate 0.0088 Epoch: 14 Global Step: 71190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:32,728-Speed 5546.33 samples/sec Loss 3.3630 LearningRate 0.0088 Epoch: 14 Global Step: 71200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:34,579-Speed 5532.71 samples/sec Loss 3.3930 LearningRate 0.0088 Epoch: 14 Global Step: 71210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:36,428-Speed 5539.63 samples/sec Loss 3.4309 LearningRate 0.0088 Epoch: 14 Global Step: 71220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:38,280-Speed 5532.92 samples/sec Loss 3.4181 LearningRate 0.0088 Epoch: 14 Global Step: 71230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:40,124-Speed 5555.40 samples/sec Loss 3.4776 LearningRate 0.0087 Epoch: 14 Global Step: 71240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:41,971-Speed 5545.68 samples/sec Loss 3.4029 LearningRate 0.0087 Epoch: 14 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:50:43,821-Speed 5535.92 samples/sec Loss 3.3877 LearningRate 0.0087 Epoch: 14 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:50:45,660-Speed 5572.07 samples/sec Loss 3.4546 LearningRate 0.0087 Epoch: 14 Global Step: 71270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:47,505-Speed 5551.78 samples/sec Loss 3.3725 LearningRate 0.0087 Epoch: 14 Global Step: 71280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:49,359-Speed 5524.82 samples/sec Loss 3.4579 LearningRate 0.0087 Epoch: 14 Global Step: 71290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:51,205-Speed 5550.36 samples/sec Loss 3.5022 LearningRate 0.0087 Epoch: 14 Global Step: 71300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:53,061-Speed 5520.12 samples/sec Loss 3.3202 LearningRate 0.0087 Epoch: 14 Global Step: 71310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:54,928-Speed 5485.02 samples/sec Loss 3.3849 LearningRate 0.0087 Epoch: 14 Global Step: 71320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:56,778-Speed 5537.28 samples/sec Loss 3.4439 LearningRate 0.0087 Epoch: 14 Global Step: 71330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:50:58,626-Speed 5543.13 samples/sec Loss 3.4112 LearningRate 0.0087 Epoch: 14 Global Step: 71340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:00,509-Speed 5441.90 samples/sec Loss 3.3757 LearningRate 0.0087 Epoch: 14 Global Step: 71350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:02,385-Speed 5459.47 samples/sec Loss 3.4963 LearningRate 0.0087 Epoch: 14 Global Step: 71360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:04,260-Speed 5461.72 samples/sec Loss 3.5169 LearningRate 0.0087 Epoch: 14 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:06,153-Speed 5414.97 samples/sec Loss 3.5538 LearningRate 0.0087 Epoch: 14 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:08,005-Speed 5529.39 samples/sec Loss 3.3298 LearningRate 0.0087 Epoch: 14 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:09,859-Speed 5526.30 samples/sec Loss 3.5031 LearningRate 0.0087 Epoch: 14 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:11,708-Speed 5538.94 samples/sec Loss 3.4210 LearningRate 0.0086 Epoch: 14 Global Step: 71410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:13,566-Speed 5514.52 samples/sec Loss 3.4885 LearningRate 0.0086 Epoch: 14 Global Step: 71420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:15,426-Speed 5507.41 samples/sec Loss 3.5095 LearningRate 0.0086 Epoch: 14 Global Step: 71430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:17,294-Speed 5483.21 samples/sec Loss 3.4501 LearningRate 0.0086 Epoch: 14 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:19,145-Speed 5534.11 samples/sec Loss 3.4644 LearningRate 0.0086 Epoch: 14 Global Step: 71450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:21,010-Speed 5492.75 samples/sec Loss 3.4844 LearningRate 0.0086 Epoch: 14 Global Step: 71460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:22,910-Speed 5393.83 samples/sec Loss 3.4220 LearningRate 0.0086 Epoch: 14 Global Step: 71470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:24,813-Speed 5382.07 samples/sec Loss 3.5002 LearningRate 0.0086 Epoch: 14 Global Step: 71480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:26,710-Speed 5398.64 samples/sec Loss 3.4496 LearningRate 0.0086 Epoch: 14 Global Step: 71490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:28,558-Speed 5546.38 samples/sec Loss 3.4282 LearningRate 0.0086 Epoch: 14 Global Step: 71500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:51:30,402-Speed 5552.57 samples/sec Loss 3.5054 LearningRate 0.0086 Epoch: 14 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:32,263-Speed 5506.85 samples/sec Loss 3.4369 LearningRate 0.0086 Epoch: 14 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:34,108-Speed 5551.58 samples/sec Loss 3.3452 LearningRate 0.0086 Epoch: 14 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:35,971-Speed 5499.54 samples/sec Loss 3.4010 LearningRate 0.0086 Epoch: 14 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:37,820-Speed 5539.87 samples/sec Loss 3.5369 LearningRate 0.0086 Epoch: 14 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:39,668-Speed 5541.47 samples/sec Loss 3.5011 LearningRate 0.0086 Epoch: 14 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:41,532-Speed 5499.19 samples/sec Loss 3.5728 LearningRate 0.0086 Epoch: 14 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:43,387-Speed 5524.01 samples/sec Loss 3.4977 LearningRate 0.0086 Epoch: 14 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:45,249-Speed 5500.24 samples/sec Loss 3.6371 LearningRate 0.0085 Epoch: 14 Global Step: 71590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:47,125-Speed 5461.72 samples/sec Loss 3.5448 LearningRate 0.0085 Epoch: 14 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:48,976-Speed 5533.42 samples/sec Loss 3.6053 LearningRate 0.0085 Epoch: 14 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:50,881-Speed 5378.66 samples/sec Loss 3.5322 LearningRate 0.0085 Epoch: 14 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:52,760-Speed 5451.57 samples/sec Loss 3.5415 LearningRate 0.0085 Epoch: 14 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:54,605-Speed 5551.31 samples/sec Loss 3.5069 LearningRate 0.0085 Epoch: 14 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:56,450-Speed 5551.89 samples/sec Loss 3.3895 LearningRate 0.0085 Epoch: 14 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:51:58,299-Speed 5539.73 samples/sec Loss 3.5727 LearningRate 0.0085 Epoch: 14 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:52:00,146-Speed 5546.32 samples/sec Loss 3.4826 LearningRate 0.0085 Epoch: 14 Global Step: 71670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:02,017-Speed 5474.57 samples/sec Loss 3.6146 LearningRate 0.0085 Epoch: 14 Global Step: 71680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:03,869-Speed 5533.35 samples/sec Loss 3.6037 LearningRate 0.0085 Epoch: 14 Global Step: 71690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:05,718-Speed 5538.40 samples/sec Loss 3.5380 LearningRate 0.0085 Epoch: 14 Global Step: 71700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:07,568-Speed 5537.84 samples/sec Loss 3.4267 LearningRate 0.0085 Epoch: 14 Global Step: 71710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:09,414-Speed 5550.62 samples/sec Loss 3.6074 LearningRate 0.0085 Epoch: 14 Global Step: 71720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:11,265-Speed 5534.55 samples/sec Loss 3.6630 LearningRate 0.0085 Epoch: 14 Global Step: 71730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:13,119-Speed 5524.35 samples/sec Loss 3.5692 LearningRate 0.0085 Epoch: 14 Global Step: 71740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:14,970-Speed 5532.92 samples/sec Loss 3.4892 LearningRate 0.0085 Epoch: 14 Global Step: 71750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:16,816-Speed 5550.71 samples/sec Loss 3.6916 LearningRate 0.0084 Epoch: 14 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:18,678-Speed 5502.86 samples/sec Loss 3.5221 LearningRate 0.0084 Epoch: 14 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:52:20,522-Speed 5553.05 samples/sec Loss 3.4601 LearningRate 0.0084 Epoch: 14 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:52:22,382-Speed 5507.99 samples/sec Loss 3.5765 LearningRate 0.0084 Epoch: 14 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:24,243-Speed 5505.80 samples/sec Loss 3.6304 LearningRate 0.0084 Epoch: 14 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:26,113-Speed 5477.93 samples/sec Loss 3.4878 LearningRate 0.0084 Epoch: 14 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:27,973-Speed 5505.54 samples/sec Loss 3.5952 LearningRate 0.0084 Epoch: 14 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:29,823-Speed 5538.66 samples/sec Loss 3.5187 LearningRate 0.0084 Epoch: 14 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:31,666-Speed 5561.20 samples/sec Loss 3.5926 LearningRate 0.0084 Epoch: 14 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:33,512-Speed 5547.81 samples/sec Loss 3.6140 LearningRate 0.0084 Epoch: 14 Global Step: 71850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:35,396-Speed 5437.32 samples/sec Loss 3.7099 LearningRate 0.0084 Epoch: 14 Global Step: 71860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:37,265-Speed 5480.68 samples/sec Loss 3.5888 LearningRate 0.0084 Epoch: 14 Global Step: 71870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:39,133-Speed 5484.15 samples/sec Loss 3.6833 LearningRate 0.0084 Epoch: 14 Global Step: 71880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:40,995-Speed 5503.13 samples/sec Loss 3.5040 LearningRate 0.0084 Epoch: 14 Global Step: 71890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:42,853-Speed 5511.30 samples/sec Loss 3.5778 LearningRate 0.0084 Epoch: 14 Global Step: 71900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:44,705-Speed 5529.76 samples/sec Loss 3.7318 LearningRate 0.0084 Epoch: 14 Global Step: 71910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:46,557-Speed 5531.76 samples/sec Loss 3.5095 LearningRate 0.0084 Epoch: 14 Global Step: 71920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:48,407-Speed 5539.50 samples/sec Loss 3.5176 LearningRate 0.0083 Epoch: 14 Global Step: 71930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:50,274-Speed 5487.49 samples/sec Loss 3.6060 LearningRate 0.0083 Epoch: 14 Global Step: 71940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:52,128-Speed 5525.36 samples/sec Loss 3.4878 LearningRate 0.0083 Epoch: 14 Global Step: 71950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:53,976-Speed 5545.24 samples/sec Loss 3.7024 LearningRate 0.0083 Epoch: 14 Global Step: 71960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:55,825-Speed 5541.43 samples/sec Loss 3.6240 LearningRate 0.0083 Epoch: 14 Global Step: 71970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:57,686-Speed 5502.30 samples/sec Loss 3.4624 LearningRate 0.0083 Epoch: 14 Global Step: 71980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:52:59,542-Speed 5519.60 samples/sec Loss 3.6188 LearningRate 0.0083 Epoch: 14 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:53:01,393-Speed 5533.42 samples/sec Loss 3.5677 LearningRate 0.0083 Epoch: 14 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:53:30,698-[lfw][72000]XNorm: 22.745616 Training: 2022-04-11 14:53:30,698-[lfw][72000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-11 14:53:30,699-[lfw][72000]Accuracy-Highest: 0.99817 Training: 2022-04-11 14:54:01,540-[cfp_fp][72000]XNorm: 20.677433 Training: 2022-04-11 14:54:01,541-[cfp_fp][72000]Accuracy-Flip: 0.97686+-0.00622 Training: 2022-04-11 14:54:01,541-[cfp_fp][72000]Accuracy-Highest: 0.98057 Training: 2022-04-11 14:54:28,158-[agedb_30][72000]XNorm: 22.580363 Training: 2022-04-11 14:54:28,158-[agedb_30][72000]Accuracy-Flip: 0.98217+-0.00619 Training: 2022-04-11 14:54:28,159-[agedb_30][72000]Accuracy-Highest: 0.98217 Training: 2022-04-11 14:54:30,042-Speed 115.51 samples/sec Loss 3.6505 LearningRate 0.0083 Epoch: 14 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:54:31,883-Speed 5566.12 samples/sec Loss 3.5904 LearningRate 0.0083 Epoch: 14 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:54:33,735-Speed 5530.34 samples/sec Loss 3.5700 LearningRate 0.0083 Epoch: 14 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:54:35,577-Speed 5566.75 samples/sec Loss 3.5577 LearningRate 0.0083 Epoch: 14 Global Step: 72040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:37,447-Speed 5478.81 samples/sec Loss 3.5716 LearningRate 0.0083 Epoch: 14 Global Step: 72050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:39,294-Speed 5548.19 samples/sec Loss 3.5341 LearningRate 0.0083 Epoch: 14 Global Step: 72060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:41,141-Speed 5546.89 samples/sec Loss 3.5399 LearningRate 0.0083 Epoch: 14 Global Step: 72070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:43,022-Speed 5446.38 samples/sec Loss 3.5483 LearningRate 0.0083 Epoch: 14 Global Step: 72080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:44,864-Speed 5562.80 samples/sec Loss 3.5305 LearningRate 0.0083 Epoch: 14 Global Step: 72090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:46,712-Speed 5543.01 samples/sec Loss 3.5456 LearningRate 0.0083 Epoch: 14 Global Step: 72100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:48,590-Speed 5456.18 samples/sec Loss 3.5733 LearningRate 0.0082 Epoch: 14 Global Step: 72110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:50,508-Speed 5343.85 samples/sec Loss 3.5964 LearningRate 0.0082 Epoch: 14 Global Step: 72120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:52,358-Speed 5539.39 samples/sec Loss 3.6696 LearningRate 0.0082 Epoch: 14 Global Step: 72130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:54,242-Speed 5438.05 samples/sec Loss 3.5166 LearningRate 0.0082 Epoch: 14 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:54:56,074-Speed 5592.59 samples/sec Loss 3.5006 LearningRate 0.0082 Epoch: 14 Global Step: 72150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:57,957-Speed 5444.39 samples/sec Loss 3.5089 LearningRate 0.0082 Epoch: 14 Global Step: 72160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:54:59,838-Speed 5446.84 samples/sec Loss 3.5872 LearningRate 0.0082 Epoch: 14 Global Step: 72170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:01,744-Speed 5374.06 samples/sec Loss 3.6836 LearningRate 0.0082 Epoch: 14 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:03,606-Speed 5504.54 samples/sec Loss 3.5877 LearningRate 0.0082 Epoch: 14 Global Step: 72190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:05,491-Speed 5436.64 samples/sec Loss 3.5711 LearningRate 0.0082 Epoch: 14 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:07,402-Speed 5358.92 samples/sec Loss 3.5393 LearningRate 0.0082 Epoch: 14 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:09,271-Speed 5482.81 samples/sec Loss 3.5558 LearningRate 0.0082 Epoch: 14 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:11,131-Speed 5509.36 samples/sec Loss 3.5921 LearningRate 0.0082 Epoch: 14 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:12,998-Speed 5485.89 samples/sec Loss 3.6073 LearningRate 0.0082 Epoch: 14 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:14,875-Speed 5460.25 samples/sec Loss 3.5104 LearningRate 0.0082 Epoch: 14 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:55:16,738-Speed 5497.73 samples/sec Loss 3.6796 LearningRate 0.0082 Epoch: 14 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:55:18,594-Speed 5520.87 samples/sec Loss 3.6236 LearningRate 0.0082 Epoch: 14 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:55:20,456-Speed 5504.69 samples/sec Loss 3.6289 LearningRate 0.0082 Epoch: 14 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:55:22,302-Speed 5549.49 samples/sec Loss 3.7108 LearningRate 0.0081 Epoch: 14 Global Step: 72290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:24,201-Speed 5395.83 samples/sec Loss 3.5905 LearningRate 0.0081 Epoch: 14 Global Step: 72300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:26,050-Speed 5542.03 samples/sec Loss 3.5077 LearningRate 0.0081 Epoch: 14 Global Step: 72310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:27,902-Speed 5529.55 samples/sec Loss 3.5355 LearningRate 0.0081 Epoch: 14 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:29,772-Speed 5479.13 samples/sec Loss 3.5957 LearningRate 0.0081 Epoch: 14 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:31,635-Speed 5497.32 samples/sec Loss 3.5480 LearningRate 0.0081 Epoch: 14 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:33,504-Speed 5481.89 samples/sec Loss 3.5922 LearningRate 0.0081 Epoch: 14 Global Step: 72350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:35,353-Speed 5540.48 samples/sec Loss 3.6134 LearningRate 0.0081 Epoch: 14 Global Step: 72360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:37,218-Speed 5493.61 samples/sec Loss 3.5706 LearningRate 0.0081 Epoch: 14 Global Step: 72370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:39,079-Speed 5503.14 samples/sec Loss 3.6306 LearningRate 0.0081 Epoch: 14 Global Step: 72380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:40,921-Speed 5563.29 samples/sec Loss 3.6621 LearningRate 0.0081 Epoch: 14 Global Step: 72390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:42,773-Speed 5528.28 samples/sec Loss 3.5137 LearningRate 0.0081 Epoch: 14 Global Step: 72400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:44,625-Speed 5534.05 samples/sec Loss 3.5361 LearningRate 0.0081 Epoch: 14 Global Step: 72410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:46,477-Speed 5528.22 samples/sec Loss 3.6583 LearningRate 0.0081 Epoch: 14 Global Step: 72420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:48,374-Speed 5401.06 samples/sec Loss 3.6044 LearningRate 0.0081 Epoch: 14 Global Step: 72430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:50,279-Speed 5379.72 samples/sec Loss 3.7555 LearningRate 0.0081 Epoch: 14 Global Step: 72440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:52,153-Speed 5464.77 samples/sec Loss 3.7113 LearningRate 0.0081 Epoch: 14 Global Step: 72450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:54,025-Speed 5472.85 samples/sec Loss 3.6029 LearningRate 0.0080 Epoch: 14 Global Step: 72460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:55,870-Speed 5551.85 samples/sec Loss 3.6015 LearningRate 0.0080 Epoch: 14 Global Step: 72470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:57,711-Speed 5562.50 samples/sec Loss 3.6314 LearningRate 0.0080 Epoch: 14 Global Step: 72480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:55:59,555-Speed 5556.91 samples/sec Loss 3.5267 LearningRate 0.0080 Epoch: 14 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:01,395-Speed 5567.47 samples/sec Loss 3.5757 LearningRate 0.0080 Epoch: 14 Global Step: 72500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:03,247-Speed 5531.40 samples/sec Loss 3.6431 LearningRate 0.0080 Epoch: 14 Global Step: 72510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:05,128-Speed 5445.25 samples/sec Loss 3.6103 LearningRate 0.0080 Epoch: 14 Global Step: 72520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:06,981-Speed 5527.52 samples/sec Loss 3.5858 LearningRate 0.0080 Epoch: 14 Global Step: 72530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:08,825-Speed 5556.65 samples/sec Loss 3.6647 LearningRate 0.0080 Epoch: 14 Global Step: 72540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:10,679-Speed 5525.55 samples/sec Loss 3.6888 LearningRate 0.0080 Epoch: 14 Global Step: 72550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:12,545-Speed 5489.55 samples/sec Loss 3.6165 LearningRate 0.0080 Epoch: 14 Global Step: 72560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:14,425-Speed 5450.53 samples/sec Loss 3.7861 LearningRate 0.0080 Epoch: 14 Global Step: 72570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:16,289-Speed 5495.10 samples/sec Loss 3.5077 LearningRate 0.0080 Epoch: 14 Global Step: 72580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:18,149-Speed 5506.74 samples/sec Loss 3.5884 LearningRate 0.0080 Epoch: 14 Global Step: 72590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:20,012-Speed 5497.34 samples/sec Loss 3.7876 LearningRate 0.0080 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:21,868-Speed 5519.07 samples/sec Loss 3.5060 LearningRate 0.0080 Epoch: 14 Global Step: 72610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:23,725-Speed 5517.56 samples/sec Loss 3.7237 LearningRate 0.0080 Epoch: 14 Global Step: 72620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:56:25,599-Speed 5466.67 samples/sec Loss 3.5564 LearningRate 0.0080 Epoch: 14 Global Step: 72630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:27,490-Speed 5415.35 samples/sec Loss 3.5986 LearningRate 0.0079 Epoch: 14 Global Step: 72640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:29,336-Speed 5549.71 samples/sec Loss 3.6892 LearningRate 0.0079 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:31,190-Speed 5525.49 samples/sec Loss 3.6168 LearningRate 0.0079 Epoch: 14 Global Step: 72660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:33,034-Speed 5555.83 samples/sec Loss 3.5530 LearningRate 0.0079 Epoch: 14 Global Step: 72670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:34,888-Speed 5525.72 samples/sec Loss 3.6561 LearningRate 0.0079 Epoch: 14 Global Step: 72680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:36,736-Speed 5544.41 samples/sec Loss 3.5711 LearningRate 0.0079 Epoch: 14 Global Step: 72690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:38,576-Speed 5565.28 samples/sec Loss 3.6025 LearningRate 0.0079 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:40,417-Speed 5563.84 samples/sec Loss 3.6698 LearningRate 0.0079 Epoch: 14 Global Step: 72710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:42,261-Speed 5556.02 samples/sec Loss 3.6594 LearningRate 0.0079 Epoch: 14 Global Step: 72720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:44,106-Speed 5552.21 samples/sec Loss 3.6177 LearningRate 0.0079 Epoch: 14 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:45,980-Speed 5467.53 samples/sec Loss 3.6436 LearningRate 0.0079 Epoch: 14 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:47,845-Speed 5491.21 samples/sec Loss 3.6294 LearningRate 0.0079 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:49,700-Speed 5522.23 samples/sec Loss 3.6471 LearningRate 0.0079 Epoch: 14 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:51,552-Speed 5532.48 samples/sec Loss 3.6595 LearningRate 0.0079 Epoch: 14 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:56:53,394-Speed 5562.75 samples/sec Loss 3.6335 LearningRate 0.0079 Epoch: 14 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:55,248-Speed 5523.22 samples/sec Loss 3.5847 LearningRate 0.0079 Epoch: 14 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:57,100-Speed 5532.38 samples/sec Loss 3.5850 LearningRate 0.0079 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:56:58,971-Speed 5474.86 samples/sec Loss 3.5984 LearningRate 0.0079 Epoch: 14 Global Step: 72810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:00,845-Speed 5465.77 samples/sec Loss 3.6253 LearningRate 0.0078 Epoch: 14 Global Step: 72820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:02,733-Speed 5426.15 samples/sec Loss 3.6176 LearningRate 0.0078 Epoch: 14 Global Step: 72830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:04,600-Speed 5487.30 samples/sec Loss 3.5963 LearningRate 0.0078 Epoch: 14 Global Step: 72840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:06,466-Speed 5488.99 samples/sec Loss 3.6106 LearningRate 0.0078 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:08,321-Speed 5521.64 samples/sec Loss 3.5633 LearningRate 0.0078 Epoch: 14 Global Step: 72860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:10,178-Speed 5517.63 samples/sec Loss 3.6845 LearningRate 0.0078 Epoch: 14 Global Step: 72870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:12,033-Speed 5522.95 samples/sec Loss 3.7428 LearningRate 0.0078 Epoch: 14 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:57:13,883-Speed 5537.67 samples/sec Loss 3.7113 LearningRate 0.0078 Epoch: 14 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:57:15,751-Speed 5482.76 samples/sec Loss 3.6295 LearningRate 0.0078 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:17,613-Speed 5501.98 samples/sec Loss 3.6986 LearningRate 0.0078 Epoch: 14 Global Step: 72910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:19,463-Speed 5535.71 samples/sec Loss 3.6841 LearningRate 0.0078 Epoch: 14 Global Step: 72920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:21,311-Speed 5542.48 samples/sec Loss 3.6820 LearningRate 0.0078 Epoch: 14 Global Step: 72930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:23,189-Speed 5455.24 samples/sec Loss 3.6623 LearningRate 0.0078 Epoch: 14 Global Step: 72940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:25,050-Speed 5506.84 samples/sec Loss 3.7122 LearningRate 0.0078 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:26,935-Speed 5434.17 samples/sec Loss 3.6217 LearningRate 0.0078 Epoch: 14 Global Step: 72960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:28,790-Speed 5520.95 samples/sec Loss 3.7007 LearningRate 0.0078 Epoch: 14 Global Step: 72970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:30,631-Speed 5565.24 samples/sec Loss 3.6588 LearningRate 0.0078 Epoch: 14 Global Step: 72980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:32,490-Speed 5508.74 samples/sec Loss 3.6605 LearningRate 0.0078 Epoch: 14 Global Step: 72990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:34,346-Speed 5522.49 samples/sec Loss 3.7055 LearningRate 0.0077 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:36,193-Speed 5544.51 samples/sec Loss 3.6966 LearningRate 0.0077 Epoch: 14 Global Step: 73010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:38,034-Speed 5563.61 samples/sec Loss 3.6505 LearningRate 0.0077 Epoch: 14 Global Step: 73020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:39,875-Speed 5564.73 samples/sec Loss 3.5096 LearningRate 0.0077 Epoch: 14 Global Step: 73030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:41,721-Speed 5550.06 samples/sec Loss 3.6298 LearningRate 0.0077 Epoch: 14 Global Step: 73040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:43,566-Speed 5553.63 samples/sec Loss 3.6238 LearningRate 0.0077 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:45,415-Speed 5536.91 samples/sec Loss 3.6781 LearningRate 0.0077 Epoch: 14 Global Step: 73060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:47,280-Speed 5493.26 samples/sec Loss 3.6442 LearningRate 0.0077 Epoch: 14 Global Step: 73070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:57:49,208-Speed 5313.68 samples/sec Loss 3.6977 LearningRate 0.0077 Epoch: 14 Global Step: 73080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:51,115-Speed 5371.56 samples/sec Loss 3.7812 LearningRate 0.0077 Epoch: 14 Global Step: 73090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:52,970-Speed 5523.99 samples/sec Loss 3.7339 LearningRate 0.0077 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:54,819-Speed 5541.88 samples/sec Loss 3.6713 LearningRate 0.0077 Epoch: 14 Global Step: 73110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:56,662-Speed 5555.25 samples/sec Loss 3.6645 LearningRate 0.0077 Epoch: 14 Global Step: 73120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:57:58,509-Speed 5547.83 samples/sec Loss 3.7295 LearningRate 0.0077 Epoch: 14 Global Step: 73130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:00,359-Speed 5538.41 samples/sec Loss 3.6522 LearningRate 0.0077 Epoch: 14 Global Step: 73140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:02,206-Speed 5543.69 samples/sec Loss 3.5896 LearningRate 0.0077 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:04,055-Speed 5541.53 samples/sec Loss 3.6089 LearningRate 0.0077 Epoch: 14 Global Step: 73160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:05,900-Speed 5552.01 samples/sec Loss 3.6897 LearningRate 0.0077 Epoch: 14 Global Step: 73170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:07,753-Speed 5528.05 samples/sec Loss 3.7070 LearningRate 0.0077 Epoch: 14 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:09,613-Speed 5507.37 samples/sec Loss 3.6508 LearningRate 0.0076 Epoch: 14 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:11,487-Speed 5464.79 samples/sec Loss 3.5679 LearningRate 0.0076 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:13,399-Speed 5358.39 samples/sec Loss 3.7252 LearningRate 0.0076 Epoch: 14 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:15,265-Speed 5491.98 samples/sec Loss 3.6313 LearningRate 0.0076 Epoch: 14 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:17,119-Speed 5525.41 samples/sec Loss 3.7732 LearningRate 0.0076 Epoch: 14 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:18,965-Speed 5548.59 samples/sec Loss 3.6160 LearningRate 0.0076 Epoch: 14 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:20,810-Speed 5552.97 samples/sec Loss 3.5793 LearningRate 0.0076 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:22,678-Speed 5482.52 samples/sec Loss 3.5337 LearningRate 0.0076 Epoch: 14 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:24,553-Speed 5463.01 samples/sec Loss 3.6204 LearningRate 0.0076 Epoch: 14 Global Step: 73270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:26,418-Speed 5492.17 samples/sec Loss 3.6748 LearningRate 0.0076 Epoch: 14 Global Step: 73280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:28,275-Speed 5516.71 samples/sec Loss 3.5959 LearningRate 0.0076 Epoch: 14 Global Step: 73290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:30,135-Speed 5508.12 samples/sec Loss 3.6996 LearningRate 0.0076 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:32,001-Speed 5489.40 samples/sec Loss 3.5892 LearningRate 0.0076 Epoch: 14 Global Step: 73310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:33,868-Speed 5487.10 samples/sec Loss 3.6689 LearningRate 0.0076 Epoch: 14 Global Step: 73320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:35,721-Speed 5526.79 samples/sec Loss 3.6027 LearningRate 0.0076 Epoch: 14 Global Step: 73330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:37,579-Speed 5514.27 samples/sec Loss 3.6114 LearningRate 0.0076 Epoch: 14 Global Step: 73340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:39,442-Speed 5498.16 samples/sec Loss 3.6263 LearningRate 0.0076 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:41,302-Speed 5509.43 samples/sec Loss 3.5555 LearningRate 0.0076 Epoch: 14 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:43,161-Speed 5507.45 samples/sec Loss 3.6577 LearningRate 0.0075 Epoch: 14 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:58:45,001-Speed 5570.40 samples/sec Loss 3.6994 LearningRate 0.0075 Epoch: 14 Global Step: 73380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:46,845-Speed 5553.19 samples/sec Loss 3.6250 LearningRate 0.0075 Epoch: 14 Global Step: 73390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:48,706-Speed 5504.17 samples/sec Loss 3.6517 LearningRate 0.0075 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:50,557-Speed 5536.12 samples/sec Loss 3.6089 LearningRate 0.0075 Epoch: 14 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:52,420-Speed 5498.50 samples/sec Loss 3.7286 LearningRate 0.0075 Epoch: 14 Global Step: 73420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:54,277-Speed 5517.51 samples/sec Loss 3.6905 LearningRate 0.0075 Epoch: 14 Global Step: 73430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:56,124-Speed 5544.55 samples/sec Loss 3.6455 LearningRate 0.0075 Epoch: 14 Global Step: 73440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:57,982-Speed 5513.76 samples/sec Loss 3.6657 LearningRate 0.0075 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:58:59,862-Speed 5449.50 samples/sec Loss 3.6665 LearningRate 0.0075 Epoch: 14 Global Step: 73460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:01,725-Speed 5497.30 samples/sec Loss 3.7476 LearningRate 0.0075 Epoch: 14 Global Step: 73470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:03,564-Speed 5572.65 samples/sec Loss 3.7321 LearningRate 0.0075 Epoch: 14 Global Step: 73480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:05,418-Speed 5524.01 samples/sec Loss 3.5652 LearningRate 0.0075 Epoch: 14 Global Step: 73490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:07,262-Speed 5555.74 samples/sec Loss 3.5572 LearningRate 0.0075 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:09,116-Speed 5524.02 samples/sec Loss 3.7049 LearningRate 0.0075 Epoch: 14 Global Step: 73510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:10,989-Speed 5470.36 samples/sec Loss 3.6533 LearningRate 0.0075 Epoch: 14 Global Step: 73520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:12,870-Speed 5444.86 samples/sec Loss 3.7319 LearningRate 0.0075 Epoch: 14 Global Step: 73530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:14,774-Speed 5382.10 samples/sec Loss 3.5970 LearningRate 0.0075 Epoch: 14 Global Step: 73540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:16,676-Speed 5386.56 samples/sec Loss 3.7321 LearningRate 0.0074 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:18,524-Speed 5542.01 samples/sec Loss 3.7120 LearningRate 0.0074 Epoch: 14 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:20,370-Speed 5549.99 samples/sec Loss 3.6200 LearningRate 0.0074 Epoch: 14 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:22,224-Speed 5524.12 samples/sec Loss 3.7249 LearningRate 0.0074 Epoch: 14 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:59:24,073-Speed 5540.41 samples/sec Loss 3.7881 LearningRate 0.0074 Epoch: 14 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 14:59:25,911-Speed 5573.85 samples/sec Loss 3.6088 LearningRate 0.0074 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:27,760-Speed 5540.70 samples/sec Loss 3.6845 LearningRate 0.0074 Epoch: 14 Global Step: 73610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:29,606-Speed 5548.60 samples/sec Loss 3.5020 LearningRate 0.0074 Epoch: 14 Global Step: 73620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:31,477-Speed 5475.24 samples/sec Loss 3.8572 LearningRate 0.0074 Epoch: 14 Global Step: 73630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:33,330-Speed 5526.51 samples/sec Loss 3.6927 LearningRate 0.0074 Epoch: 14 Global Step: 73640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:35,167-Speed 5577.48 samples/sec Loss 3.7403 LearningRate 0.0074 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:37,014-Speed 5545.60 samples/sec Loss 3.6962 LearningRate 0.0074 Epoch: 14 Global Step: 73660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:38,858-Speed 5556.92 samples/sec Loss 3.5671 LearningRate 0.0074 Epoch: 14 Global Step: 73670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:40,714-Speed 5518.09 samples/sec Loss 3.7016 LearningRate 0.0074 Epoch: 14 Global Step: 73680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:42,563-Speed 5540.96 samples/sec Loss 3.6868 LearningRate 0.0074 Epoch: 14 Global Step: 73690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:44,408-Speed 5552.35 samples/sec Loss 3.6222 LearningRate 0.0074 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:46,260-Speed 5530.32 samples/sec Loss 3.6707 LearningRate 0.0074 Epoch: 14 Global Step: 73710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:48,145-Speed 5436.18 samples/sec Loss 3.6158 LearningRate 0.0074 Epoch: 14 Global Step: 73720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:50,031-Speed 5431.70 samples/sec Loss 3.7173 LearningRate 0.0074 Epoch: 14 Global Step: 73730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:51,888-Speed 5515.44 samples/sec Loss 3.7273 LearningRate 0.0073 Epoch: 14 Global Step: 73740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 14:59:53,739-Speed 5534.29 samples/sec Loss 3.7627 LearningRate 0.0073 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:55,584-Speed 5554.13 samples/sec Loss 3.6268 LearningRate 0.0073 Epoch: 14 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:57,426-Speed 5560.48 samples/sec Loss 3.7715 LearningRate 0.0073 Epoch: 14 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 14:59:59,271-Speed 5551.71 samples/sec Loss 3.7480 LearningRate 0.0073 Epoch: 14 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:01,122-Speed 5533.05 samples/sec Loss 3.5972 LearningRate 0.0073 Epoch: 14 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:02,972-Speed 5539.71 samples/sec Loss 3.6363 LearningRate 0.0073 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:04,829-Speed 5515.17 samples/sec Loss 3.5416 LearningRate 0.0073 Epoch: 14 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:06,680-Speed 5533.68 samples/sec Loss 3.5891 LearningRate 0.0073 Epoch: 14 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:08,540-Speed 5507.95 samples/sec Loss 3.5456 LearningRate 0.0073 Epoch: 14 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:10,387-Speed 5546.54 samples/sec Loss 3.7684 LearningRate 0.0073 Epoch: 14 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:12,240-Speed 5528.76 samples/sec Loss 3.7153 LearningRate 0.0073 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:14,106-Speed 5488.06 samples/sec Loss 3.6356 LearningRate 0.0073 Epoch: 14 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:15,971-Speed 5495.19 samples/sec Loss 3.6547 LearningRate 0.0073 Epoch: 14 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:17,842-Speed 5475.02 samples/sec Loss 3.6767 LearningRate 0.0073 Epoch: 14 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:19,708-Speed 5490.78 samples/sec Loss 3.5807 LearningRate 0.0073 Epoch: 14 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:21,553-Speed 5550.14 samples/sec Loss 3.7007 LearningRate 0.0073 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:23,417-Speed 5495.52 samples/sec Loss 3.5616 LearningRate 0.0073 Epoch: 14 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:25,277-Speed 5506.61 samples/sec Loss 3.6598 LearningRate 0.0073 Epoch: 14 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:27,134-Speed 5517.65 samples/sec Loss 3.7446 LearningRate 0.0072 Epoch: 14 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:28,984-Speed 5538.28 samples/sec Loss 3.6577 LearningRate 0.0072 Epoch: 14 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:00:30,827-Speed 5556.74 samples/sec Loss 3.6990 LearningRate 0.0072 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:32,687-Speed 5508.20 samples/sec Loss 3.6424 LearningRate 0.0072 Epoch: 14 Global Step: 73960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:34,559-Speed 5472.19 samples/sec Loss 3.6622 LearningRate 0.0072 Epoch: 14 Global Step: 73970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:36,453-Speed 5408.94 samples/sec Loss 3.6009 LearningRate 0.0072 Epoch: 14 Global Step: 73980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:38,333-Speed 5450.19 samples/sec Loss 3.7683 LearningRate 0.0072 Epoch: 14 Global Step: 73990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:00:40,185-Speed 5530.18 samples/sec Loss 3.5959 LearningRate 0.0072 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:01:06,962-[lfw][74000]XNorm: 22.630857 Training: 2022-04-11 15:01:06,963-[lfw][74000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:01:06,963-[lfw][74000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:01:41,269-[cfp_fp][74000]XNorm: 21.012182 Training: 2022-04-11 15:01:41,270-[cfp_fp][74000]Accuracy-Flip: 0.98000+-0.00709 Training: 2022-04-11 15:01:41,270-[cfp_fp][74000]Accuracy-Highest: 0.98057 Training: 2022-04-11 15:02:10,625-[agedb_30][74000]XNorm: 22.659951 Training: 2022-04-11 15:02:10,625-[agedb_30][74000]Accuracy-Flip: 0.97900+-0.00754 Training: 2022-04-11 15:02:10,626-[agedb_30][74000]Accuracy-Highest: 0.98217 Training: 2022-04-11 15:02:12,509-Speed 110.91 samples/sec Loss 3.6705 LearningRate 0.0072 Epoch: 14 Global Step: 74010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:14,360-Speed 5535.16 samples/sec Loss 3.7274 LearningRate 0.0072 Epoch: 14 Global Step: 74020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:16,214-Speed 5524.07 samples/sec Loss 3.6500 LearningRate 0.0072 Epoch: 14 Global Step: 74030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:18,065-Speed 5533.69 samples/sec Loss 3.7218 LearningRate 0.0072 Epoch: 14 Global Step: 74040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:19,902-Speed 5578.13 samples/sec Loss 3.6610 LearningRate 0.0072 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:21,743-Speed 5563.70 samples/sec Loss 3.6391 LearningRate 0.0072 Epoch: 14 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:23,599-Speed 5519.02 samples/sec Loss 3.6772 LearningRate 0.0072 Epoch: 14 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:25,448-Speed 5539.49 samples/sec Loss 3.7759 LearningRate 0.0072 Epoch: 14 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:27,299-Speed 5534.73 samples/sec Loss 3.5913 LearningRate 0.0072 Epoch: 14 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:29,146-Speed 5547.25 samples/sec Loss 3.7152 LearningRate 0.0072 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:30,987-Speed 5565.03 samples/sec Loss 3.6530 LearningRate 0.0072 Epoch: 14 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:32,832-Speed 5550.03 samples/sec Loss 3.6474 LearningRate 0.0071 Epoch: 14 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:34,706-Speed 5465.58 samples/sec Loss 3.6258 LearningRate 0.0071 Epoch: 14 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:36,561-Speed 5523.73 samples/sec Loss 3.6826 LearningRate 0.0071 Epoch: 14 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:38,409-Speed 5544.76 samples/sec Loss 3.6376 LearningRate 0.0071 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:40,301-Speed 5412.08 samples/sec Loss 3.6488 LearningRate 0.0071 Epoch: 14 Global Step: 74160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:42,165-Speed 5497.59 samples/sec Loss 3.7133 LearningRate 0.0071 Epoch: 14 Global Step: 74170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:44,026-Speed 5504.66 samples/sec Loss 3.5619 LearningRate 0.0071 Epoch: 14 Global Step: 74180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:45,869-Speed 5559.42 samples/sec Loss 3.6895 LearningRate 0.0071 Epoch: 14 Global Step: 74190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:47,713-Speed 5555.22 samples/sec Loss 3.7004 LearningRate 0.0071 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:49,576-Speed 5499.64 samples/sec Loss 3.6902 LearningRate 0.0071 Epoch: 14 Global Step: 74210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:51,436-Speed 5508.95 samples/sec Loss 3.6299 LearningRate 0.0071 Epoch: 14 Global Step: 74220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:53,281-Speed 5549.22 samples/sec Loss 3.7100 LearningRate 0.0071 Epoch: 14 Global Step: 74230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:55,183-Speed 5387.40 samples/sec Loss 3.5673 LearningRate 0.0071 Epoch: 14 Global Step: 74240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:02:57,031-Speed 5543.03 samples/sec Loss 3.6730 LearningRate 0.0071 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:02:58,877-Speed 5548.31 samples/sec Loss 3.7107 LearningRate 0.0071 Epoch: 14 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:00,740-Speed 5502.00 samples/sec Loss 3.6596 LearningRate 0.0071 Epoch: 14 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:02,584-Speed 5553.53 samples/sec Loss 3.6972 LearningRate 0.0071 Epoch: 14 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:04,432-Speed 5542.45 samples/sec Loss 3.5839 LearningRate 0.0071 Epoch: 14 Global Step: 74290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:06,297-Speed 5492.87 samples/sec Loss 3.7307 LearningRate 0.0071 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:08,145-Speed 5543.88 samples/sec Loss 3.7998 LearningRate 0.0070 Epoch: 14 Global Step: 74310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:10,011-Speed 5488.54 samples/sec Loss 3.5776 LearningRate 0.0070 Epoch: 14 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:11,900-Speed 5425.23 samples/sec Loss 3.7098 LearningRate 0.0070 Epoch: 14 Global Step: 74330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:13,751-Speed 5531.84 samples/sec Loss 3.6387 LearningRate 0.0070 Epoch: 14 Global Step: 74340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:15,620-Speed 5480.68 samples/sec Loss 3.6414 LearningRate 0.0070 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:17,471-Speed 5533.57 samples/sec Loss 3.7055 LearningRate 0.0070 Epoch: 14 Global Step: 74360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:19,322-Speed 5534.64 samples/sec Loss 3.6837 LearningRate 0.0070 Epoch: 14 Global Step: 74370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:21,173-Speed 5537.92 samples/sec Loss 3.5588 LearningRate 0.0070 Epoch: 14 Global Step: 74380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:23,050-Speed 5457.26 samples/sec Loss 3.7275 LearningRate 0.0070 Epoch: 14 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:24,949-Speed 5391.66 samples/sec Loss 3.5881 LearningRate 0.0070 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:26,833-Speed 5440.03 samples/sec Loss 3.7644 LearningRate 0.0070 Epoch: 14 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:28,698-Speed 5490.27 samples/sec Loss 3.6222 LearningRate 0.0070 Epoch: 14 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:30,568-Speed 5478.43 samples/sec Loss 3.6063 LearningRate 0.0070 Epoch: 14 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:32,415-Speed 5548.10 samples/sec Loss 3.6055 LearningRate 0.0070 Epoch: 14 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:34,274-Speed 5507.88 samples/sec Loss 3.5655 LearningRate 0.0070 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:36,155-Speed 5446.64 samples/sec Loss 3.6043 LearningRate 0.0070 Epoch: 14 Global Step: 74460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:38,006-Speed 5535.20 samples/sec Loss 3.6722 LearningRate 0.0070 Epoch: 14 Global Step: 74470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:39,847-Speed 5562.53 samples/sec Loss 3.7371 LearningRate 0.0070 Epoch: 14 Global Step: 74480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:41,710-Speed 5501.09 samples/sec Loss 3.6725 LearningRate 0.0070 Epoch: 14 Global Step: 74490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:43,561-Speed 5533.69 samples/sec Loss 3.5763 LearningRate 0.0069 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:45,411-Speed 5537.67 samples/sec Loss 3.5388 LearningRate 0.0069 Epoch: 14 Global Step: 74510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:47,304-Speed 5411.32 samples/sec Loss 3.6747 LearningRate 0.0069 Epoch: 14 Global Step: 74520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:49,214-Speed 5360.83 samples/sec Loss 3.6317 LearningRate 0.0069 Epoch: 14 Global Step: 74530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:51,097-Speed 5441.03 samples/sec Loss 3.6333 LearningRate 0.0069 Epoch: 14 Global Step: 74540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:03:52,943-Speed 5549.15 samples/sec Loss 3.5185 LearningRate 0.0069 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:54,816-Speed 5468.76 samples/sec Loss 3.7511 LearningRate 0.0069 Epoch: 14 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:56,664-Speed 5544.11 samples/sec Loss 3.6160 LearningRate 0.0069 Epoch: 14 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:03:58,521-Speed 5516.67 samples/sec Loss 3.6239 LearningRate 0.0069 Epoch: 14 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:04:00,377-Speed 5520.59 samples/sec Loss 3.7595 LearningRate 0.0069 Epoch: 14 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:04:02,248-Speed 5474.18 samples/sec Loss 3.6736 LearningRate 0.0069 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:04,093-Speed 5551.90 samples/sec Loss 3.5967 LearningRate 0.0069 Epoch: 14 Global Step: 74610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:05,933-Speed 5567.55 samples/sec Loss 3.5867 LearningRate 0.0069 Epoch: 14 Global Step: 74620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:07,779-Speed 5549.22 samples/sec Loss 3.5234 LearningRate 0.0069 Epoch: 14 Global Step: 74630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:09,621-Speed 5559.94 samples/sec Loss 3.5342 LearningRate 0.0069 Epoch: 14 Global Step: 74640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:11,480-Speed 5512.23 samples/sec Loss 3.7197 LearningRate 0.0069 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:13,367-Speed 5429.51 samples/sec Loss 3.6469 LearningRate 0.0069 Epoch: 14 Global Step: 74660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:15,205-Speed 5570.50 samples/sec Loss 3.6753 LearningRate 0.0069 Epoch: 14 Global Step: 74670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:17,052-Speed 5549.62 samples/sec Loss 3.5670 LearningRate 0.0069 Epoch: 14 Global Step: 74680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:18,899-Speed 5544.82 samples/sec Loss 3.5516 LearningRate 0.0068 Epoch: 14 Global Step: 74690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:20,742-Speed 5559.68 samples/sec Loss 3.6037 LearningRate 0.0068 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:22,586-Speed 5554.05 samples/sec Loss 3.7028 LearningRate 0.0068 Epoch: 14 Global Step: 74710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:24,431-Speed 5552.86 samples/sec Loss 3.6780 LearningRate 0.0068 Epoch: 14 Global Step: 74720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:26,274-Speed 5558.12 samples/sec Loss 3.6420 LearningRate 0.0068 Epoch: 14 Global Step: 74730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:28,129-Speed 5523.07 samples/sec Loss 3.6332 LearningRate 0.0068 Epoch: 14 Global Step: 74740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:29,981-Speed 5530.65 samples/sec Loss 3.5187 LearningRate 0.0068 Epoch: 14 Global Step: 74750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:31,825-Speed 5555.73 samples/sec Loss 3.5355 LearningRate 0.0068 Epoch: 14 Global Step: 74760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:04:33,675-Speed 5537.29 samples/sec Loss 3.6414 LearningRate 0.0068 Epoch: 14 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:35,537-Speed 5499.01 samples/sec Loss 3.7400 LearningRate 0.0068 Epoch: 14 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:37,448-Speed 5361.40 samples/sec Loss 3.4779 LearningRate 0.0068 Epoch: 14 Global Step: 74790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:39,296-Speed 5542.81 samples/sec Loss 3.6877 LearningRate 0.0068 Epoch: 14 Global Step: 74800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:41,149-Speed 5528.36 samples/sec Loss 3.5901 LearningRate 0.0068 Epoch: 14 Global Step: 74810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:42,993-Speed 5555.47 samples/sec Loss 3.7751 LearningRate 0.0068 Epoch: 14 Global Step: 74820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:44,840-Speed 5546.05 samples/sec Loss 3.5509 LearningRate 0.0068 Epoch: 14 Global Step: 74830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:46,684-Speed 5556.25 samples/sec Loss 3.5631 LearningRate 0.0068 Epoch: 14 Global Step: 74840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:48,532-Speed 5543.02 samples/sec Loss 3.5768 LearningRate 0.0068 Epoch: 14 Global Step: 74850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:50,383-Speed 5533.89 samples/sec Loss 3.6745 LearningRate 0.0068 Epoch: 14 Global Step: 74860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:52,231-Speed 5545.00 samples/sec Loss 3.6070 LearningRate 0.0068 Epoch: 14 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:04:54,084-Speed 5527.19 samples/sec Loss 3.6460 LearningRate 0.0067 Epoch: 14 Global Step: 74880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:55,934-Speed 5537.54 samples/sec Loss 3.4497 LearningRate 0.0067 Epoch: 14 Global Step: 74890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:57,796-Speed 5502.13 samples/sec Loss 3.6382 LearningRate 0.0067 Epoch: 14 Global Step: 74900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:04:59,661-Speed 5491.38 samples/sec Loss 3.5606 LearningRate 0.0067 Epoch: 14 Global Step: 74910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:01,539-Speed 5454.09 samples/sec Loss 3.5342 LearningRate 0.0067 Epoch: 14 Global Step: 74920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:03,406-Speed 5488.88 samples/sec Loss 3.6982 LearningRate 0.0067 Epoch: 14 Global Step: 74930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:05,293-Speed 5428.20 samples/sec Loss 3.8040 LearningRate 0.0067 Epoch: 14 Global Step: 74940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:07,138-Speed 5553.48 samples/sec Loss 3.5263 LearningRate 0.0067 Epoch: 14 Global Step: 74950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:08,983-Speed 5551.43 samples/sec Loss 3.6107 LearningRate 0.0067 Epoch: 14 Global Step: 74960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:10,836-Speed 5527.35 samples/sec Loss 3.7097 LearningRate 0.0067 Epoch: 14 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:12,712-Speed 5461.09 samples/sec Loss 3.7351 LearningRate 0.0067 Epoch: 14 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:05:14,580-Speed 5483.58 samples/sec Loss 3.6796 LearningRate 0.0067 Epoch: 14 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:05:16,424-Speed 5556.51 samples/sec Loss 3.6476 LearningRate 0.0067 Epoch: 14 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:18,273-Speed 5537.02 samples/sec Loss 3.6127 LearningRate 0.0067 Epoch: 14 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:20,124-Speed 5534.91 samples/sec Loss 3.5070 LearningRate 0.0067 Epoch: 14 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:21,974-Speed 5538.31 samples/sec Loss 3.6512 LearningRate 0.0067 Epoch: 14 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:23,837-Speed 5497.10 samples/sec Loss 3.6153 LearningRate 0.0067 Epoch: 14 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:25,696-Speed 5511.51 samples/sec Loss 3.5214 LearningRate 0.0067 Epoch: 14 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:27,544-Speed 5543.95 samples/sec Loss 3.6184 LearningRate 0.0067 Epoch: 14 Global Step: 75060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:29,393-Speed 5541.38 samples/sec Loss 3.6079 LearningRate 0.0067 Epoch: 14 Global Step: 75070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:31,248-Speed 5520.95 samples/sec Loss 3.5736 LearningRate 0.0066 Epoch: 14 Global Step: 75080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:33,100-Speed 5531.79 samples/sec Loss 3.5436 LearningRate 0.0066 Epoch: 14 Global Step: 75090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:34,948-Speed 5541.94 samples/sec Loss 3.6450 LearningRate 0.0066 Epoch: 14 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:05:36,785-Speed 5578.36 samples/sec Loss 3.5911 LearningRate 0.0066 Epoch: 14 Global Step: 75110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:38,640-Speed 5522.18 samples/sec Loss 3.5967 LearningRate 0.0066 Epoch: 14 Global Step: 75120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:40,490-Speed 5533.96 samples/sec Loss 3.6473 LearningRate 0.0066 Epoch: 14 Global Step: 75130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:42,343-Speed 5530.11 samples/sec Loss 3.6364 LearningRate 0.0066 Epoch: 14 Global Step: 75140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:44,193-Speed 5536.59 samples/sec Loss 3.6324 LearningRate 0.0066 Epoch: 14 Global Step: 75150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:46,045-Speed 5532.45 samples/sec Loss 3.5739 LearningRate 0.0066 Epoch: 14 Global Step: 75160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:47,907-Speed 5500.83 samples/sec Loss 3.7522 LearningRate 0.0066 Epoch: 14 Global Step: 75170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:49,753-Speed 5551.27 samples/sec Loss 3.5975 LearningRate 0.0066 Epoch: 14 Global Step: 75180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:51,619-Speed 5490.67 samples/sec Loss 3.6537 LearningRate 0.0066 Epoch: 14 Global Step: 75190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:53,507-Speed 5424.63 samples/sec Loss 3.5462 LearningRate 0.0066 Epoch: 14 Global Step: 75200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:55,351-Speed 5554.18 samples/sec Loss 3.5228 LearningRate 0.0066 Epoch: 14 Global Step: 75210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:57,191-Speed 5567.79 samples/sec Loss 3.5447 LearningRate 0.0066 Epoch: 14 Global Step: 75220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:05:59,034-Speed 5558.02 samples/sec Loss 3.6550 LearningRate 0.0066 Epoch: 14 Global Step: 75230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:00,887-Speed 5530.92 samples/sec Loss 3.5826 LearningRate 0.0066 Epoch: 14 Global Step: 75240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:02,733-Speed 5547.25 samples/sec Loss 3.5062 LearningRate 0.0066 Epoch: 14 Global Step: 75250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:04,588-Speed 5522.63 samples/sec Loss 3.7427 LearningRate 0.0066 Epoch: 14 Global Step: 75260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:06,431-Speed 5558.19 samples/sec Loss 3.6595 LearningRate 0.0066 Epoch: 14 Global Step: 75270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:08,276-Speed 5552.50 samples/sec Loss 3.6258 LearningRate 0.0065 Epoch: 14 Global Step: 75280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:10,131-Speed 5522.28 samples/sec Loss 3.6179 LearningRate 0.0065 Epoch: 14 Global Step: 75290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:12,012-Speed 5447.51 samples/sec Loss 3.6409 LearningRate 0.0065 Epoch: 14 Global Step: 75300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:13,885-Speed 5468.59 samples/sec Loss 3.7102 LearningRate 0.0065 Epoch: 14 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:15,760-Speed 5462.09 samples/sec Loss 3.5869 LearningRate 0.0065 Epoch: 14 Global Step: 75320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:17,637-Speed 5459.06 samples/sec Loss 3.5801 LearningRate 0.0065 Epoch: 14 Global Step: 75330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:19,483-Speed 5549.18 samples/sec Loss 3.7480 LearningRate 0.0065 Epoch: 14 Global Step: 75340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:21,338-Speed 5522.11 samples/sec Loss 3.6146 LearningRate 0.0065 Epoch: 14 Global Step: 75350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:23,188-Speed 5535.40 samples/sec Loss 3.6508 LearningRate 0.0065 Epoch: 14 Global Step: 75360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:25,045-Speed 5517.40 samples/sec Loss 3.7381 LearningRate 0.0065 Epoch: 14 Global Step: 75370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:26,914-Speed 5479.50 samples/sec Loss 3.5495 LearningRate 0.0065 Epoch: 14 Global Step: 75380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:28,765-Speed 5535.14 samples/sec Loss 3.5165 LearningRate 0.0065 Epoch: 14 Global Step: 75390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:30,613-Speed 5544.54 samples/sec Loss 3.7108 LearningRate 0.0065 Epoch: 14 Global Step: 75400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:32,461-Speed 5542.11 samples/sec Loss 3.5956 LearningRate 0.0065 Epoch: 14 Global Step: 75410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:34,305-Speed 5555.00 samples/sec Loss 3.5271 LearningRate 0.0065 Epoch: 14 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:36,186-Speed 5446.23 samples/sec Loss 3.6712 LearningRate 0.0065 Epoch: 14 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:38,039-Speed 5529.60 samples/sec Loss 3.6191 LearningRate 0.0065 Epoch: 14 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:39,887-Speed 5541.87 samples/sec Loss 3.5672 LearningRate 0.0065 Epoch: 14 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:41,743-Speed 5521.07 samples/sec Loss 3.6359 LearningRate 0.0065 Epoch: 14 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:43,603-Speed 5506.29 samples/sec Loss 3.5120 LearningRate 0.0064 Epoch: 14 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:06:45,435-Speed 5590.28 samples/sec Loss 3.5421 LearningRate 0.0064 Epoch: 14 Global Step: 75480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:47,304-Speed 5483.05 samples/sec Loss 3.6068 LearningRate 0.0064 Epoch: 14 Global Step: 75490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:49,171-Speed 5484.25 samples/sec Loss 3.6725 LearningRate 0.0064 Epoch: 14 Global Step: 75500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:51,022-Speed 5535.77 samples/sec Loss 3.5530 LearningRate 0.0064 Epoch: 14 Global Step: 75510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:52,905-Speed 5441.70 samples/sec Loss 3.5945 LearningRate 0.0064 Epoch: 14 Global Step: 75520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:54,782-Speed 5455.63 samples/sec Loss 3.4826 LearningRate 0.0064 Epoch: 14 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:56,630-Speed 5546.08 samples/sec Loss 3.5719 LearningRate 0.0064 Epoch: 14 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:06:58,478-Speed 5542.05 samples/sec Loss 3.5037 LearningRate 0.0064 Epoch: 14 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:00,333-Speed 5522.33 samples/sec Loss 3.5535 LearningRate 0.0064 Epoch: 14 Global Step: 75560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:02,194-Speed 5504.60 samples/sec Loss 3.5986 LearningRate 0.0064 Epoch: 14 Global Step: 75570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:04,069-Speed 5463.49 samples/sec Loss 3.6862 LearningRate 0.0064 Epoch: 14 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:05,940-Speed 5474.81 samples/sec Loss 3.4611 LearningRate 0.0064 Epoch: 14 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:07,803-Speed 5497.02 samples/sec Loss 3.6180 LearningRate 0.0064 Epoch: 14 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:09,650-Speed 5546.29 samples/sec Loss 3.5684 LearningRate 0.0064 Epoch: 14 Global Step: 75610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:11,546-Speed 5406.12 samples/sec Loss 3.5538 LearningRate 0.0064 Epoch: 14 Global Step: 75620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:13,400-Speed 5525.32 samples/sec Loss 3.6676 LearningRate 0.0064 Epoch: 14 Global Step: 75630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:15,270-Speed 5475.75 samples/sec Loss 3.5585 LearningRate 0.0064 Epoch: 14 Global Step: 75640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:17,117-Speed 5546.69 samples/sec Loss 3.5486 LearningRate 0.0064 Epoch: 14 Global Step: 75650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:18,971-Speed 5524.49 samples/sec Loss 3.4599 LearningRate 0.0064 Epoch: 14 Global Step: 75660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:20,820-Speed 5539.64 samples/sec Loss 3.6505 LearningRate 0.0063 Epoch: 14 Global Step: 75670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:22,706-Speed 5433.95 samples/sec Loss 3.6787 LearningRate 0.0063 Epoch: 14 Global Step: 75680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:24,580-Speed 5463.70 samples/sec Loss 3.6664 LearningRate 0.0063 Epoch: 14 Global Step: 75690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:26,451-Speed 5475.85 samples/sec Loss 3.5857 LearningRate 0.0063 Epoch: 14 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:28,324-Speed 5468.83 samples/sec Loss 3.6008 LearningRate 0.0063 Epoch: 14 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:30,174-Speed 5539.17 samples/sec Loss 3.6090 LearningRate 0.0063 Epoch: 14 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:32,030-Speed 5519.00 samples/sec Loss 3.5566 LearningRate 0.0063 Epoch: 14 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:33,886-Speed 5520.73 samples/sec Loss 3.6699 LearningRate 0.0063 Epoch: 14 Global Step: 75740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:35,745-Speed 5510.82 samples/sec Loss 3.5547 LearningRate 0.0063 Epoch: 14 Global Step: 75750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:37,601-Speed 5517.43 samples/sec Loss 3.6076 LearningRate 0.0063 Epoch: 14 Global Step: 75760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:39,455-Speed 5526.97 samples/sec Loss 3.7757 LearningRate 0.0063 Epoch: 14 Global Step: 75770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:41,346-Speed 5415.54 samples/sec Loss 3.5572 LearningRate 0.0063 Epoch: 14 Global Step: 75780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:43,234-Speed 5425.93 samples/sec Loss 3.5546 LearningRate 0.0063 Epoch: 14 Global Step: 75790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:45,081-Speed 5547.60 samples/sec Loss 3.6777 LearningRate 0.0063 Epoch: 14 Global Step: 75800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:46,960-Speed 5451.89 samples/sec Loss 3.5845 LearningRate 0.0063 Epoch: 14 Global Step: 75810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:48,818-Speed 5512.85 samples/sec Loss 3.6246 LearningRate 0.0063 Epoch: 14 Global Step: 75820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:50,669-Speed 5532.62 samples/sec Loss 3.5842 LearningRate 0.0063 Epoch: 14 Global Step: 75830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:52,553-Speed 5440.44 samples/sec Loss 3.5097 LearningRate 0.0063 Epoch: 14 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:54,406-Speed 5527.83 samples/sec Loss 3.6059 LearningRate 0.0063 Epoch: 14 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:07:56,317-Speed 5361.45 samples/sec Loss 3.5102 LearningRate 0.0063 Epoch: 14 Global Step: 75860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:07:58,136-Speed 5630.07 samples/sec Loss 3.5701 LearningRate 0.0063 Epoch: 14 Global Step: 75870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:09,042-Speed 938.98 samples/sec Loss 2.7754 LearningRate 0.0062 Epoch: 15 Global Step: 75880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:10,921-Speed 5453.28 samples/sec Loss 2.7378 LearningRate 0.0062 Epoch: 15 Global Step: 75890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:12,927-Speed 5105.76 samples/sec Loss 2.7984 LearningRate 0.0062 Epoch: 15 Global Step: 75900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:14,790-Speed 5497.97 samples/sec Loss 2.8234 LearningRate 0.0062 Epoch: 15 Global Step: 75910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:16,789-Speed 5125.26 samples/sec Loss 2.7623 LearningRate 0.0062 Epoch: 15 Global Step: 75920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:18,637-Speed 5543.13 samples/sec Loss 2.7584 LearningRate 0.0062 Epoch: 15 Global Step: 75930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:20,488-Speed 5533.47 samples/sec Loss 2.6982 LearningRate 0.0062 Epoch: 15 Global Step: 75940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:22,339-Speed 5533.29 samples/sec Loss 2.6679 LearningRate 0.0062 Epoch: 15 Global Step: 75950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:24,188-Speed 5546.37 samples/sec Loss 2.8115 LearningRate 0.0062 Epoch: 15 Global Step: 75960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:26,050-Speed 5501.02 samples/sec Loss 2.8745 LearningRate 0.0062 Epoch: 15 Global Step: 75970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:27,906-Speed 5517.40 samples/sec Loss 2.6047 LearningRate 0.0062 Epoch: 15 Global Step: 75980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:29,755-Speed 5539.35 samples/sec Loss 2.7713 LearningRate 0.0062 Epoch: 15 Global Step: 75990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:31,606-Speed 5535.57 samples/sec Loss 2.7014 LearningRate 0.0062 Epoch: 15 Global Step: 76000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:08:58,281-[lfw][76000]XNorm: 21.818461 Training: 2022-04-11 15:08:58,282-[lfw][76000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:08:58,282-[lfw][76000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:09:29,081-[cfp_fp][76000]XNorm: 20.213919 Training: 2022-04-11 15:09:29,081-[cfp_fp][76000]Accuracy-Flip: 0.97971+-0.00534 Training: 2022-04-11 15:09:29,082-[cfp_fp][76000]Accuracy-Highest: 0.98057 Training: 2022-04-11 15:09:55,638-[agedb_30][76000]XNorm: 21.908650 Training: 2022-04-11 15:09:55,638-[agedb_30][76000]Accuracy-Flip: 0.98267+-0.00684 Training: 2022-04-11 15:09:55,639-[agedb_30][76000]Accuracy-Highest: 0.98267 Training: 2022-04-11 15:09:57,500-Speed 119.22 samples/sec Loss 2.7129 LearningRate 0.0062 Epoch: 15 Global Step: 76010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:09:59,343-Speed 5558.50 samples/sec Loss 2.8287 LearningRate 0.0062 Epoch: 15 Global Step: 76020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:01,182-Speed 5567.92 samples/sec Loss 2.7993 LearningRate 0.0062 Epoch: 15 Global Step: 76030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:03,041-Speed 5509.15 samples/sec Loss 2.7418 LearningRate 0.0062 Epoch: 15 Global Step: 76040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:04,883-Speed 5562.74 samples/sec Loss 2.8485 LearningRate 0.0062 Epoch: 15 Global Step: 76050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:06,723-Speed 5566.99 samples/sec Loss 2.9035 LearningRate 0.0062 Epoch: 15 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:10:08,565-Speed 5560.68 samples/sec Loss 2.7406 LearningRate 0.0062 Epoch: 15 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:10:10,404-Speed 5570.64 samples/sec Loss 2.8702 LearningRate 0.0061 Epoch: 15 Global Step: 76080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:12,255-Speed 5535.76 samples/sec Loss 2.8218 LearningRate 0.0061 Epoch: 15 Global Step: 76090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:14,097-Speed 5559.86 samples/sec Loss 2.9585 LearningRate 0.0061 Epoch: 15 Global Step: 76100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:15,948-Speed 5532.92 samples/sec Loss 2.9779 LearningRate 0.0061 Epoch: 15 Global Step: 76110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:17,814-Speed 5489.21 samples/sec Loss 2.8627 LearningRate 0.0061 Epoch: 15 Global Step: 76120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:19,659-Speed 5552.74 samples/sec Loss 2.7646 LearningRate 0.0061 Epoch: 15 Global Step: 76130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:21,503-Speed 5556.06 samples/sec Loss 2.7678 LearningRate 0.0061 Epoch: 15 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:23,351-Speed 5543.12 samples/sec Loss 2.8261 LearningRate 0.0061 Epoch: 15 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:25,204-Speed 5529.62 samples/sec Loss 2.7687 LearningRate 0.0061 Epoch: 15 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:27,045-Speed 5564.14 samples/sec Loss 2.7912 LearningRate 0.0061 Epoch: 15 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:28,886-Speed 5563.98 samples/sec Loss 2.8709 LearningRate 0.0061 Epoch: 15 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:10:30,724-Speed 5573.84 samples/sec Loss 2.9218 LearningRate 0.0061 Epoch: 15 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:32,566-Speed 5560.93 samples/sec Loss 2.7945 LearningRate 0.0061 Epoch: 15 Global Step: 76200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:34,424-Speed 5514.55 samples/sec Loss 2.9343 LearningRate 0.0061 Epoch: 15 Global Step: 76210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:36,261-Speed 5576.04 samples/sec Loss 2.8321 LearningRate 0.0061 Epoch: 15 Global Step: 76220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:38,109-Speed 5541.99 samples/sec Loss 2.8423 LearningRate 0.0061 Epoch: 15 Global Step: 76230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:39,959-Speed 5537.28 samples/sec Loss 2.8763 LearningRate 0.0061 Epoch: 15 Global Step: 76240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:41,803-Speed 5555.28 samples/sec Loss 2.8840 LearningRate 0.0061 Epoch: 15 Global Step: 76250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:43,647-Speed 5554.32 samples/sec Loss 2.8272 LearningRate 0.0061 Epoch: 15 Global Step: 76260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:45,495-Speed 5543.27 samples/sec Loss 2.7328 LearningRate 0.0061 Epoch: 15 Global Step: 76270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:47,352-Speed 5519.35 samples/sec Loss 2.8747 LearningRate 0.0060 Epoch: 15 Global Step: 76280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:49,194-Speed 5558.75 samples/sec Loss 2.8526 LearningRate 0.0060 Epoch: 15 Global Step: 76290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:51,055-Speed 5506.55 samples/sec Loss 2.9074 LearningRate 0.0060 Epoch: 15 Global Step: 76300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:52,918-Speed 5498.96 samples/sec Loss 2.7954 LearningRate 0.0060 Epoch: 15 Global Step: 76310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:10:54,761-Speed 5555.58 samples/sec Loss 2.9631 LearningRate 0.0060 Epoch: 15 Global Step: 76320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:56,606-Speed 5555.40 samples/sec Loss 2.9173 LearningRate 0.0060 Epoch: 15 Global Step: 76330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:10:58,474-Speed 5482.02 samples/sec Loss 2.8849 LearningRate 0.0060 Epoch: 15 Global Step: 76340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:00,335-Speed 5506.02 samples/sec Loss 2.8130 LearningRate 0.0060 Epoch: 15 Global Step: 76350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:02,198-Speed 5497.48 samples/sec Loss 2.9340 LearningRate 0.0060 Epoch: 15 Global Step: 76360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:04,074-Speed 5461.85 samples/sec Loss 2.7699 LearningRate 0.0060 Epoch: 15 Global Step: 76370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:05,923-Speed 5540.42 samples/sec Loss 2.9085 LearningRate 0.0060 Epoch: 15 Global Step: 76380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:07,776-Speed 5527.79 samples/sec Loss 2.7912 LearningRate 0.0060 Epoch: 15 Global Step: 76390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:09,619-Speed 5555.83 samples/sec Loss 2.9010 LearningRate 0.0060 Epoch: 15 Global Step: 76400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:11,522-Speed 5384.45 samples/sec Loss 2.9896 LearningRate 0.0060 Epoch: 15 Global Step: 76410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:13,380-Speed 5513.96 samples/sec Loss 2.8610 LearningRate 0.0060 Epoch: 15 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:11:15,238-Speed 5513.37 samples/sec Loss 2.9954 LearningRate 0.0060 Epoch: 15 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:11:17,074-Speed 5580.61 samples/sec Loss 2.9093 LearningRate 0.0060 Epoch: 15 Global Step: 76440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:18,915-Speed 5562.70 samples/sec Loss 2.9848 LearningRate 0.0060 Epoch: 15 Global Step: 76450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:20,761-Speed 5549.87 samples/sec Loss 2.9188 LearningRate 0.0060 Epoch: 15 Global Step: 76460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:22,610-Speed 5539.92 samples/sec Loss 2.9369 LearningRate 0.0060 Epoch: 15 Global Step: 76470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:24,458-Speed 5544.29 samples/sec Loss 2.9652 LearningRate 0.0060 Epoch: 15 Global Step: 76480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:26,334-Speed 5459.12 samples/sec Loss 2.9418 LearningRate 0.0059 Epoch: 15 Global Step: 76490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:28,182-Speed 5544.32 samples/sec Loss 2.9531 LearningRate 0.0059 Epoch: 15 Global Step: 76500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:30,028-Speed 5549.17 samples/sec Loss 2.9471 LearningRate 0.0059 Epoch: 15 Global Step: 76510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:31,878-Speed 5536.94 samples/sec Loss 2.9733 LearningRate 0.0059 Epoch: 15 Global Step: 76520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:33,726-Speed 5541.23 samples/sec Loss 3.0073 LearningRate 0.0059 Epoch: 15 Global Step: 76530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:35,589-Speed 5500.91 samples/sec Loss 2.9363 LearningRate 0.0059 Epoch: 15 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:11:37,438-Speed 5539.91 samples/sec Loss 2.9736 LearningRate 0.0059 Epoch: 15 Global Step: 76550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:39,286-Speed 5544.66 samples/sec Loss 2.9220 LearningRate 0.0059 Epoch: 15 Global Step: 76560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:41,131-Speed 5550.28 samples/sec Loss 2.8487 LearningRate 0.0059 Epoch: 15 Global Step: 76570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:42,979-Speed 5545.04 samples/sec Loss 2.9284 LearningRate 0.0059 Epoch: 15 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:44,822-Speed 5557.55 samples/sec Loss 2.9567 LearningRate 0.0059 Epoch: 15 Global Step: 76590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:46,671-Speed 5540.39 samples/sec Loss 2.9403 LearningRate 0.0059 Epoch: 15 Global Step: 76600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:48,520-Speed 5539.33 samples/sec Loss 2.9155 LearningRate 0.0059 Epoch: 15 Global Step: 76610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:50,377-Speed 5514.59 samples/sec Loss 3.0326 LearningRate 0.0059 Epoch: 15 Global Step: 76620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:52,305-Speed 5314.40 samples/sec Loss 2.9421 LearningRate 0.0059 Epoch: 15 Global Step: 76630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:54,169-Speed 5496.24 samples/sec Loss 2.8912 LearningRate 0.0059 Epoch: 15 Global Step: 76640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:11:56,015-Speed 5548.31 samples/sec Loss 2.9475 LearningRate 0.0059 Epoch: 15 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:11:57,869-Speed 5525.53 samples/sec Loss 2.9749 LearningRate 0.0059 Epoch: 15 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:11:59,716-Speed 5546.76 samples/sec Loss 2.8457 LearningRate 0.0059 Epoch: 15 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:01,579-Speed 5499.00 samples/sec Loss 2.9540 LearningRate 0.0059 Epoch: 15 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:03,439-Speed 5508.63 samples/sec Loss 2.9591 LearningRate 0.0059 Epoch: 15 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:05,360-Speed 5331.56 samples/sec Loss 2.9974 LearningRate 0.0058 Epoch: 15 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:07,217-Speed 5516.54 samples/sec Loss 2.9372 LearningRate 0.0058 Epoch: 15 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:09,066-Speed 5539.87 samples/sec Loss 2.9851 LearningRate 0.0058 Epoch: 15 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:10,921-Speed 5522.18 samples/sec Loss 2.9604 LearningRate 0.0058 Epoch: 15 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:12,769-Speed 5541.47 samples/sec Loss 2.9912 LearningRate 0.0058 Epoch: 15 Global Step: 76740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:14,654-Speed 5433.59 samples/sec Loss 2.9410 LearningRate 0.0058 Epoch: 15 Global Step: 76750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:16,521-Speed 5487.66 samples/sec Loss 3.0951 LearningRate 0.0058 Epoch: 15 Global Step: 76760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:18,382-Speed 5504.24 samples/sec Loss 2.9920 LearningRate 0.0058 Epoch: 15 Global Step: 76770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:20,229-Speed 5547.96 samples/sec Loss 2.9921 LearningRate 0.0058 Epoch: 15 Global Step: 76780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:22,081-Speed 5530.59 samples/sec Loss 2.9890 LearningRate 0.0058 Epoch: 15 Global Step: 76790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:23,929-Speed 5545.13 samples/sec Loss 3.0180 LearningRate 0.0058 Epoch: 15 Global Step: 76800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:25,786-Speed 5515.77 samples/sec Loss 2.9854 LearningRate 0.0058 Epoch: 15 Global Step: 76810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:27,646-Speed 5506.67 samples/sec Loss 3.0525 LearningRate 0.0058 Epoch: 15 Global Step: 76820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:29,492-Speed 5547.51 samples/sec Loss 2.8824 LearningRate 0.0058 Epoch: 15 Global Step: 76830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:31,387-Speed 5408.79 samples/sec Loss 2.9772 LearningRate 0.0058 Epoch: 15 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:33,234-Speed 5543.55 samples/sec Loss 2.9974 LearningRate 0.0058 Epoch: 15 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:35,081-Speed 5546.36 samples/sec Loss 3.0456 LearningRate 0.0058 Epoch: 15 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:36,938-Speed 5518.01 samples/sec Loss 2.9265 LearningRate 0.0058 Epoch: 15 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:38,799-Speed 5503.74 samples/sec Loss 3.0041 LearningRate 0.0058 Epoch: 15 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:40,667-Speed 5484.36 samples/sec Loss 2.9753 LearningRate 0.0058 Epoch: 15 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:42,524-Speed 5517.04 samples/sec Loss 3.0365 LearningRate 0.0058 Epoch: 15 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:44,387-Speed 5498.02 samples/sec Loss 2.9443 LearningRate 0.0057 Epoch: 15 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:46,240-Speed 5530.42 samples/sec Loss 3.0777 LearningRate 0.0057 Epoch: 15 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:12:48,078-Speed 5570.26 samples/sec Loss 2.9337 LearningRate 0.0057 Epoch: 15 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:49,929-Speed 5536.11 samples/sec Loss 3.0609 LearningRate 0.0057 Epoch: 15 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:51,798-Speed 5480.75 samples/sec Loss 2.9568 LearningRate 0.0057 Epoch: 15 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:53,658-Speed 5506.65 samples/sec Loss 2.9468 LearningRate 0.0057 Epoch: 15 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:55,517-Speed 5509.07 samples/sec Loss 2.9990 LearningRate 0.0057 Epoch: 15 Global Step: 76970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:57,363-Speed 5550.34 samples/sec Loss 2.9609 LearningRate 0.0057 Epoch: 15 Global Step: 76980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:12:59,212-Speed 5541.37 samples/sec Loss 3.0534 LearningRate 0.0057 Epoch: 15 Global Step: 76990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:01,119-Speed 5373.33 samples/sec Loss 3.0563 LearningRate 0.0057 Epoch: 15 Global Step: 77000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:03,023-Speed 5377.81 samples/sec Loss 3.0476 LearningRate 0.0057 Epoch: 15 Global Step: 77010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:04,888-Speed 5494.08 samples/sec Loss 3.0185 LearningRate 0.0057 Epoch: 15 Global Step: 77020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:06,755-Speed 5487.24 samples/sec Loss 2.9983 LearningRate 0.0057 Epoch: 15 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:08,632-Speed 5455.61 samples/sec Loss 3.0698 LearningRate 0.0057 Epoch: 15 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:10,493-Speed 5505.63 samples/sec Loss 3.1937 LearningRate 0.0057 Epoch: 15 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:12,409-Speed 5345.53 samples/sec Loss 3.0202 LearningRate 0.0057 Epoch: 15 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:14,313-Speed 5380.37 samples/sec Loss 3.0660 LearningRate 0.0057 Epoch: 15 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:16,164-Speed 5536.22 samples/sec Loss 3.0120 LearningRate 0.0057 Epoch: 15 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 15:13:18,024-Speed 5505.63 samples/sec Loss 2.9757 LearningRate 0.0057 Epoch: 15 Global Step: 77090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:19,873-Speed 5540.38 samples/sec Loss 2.9721 LearningRate 0.0057 Epoch: 15 Global Step: 77100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:21,733-Speed 5509.88 samples/sec Loss 3.0155 LearningRate 0.0057 Epoch: 15 Global Step: 77110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:23,607-Speed 5466.60 samples/sec Loss 3.0626 LearningRate 0.0056 Epoch: 15 Global Step: 77120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:25,474-Speed 5484.01 samples/sec Loss 3.0022 LearningRate 0.0056 Epoch: 15 Global Step: 77130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:27,386-Speed 5358.18 samples/sec Loss 2.9578 LearningRate 0.0056 Epoch: 15 Global Step: 77140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:29,251-Speed 5492.73 samples/sec Loss 3.0516 LearningRate 0.0056 Epoch: 15 Global Step: 77150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:31,136-Speed 5434.78 samples/sec Loss 3.0644 LearningRate 0.0056 Epoch: 15 Global Step: 77160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:32,996-Speed 5508.24 samples/sec Loss 3.0597 LearningRate 0.0056 Epoch: 15 Global Step: 77170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:34,845-Speed 5540.53 samples/sec Loss 3.0235 LearningRate 0.0056 Epoch: 15 Global Step: 77180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:36,735-Speed 5419.27 samples/sec Loss 2.9979 LearningRate 0.0056 Epoch: 15 Global Step: 77190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:38,580-Speed 5553.07 samples/sec Loss 3.0393 LearningRate 0.0056 Epoch: 15 Global Step: 77200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:40,456-Speed 5459.15 samples/sec Loss 3.1135 LearningRate 0.0056 Epoch: 15 Global Step: 77210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:42,336-Speed 5448.54 samples/sec Loss 3.0942 LearningRate 0.0056 Epoch: 15 Global Step: 77220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:44,184-Speed 5543.68 samples/sec Loss 3.0867 LearningRate 0.0056 Epoch: 15 Global Step: 77230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:46,037-Speed 5530.97 samples/sec Loss 3.0197 LearningRate 0.0056 Epoch: 15 Global Step: 77240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:13:47,894-Speed 5514.86 samples/sec Loss 3.0582 LearningRate 0.0056 Epoch: 15 Global Step: 77250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:49,777-Speed 5440.41 samples/sec Loss 3.0947 LearningRate 0.0056 Epoch: 15 Global Step: 77260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:51,647-Speed 5476.65 samples/sec Loss 3.0380 LearningRate 0.0056 Epoch: 15 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:53,509-Speed 5501.54 samples/sec Loss 3.0031 LearningRate 0.0056 Epoch: 15 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:55,364-Speed 5522.92 samples/sec Loss 2.9841 LearningRate 0.0056 Epoch: 15 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:57,221-Speed 5515.55 samples/sec Loss 3.1834 LearningRate 0.0056 Epoch: 15 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:13:59,069-Speed 5543.49 samples/sec Loss 3.0020 LearningRate 0.0056 Epoch: 15 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:00,921-Speed 5530.98 samples/sec Loss 2.9481 LearningRate 0.0056 Epoch: 15 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:02,779-Speed 5515.24 samples/sec Loss 3.1077 LearningRate 0.0055 Epoch: 15 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:04,652-Speed 5469.47 samples/sec Loss 3.0633 LearningRate 0.0055 Epoch: 15 Global Step: 77340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:06,494-Speed 5562.32 samples/sec Loss 3.1393 LearningRate 0.0055 Epoch: 15 Global Step: 77350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:08,346-Speed 5530.65 samples/sec Loss 3.0731 LearningRate 0.0055 Epoch: 15 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:10,209-Speed 5497.15 samples/sec Loss 2.9309 LearningRate 0.0055 Epoch: 15 Global Step: 77370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:12,107-Speed 5398.71 samples/sec Loss 3.0679 LearningRate 0.0055 Epoch: 15 Global Step: 77380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:13,972-Speed 5492.58 samples/sec Loss 3.0352 LearningRate 0.0055 Epoch: 15 Global Step: 77390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:15,832-Speed 5506.89 samples/sec Loss 2.9564 LearningRate 0.0055 Epoch: 15 Global Step: 77400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:17,691-Speed 5509.19 samples/sec Loss 3.0237 LearningRate 0.0055 Epoch: 15 Global Step: 77410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:19,547-Speed 5520.07 samples/sec Loss 2.9887 LearningRate 0.0055 Epoch: 15 Global Step: 77420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:21,413-Speed 5488.87 samples/sec Loss 3.0368 LearningRate 0.0055 Epoch: 15 Global Step: 77430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:23,275-Speed 5501.56 samples/sec Loss 3.1010 LearningRate 0.0055 Epoch: 15 Global Step: 77440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:25,129-Speed 5526.97 samples/sec Loss 3.0542 LearningRate 0.0055 Epoch: 15 Global Step: 77450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:27,016-Speed 5428.98 samples/sec Loss 3.0429 LearningRate 0.0055 Epoch: 15 Global Step: 77460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:28,861-Speed 5550.29 samples/sec Loss 2.9228 LearningRate 0.0055 Epoch: 15 Global Step: 77470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:30,714-Speed 5529.28 samples/sec Loss 3.1464 LearningRate 0.0055 Epoch: 15 Global Step: 77480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:32,567-Speed 5529.22 samples/sec Loss 3.0912 LearningRate 0.0055 Epoch: 15 Global Step: 77490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:34,411-Speed 5553.96 samples/sec Loss 3.0366 LearningRate 0.0055 Epoch: 15 Global Step: 77500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:36,300-Speed 5424.85 samples/sec Loss 3.1299 LearningRate 0.0055 Epoch: 15 Global Step: 77510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:38,148-Speed 5542.57 samples/sec Loss 3.0720 LearningRate 0.0055 Epoch: 15 Global Step: 77520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:40,000-Speed 5530.58 samples/sec Loss 3.1120 LearningRate 0.0055 Epoch: 15 Global Step: 77530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:41,864-Speed 5495.43 samples/sec Loss 2.9806 LearningRate 0.0055 Epoch: 15 Global Step: 77540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:43,726-Speed 5500.38 samples/sec Loss 2.9533 LearningRate 0.0054 Epoch: 15 Global Step: 77550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:45,579-Speed 5527.97 samples/sec Loss 3.0176 LearningRate 0.0054 Epoch: 15 Global Step: 77560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:47,446-Speed 5489.25 samples/sec Loss 3.0688 LearningRate 0.0054 Epoch: 15 Global Step: 77570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 15:14:49,298-Speed 5531.26 samples/sec Loss 3.0472 LearningRate 0.0054 Epoch: 15 Global Step: 77580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 15:14:51,149-Speed 5533.00 samples/sec Loss 2.9988 LearningRate 0.0054 Epoch: 15 Global Step: 77590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:14:53,002-Speed 5527.35 samples/sec Loss 3.0412 LearningRate 0.0054 Epoch: 15 Global Step: 77600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:14:54,861-Speed 5511.72 samples/sec Loss 2.9801 LearningRate 0.0054 Epoch: 15 Global Step: 77610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:14:56,733-Speed 5473.48 samples/sec Loss 3.1367 LearningRate 0.0054 Epoch: 15 Global Step: 77620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:14:58,600-Speed 5485.66 samples/sec Loss 3.0675 LearningRate 0.0054 Epoch: 15 Global Step: 77630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:00,486-Speed 5429.75 samples/sec Loss 3.0931 LearningRate 0.0054 Epoch: 15 Global Step: 77640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:02,360-Speed 5467.58 samples/sec Loss 3.0733 LearningRate 0.0054 Epoch: 15 Global Step: 77650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:04,220-Speed 5506.20 samples/sec Loss 3.0653 LearningRate 0.0054 Epoch: 15 Global Step: 77660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:06,076-Speed 5521.16 samples/sec Loss 3.0354 LearningRate 0.0054 Epoch: 15 Global Step: 77670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:07,929-Speed 5527.42 samples/sec Loss 3.0430 LearningRate 0.0054 Epoch: 15 Global Step: 77680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:09,785-Speed 5519.50 samples/sec Loss 2.9559 LearningRate 0.0054 Epoch: 15 Global Step: 77690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:11,645-Speed 5507.54 samples/sec Loss 2.9946 LearningRate 0.0054 Epoch: 15 Global Step: 77700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:13,541-Speed 5402.54 samples/sec Loss 3.0442 LearningRate 0.0054 Epoch: 15 Global Step: 77710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:15,397-Speed 5519.14 samples/sec Loss 3.0894 LearningRate 0.0054 Epoch: 15 Global Step: 77720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:17,246-Speed 5542.59 samples/sec Loss 3.0249 LearningRate 0.0054 Epoch: 15 Global Step: 77730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:19,103-Speed 5515.05 samples/sec Loss 3.0811 LearningRate 0.0054 Epoch: 15 Global Step: 77740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:20,953-Speed 5537.20 samples/sec Loss 3.0832 LearningRate 0.0054 Epoch: 15 Global Step: 77750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:22,812-Speed 5509.58 samples/sec Loss 3.1343 LearningRate 0.0054 Epoch: 15 Global Step: 77760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:24,675-Speed 5497.75 samples/sec Loss 3.0979 LearningRate 0.0053 Epoch: 15 Global Step: 77770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:26,532-Speed 5518.08 samples/sec Loss 3.1314 LearningRate 0.0053 Epoch: 15 Global Step: 77780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:28,382-Speed 5537.77 samples/sec Loss 2.9793 LearningRate 0.0053 Epoch: 15 Global Step: 77790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:30,250-Speed 5484.07 samples/sec Loss 3.0428 LearningRate 0.0053 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:15:32,095-Speed 5553.98 samples/sec Loss 2.9435 LearningRate 0.0053 Epoch: 15 Global Step: 77810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:33,953-Speed 5511.88 samples/sec Loss 3.0064 LearningRate 0.0053 Epoch: 15 Global Step: 77820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:35,820-Speed 5486.44 samples/sec Loss 3.1422 LearningRate 0.0053 Epoch: 15 Global Step: 77830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:37,681-Speed 5505.19 samples/sec Loss 3.0719 LearningRate 0.0053 Epoch: 15 Global Step: 77840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:39,544-Speed 5498.75 samples/sec Loss 3.1168 LearningRate 0.0053 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:41,401-Speed 5515.17 samples/sec Loss 3.0785 LearningRate 0.0053 Epoch: 15 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:43,263-Speed 5502.27 samples/sec Loss 3.0281 LearningRate 0.0053 Epoch: 15 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:45,111-Speed 5543.59 samples/sec Loss 2.9961 LearningRate 0.0053 Epoch: 15 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:46,964-Speed 5527.77 samples/sec Loss 2.9803 LearningRate 0.0053 Epoch: 15 Global Step: 77890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:48,823-Speed 5510.94 samples/sec Loss 3.0419 LearningRate 0.0053 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:50,660-Speed 5575.71 samples/sec Loss 3.0352 LearningRate 0.0053 Epoch: 15 Global Step: 77910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:52,518-Speed 5514.68 samples/sec Loss 3.0410 LearningRate 0.0053 Epoch: 15 Global Step: 77920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:54,377-Speed 5510.02 samples/sec Loss 3.0198 LearningRate 0.0053 Epoch: 15 Global Step: 77930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:56,229-Speed 5530.37 samples/sec Loss 3.0897 LearningRate 0.0053 Epoch: 15 Global Step: 77940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:58,077-Speed 5543.11 samples/sec Loss 3.1617 LearningRate 0.0053 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:15:59,933-Speed 5520.69 samples/sec Loss 3.1506 LearningRate 0.0053 Epoch: 15 Global Step: 77960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:16:01,784-Speed 5533.10 samples/sec Loss 3.1071 LearningRate 0.0053 Epoch: 15 Global Step: 77970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:16:03,659-Speed 5463.37 samples/sec Loss 3.0596 LearningRate 0.0053 Epoch: 15 Global Step: 77980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:16:05,518-Speed 5509.23 samples/sec Loss 3.0366 LearningRate 0.0052 Epoch: 15 Global Step: 77990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:16:07,388-Speed 5478.90 samples/sec Loss 3.1170 LearningRate 0.0052 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:16:33,998-[lfw][78000]XNorm: 23.235217 Training: 2022-04-11 15:16:33,999-[lfw][78000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 15:16:34,000-[lfw][78000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:17:04,622-[cfp_fp][78000]XNorm: 21.598676 Training: 2022-04-11 15:17:04,623-[cfp_fp][78000]Accuracy-Flip: 0.98157+-0.00757 Training: 2022-04-11 15:17:04,623-[cfp_fp][78000]Accuracy-Highest: 0.98157 Training: 2022-04-11 15:17:31,096-[agedb_30][78000]XNorm: 23.207164 Training: 2022-04-11 15:17:31,097-[agedb_30][78000]Accuracy-Flip: 0.97950+-0.00723 Training: 2022-04-11 15:17:31,097-[agedb_30][78000]Accuracy-Highest: 0.98267 Training: 2022-04-11 15:17:32,953-Speed 119.68 samples/sec Loss 3.1293 LearningRate 0.0052 Epoch: 15 Global Step: 78010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:17:34,798-Speed 5552.12 samples/sec Loss 3.0485 LearningRate 0.0052 Epoch: 15 Global Step: 78020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:17:36,622-Speed 5616.18 samples/sec Loss 3.0384 LearningRate 0.0052 Epoch: 15 Global Step: 78030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:17:38,452-Speed 5597.79 samples/sec Loss 3.0227 LearningRate 0.0052 Epoch: 15 Global Step: 78040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:17:40,288-Speed 5579.40 samples/sec Loss 3.1949 LearningRate 0.0052 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:17:42,128-Speed 5567.50 samples/sec Loss 3.1347 LearningRate 0.0052 Epoch: 15 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:43,950-Speed 5620.86 samples/sec Loss 3.1305 LearningRate 0.0052 Epoch: 15 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:45,774-Speed 5617.11 samples/sec Loss 3.0919 LearningRate 0.0052 Epoch: 15 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:47,616-Speed 5561.57 samples/sec Loss 3.0798 LearningRate 0.0052 Epoch: 15 Global Step: 78090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:49,447-Speed 5595.40 samples/sec Loss 3.1009 LearningRate 0.0052 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:51,306-Speed 5511.35 samples/sec Loss 3.0278 LearningRate 0.0052 Epoch: 15 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:53,144-Speed 5573.77 samples/sec Loss 2.8683 LearningRate 0.0052 Epoch: 15 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:54,972-Speed 5604.01 samples/sec Loss 3.2062 LearningRate 0.0052 Epoch: 15 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:56,800-Speed 5603.07 samples/sec Loss 3.1088 LearningRate 0.0052 Epoch: 15 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:17:58,630-Speed 5597.25 samples/sec Loss 3.1370 LearningRate 0.0052 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:00,446-Speed 5641.60 samples/sec Loss 3.1085 LearningRate 0.0052 Epoch: 15 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:02,272-Speed 5609.20 samples/sec Loss 3.0761 LearningRate 0.0052 Epoch: 15 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:04,110-Speed 5575.02 samples/sec Loss 3.2011 LearningRate 0.0052 Epoch: 15 Global Step: 78180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:05,950-Speed 5565.87 samples/sec Loss 3.0625 LearningRate 0.0052 Epoch: 15 Global Step: 78190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:07,777-Speed 5606.61 samples/sec Loss 3.0644 LearningRate 0.0052 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:09,607-Speed 5596.31 samples/sec Loss 3.1058 LearningRate 0.0051 Epoch: 15 Global Step: 78210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:11,452-Speed 5553.74 samples/sec Loss 3.0844 LearningRate 0.0051 Epoch: 15 Global Step: 78220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:13,290-Speed 5572.14 samples/sec Loss 3.2531 LearningRate 0.0051 Epoch: 15 Global Step: 78230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:15,128-Speed 5576.29 samples/sec Loss 3.1194 LearningRate 0.0051 Epoch: 15 Global Step: 78240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:16,955-Speed 5605.98 samples/sec Loss 3.0814 LearningRate 0.0051 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:18,801-Speed 5548.43 samples/sec Loss 3.0068 LearningRate 0.0051 Epoch: 15 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:18:20,623-Speed 5624.62 samples/sec Loss 3.1956 LearningRate 0.0051 Epoch: 15 Global Step: 78270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:22,451-Speed 5603.94 samples/sec Loss 3.0903 LearningRate 0.0051 Epoch: 15 Global Step: 78280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:24,286-Speed 5581.61 samples/sec Loss 3.0076 LearningRate 0.0051 Epoch: 15 Global Step: 78290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:26,126-Speed 5567.13 samples/sec Loss 3.1662 LearningRate 0.0051 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:27,959-Speed 5590.89 samples/sec Loss 3.1563 LearningRate 0.0051 Epoch: 15 Global Step: 78310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:29,791-Speed 5588.79 samples/sec Loss 2.9984 LearningRate 0.0051 Epoch: 15 Global Step: 78320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:31,628-Speed 5577.11 samples/sec Loss 3.1548 LearningRate 0.0051 Epoch: 15 Global Step: 78330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:33,459-Speed 5595.09 samples/sec Loss 3.1559 LearningRate 0.0051 Epoch: 15 Global Step: 78340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:35,296-Speed 5577.29 samples/sec Loss 3.0980 LearningRate 0.0051 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:37,141-Speed 5552.82 samples/sec Loss 3.0897 LearningRate 0.0051 Epoch: 15 Global Step: 78360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:38,975-Speed 5585.53 samples/sec Loss 3.1337 LearningRate 0.0051 Epoch: 15 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:18:40,805-Speed 5595.46 samples/sec Loss 2.9497 LearningRate 0.0051 Epoch: 15 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:18:42,655-Speed 5538.89 samples/sec Loss 3.2754 LearningRate 0.0051 Epoch: 15 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:18:44,495-Speed 5567.72 samples/sec Loss 3.2126 LearningRate 0.0051 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:18:46,326-Speed 5594.18 samples/sec Loss 3.0949 LearningRate 0.0051 Epoch: 15 Global Step: 78410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:48,157-Speed 5595.75 samples/sec Loss 3.0379 LearningRate 0.0051 Epoch: 15 Global Step: 78420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:49,991-Speed 5584.74 samples/sec Loss 2.9798 LearningRate 0.0050 Epoch: 15 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:51,883-Speed 5414.64 samples/sec Loss 3.0962 LearningRate 0.0050 Epoch: 15 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:53,726-Speed 5560.01 samples/sec Loss 3.0795 LearningRate 0.0050 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:55,557-Speed 5593.32 samples/sec Loss 3.0475 LearningRate 0.0050 Epoch: 15 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:57,404-Speed 5544.71 samples/sec Loss 3.0510 LearningRate 0.0050 Epoch: 15 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:18:59,241-Speed 5576.99 samples/sec Loss 3.1677 LearningRate 0.0050 Epoch: 15 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:01,076-Speed 5583.83 samples/sec Loss 3.0492 LearningRate 0.0050 Epoch: 15 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:02,908-Speed 5592.23 samples/sec Loss 3.0786 LearningRate 0.0050 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:04,742-Speed 5584.73 samples/sec Loss 2.9712 LearningRate 0.0050 Epoch: 15 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:06,597-Speed 5523.90 samples/sec Loss 3.0766 LearningRate 0.0050 Epoch: 15 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:08,427-Speed 5598.28 samples/sec Loss 3.0676 LearningRate 0.0050 Epoch: 15 Global Step: 78530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:10,271-Speed 5554.56 samples/sec Loss 3.1089 LearningRate 0.0050 Epoch: 15 Global Step: 78540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:12,111-Speed 5566.97 samples/sec Loss 2.9658 LearningRate 0.0050 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:13,960-Speed 5540.66 samples/sec Loss 3.0896 LearningRate 0.0050 Epoch: 15 Global Step: 78560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:15,865-Speed 5375.40 samples/sec Loss 3.1389 LearningRate 0.0050 Epoch: 15 Global Step: 78570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:17,709-Speed 5556.28 samples/sec Loss 2.9380 LearningRate 0.0050 Epoch: 15 Global Step: 78580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:19,540-Speed 5594.10 samples/sec Loss 2.9932 LearningRate 0.0050 Epoch: 15 Global Step: 78590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:21,386-Speed 5551.08 samples/sec Loss 2.9898 LearningRate 0.0050 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:23,220-Speed 5585.54 samples/sec Loss 2.9889 LearningRate 0.0050 Epoch: 15 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:19:25,041-Speed 5625.35 samples/sec Loss 3.0494 LearningRate 0.0050 Epoch: 15 Global Step: 78620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:26,880-Speed 5571.25 samples/sec Loss 3.0703 LearningRate 0.0050 Epoch: 15 Global Step: 78630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:28,715-Speed 5581.89 samples/sec Loss 2.9931 LearningRate 0.0050 Epoch: 15 Global Step: 78640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:30,544-Speed 5600.21 samples/sec Loss 3.0727 LearningRate 0.0050 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:32,393-Speed 5539.04 samples/sec Loss 3.1425 LearningRate 0.0049 Epoch: 15 Global Step: 78660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:34,226-Speed 5590.42 samples/sec Loss 3.0148 LearningRate 0.0049 Epoch: 15 Global Step: 78670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:36,083-Speed 5513.85 samples/sec Loss 3.1366 LearningRate 0.0049 Epoch: 15 Global Step: 78680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:37,940-Speed 5517.69 samples/sec Loss 3.0248 LearningRate 0.0049 Epoch: 15 Global Step: 78690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:39,770-Speed 5598.25 samples/sec Loss 3.1400 LearningRate 0.0049 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:41,607-Speed 5577.45 samples/sec Loss 3.1139 LearningRate 0.0049 Epoch: 15 Global Step: 78710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:43,443-Speed 5579.16 samples/sec Loss 3.1004 LearningRate 0.0049 Epoch: 15 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:19:45,262-Speed 5630.22 samples/sec Loss 3.1300 LearningRate 0.0049 Epoch: 15 Global Step: 78730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:47,108-Speed 5549.22 samples/sec Loss 3.0853 LearningRate 0.0049 Epoch: 15 Global Step: 78740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:48,938-Speed 5599.33 samples/sec Loss 2.9865 LearningRate 0.0049 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:50,798-Speed 5505.47 samples/sec Loss 3.0204 LearningRate 0.0049 Epoch: 15 Global Step: 78760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:52,637-Speed 5570.54 samples/sec Loss 3.0103 LearningRate 0.0049 Epoch: 15 Global Step: 78770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:54,472-Speed 5582.29 samples/sec Loss 3.1004 LearningRate 0.0049 Epoch: 15 Global Step: 78780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:56,302-Speed 5600.71 samples/sec Loss 3.0969 LearningRate 0.0049 Epoch: 15 Global Step: 78790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:58,131-Speed 5598.99 samples/sec Loss 3.1596 LearningRate 0.0049 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:19:59,964-Speed 5588.18 samples/sec Loss 3.0313 LearningRate 0.0049 Epoch: 15 Global Step: 78810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:01,807-Speed 5560.46 samples/sec Loss 3.0651 LearningRate 0.0049 Epoch: 15 Global Step: 78820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:03,639-Speed 5592.20 samples/sec Loss 3.0514 LearningRate 0.0049 Epoch: 15 Global Step: 78830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:05,470-Speed 5594.88 samples/sec Loss 3.0636 LearningRate 0.0049 Epoch: 15 Global Step: 78840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:07,301-Speed 5592.64 samples/sec Loss 2.9551 LearningRate 0.0049 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:09,130-Speed 5600.56 samples/sec Loss 3.0725 LearningRate 0.0049 Epoch: 15 Global Step: 78860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:10,971-Speed 5563.53 samples/sec Loss 3.1512 LearningRate 0.0049 Epoch: 15 Global Step: 78870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:12,806-Speed 5585.19 samples/sec Loss 3.1189 LearningRate 0.0049 Epoch: 15 Global Step: 78880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:14,640-Speed 5585.22 samples/sec Loss 3.1240 LearningRate 0.0048 Epoch: 15 Global Step: 78890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:16,485-Speed 5552.51 samples/sec Loss 3.0332 LearningRate 0.0048 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:18,336-Speed 5533.52 samples/sec Loss 3.0293 LearningRate 0.0048 Epoch: 15 Global Step: 78910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:20,158-Speed 5622.47 samples/sec Loss 3.1434 LearningRate 0.0048 Epoch: 15 Global Step: 78920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:21,989-Speed 5595.10 samples/sec Loss 3.1669 LearningRate 0.0048 Epoch: 15 Global Step: 78930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:23,834-Speed 5553.81 samples/sec Loss 3.0995 LearningRate 0.0048 Epoch: 15 Global Step: 78940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:25,665-Speed 5593.42 samples/sec Loss 3.0535 LearningRate 0.0048 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:27,493-Speed 5604.88 samples/sec Loss 3.1318 LearningRate 0.0048 Epoch: 15 Global Step: 78960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:29,327-Speed 5585.60 samples/sec Loss 3.1999 LearningRate 0.0048 Epoch: 15 Global Step: 78970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:31,163-Speed 5576.11 samples/sec Loss 3.1674 LearningRate 0.0048 Epoch: 15 Global Step: 78980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:33,010-Speed 5547.74 samples/sec Loss 3.1469 LearningRate 0.0048 Epoch: 15 Global Step: 78990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:34,877-Speed 5485.68 samples/sec Loss 2.9757 LearningRate 0.0048 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:36,723-Speed 5550.33 samples/sec Loss 3.1917 LearningRate 0.0048 Epoch: 15 Global Step: 79010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:20:38,608-Speed 5435.47 samples/sec Loss 3.0288 LearningRate 0.0048 Epoch: 15 Global Step: 79020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:40,454-Speed 5548.29 samples/sec Loss 3.0938 LearningRate 0.0048 Epoch: 15 Global Step: 79030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:42,291-Speed 5576.88 samples/sec Loss 3.0522 LearningRate 0.0048 Epoch: 15 Global Step: 79040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:44,132-Speed 5565.21 samples/sec Loss 3.0750 LearningRate 0.0048 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:45,968-Speed 5578.82 samples/sec Loss 2.9221 LearningRate 0.0048 Epoch: 15 Global Step: 79060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:47,811-Speed 5560.40 samples/sec Loss 3.1093 LearningRate 0.0048 Epoch: 15 Global Step: 79070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:49,653-Speed 5558.40 samples/sec Loss 3.0658 LearningRate 0.0048 Epoch: 15 Global Step: 79080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:51,484-Speed 5596.46 samples/sec Loss 3.1300 LearningRate 0.0048 Epoch: 15 Global Step: 79090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:53,326-Speed 5560.34 samples/sec Loss 3.0554 LearningRate 0.0048 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:55,157-Speed 5594.04 samples/sec Loss 3.1118 LearningRate 0.0048 Epoch: 15 Global Step: 79110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:56,983-Speed 5613.27 samples/sec Loss 3.1605 LearningRate 0.0047 Epoch: 15 Global Step: 79120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:20:58,870-Speed 5426.93 samples/sec Loss 3.3090 LearningRate 0.0047 Epoch: 15 Global Step: 79130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:00,706-Speed 5581.68 samples/sec Loss 3.1913 LearningRate 0.0047 Epoch: 15 Global Step: 79140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:02,562-Speed 5517.78 samples/sec Loss 3.0481 LearningRate 0.0047 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:04,396-Speed 5588.26 samples/sec Loss 3.0166 LearningRate 0.0047 Epoch: 15 Global Step: 79160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:06,233-Speed 5574.63 samples/sec Loss 3.1591 LearningRate 0.0047 Epoch: 15 Global Step: 79170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:08,073-Speed 5569.21 samples/sec Loss 3.1557 LearningRate 0.0047 Epoch: 15 Global Step: 79180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:09,905-Speed 5589.29 samples/sec Loss 3.0788 LearningRate 0.0047 Epoch: 15 Global Step: 79190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:11,768-Speed 5512.09 samples/sec Loss 3.0777 LearningRate 0.0047 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:13,612-Speed 5556.56 samples/sec Loss 3.2802 LearningRate 0.0047 Epoch: 15 Global Step: 79210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:15,454-Speed 5560.90 samples/sec Loss 3.0789 LearningRate 0.0047 Epoch: 15 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:21:17,292-Speed 5572.63 samples/sec Loss 3.0801 LearningRate 0.0047 Epoch: 15 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:21:19,135-Speed 5558.79 samples/sec Loss 3.1002 LearningRate 0.0047 Epoch: 15 Global Step: 79240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:20,973-Speed 5573.50 samples/sec Loss 3.0997 LearningRate 0.0047 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:22,801-Speed 5605.37 samples/sec Loss 2.9216 LearningRate 0.0047 Epoch: 15 Global Step: 79260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:24,635-Speed 5583.39 samples/sec Loss 3.0732 LearningRate 0.0047 Epoch: 15 Global Step: 79270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:26,506-Speed 5475.07 samples/sec Loss 3.0479 LearningRate 0.0047 Epoch: 15 Global Step: 79280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:28,337-Speed 5594.89 samples/sec Loss 3.0850 LearningRate 0.0047 Epoch: 15 Global Step: 79290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:30,166-Speed 5599.94 samples/sec Loss 3.0838 LearningRate 0.0047 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:32,000-Speed 5588.21 samples/sec Loss 3.1351 LearningRate 0.0047 Epoch: 15 Global Step: 79310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:33,830-Speed 5597.32 samples/sec Loss 3.2219 LearningRate 0.0047 Epoch: 15 Global Step: 79320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:35,668-Speed 5570.40 samples/sec Loss 3.0926 LearningRate 0.0047 Epoch: 15 Global Step: 79330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:37,496-Speed 5604.25 samples/sec Loss 3.1859 LearningRate 0.0047 Epoch: 15 Global Step: 79340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:39,337-Speed 5566.28 samples/sec Loss 3.1233 LearningRate 0.0046 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:41,166-Speed 5598.87 samples/sec Loss 2.9555 LearningRate 0.0046 Epoch: 15 Global Step: 79360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:42,994-Speed 5605.24 samples/sec Loss 3.2222 LearningRate 0.0046 Epoch: 15 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:44,828-Speed 5585.25 samples/sec Loss 3.1194 LearningRate 0.0046 Epoch: 15 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:46,657-Speed 5602.17 samples/sec Loss 2.9793 LearningRate 0.0046 Epoch: 15 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:48,502-Speed 5551.58 samples/sec Loss 3.0309 LearningRate 0.0046 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:50,335-Speed 5588.12 samples/sec Loss 3.0369 LearningRate 0.0046 Epoch: 15 Global Step: 79410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:52,172-Speed 5576.24 samples/sec Loss 3.0285 LearningRate 0.0046 Epoch: 15 Global Step: 79420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:54,002-Speed 5598.83 samples/sec Loss 3.1162 LearningRate 0.0046 Epoch: 15 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:21:55,833-Speed 5594.81 samples/sec Loss 3.1633 LearningRate 0.0046 Epoch: 15 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:21:57,672-Speed 5571.20 samples/sec Loss 3.1024 LearningRate 0.0046 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:21:59,514-Speed 5559.15 samples/sec Loss 3.0345 LearningRate 0.0046 Epoch: 15 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:22:01,369-Speed 5522.11 samples/sec Loss 3.1284 LearningRate 0.0046 Epoch: 15 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:22:03,198-Speed 5603.52 samples/sec Loss 3.0245 LearningRate 0.0046 Epoch: 15 Global Step: 79480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:05,038-Speed 5566.94 samples/sec Loss 3.0701 LearningRate 0.0046 Epoch: 15 Global Step: 79490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:06,892-Speed 5526.64 samples/sec Loss 3.0704 LearningRate 0.0046 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:08,722-Speed 5596.49 samples/sec Loss 3.1270 LearningRate 0.0046 Epoch: 15 Global Step: 79510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:10,559-Speed 5576.63 samples/sec Loss 3.0290 LearningRate 0.0046 Epoch: 15 Global Step: 79520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:12,394-Speed 5583.88 samples/sec Loss 3.1848 LearningRate 0.0046 Epoch: 15 Global Step: 79530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:14,229-Speed 5579.88 samples/sec Loss 3.0873 LearningRate 0.0046 Epoch: 15 Global Step: 79540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:16,083-Speed 5527.26 samples/sec Loss 3.1832 LearningRate 0.0046 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:17,918-Speed 5581.67 samples/sec Loss 3.1593 LearningRate 0.0046 Epoch: 15 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:19,749-Speed 5594.92 samples/sec Loss 3.1269 LearningRate 0.0046 Epoch: 15 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:21,574-Speed 5615.28 samples/sec Loss 3.0358 LearningRate 0.0046 Epoch: 15 Global Step: 79580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:23,412-Speed 5571.90 samples/sec Loss 2.9896 LearningRate 0.0045 Epoch: 15 Global Step: 79590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:25,265-Speed 5527.59 samples/sec Loss 3.0420 LearningRate 0.0045 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:27,105-Speed 5568.04 samples/sec Loss 3.1072 LearningRate 0.0045 Epoch: 15 Global Step: 79610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:28,961-Speed 5518.64 samples/sec Loss 3.0894 LearningRate 0.0045 Epoch: 15 Global Step: 79620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:30,800-Speed 5568.86 samples/sec Loss 3.0001 LearningRate 0.0045 Epoch: 15 Global Step: 79630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:32,631-Speed 5598.53 samples/sec Loss 3.0837 LearningRate 0.0045 Epoch: 15 Global Step: 79640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:34,463-Speed 5589.25 samples/sec Loss 3.1040 LearningRate 0.0045 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:36,302-Speed 5569.03 samples/sec Loss 3.1423 LearningRate 0.0045 Epoch: 15 Global Step: 79660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:38,136-Speed 5586.98 samples/sec Loss 3.1401 LearningRate 0.0045 Epoch: 15 Global Step: 79670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:22:39,980-Speed 5556.73 samples/sec Loss 3.2089 LearningRate 0.0045 Epoch: 15 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:41,813-Speed 5590.03 samples/sec Loss 3.1374 LearningRate 0.0045 Epoch: 15 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:43,646-Speed 5586.62 samples/sec Loss 3.1576 LearningRate 0.0045 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:45,478-Speed 5592.17 samples/sec Loss 3.1834 LearningRate 0.0045 Epoch: 15 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:47,310-Speed 5591.98 samples/sec Loss 2.9937 LearningRate 0.0045 Epoch: 15 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:49,144-Speed 5586.33 samples/sec Loss 3.1155 LearningRate 0.0045 Epoch: 15 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:50,978-Speed 5583.01 samples/sec Loss 3.1524 LearningRate 0.0045 Epoch: 15 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:52,811-Speed 5588.93 samples/sec Loss 3.1086 LearningRate 0.0045 Epoch: 15 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:54,640-Speed 5601.34 samples/sec Loss 3.1983 LearningRate 0.0045 Epoch: 15 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:56,471-Speed 5595.68 samples/sec Loss 3.0329 LearningRate 0.0045 Epoch: 15 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:22:58,310-Speed 5570.06 samples/sec Loss 3.0675 LearningRate 0.0045 Epoch: 15 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:23:00,164-Speed 5525.25 samples/sec Loss 3.1024 LearningRate 0.0045 Epoch: 15 Global Step: 79790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:02,034-Speed 5480.06 samples/sec Loss 3.0906 LearningRate 0.0045 Epoch: 15 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:03,869-Speed 5580.24 samples/sec Loss 3.1775 LearningRate 0.0045 Epoch: 15 Global Step: 79810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:05,709-Speed 5569.30 samples/sec Loss 3.1072 LearningRate 0.0045 Epoch: 15 Global Step: 79820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:07,551-Speed 5560.91 samples/sec Loss 3.0548 LearningRate 0.0044 Epoch: 15 Global Step: 79830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:09,383-Speed 5590.67 samples/sec Loss 2.9856 LearningRate 0.0044 Epoch: 15 Global Step: 79840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:11,214-Speed 5594.19 samples/sec Loss 3.1550 LearningRate 0.0044 Epoch: 15 Global Step: 79850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:13,075-Speed 5506.89 samples/sec Loss 3.1083 LearningRate 0.0044 Epoch: 15 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:14,908-Speed 5585.98 samples/sec Loss 2.9291 LearningRate 0.0044 Epoch: 15 Global Step: 79870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:16,847-Speed 5285.29 samples/sec Loss 3.0759 LearningRate 0.0044 Epoch: 15 Global Step: 79880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:18,688-Speed 5561.78 samples/sec Loss 3.0639 LearningRate 0.0044 Epoch: 15 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:23:20,518-Speed 5598.44 samples/sec Loss 3.0020 LearningRate 0.0044 Epoch: 15 Global Step: 79900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:22,347-Speed 5602.10 samples/sec Loss 2.9572 LearningRate 0.0044 Epoch: 15 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:23:24,189-Speed 5559.52 samples/sec Loss 3.2200 LearningRate 0.0044 Epoch: 15 Global Step: 79920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:26,025-Speed 5580.45 samples/sec Loss 3.0910 LearningRate 0.0044 Epoch: 15 Global Step: 79930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:27,858-Speed 5591.04 samples/sec Loss 3.1216 LearningRate 0.0044 Epoch: 15 Global Step: 79940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:29,695-Speed 5574.28 samples/sec Loss 3.1511 LearningRate 0.0044 Epoch: 15 Global Step: 79950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:31,533-Speed 5573.94 samples/sec Loss 3.0854 LearningRate 0.0044 Epoch: 15 Global Step: 79960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:33,366-Speed 5590.02 samples/sec Loss 2.9992 LearningRate 0.0044 Epoch: 15 Global Step: 79970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:35,199-Speed 5586.53 samples/sec Loss 3.2457 LearningRate 0.0044 Epoch: 15 Global Step: 79980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:37,035-Speed 5581.05 samples/sec Loss 3.1564 LearningRate 0.0044 Epoch: 15 Global Step: 79990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:23:38,868-Speed 5587.43 samples/sec Loss 3.1343 LearningRate 0.0044 Epoch: 15 Global Step: 80000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:24:05,393-[lfw][80000]XNorm: 22.622122 Training: 2022-04-11 15:24:05,394-[lfw][80000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 15:24:05,394-[lfw][80000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:24:36,103-[cfp_fp][80000]XNorm: 21.158114 Training: 2022-04-11 15:24:36,104-[cfp_fp][80000]Accuracy-Flip: 0.98086+-0.00683 Training: 2022-04-11 15:24:36,104-[cfp_fp][80000]Accuracy-Highest: 0.98157 Training: 2022-04-11 15:25:02,642-[agedb_30][80000]XNorm: 22.624514 Training: 2022-04-11 15:25:02,643-[agedb_30][80000]Accuracy-Flip: 0.98350+-0.00693 Training: 2022-04-11 15:25:02,643-[agedb_30][80000]Accuracy-Highest: 0.98350 Training: 2022-04-11 15:25:04,526-Speed 119.55 samples/sec Loss 3.1165 LearningRate 0.0044 Epoch: 15 Global Step: 80010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:06,382-Speed 5518.12 samples/sec Loss 3.1257 LearningRate 0.0044 Epoch: 15 Global Step: 80020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:08,228-Speed 5550.10 samples/sec Loss 3.1735 LearningRate 0.0044 Epoch: 15 Global Step: 80030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:10,041-Speed 5648.93 samples/sec Loss 3.0355 LearningRate 0.0044 Epoch: 15 Global Step: 80040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:11,879-Speed 5571.65 samples/sec Loss 3.0633 LearningRate 0.0044 Epoch: 15 Global Step: 80050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:13,741-Speed 5504.14 samples/sec Loss 3.0736 LearningRate 0.0044 Epoch: 15 Global Step: 80060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:15,621-Speed 5448.66 samples/sec Loss 3.0122 LearningRate 0.0043 Epoch: 15 Global Step: 80070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:17,468-Speed 5544.42 samples/sec Loss 2.9589 LearningRate 0.0043 Epoch: 15 Global Step: 80080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:19,322-Speed 5528.45 samples/sec Loss 3.0902 LearningRate 0.0043 Epoch: 15 Global Step: 80090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:21,146-Speed 5615.92 samples/sec Loss 3.1001 LearningRate 0.0043 Epoch: 15 Global Step: 80100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:22,979-Speed 5587.46 samples/sec Loss 3.1114 LearningRate 0.0043 Epoch: 15 Global Step: 80110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:24,808-Speed 5601.00 samples/sec Loss 3.1618 LearningRate 0.0043 Epoch: 15 Global Step: 80120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:26,638-Speed 5596.98 samples/sec Loss 3.1711 LearningRate 0.0043 Epoch: 15 Global Step: 80130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:28,467-Speed 5600.22 samples/sec Loss 3.0840 LearningRate 0.0043 Epoch: 15 Global Step: 80140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:30,296-Speed 5602.55 samples/sec Loss 3.0022 LearningRate 0.0043 Epoch: 15 Global Step: 80150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:32,124-Speed 5603.70 samples/sec Loss 2.9964 LearningRate 0.0043 Epoch: 15 Global Step: 80160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:33,965-Speed 5562.84 samples/sec Loss 3.1237 LearningRate 0.0043 Epoch: 15 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:35,799-Speed 5584.70 samples/sec Loss 3.1032 LearningRate 0.0043 Epoch: 15 Global Step: 80180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:37,627-Speed 5608.06 samples/sec Loss 3.0226 LearningRate 0.0043 Epoch: 15 Global Step: 80190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:39,467-Speed 5566.74 samples/sec Loss 3.1666 LearningRate 0.0043 Epoch: 15 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:41,298-Speed 5594.45 samples/sec Loss 3.1076 LearningRate 0.0043 Epoch: 15 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:43,132-Speed 5583.73 samples/sec Loss 3.0631 LearningRate 0.0043 Epoch: 15 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:44,967-Speed 5583.14 samples/sec Loss 3.1475 LearningRate 0.0043 Epoch: 15 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:46,788-Speed 5623.85 samples/sec Loss 3.1369 LearningRate 0.0043 Epoch: 15 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:48,622-Speed 5586.36 samples/sec Loss 3.0572 LearningRate 0.0043 Epoch: 15 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:25:50,453-Speed 5594.57 samples/sec Loss 3.1197 LearningRate 0.0043 Epoch: 15 Global Step: 80260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:52,289-Speed 5579.47 samples/sec Loss 3.0364 LearningRate 0.0043 Epoch: 15 Global Step: 80270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:54,116-Speed 5607.50 samples/sec Loss 3.0882 LearningRate 0.0043 Epoch: 15 Global Step: 80280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:55,950-Speed 5585.49 samples/sec Loss 3.0875 LearningRate 0.0043 Epoch: 15 Global Step: 80290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:57,787-Speed 5576.62 samples/sec Loss 3.0345 LearningRate 0.0043 Epoch: 15 Global Step: 80300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:25:59,618-Speed 5596.32 samples/sec Loss 3.1267 LearningRate 0.0042 Epoch: 15 Global Step: 80310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:01,461-Speed 5556.94 samples/sec Loss 3.1355 LearningRate 0.0042 Epoch: 15 Global Step: 80320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:03,306-Speed 5555.65 samples/sec Loss 3.1199 LearningRate 0.0042 Epoch: 15 Global Step: 80330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:05,144-Speed 5572.53 samples/sec Loss 3.0435 LearningRate 0.0042 Epoch: 15 Global Step: 80340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:06,977-Speed 5587.51 samples/sec Loss 3.0703 LearningRate 0.0042 Epoch: 15 Global Step: 80350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:08,803-Speed 5610.64 samples/sec Loss 3.0932 LearningRate 0.0042 Epoch: 15 Global Step: 80360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:10,630-Speed 5606.95 samples/sec Loss 3.0170 LearningRate 0.0042 Epoch: 15 Global Step: 80370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:12,471-Speed 5563.27 samples/sec Loss 3.0134 LearningRate 0.0042 Epoch: 15 Global Step: 80380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:14,298-Speed 5606.64 samples/sec Loss 3.0217 LearningRate 0.0042 Epoch: 15 Global Step: 80390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:16,117-Speed 5630.43 samples/sec Loss 3.1269 LearningRate 0.0042 Epoch: 15 Global Step: 80400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:17,958-Speed 5565.74 samples/sec Loss 3.0846 LearningRate 0.0042 Epoch: 15 Global Step: 80410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:19,788-Speed 5597.51 samples/sec Loss 3.1587 LearningRate 0.0042 Epoch: 15 Global Step: 80420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:21,621-Speed 5589.88 samples/sec Loss 3.0869 LearningRate 0.0042 Epoch: 15 Global Step: 80430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:23,451-Speed 5596.79 samples/sec Loss 2.9992 LearningRate 0.0042 Epoch: 15 Global Step: 80440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:25,285-Speed 5587.18 samples/sec Loss 2.9340 LearningRate 0.0042 Epoch: 15 Global Step: 80450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:27,113-Speed 5602.25 samples/sec Loss 3.1375 LearningRate 0.0042 Epoch: 15 Global Step: 80460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:28,945-Speed 5592.60 samples/sec Loss 3.1117 LearningRate 0.0042 Epoch: 15 Global Step: 80470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:30,774-Speed 5601.51 samples/sec Loss 3.0604 LearningRate 0.0042 Epoch: 15 Global Step: 80480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:32,604-Speed 5596.38 samples/sec Loss 3.0598 LearningRate 0.0042 Epoch: 15 Global Step: 80490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:26:34,442-Speed 5573.63 samples/sec Loss 3.0085 LearningRate 0.0042 Epoch: 15 Global Step: 80500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:36,289-Speed 5544.27 samples/sec Loss 3.1089 LearningRate 0.0042 Epoch: 15 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:38,194-Speed 5379.79 samples/sec Loss 3.2519 LearningRate 0.0042 Epoch: 15 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:40,046-Speed 5529.96 samples/sec Loss 3.0749 LearningRate 0.0042 Epoch: 15 Global Step: 80530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:41,878-Speed 5592.78 samples/sec Loss 3.1517 LearningRate 0.0042 Epoch: 15 Global Step: 80540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:43,713-Speed 5583.66 samples/sec Loss 3.1444 LearningRate 0.0042 Epoch: 15 Global Step: 80550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:45,544-Speed 5592.62 samples/sec Loss 3.0856 LearningRate 0.0041 Epoch: 15 Global Step: 80560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:47,377-Speed 5587.87 samples/sec Loss 3.1112 LearningRate 0.0041 Epoch: 15 Global Step: 80570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:49,210-Speed 5589.54 samples/sec Loss 3.0089 LearningRate 0.0041 Epoch: 15 Global Step: 80580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:51,043-Speed 5589.39 samples/sec Loss 3.0668 LearningRate 0.0041 Epoch: 15 Global Step: 80590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:52,886-Speed 5558.16 samples/sec Loss 3.0791 LearningRate 0.0041 Epoch: 15 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:26:54,728-Speed 5561.11 samples/sec Loss 3.0244 LearningRate 0.0041 Epoch: 15 Global Step: 80610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:56,559-Speed 5592.97 samples/sec Loss 3.0969 LearningRate 0.0041 Epoch: 15 Global Step: 80620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:26:58,394-Speed 5583.20 samples/sec Loss 2.9973 LearningRate 0.0041 Epoch: 15 Global Step: 80630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:00,226-Speed 5593.83 samples/sec Loss 3.0314 LearningRate 0.0041 Epoch: 15 Global Step: 80640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:02,065-Speed 5567.63 samples/sec Loss 3.0279 LearningRate 0.0041 Epoch: 15 Global Step: 80650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:03,906-Speed 5566.51 samples/sec Loss 3.0987 LearningRate 0.0041 Epoch: 15 Global Step: 80660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:05,750-Speed 5554.24 samples/sec Loss 3.0707 LearningRate 0.0041 Epoch: 15 Global Step: 80670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:07,588-Speed 5573.96 samples/sec Loss 3.0934 LearningRate 0.0041 Epoch: 15 Global Step: 80680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:09,421-Speed 5590.12 samples/sec Loss 3.0132 LearningRate 0.0041 Epoch: 15 Global Step: 80690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:11,262-Speed 5562.09 samples/sec Loss 2.9785 LearningRate 0.0041 Epoch: 15 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:13,114-Speed 5532.54 samples/sec Loss 2.9707 LearningRate 0.0041 Epoch: 15 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:27:14,958-Speed 5554.42 samples/sec Loss 3.1036 LearningRate 0.0041 Epoch: 15 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:27:16,795-Speed 5576.44 samples/sec Loss 3.1207 LearningRate 0.0041 Epoch: 15 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:27:18,636-Speed 5565.79 samples/sec Loss 3.0764 LearningRate 0.0041 Epoch: 15 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:27:20,461-Speed 5612.89 samples/sec Loss 3.1131 LearningRate 0.0041 Epoch: 15 Global Step: 80750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:22,295-Speed 5583.82 samples/sec Loss 3.1241 LearningRate 0.0041 Epoch: 15 Global Step: 80760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:24,127-Speed 5593.46 samples/sec Loss 3.1311 LearningRate 0.0041 Epoch: 15 Global Step: 80770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:25,961-Speed 5586.40 samples/sec Loss 3.0769 LearningRate 0.0041 Epoch: 15 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:27,793-Speed 5591.27 samples/sec Loss 2.9936 LearningRate 0.0041 Epoch: 15 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:29,623-Speed 5596.54 samples/sec Loss 3.0319 LearningRate 0.0041 Epoch: 15 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:31,452-Speed 5599.70 samples/sec Loss 3.1505 LearningRate 0.0040 Epoch: 15 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:33,283-Speed 5596.23 samples/sec Loss 2.9910 LearningRate 0.0040 Epoch: 15 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:35,113-Speed 5596.34 samples/sec Loss 2.9993 LearningRate 0.0040 Epoch: 15 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:36,943-Speed 5600.16 samples/sec Loss 3.0736 LearningRate 0.0040 Epoch: 15 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:27:38,764-Speed 5623.13 samples/sec Loss 3.0596 LearningRate 0.0040 Epoch: 15 Global Step: 80850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:40,613-Speed 5542.47 samples/sec Loss 3.0256 LearningRate 0.0040 Epoch: 15 Global Step: 80860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:42,449-Speed 5576.42 samples/sec Loss 3.0210 LearningRate 0.0040 Epoch: 15 Global Step: 80870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:44,285-Speed 5584.05 samples/sec Loss 3.1047 LearningRate 0.0040 Epoch: 15 Global Step: 80880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:46,131-Speed 5549.27 samples/sec Loss 2.9614 LearningRate 0.0040 Epoch: 15 Global Step: 80890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:47,979-Speed 5543.61 samples/sec Loss 3.0774 LearningRate 0.0040 Epoch: 15 Global Step: 80900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:49,811-Speed 5591.62 samples/sec Loss 3.0625 LearningRate 0.0040 Epoch: 15 Global Step: 80910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:27:51,788-Speed 5179.84 samples/sec Loss 3.0419 LearningRate 0.0040 Epoch: 15 Global Step: 80920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:02,598-Speed 947.40 samples/sec Loss 2.8851 LearningRate 0.0040 Epoch: 16 Global Step: 80930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:04,465-Speed 5488.34 samples/sec Loss 2.2752 LearningRate 0.0040 Epoch: 16 Global Step: 80940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:06,302-Speed 5578.93 samples/sec Loss 2.4113 LearningRate 0.0040 Epoch: 16 Global Step: 80950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:08,148-Speed 5546.24 samples/sec Loss 2.2226 LearningRate 0.0040 Epoch: 16 Global Step: 80960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:09,993-Speed 5553.58 samples/sec Loss 2.3458 LearningRate 0.0040 Epoch: 16 Global Step: 80970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:11,839-Speed 5552.45 samples/sec Loss 2.2910 LearningRate 0.0040 Epoch: 16 Global Step: 80980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:13,674-Speed 5580.60 samples/sec Loss 2.3544 LearningRate 0.0040 Epoch: 16 Global Step: 80990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:15,521-Speed 5545.94 samples/sec Loss 2.2961 LearningRate 0.0040 Epoch: 16 Global Step: 81000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:17,366-Speed 5550.73 samples/sec Loss 2.3816 LearningRate 0.0040 Epoch: 16 Global Step: 81010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:19,209-Speed 5558.75 samples/sec Loss 2.3259 LearningRate 0.0040 Epoch: 16 Global Step: 81020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:21,046-Speed 5576.97 samples/sec Loss 2.3083 LearningRate 0.0040 Epoch: 16 Global Step: 81030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:22,877-Speed 5595.61 samples/sec Loss 2.2758 LearningRate 0.0040 Epoch: 16 Global Step: 81040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:24,724-Speed 5547.66 samples/sec Loss 2.3025 LearningRate 0.0040 Epoch: 16 Global Step: 81050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:26,567-Speed 5556.26 samples/sec Loss 2.3394 LearningRate 0.0039 Epoch: 16 Global Step: 81060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:28,412-Speed 5554.17 samples/sec Loss 2.3535 LearningRate 0.0039 Epoch: 16 Global Step: 81070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:30,255-Speed 5556.78 samples/sec Loss 2.3430 LearningRate 0.0039 Epoch: 16 Global Step: 81080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:32,110-Speed 5520.73 samples/sec Loss 2.2524 LearningRate 0.0039 Epoch: 16 Global Step: 81090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:33,942-Speed 5592.15 samples/sec Loss 2.3326 LearningRate 0.0039 Epoch: 16 Global Step: 81100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:35,777-Speed 5582.15 samples/sec Loss 2.4022 LearningRate 0.0039 Epoch: 16 Global Step: 81110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:37,619-Speed 5562.22 samples/sec Loss 2.3414 LearningRate 0.0039 Epoch: 16 Global Step: 81120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:39,455-Speed 5581.08 samples/sec Loss 2.2703 LearningRate 0.0039 Epoch: 16 Global Step: 81130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:28:41,305-Speed 5538.16 samples/sec Loss 2.3345 LearningRate 0.0039 Epoch: 16 Global Step: 81140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:43,140-Speed 5581.86 samples/sec Loss 2.2742 LearningRate 0.0039 Epoch: 16 Global Step: 81150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:44,978-Speed 5573.56 samples/sec Loss 2.2190 LearningRate 0.0039 Epoch: 16 Global Step: 81160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:46,811-Speed 5586.53 samples/sec Loss 2.3384 LearningRate 0.0039 Epoch: 16 Global Step: 81170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:48,669-Speed 5515.53 samples/sec Loss 2.3614 LearningRate 0.0039 Epoch: 16 Global Step: 81180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:50,507-Speed 5571.83 samples/sec Loss 2.3388 LearningRate 0.0039 Epoch: 16 Global Step: 81190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:52,358-Speed 5534.48 samples/sec Loss 2.3010 LearningRate 0.0039 Epoch: 16 Global Step: 81200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:54,208-Speed 5538.54 samples/sec Loss 2.3869 LearningRate 0.0039 Epoch: 16 Global Step: 81210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:56,041-Speed 5586.12 samples/sec Loss 2.3426 LearningRate 0.0039 Epoch: 16 Global Step: 81220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:57,877-Speed 5580.81 samples/sec Loss 2.3058 LearningRate 0.0039 Epoch: 16 Global Step: 81230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:28:59,727-Speed 5537.65 samples/sec Loss 2.2785 LearningRate 0.0039 Epoch: 16 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:29:01,580-Speed 5528.77 samples/sec Loss 2.3999 LearningRate 0.0039 Epoch: 16 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:29:03,403-Speed 5620.29 samples/sec Loss 2.3146 LearningRate 0.0039 Epoch: 16 Global Step: 81260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:05,260-Speed 5514.44 samples/sec Loss 2.3592 LearningRate 0.0039 Epoch: 16 Global Step: 81270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:07,101-Speed 5565.56 samples/sec Loss 2.3088 LearningRate 0.0039 Epoch: 16 Global Step: 81280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:08,946-Speed 5553.37 samples/sec Loss 2.4392 LearningRate 0.0039 Epoch: 16 Global Step: 81290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:10,786-Speed 5565.21 samples/sec Loss 2.3811 LearningRate 0.0039 Epoch: 16 Global Step: 81300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:12,638-Speed 5533.07 samples/sec Loss 2.2940 LearningRate 0.0039 Epoch: 16 Global Step: 81310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:14,513-Speed 5461.72 samples/sec Loss 2.4837 LearningRate 0.0038 Epoch: 16 Global Step: 81320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:16,376-Speed 5499.42 samples/sec Loss 2.3816 LearningRate 0.0038 Epoch: 16 Global Step: 81330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:18,212-Speed 5577.75 samples/sec Loss 2.3262 LearningRate 0.0038 Epoch: 16 Global Step: 81340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:20,037-Speed 5614.85 samples/sec Loss 2.3949 LearningRate 0.0038 Epoch: 16 Global Step: 81350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:21,885-Speed 5543.84 samples/sec Loss 2.5112 LearningRate 0.0038 Epoch: 16 Global Step: 81360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:23,760-Speed 5462.31 samples/sec Loss 2.4339 LearningRate 0.0038 Epoch: 16 Global Step: 81370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:25,620-Speed 5507.68 samples/sec Loss 2.3894 LearningRate 0.0038 Epoch: 16 Global Step: 81380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:27,466-Speed 5550.54 samples/sec Loss 2.3240 LearningRate 0.0038 Epoch: 16 Global Step: 81390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:29,313-Speed 5545.37 samples/sec Loss 2.3745 LearningRate 0.0038 Epoch: 16 Global Step: 81400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:31,165-Speed 5532.76 samples/sec Loss 2.2947 LearningRate 0.0038 Epoch: 16 Global Step: 81410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:32,995-Speed 5597.55 samples/sec Loss 2.4295 LearningRate 0.0038 Epoch: 16 Global Step: 81420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:34,831-Speed 5577.93 samples/sec Loss 2.3534 LearningRate 0.0038 Epoch: 16 Global Step: 81430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:36,682-Speed 5535.24 samples/sec Loss 2.3951 LearningRate 0.0038 Epoch: 16 Global Step: 81440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:29:38,529-Speed 5546.32 samples/sec Loss 2.4257 LearningRate 0.0038 Epoch: 16 Global Step: 81450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:40,384-Speed 5523.10 samples/sec Loss 2.3327 LearningRate 0.0038 Epoch: 16 Global Step: 81460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:42,235-Speed 5534.83 samples/sec Loss 2.4514 LearningRate 0.0038 Epoch: 16 Global Step: 81470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:44,068-Speed 5587.55 samples/sec Loss 2.2961 LearningRate 0.0038 Epoch: 16 Global Step: 81480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:45,903-Speed 5581.96 samples/sec Loss 2.4489 LearningRate 0.0038 Epoch: 16 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:47,748-Speed 5552.89 samples/sec Loss 2.4476 LearningRate 0.0038 Epoch: 16 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:49,586-Speed 5574.67 samples/sec Loss 2.3704 LearningRate 0.0038 Epoch: 16 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:51,425-Speed 5568.99 samples/sec Loss 2.3596 LearningRate 0.0038 Epoch: 16 Global Step: 81520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:53,272-Speed 5545.12 samples/sec Loss 2.3239 LearningRate 0.0038 Epoch: 16 Global Step: 81530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:55,108-Speed 5582.48 samples/sec Loss 2.3821 LearningRate 0.0038 Epoch: 16 Global Step: 81540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:29:56,943-Speed 5580.32 samples/sec Loss 2.4447 LearningRate 0.0038 Epoch: 16 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:29:58,767-Speed 5616.80 samples/sec Loss 2.4730 LearningRate 0.0038 Epoch: 16 Global Step: 81560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:00,603-Speed 5578.06 samples/sec Loss 2.4248 LearningRate 0.0038 Epoch: 16 Global Step: 81570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:02,460-Speed 5515.86 samples/sec Loss 2.4042 LearningRate 0.0037 Epoch: 16 Global Step: 81580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:04,322-Speed 5504.45 samples/sec Loss 2.4067 LearningRate 0.0037 Epoch: 16 Global Step: 81590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:06,160-Speed 5573.51 samples/sec Loss 2.4499 LearningRate 0.0037 Epoch: 16 Global Step: 81600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:07,996-Speed 5580.06 samples/sec Loss 2.4218 LearningRate 0.0037 Epoch: 16 Global Step: 81610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:09,830-Speed 5585.84 samples/sec Loss 2.3721 LearningRate 0.0037 Epoch: 16 Global Step: 81620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:11,677-Speed 5546.24 samples/sec Loss 2.3281 LearningRate 0.0037 Epoch: 16 Global Step: 81630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:13,516-Speed 5568.80 samples/sec Loss 2.4703 LearningRate 0.0037 Epoch: 16 Global Step: 81640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:15,356-Speed 5568.92 samples/sec Loss 2.4279 LearningRate 0.0037 Epoch: 16 Global Step: 81650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:17,190-Speed 5584.02 samples/sec Loss 2.4236 LearningRate 0.0037 Epoch: 16 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:19,030-Speed 5567.43 samples/sec Loss 2.4120 LearningRate 0.0037 Epoch: 16 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:20,864-Speed 5585.42 samples/sec Loss 2.5338 LearningRate 0.0037 Epoch: 16 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:22,699-Speed 5582.44 samples/sec Loss 2.5101 LearningRate 0.0037 Epoch: 16 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:24,532-Speed 5588.69 samples/sec Loss 2.4490 LearningRate 0.0037 Epoch: 16 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:26,362-Speed 5597.35 samples/sec Loss 2.3371 LearningRate 0.0037 Epoch: 16 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:28,201-Speed 5571.32 samples/sec Loss 2.5223 LearningRate 0.0037 Epoch: 16 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:30,036-Speed 5584.36 samples/sec Loss 2.4370 LearningRate 0.0037 Epoch: 16 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:31,870-Speed 5582.57 samples/sec Loss 2.5405 LearningRate 0.0037 Epoch: 16 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:33,704-Speed 5586.24 samples/sec Loss 2.4623 LearningRate 0.0037 Epoch: 16 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:35,547-Speed 5560.39 samples/sec Loss 2.5358 LearningRate 0.0037 Epoch: 16 Global Step: 81760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:37,387-Speed 5566.50 samples/sec Loss 2.5200 LearningRate 0.0037 Epoch: 16 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:39,224-Speed 5575.51 samples/sec Loss 2.4261 LearningRate 0.0037 Epoch: 16 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:41,069-Speed 5553.42 samples/sec Loss 2.4388 LearningRate 0.0037 Epoch: 16 Global Step: 81790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:42,917-Speed 5542.02 samples/sec Loss 2.4349 LearningRate 0.0037 Epoch: 16 Global Step: 81800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:44,751-Speed 5584.97 samples/sec Loss 2.3445 LearningRate 0.0037 Epoch: 16 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:30:46,584-Speed 5589.25 samples/sec Loss 2.4997 LearningRate 0.0037 Epoch: 16 Global Step: 81820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:48,421-Speed 5577.11 samples/sec Loss 2.4641 LearningRate 0.0037 Epoch: 16 Global Step: 81830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:50,267-Speed 5548.62 samples/sec Loss 2.5001 LearningRate 0.0036 Epoch: 16 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:52,106-Speed 5571.54 samples/sec Loss 2.4678 LearningRate 0.0036 Epoch: 16 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:53,963-Speed 5517.96 samples/sec Loss 2.4271 LearningRate 0.0036 Epoch: 16 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:55,799-Speed 5576.39 samples/sec Loss 2.4668 LearningRate 0.0036 Epoch: 16 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:57,641-Speed 5561.74 samples/sec Loss 2.4610 LearningRate 0.0036 Epoch: 16 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:30:59,478-Speed 5576.07 samples/sec Loss 2.4737 LearningRate 0.0036 Epoch: 16 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:31:01,323-Speed 5554.36 samples/sec Loss 2.4766 LearningRate 0.0036 Epoch: 16 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:31:03,161-Speed 5572.12 samples/sec Loss 2.4550 LearningRate 0.0036 Epoch: 16 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:31:05,004-Speed 5558.79 samples/sec Loss 2.4926 LearningRate 0.0036 Epoch: 16 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:31:06,845-Speed 5564.14 samples/sec Loss 2.4870 LearningRate 0.0036 Epoch: 16 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:31:08,677-Speed 5590.74 samples/sec Loss 2.4258 LearningRate 0.0036 Epoch: 16 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:31:10,519-Speed 5562.39 samples/sec Loss 2.4587 LearningRate 0.0036 Epoch: 16 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:31:12,357-Speed 5574.50 samples/sec Loss 2.5216 LearningRate 0.0036 Epoch: 16 Global Step: 81960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:31:14,185-Speed 5603.09 samples/sec Loss 2.5543 LearningRate 0.0036 Epoch: 16 Global Step: 81970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:31:16,032-Speed 5546.57 samples/sec Loss 2.4627 LearningRate 0.0036 Epoch: 16 Global Step: 81980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:31:17,883-Speed 5534.53 samples/sec Loss 2.4687 LearningRate 0.0036 Epoch: 16 Global Step: 81990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:31:19,715-Speed 5591.05 samples/sec Loss 2.5109 LearningRate 0.0036 Epoch: 16 Global Step: 82000 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:31:46,180-[lfw][82000]XNorm: 23.026910 Training: 2022-04-11 15:31:46,181-[lfw][82000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:31:46,181-[lfw][82000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:32:16,837-[cfp_fp][82000]XNorm: 21.433771 Training: 2022-04-11 15:32:16,838-[cfp_fp][82000]Accuracy-Flip: 0.98243+-0.00661 Training: 2022-04-11 15:32:16,838-[cfp_fp][82000]Accuracy-Highest: 0.98243 Training: 2022-04-11 15:32:43,158-[agedb_30][82000]XNorm: 22.909013 Training: 2022-04-11 15:32:43,159-[agedb_30][82000]Accuracy-Flip: 0.98183+-0.00747 Training: 2022-04-11 15:32:43,159-[agedb_30][82000]Accuracy-Highest: 0.98350 Training: 2022-04-11 15:32:45,002-Speed 120.07 samples/sec Loss 2.4726 LearningRate 0.0036 Epoch: 16 Global Step: 82010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:46,829-Speed 5608.58 samples/sec Loss 2.4404 LearningRate 0.0036 Epoch: 16 Global Step: 82020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:48,665-Speed 5577.76 samples/sec Loss 2.5184 LearningRate 0.0036 Epoch: 16 Global Step: 82030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:50,503-Speed 5572.15 samples/sec Loss 2.5487 LearningRate 0.0036 Epoch: 16 Global Step: 82040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:52,340-Speed 5578.47 samples/sec Loss 2.3272 LearningRate 0.0036 Epoch: 16 Global Step: 82050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:54,178-Speed 5571.84 samples/sec Loss 2.4269 LearningRate 0.0036 Epoch: 16 Global Step: 82060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:56,013-Speed 5581.82 samples/sec Loss 2.4715 LearningRate 0.0036 Epoch: 16 Global Step: 82070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:57,837-Speed 5616.54 samples/sec Loss 2.6585 LearningRate 0.0036 Epoch: 16 Global Step: 82080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:32:59,670-Speed 5590.94 samples/sec Loss 2.4951 LearningRate 0.0036 Epoch: 16 Global Step: 82090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-11 15:33:01,520-Speed 5534.93 samples/sec Loss 2.4669 LearningRate 0.0035 Epoch: 16 Global Step: 82100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:03,460-Speed 5282.62 samples/sec Loss 2.4787 LearningRate 0.0035 Epoch: 16 Global Step: 82110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:05,309-Speed 5537.91 samples/sec Loss 2.3645 LearningRate 0.0035 Epoch: 16 Global Step: 82120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:07,151-Speed 5560.86 samples/sec Loss 2.5229 LearningRate 0.0035 Epoch: 16 Global Step: 82130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:08,980-Speed 5603.40 samples/sec Loss 2.4566 LearningRate 0.0035 Epoch: 16 Global Step: 82140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:10,823-Speed 5559.09 samples/sec Loss 2.5155 LearningRate 0.0035 Epoch: 16 Global Step: 82150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:12,676-Speed 5527.67 samples/sec Loss 2.5272 LearningRate 0.0035 Epoch: 16 Global Step: 82160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:14,504-Speed 5604.01 samples/sec Loss 2.4941 LearningRate 0.0035 Epoch: 16 Global Step: 82170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:16,336-Speed 5592.14 samples/sec Loss 2.5504 LearningRate 0.0035 Epoch: 16 Global Step: 82180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:18,168-Speed 5589.95 samples/sec Loss 2.5173 LearningRate 0.0035 Epoch: 16 Global Step: 82190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:33:20,010-Speed 5560.88 samples/sec Loss 2.5551 LearningRate 0.0035 Epoch: 16 Global Step: 82200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:21,848-Speed 5573.65 samples/sec Loss 2.4290 LearningRate 0.0035 Epoch: 16 Global Step: 82210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:23,683-Speed 5583.57 samples/sec Loss 2.5304 LearningRate 0.0035 Epoch: 16 Global Step: 82220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:25,517-Speed 5585.19 samples/sec Loss 2.3725 LearningRate 0.0035 Epoch: 16 Global Step: 82230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:27,359-Speed 5559.37 samples/sec Loss 2.5328 LearningRate 0.0035 Epoch: 16 Global Step: 82240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:29,217-Speed 5516.42 samples/sec Loss 2.4994 LearningRate 0.0035 Epoch: 16 Global Step: 82250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:31,053-Speed 5579.65 samples/sec Loss 2.5892 LearningRate 0.0035 Epoch: 16 Global Step: 82260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:32,887-Speed 5583.03 samples/sec Loss 2.5834 LearningRate 0.0035 Epoch: 16 Global Step: 82270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:34,726-Speed 5573.52 samples/sec Loss 2.4939 LearningRate 0.0035 Epoch: 16 Global Step: 82280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:36,557-Speed 5593.16 samples/sec Loss 2.4735 LearningRate 0.0035 Epoch: 16 Global Step: 82290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:38,401-Speed 5557.30 samples/sec Loss 2.5423 LearningRate 0.0035 Epoch: 16 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:33:40,239-Speed 5570.94 samples/sec Loss 2.5121 LearningRate 0.0035 Epoch: 16 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:33:42,068-Speed 5601.68 samples/sec Loss 2.6535 LearningRate 0.0035 Epoch: 16 Global Step: 82320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:43,901-Speed 5590.20 samples/sec Loss 2.6011 LearningRate 0.0035 Epoch: 16 Global Step: 82330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:45,734-Speed 5586.24 samples/sec Loss 2.4630 LearningRate 0.0035 Epoch: 16 Global Step: 82340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:47,584-Speed 5536.60 samples/sec Loss 2.4056 LearningRate 0.0035 Epoch: 16 Global Step: 82350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:49,436-Speed 5533.56 samples/sec Loss 2.4550 LearningRate 0.0035 Epoch: 16 Global Step: 82360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:51,281-Speed 5549.78 samples/sec Loss 2.4639 LearningRate 0.0035 Epoch: 16 Global Step: 82370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:53,149-Speed 5484.53 samples/sec Loss 2.5409 LearningRate 0.0034 Epoch: 16 Global Step: 82380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:54,994-Speed 5554.91 samples/sec Loss 2.4913 LearningRate 0.0034 Epoch: 16 Global Step: 82390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:56,827-Speed 5588.06 samples/sec Loss 2.4742 LearningRate 0.0034 Epoch: 16 Global Step: 82400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:33:58,665-Speed 5574.08 samples/sec Loss 2.4536 LearningRate 0.0034 Epoch: 16 Global Step: 82410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:00,489-Speed 5615.83 samples/sec Loss 2.4789 LearningRate 0.0034 Epoch: 16 Global Step: 82420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:02,328-Speed 5567.98 samples/sec Loss 2.3849 LearningRate 0.0034 Epoch: 16 Global Step: 82430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:04,166-Speed 5576.42 samples/sec Loss 2.5408 LearningRate 0.0034 Epoch: 16 Global Step: 82440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:06,000-Speed 5583.43 samples/sec Loss 2.4467 LearningRate 0.0034 Epoch: 16 Global Step: 82450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:07,831-Speed 5595.62 samples/sec Loss 2.4318 LearningRate 0.0034 Epoch: 16 Global Step: 82460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:09,676-Speed 5550.71 samples/sec Loss 2.4449 LearningRate 0.0034 Epoch: 16 Global Step: 82470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:11,513-Speed 5577.51 samples/sec Loss 2.4858 LearningRate 0.0034 Epoch: 16 Global Step: 82480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:13,358-Speed 5552.35 samples/sec Loss 2.5064 LearningRate 0.0034 Epoch: 16 Global Step: 82490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:15,196-Speed 5573.10 samples/sec Loss 2.5588 LearningRate 0.0034 Epoch: 16 Global Step: 82500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:17,038-Speed 5563.48 samples/sec Loss 2.4440 LearningRate 0.0034 Epoch: 16 Global Step: 82510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:18,870-Speed 5591.79 samples/sec Loss 2.5730 LearningRate 0.0034 Epoch: 16 Global Step: 82520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:20,704-Speed 5583.16 samples/sec Loss 2.5229 LearningRate 0.0034 Epoch: 16 Global Step: 82530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:22,552-Speed 5542.57 samples/sec Loss 2.5694 LearningRate 0.0034 Epoch: 16 Global Step: 82540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:24,394-Speed 5563.95 samples/sec Loss 2.5057 LearningRate 0.0034 Epoch: 16 Global Step: 82550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:34:26,224-Speed 5597.76 samples/sec Loss 2.5628 LearningRate 0.0034 Epoch: 16 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:28,057-Speed 5585.76 samples/sec Loss 2.4979 LearningRate 0.0034 Epoch: 16 Global Step: 82570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:29,894-Speed 5578.12 samples/sec Loss 2.4954 LearningRate 0.0034 Epoch: 16 Global Step: 82580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:31,737-Speed 5558.34 samples/sec Loss 2.5539 LearningRate 0.0034 Epoch: 16 Global Step: 82590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:33,572-Speed 5581.14 samples/sec Loss 2.4884 LearningRate 0.0034 Epoch: 16 Global Step: 82600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:35,420-Speed 5545.81 samples/sec Loss 2.6136 LearningRate 0.0034 Epoch: 16 Global Step: 82610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:37,276-Speed 5519.83 samples/sec Loss 2.4419 LearningRate 0.0034 Epoch: 16 Global Step: 82620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:39,113-Speed 5574.78 samples/sec Loss 2.5546 LearningRate 0.0034 Epoch: 16 Global Step: 82630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:40,954-Speed 5564.57 samples/sec Loss 2.4588 LearningRate 0.0034 Epoch: 16 Global Step: 82640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:42,792-Speed 5572.53 samples/sec Loss 2.5684 LearningRate 0.0033 Epoch: 16 Global Step: 82650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:44,625-Speed 5590.79 samples/sec Loss 2.5423 LearningRate 0.0033 Epoch: 16 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:34:46,455-Speed 5597.66 samples/sec Loss 2.5593 LearningRate 0.0033 Epoch: 16 Global Step: 82670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:48,284-Speed 5601.30 samples/sec Loss 2.3767 LearningRate 0.0033 Epoch: 16 Global Step: 82680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:50,113-Speed 5598.62 samples/sec Loss 2.5342 LearningRate 0.0033 Epoch: 16 Global Step: 82690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:51,962-Speed 5539.26 samples/sec Loss 2.5413 LearningRate 0.0033 Epoch: 16 Global Step: 82700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:53,805-Speed 5559.02 samples/sec Loss 2.5046 LearningRate 0.0033 Epoch: 16 Global Step: 82710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:55,647-Speed 5562.12 samples/sec Loss 2.5380 LearningRate 0.0033 Epoch: 16 Global Step: 82720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:57,487-Speed 5567.32 samples/sec Loss 2.5713 LearningRate 0.0033 Epoch: 16 Global Step: 82730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:34:59,326-Speed 5570.57 samples/sec Loss 2.5544 LearningRate 0.0033 Epoch: 16 Global Step: 82740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:01,184-Speed 5514.95 samples/sec Loss 2.4000 LearningRate 0.0033 Epoch: 16 Global Step: 82750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:03,028-Speed 5552.75 samples/sec Loss 2.4246 LearningRate 0.0033 Epoch: 16 Global Step: 82760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:04,878-Speed 5539.07 samples/sec Loss 2.5565 LearningRate 0.0033 Epoch: 16 Global Step: 82770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:06,732-Speed 5524.09 samples/sec Loss 2.5228 LearningRate 0.0033 Epoch: 16 Global Step: 82780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:08,564-Speed 5592.09 samples/sec Loss 2.5267 LearningRate 0.0033 Epoch: 16 Global Step: 82790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:10,427-Speed 5498.05 samples/sec Loss 2.4869 LearningRate 0.0033 Epoch: 16 Global Step: 82800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:12,262-Speed 5584.54 samples/sec Loss 2.4514 LearningRate 0.0033 Epoch: 16 Global Step: 82810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:14,125-Speed 5496.81 samples/sec Loss 2.5166 LearningRate 0.0033 Epoch: 16 Global Step: 82820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:15,965-Speed 5568.17 samples/sec Loss 2.4902 LearningRate 0.0033 Epoch: 16 Global Step: 82830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:17,797-Speed 5593.71 samples/sec Loss 2.5899 LearningRate 0.0033 Epoch: 16 Global Step: 82840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:19,631-Speed 5583.83 samples/sec Loss 2.5122 LearningRate 0.0033 Epoch: 16 Global Step: 82850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:21,469-Speed 5574.69 samples/sec Loss 2.5677 LearningRate 0.0033 Epoch: 16 Global Step: 82860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:23,328-Speed 5509.75 samples/sec Loss 2.5945 LearningRate 0.0033 Epoch: 16 Global Step: 82870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:25,175-Speed 5546.97 samples/sec Loss 2.5890 LearningRate 0.0033 Epoch: 16 Global Step: 82880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:27,008-Speed 5586.62 samples/sec Loss 2.5391 LearningRate 0.0033 Epoch: 16 Global Step: 82890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:35:28,839-Speed 5596.11 samples/sec Loss 2.5043 LearningRate 0.0033 Epoch: 16 Global Step: 82900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:30,675-Speed 5579.86 samples/sec Loss 2.5338 LearningRate 0.0033 Epoch: 16 Global Step: 82910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:32,508-Speed 5588.55 samples/sec Loss 2.5693 LearningRate 0.0033 Epoch: 16 Global Step: 82920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:34,349-Speed 5565.78 samples/sec Loss 2.4171 LearningRate 0.0032 Epoch: 16 Global Step: 82930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:36,185-Speed 5577.27 samples/sec Loss 2.5561 LearningRate 0.0032 Epoch: 16 Global Step: 82940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:38,022-Speed 5578.36 samples/sec Loss 2.5073 LearningRate 0.0032 Epoch: 16 Global Step: 82950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:39,853-Speed 5594.91 samples/sec Loss 2.4888 LearningRate 0.0032 Epoch: 16 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:41,692-Speed 5569.39 samples/sec Loss 2.5899 LearningRate 0.0032 Epoch: 16 Global Step: 82970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:43,520-Speed 5603.83 samples/sec Loss 2.6715 LearningRate 0.0032 Epoch: 16 Global Step: 82980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:45,360-Speed 5567.35 samples/sec Loss 2.6035 LearningRate 0.0032 Epoch: 16 Global Step: 82990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:47,195-Speed 5581.51 samples/sec Loss 2.4804 LearningRate 0.0032 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:35:49,031-Speed 5582.19 samples/sec Loss 2.4699 LearningRate 0.0032 Epoch: 16 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:35:50,891-Speed 5505.15 samples/sec Loss 2.6646 LearningRate 0.0032 Epoch: 16 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:35:52,735-Speed 5554.38 samples/sec Loss 2.5560 LearningRate 0.0032 Epoch: 16 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:54,578-Speed 5559.12 samples/sec Loss 2.5374 LearningRate 0.0032 Epoch: 16 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:56,424-Speed 5550.76 samples/sec Loss 2.5602 LearningRate 0.0032 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:35:58,259-Speed 5582.30 samples/sec Loss 2.5600 LearningRate 0.0032 Epoch: 16 Global Step: 83060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:00,091-Speed 5592.44 samples/sec Loss 2.4932 LearningRate 0.0032 Epoch: 16 Global Step: 83070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:01,926-Speed 5580.74 samples/sec Loss 2.5281 LearningRate 0.0032 Epoch: 16 Global Step: 83080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:03,766-Speed 5570.54 samples/sec Loss 2.5335 LearningRate 0.0032 Epoch: 16 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:05,600-Speed 5584.42 samples/sec Loss 2.5946 LearningRate 0.0032 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:07,431-Speed 5594.10 samples/sec Loss 2.5151 LearningRate 0.0032 Epoch: 16 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:09,266-Speed 5583.62 samples/sec Loss 2.5214 LearningRate 0.0032 Epoch: 16 Global Step: 83120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:11,102-Speed 5577.50 samples/sec Loss 2.4819 LearningRate 0.0032 Epoch: 16 Global Step: 83130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:12,942-Speed 5566.32 samples/sec Loss 2.4919 LearningRate 0.0032 Epoch: 16 Global Step: 83140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:14,783-Speed 5565.70 samples/sec Loss 2.4987 LearningRate 0.0032 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:16,641-Speed 5512.88 samples/sec Loss 2.5314 LearningRate 0.0032 Epoch: 16 Global Step: 83160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:18,477-Speed 5580.68 samples/sec Loss 2.5077 LearningRate 0.0032 Epoch: 16 Global Step: 83170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:20,310-Speed 5589.72 samples/sec Loss 2.5284 LearningRate 0.0032 Epoch: 16 Global Step: 83180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:22,143-Speed 5587.73 samples/sec Loss 2.5926 LearningRate 0.0032 Epoch: 16 Global Step: 83190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:23,990-Speed 5546.33 samples/sec Loss 2.5021 LearningRate 0.0032 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:25,832-Speed 5561.67 samples/sec Loss 2.5583 LearningRate 0.0031 Epoch: 16 Global Step: 83210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:36:27,667-Speed 5581.55 samples/sec Loss 2.6502 LearningRate 0.0031 Epoch: 16 Global Step: 83220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:29,499-Speed 5592.71 samples/sec Loss 2.5383 LearningRate 0.0031 Epoch: 16 Global Step: 83230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:31,331-Speed 5590.32 samples/sec Loss 2.4953 LearningRate 0.0031 Epoch: 16 Global Step: 83240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:33,177-Speed 5547.30 samples/sec Loss 2.3903 LearningRate 0.0031 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:35,014-Speed 5579.46 samples/sec Loss 2.5497 LearningRate 0.0031 Epoch: 16 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:36,846-Speed 5591.04 samples/sec Loss 2.5464 LearningRate 0.0031 Epoch: 16 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:38,683-Speed 5577.37 samples/sec Loss 2.5297 LearningRate 0.0031 Epoch: 16 Global Step: 83280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:40,531-Speed 5542.55 samples/sec Loss 2.5205 LearningRate 0.0031 Epoch: 16 Global Step: 83290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:42,365-Speed 5583.42 samples/sec Loss 2.5452 LearningRate 0.0031 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:44,205-Speed 5568.37 samples/sec Loss 2.6124 LearningRate 0.0031 Epoch: 16 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:46,035-Speed 5597.03 samples/sec Loss 2.5144 LearningRate 0.0031 Epoch: 16 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:36:47,868-Speed 5589.87 samples/sec Loss 2.4885 LearningRate 0.0031 Epoch: 16 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:36:49,705-Speed 5574.92 samples/sec Loss 2.5468 LearningRate 0.0031 Epoch: 16 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:36:51,541-Speed 5581.04 samples/sec Loss 2.5819 LearningRate 0.0031 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:53,378-Speed 5575.34 samples/sec Loss 2.4722 LearningRate 0.0031 Epoch: 16 Global Step: 83360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:55,212-Speed 5586.23 samples/sec Loss 2.5878 LearningRate 0.0031 Epoch: 16 Global Step: 83370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:57,044-Speed 5591.98 samples/sec Loss 2.4785 LearningRate 0.0031 Epoch: 16 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:36:58,889-Speed 5554.46 samples/sec Loss 2.4129 LearningRate 0.0031 Epoch: 16 Global Step: 83390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:00,756-Speed 5486.42 samples/sec Loss 2.5878 LearningRate 0.0031 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:02,634-Speed 5451.53 samples/sec Loss 2.5539 LearningRate 0.0031 Epoch: 16 Global Step: 83410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:04,469-Speed 5585.87 samples/sec Loss 2.4881 LearningRate 0.0031 Epoch: 16 Global Step: 83420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:06,309-Speed 5566.79 samples/sec Loss 2.5925 LearningRate 0.0031 Epoch: 16 Global Step: 83430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:08,145-Speed 5576.92 samples/sec Loss 2.6171 LearningRate 0.0031 Epoch: 16 Global Step: 83440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:09,983-Speed 5576.17 samples/sec Loss 2.4207 LearningRate 0.0031 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:11,831-Speed 5541.77 samples/sec Loss 2.5586 LearningRate 0.0031 Epoch: 16 Global Step: 83460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:13,694-Speed 5498.20 samples/sec Loss 2.5597 LearningRate 0.0031 Epoch: 16 Global Step: 83470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:15,534-Speed 5569.60 samples/sec Loss 2.5899 LearningRate 0.0031 Epoch: 16 Global Step: 83480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:17,378-Speed 5553.88 samples/sec Loss 2.4934 LearningRate 0.0031 Epoch: 16 Global Step: 83490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:19,216-Speed 5575.94 samples/sec Loss 2.5469 LearningRate 0.0030 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:21,048-Speed 5590.27 samples/sec Loss 2.5742 LearningRate 0.0030 Epoch: 16 Global Step: 83510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:22,882-Speed 5585.76 samples/sec Loss 2.4577 LearningRate 0.0030 Epoch: 16 Global Step: 83520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:24,721-Speed 5569.67 samples/sec Loss 2.4514 LearningRate 0.0030 Epoch: 16 Global Step: 83530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:26,556-Speed 5581.70 samples/sec Loss 2.4947 LearningRate 0.0030 Epoch: 16 Global Step: 83540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:28,401-Speed 5551.03 samples/sec Loss 2.4609 LearningRate 0.0030 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:30,238-Speed 5578.12 samples/sec Loss 2.6306 LearningRate 0.0030 Epoch: 16 Global Step: 83560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:32,072-Speed 5586.51 samples/sec Loss 2.5633 LearningRate 0.0030 Epoch: 16 Global Step: 83570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:33,910-Speed 5571.98 samples/sec Loss 2.6453 LearningRate 0.0030 Epoch: 16 Global Step: 83580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:35,742-Speed 5593.15 samples/sec Loss 2.5962 LearningRate 0.0030 Epoch: 16 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:37:37,601-Speed 5509.54 samples/sec Loss 2.6716 LearningRate 0.0030 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:37:39,439-Speed 5575.84 samples/sec Loss 2.4481 LearningRate 0.0030 Epoch: 16 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:37:41,277-Speed 5573.09 samples/sec Loss 2.5061 LearningRate 0.0030 Epoch: 16 Global Step: 83620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:43,119-Speed 5560.90 samples/sec Loss 2.5007 LearningRate 0.0030 Epoch: 16 Global Step: 83630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:44,955-Speed 5580.64 samples/sec Loss 2.5590 LearningRate 0.0030 Epoch: 16 Global Step: 83640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:46,812-Speed 5513.90 samples/sec Loss 2.5580 LearningRate 0.0030 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:48,647-Speed 5583.93 samples/sec Loss 2.5885 LearningRate 0.0030 Epoch: 16 Global Step: 83660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:50,484-Speed 5574.35 samples/sec Loss 2.5697 LearningRate 0.0030 Epoch: 16 Global Step: 83670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:52,324-Speed 5568.30 samples/sec Loss 2.4997 LearningRate 0.0030 Epoch: 16 Global Step: 83680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:54,159-Speed 5583.98 samples/sec Loss 2.4648 LearningRate 0.0030 Epoch: 16 Global Step: 83690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:55,996-Speed 5574.28 samples/sec Loss 2.3730 LearningRate 0.0030 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:37:57,829-Speed 5587.86 samples/sec Loss 2.5645 LearningRate 0.0030 Epoch: 16 Global Step: 83710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:37:59,668-Speed 5571.54 samples/sec Loss 2.5599 LearningRate 0.0030 Epoch: 16 Global Step: 83720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:01,499-Speed 5595.93 samples/sec Loss 2.4715 LearningRate 0.0030 Epoch: 16 Global Step: 83730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:03,337-Speed 5574.13 samples/sec Loss 2.5393 LearningRate 0.0030 Epoch: 16 Global Step: 83740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:05,174-Speed 5575.63 samples/sec Loss 2.5842 LearningRate 0.0030 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:07,012-Speed 5574.46 samples/sec Loss 2.4920 LearningRate 0.0030 Epoch: 16 Global Step: 83760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:08,846-Speed 5585.12 samples/sec Loss 2.5493 LearningRate 0.0030 Epoch: 16 Global Step: 83770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:10,678-Speed 5591.25 samples/sec Loss 2.6101 LearningRate 0.0030 Epoch: 16 Global Step: 83780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:12,529-Speed 5532.69 samples/sec Loss 2.5850 LearningRate 0.0029 Epoch: 16 Global Step: 83790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:14,372-Speed 5560.50 samples/sec Loss 2.5812 LearningRate 0.0029 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:16,212-Speed 5566.70 samples/sec Loss 2.4754 LearningRate 0.0029 Epoch: 16 Global Step: 83810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:38:18,052-Speed 5568.08 samples/sec Loss 2.4815 LearningRate 0.0029 Epoch: 16 Global Step: 83820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:38:19,876-Speed 5617.85 samples/sec Loss 2.6301 LearningRate 0.0029 Epoch: 16 Global Step: 83830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:21,713-Speed 5576.43 samples/sec Loss 2.4872 LearningRate 0.0029 Epoch: 16 Global Step: 83840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:23,546-Speed 5589.19 samples/sec Loss 2.5060 LearningRate 0.0029 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:25,414-Speed 5482.60 samples/sec Loss 2.4942 LearningRate 0.0029 Epoch: 16 Global Step: 83860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:27,250-Speed 5578.55 samples/sec Loss 2.4900 LearningRate 0.0029 Epoch: 16 Global Step: 83870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:29,093-Speed 5558.80 samples/sec Loss 2.6070 LearningRate 0.0029 Epoch: 16 Global Step: 83880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:30,945-Speed 5530.66 samples/sec Loss 2.6585 LearningRate 0.0029 Epoch: 16 Global Step: 83890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:32,785-Speed 5566.76 samples/sec Loss 2.5679 LearningRate 0.0029 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:34,624-Speed 5569.58 samples/sec Loss 2.5026 LearningRate 0.0029 Epoch: 16 Global Step: 83910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:36,470-Speed 5548.71 samples/sec Loss 2.5938 LearningRate 0.0029 Epoch: 16 Global Step: 83920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:38,311-Speed 5565.04 samples/sec Loss 2.5839 LearningRate 0.0029 Epoch: 16 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:38:40,153-Speed 5564.80 samples/sec Loss 2.5503 LearningRate 0.0029 Epoch: 16 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:38:41,978-Speed 5612.19 samples/sec Loss 2.5337 LearningRate 0.0029 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:43,810-Speed 5591.91 samples/sec Loss 2.5422 LearningRate 0.0029 Epoch: 16 Global Step: 83960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:45,646-Speed 5580.39 samples/sec Loss 2.5460 LearningRate 0.0029 Epoch: 16 Global Step: 83970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:47,502-Speed 5516.74 samples/sec Loss 2.5877 LearningRate 0.0029 Epoch: 16 Global Step: 83980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:49,362-Speed 5510.10 samples/sec Loss 2.5625 LearningRate 0.0029 Epoch: 16 Global Step: 83990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:38:51,226-Speed 5495.48 samples/sec Loss 2.6409 LearningRate 0.0029 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:39:17,696-[lfw][84000]XNorm: 22.333789 Training: 2022-04-11 15:39:17,697-[lfw][84000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 15:39:17,697-[lfw][84000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:39:48,300-[cfp_fp][84000]XNorm: 20.998311 Training: 2022-04-11 15:39:48,301-[cfp_fp][84000]Accuracy-Flip: 0.98186+-0.00723 Training: 2022-04-11 15:39:48,302-[cfp_fp][84000]Accuracy-Highest: 0.98243 Training: 2022-04-11 15:40:14,643-[agedb_30][84000]XNorm: 22.086878 Training: 2022-04-11 15:40:14,644-[agedb_30][84000]Accuracy-Flip: 0.98100+-0.00723 Training: 2022-04-11 15:40:14,644-[agedb_30][84000]Accuracy-Highest: 0.98350 Training: 2022-04-11 15:40:16,504-Speed 120.08 samples/sec Loss 2.7031 LearningRate 0.0029 Epoch: 16 Global Step: 84010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:40:18,337-Speed 5589.82 samples/sec Loss 2.5498 LearningRate 0.0029 Epoch: 16 Global Step: 84020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:40:20,163-Speed 5609.97 samples/sec Loss 2.5650 LearningRate 0.0029 Epoch: 16 Global Step: 84030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:40:22,002-Speed 5568.19 samples/sec Loss 2.5536 LearningRate 0.0029 Epoch: 16 Global Step: 84040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:40:23,851-Speed 5539.83 samples/sec Loss 2.6660 LearningRate 0.0029 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:25,681-Speed 5599.29 samples/sec Loss 2.6107 LearningRate 0.0029 Epoch: 16 Global Step: 84060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:27,523-Speed 5559.62 samples/sec Loss 2.5476 LearningRate 0.0029 Epoch: 16 Global Step: 84070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:29,355-Speed 5592.22 samples/sec Loss 2.4689 LearningRate 0.0029 Epoch: 16 Global Step: 84080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:31,195-Speed 5566.19 samples/sec Loss 2.4982 LearningRate 0.0028 Epoch: 16 Global Step: 84090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:33,031-Speed 5581.27 samples/sec Loss 2.6658 LearningRate 0.0028 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:34,870-Speed 5569.90 samples/sec Loss 2.6137 LearningRate 0.0028 Epoch: 16 Global Step: 84110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:36,704-Speed 5583.90 samples/sec Loss 2.4902 LearningRate 0.0028 Epoch: 16 Global Step: 84120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:38,531-Speed 5605.96 samples/sec Loss 2.4538 LearningRate 0.0028 Epoch: 16 Global Step: 84130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:40,362-Speed 5597.48 samples/sec Loss 2.5876 LearningRate 0.0028 Epoch: 16 Global Step: 84140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:42,201-Speed 5569.12 samples/sec Loss 2.5593 LearningRate 0.0028 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:40:44,023-Speed 5623.35 samples/sec Loss 2.5635 LearningRate 0.0028 Epoch: 16 Global Step: 84160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:45,855-Speed 5590.64 samples/sec Loss 2.5527 LearningRate 0.0028 Epoch: 16 Global Step: 84170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:47,702-Speed 5547.51 samples/sec Loss 2.4517 LearningRate 0.0028 Epoch: 16 Global Step: 84180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:49,536-Speed 5584.29 samples/sec Loss 2.6163 LearningRate 0.0028 Epoch: 16 Global Step: 84190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:51,369-Speed 5588.95 samples/sec Loss 2.6010 LearningRate 0.0028 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:53,213-Speed 5555.23 samples/sec Loss 2.5462 LearningRate 0.0028 Epoch: 16 Global Step: 84210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:55,060-Speed 5545.71 samples/sec Loss 2.4956 LearningRate 0.0028 Epoch: 16 Global Step: 84220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:56,893-Speed 5590.10 samples/sec Loss 2.5864 LearningRate 0.0028 Epoch: 16 Global Step: 84230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:40:58,722-Speed 5598.07 samples/sec Loss 2.5964 LearningRate 0.0028 Epoch: 16 Global Step: 84240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:00,570-Speed 5546.46 samples/sec Loss 2.4879 LearningRate 0.0028 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:02,407-Speed 5576.67 samples/sec Loss 2.5913 LearningRate 0.0028 Epoch: 16 Global Step: 84260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:04,249-Speed 5559.45 samples/sec Loss 2.5325 LearningRate 0.0028 Epoch: 16 Global Step: 84270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:06,084-Speed 5582.45 samples/sec Loss 2.5053 LearningRate 0.0028 Epoch: 16 Global Step: 84280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:07,934-Speed 5538.37 samples/sec Loss 2.4832 LearningRate 0.0028 Epoch: 16 Global Step: 84290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:09,777-Speed 5559.47 samples/sec Loss 2.5318 LearningRate 0.0028 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:11,629-Speed 5530.97 samples/sec Loss 2.5075 LearningRate 0.0028 Epoch: 16 Global Step: 84310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:13,469-Speed 5566.70 samples/sec Loss 2.5582 LearningRate 0.0028 Epoch: 16 Global Step: 84320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:15,310-Speed 5563.07 samples/sec Loss 2.6294 LearningRate 0.0028 Epoch: 16 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:17,146-Speed 5580.31 samples/sec Loss 2.5982 LearningRate 0.0028 Epoch: 16 Global Step: 84340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:18,993-Speed 5545.43 samples/sec Loss 2.5141 LearningRate 0.0028 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:20,827-Speed 5586.81 samples/sec Loss 2.5169 LearningRate 0.0028 Epoch: 16 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:41:22,653-Speed 5611.82 samples/sec Loss 2.5265 LearningRate 0.0028 Epoch: 16 Global Step: 84370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:24,502-Speed 5538.39 samples/sec Loss 2.5668 LearningRate 0.0028 Epoch: 16 Global Step: 84380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:26,345-Speed 5559.54 samples/sec Loss 2.5507 LearningRate 0.0027 Epoch: 16 Global Step: 84390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:28,191-Speed 5548.87 samples/sec Loss 2.4747 LearningRate 0.0027 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:30,023-Speed 5592.84 samples/sec Loss 2.5384 LearningRate 0.0027 Epoch: 16 Global Step: 84410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:31,856-Speed 5585.76 samples/sec Loss 2.5948 LearningRate 0.0027 Epoch: 16 Global Step: 84420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:33,694-Speed 5574.91 samples/sec Loss 2.5978 LearningRate 0.0027 Epoch: 16 Global Step: 84430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:35,530-Speed 5579.93 samples/sec Loss 2.5228 LearningRate 0.0027 Epoch: 16 Global Step: 84440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:37,370-Speed 5565.32 samples/sec Loss 2.5778 LearningRate 0.0027 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:39,203-Speed 5590.44 samples/sec Loss 2.6101 LearningRate 0.0027 Epoch: 16 Global Step: 84460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:41,039-Speed 5579.26 samples/sec Loss 2.5228 LearningRate 0.0027 Epoch: 16 Global Step: 84470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:42,874-Speed 5583.32 samples/sec Loss 2.5550 LearningRate 0.0027 Epoch: 16 Global Step: 84480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:44,706-Speed 5590.48 samples/sec Loss 2.4496 LearningRate 0.0027 Epoch: 16 Global Step: 84490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:46,544-Speed 5573.88 samples/sec Loss 2.5564 LearningRate 0.0027 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:48,408-Speed 5496.29 samples/sec Loss 2.6324 LearningRate 0.0027 Epoch: 16 Global Step: 84510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:50,262-Speed 5525.36 samples/sec Loss 2.4521 LearningRate 0.0027 Epoch: 16 Global Step: 84520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:52,129-Speed 5485.00 samples/sec Loss 2.5780 LearningRate 0.0027 Epoch: 16 Global Step: 84530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:53,999-Speed 5479.67 samples/sec Loss 2.6432 LearningRate 0.0027 Epoch: 16 Global Step: 84540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:55,842-Speed 5558.60 samples/sec Loss 2.5009 LearningRate 0.0027 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:57,675-Speed 5587.41 samples/sec Loss 2.5983 LearningRate 0.0027 Epoch: 16 Global Step: 84560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:41:59,517-Speed 5561.11 samples/sec Loss 2.5115 LearningRate 0.0027 Epoch: 16 Global Step: 84570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:01,376-Speed 5512.45 samples/sec Loss 2.5376 LearningRate 0.0027 Epoch: 16 Global Step: 84580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:03,255-Speed 5451.08 samples/sec Loss 2.5051 LearningRate 0.0027 Epoch: 16 Global Step: 84590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:05,119-Speed 5494.77 samples/sec Loss 2.6524 LearningRate 0.0027 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:06,961-Speed 5562.31 samples/sec Loss 2.4692 LearningRate 0.0027 Epoch: 16 Global Step: 84610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:08,793-Speed 5591.01 samples/sec Loss 2.6177 LearningRate 0.0027 Epoch: 16 Global Step: 84620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:10,631-Speed 5573.79 samples/sec Loss 2.5899 LearningRate 0.0027 Epoch: 16 Global Step: 84630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:12,471-Speed 5565.49 samples/sec Loss 2.6619 LearningRate 0.0027 Epoch: 16 Global Step: 84640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:14,320-Speed 5540.35 samples/sec Loss 2.5340 LearningRate 0.0027 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:16,209-Speed 5422.71 samples/sec Loss 2.5323 LearningRate 0.0027 Epoch: 16 Global Step: 84660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:18,052-Speed 5559.20 samples/sec Loss 2.5673 LearningRate 0.0027 Epoch: 16 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:42:19,879-Speed 5607.22 samples/sec Loss 2.5017 LearningRate 0.0027 Epoch: 16 Global Step: 84680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:21,716-Speed 5578.18 samples/sec Loss 2.4994 LearningRate 0.0027 Epoch: 16 Global Step: 84690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:23,548-Speed 5591.58 samples/sec Loss 2.4984 LearningRate 0.0026 Epoch: 16 Global Step: 84700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:25,381-Speed 5588.14 samples/sec Loss 2.5212 LearningRate 0.0026 Epoch: 16 Global Step: 84710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:27,218-Speed 5576.32 samples/sec Loss 2.5089 LearningRate 0.0026 Epoch: 16 Global Step: 84720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:29,053-Speed 5582.07 samples/sec Loss 2.5307 LearningRate 0.0026 Epoch: 16 Global Step: 84730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:30,884-Speed 5594.18 samples/sec Loss 2.4716 LearningRate 0.0026 Epoch: 16 Global Step: 84740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:32,723-Speed 5570.98 samples/sec Loss 2.4939 LearningRate 0.0026 Epoch: 16 Global Step: 84750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:34,559-Speed 5579.52 samples/sec Loss 2.5125 LearningRate 0.0026 Epoch: 16 Global Step: 84760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:36,406-Speed 5544.94 samples/sec Loss 2.5883 LearningRate 0.0026 Epoch: 16 Global Step: 84770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:38,250-Speed 5556.35 samples/sec Loss 2.5442 LearningRate 0.0026 Epoch: 16 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:42:40,099-Speed 5539.43 samples/sec Loss 2.5098 LearningRate 0.0026 Epoch: 16 Global Step: 84790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:41,943-Speed 5556.24 samples/sec Loss 2.4843 LearningRate 0.0026 Epoch: 16 Global Step: 84800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:43,783-Speed 5566.53 samples/sec Loss 2.6409 LearningRate 0.0026 Epoch: 16 Global Step: 84810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:45,623-Speed 5569.43 samples/sec Loss 2.5265 LearningRate 0.0026 Epoch: 16 Global Step: 84820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:47,455-Speed 5590.15 samples/sec Loss 2.6081 LearningRate 0.0026 Epoch: 16 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:49,295-Speed 5567.50 samples/sec Loss 2.4996 LearningRate 0.0026 Epoch: 16 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:51,130-Speed 5582.81 samples/sec Loss 2.4659 LearningRate 0.0026 Epoch: 16 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:52,968-Speed 5573.38 samples/sec Loss 2.6147 LearningRate 0.0026 Epoch: 16 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:54,807-Speed 5570.92 samples/sec Loss 2.5995 LearningRate 0.0026 Epoch: 16 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:56,641-Speed 5583.61 samples/sec Loss 2.5863 LearningRate 0.0026 Epoch: 16 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:42:58,500-Speed 5511.70 samples/sec Loss 2.4705 LearningRate 0.0026 Epoch: 16 Global Step: 84890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:00,341-Speed 5564.13 samples/sec Loss 2.5381 LearningRate 0.0026 Epoch: 16 Global Step: 84900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:02,199-Speed 5514.76 samples/sec Loss 2.5285 LearningRate 0.0026 Epoch: 16 Global Step: 84910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:04,029-Speed 5595.32 samples/sec Loss 2.6482 LearningRate 0.0026 Epoch: 16 Global Step: 84920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:05,874-Speed 5553.27 samples/sec Loss 2.7424 LearningRate 0.0026 Epoch: 16 Global Step: 84930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:07,711-Speed 5577.67 samples/sec Loss 2.4594 LearningRate 0.0026 Epoch: 16 Global Step: 84940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:09,545-Speed 5584.88 samples/sec Loss 2.4991 LearningRate 0.0026 Epoch: 16 Global Step: 84950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:11,404-Speed 5509.18 samples/sec Loss 2.5684 LearningRate 0.0026 Epoch: 16 Global Step: 84960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:13,244-Speed 5566.21 samples/sec Loss 2.5109 LearningRate 0.0026 Epoch: 16 Global Step: 84970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:15,086-Speed 5562.10 samples/sec Loss 2.6311 LearningRate 0.0026 Epoch: 16 Global Step: 84980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:16,926-Speed 5569.16 samples/sec Loss 2.5062 LearningRate 0.0026 Epoch: 16 Global Step: 84990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:18,762-Speed 5578.29 samples/sec Loss 2.5671 LearningRate 0.0026 Epoch: 16 Global Step: 85000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:20,603-Speed 5564.09 samples/sec Loss 2.5523 LearningRate 0.0025 Epoch: 16 Global Step: 85010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:22,437-Speed 5587.58 samples/sec Loss 2.5850 LearningRate 0.0025 Epoch: 16 Global Step: 85020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:24,282-Speed 5549.94 samples/sec Loss 2.4859 LearningRate 0.0025 Epoch: 16 Global Step: 85030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:26,128-Speed 5550.93 samples/sec Loss 2.4840 LearningRate 0.0025 Epoch: 16 Global Step: 85040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:27,972-Speed 5555.93 samples/sec Loss 2.4995 LearningRate 0.0025 Epoch: 16 Global Step: 85050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:29,810-Speed 5572.39 samples/sec Loss 2.4252 LearningRate 0.0025 Epoch: 16 Global Step: 85060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:31,641-Speed 5596.10 samples/sec Loss 2.4588 LearningRate 0.0025 Epoch: 16 Global Step: 85070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:33,475-Speed 5584.58 samples/sec Loss 2.5588 LearningRate 0.0025 Epoch: 16 Global Step: 85080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:35,317-Speed 5561.29 samples/sec Loss 2.5719 LearningRate 0.0025 Epoch: 16 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:43:37,153-Speed 5578.01 samples/sec Loss 2.5228 LearningRate 0.0025 Epoch: 16 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:43:38,986-Speed 5590.13 samples/sec Loss 2.6051 LearningRate 0.0025 Epoch: 16 Global Step: 85110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:40,833-Speed 5547.62 samples/sec Loss 2.4514 LearningRate 0.0025 Epoch: 16 Global Step: 85120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:42,681-Speed 5540.80 samples/sec Loss 2.6758 LearningRate 0.0025 Epoch: 16 Global Step: 85130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:44,514-Speed 5590.68 samples/sec Loss 2.4748 LearningRate 0.0025 Epoch: 16 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:46,348-Speed 5586.36 samples/sec Loss 2.5163 LearningRate 0.0025 Epoch: 16 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:48,181-Speed 5588.20 samples/sec Loss 2.6435 LearningRate 0.0025 Epoch: 16 Global Step: 85160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:50,038-Speed 5517.39 samples/sec Loss 2.5973 LearningRate 0.0025 Epoch: 16 Global Step: 85170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:43:51,872-Speed 5583.29 samples/sec Loss 2.6147 LearningRate 0.0025 Epoch: 16 Global Step: 85180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:53,711-Speed 5569.65 samples/sec Loss 2.4618 LearningRate 0.0025 Epoch: 16 Global Step: 85190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:55,555-Speed 5557.08 samples/sec Loss 2.5057 LearningRate 0.0025 Epoch: 16 Global Step: 85200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:57,389-Speed 5586.23 samples/sec Loss 2.3526 LearningRate 0.0025 Epoch: 16 Global Step: 85210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:43:59,221-Speed 5591.16 samples/sec Loss 2.5596 LearningRate 0.0025 Epoch: 16 Global Step: 85220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:01,070-Speed 5540.25 samples/sec Loss 2.6003 LearningRate 0.0025 Epoch: 16 Global Step: 85230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:02,914-Speed 5556.42 samples/sec Loss 2.4876 LearningRate 0.0025 Epoch: 16 Global Step: 85240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:04,752-Speed 5572.82 samples/sec Loss 2.4983 LearningRate 0.0025 Epoch: 16 Global Step: 85250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:06,589-Speed 5575.99 samples/sec Loss 2.5824 LearningRate 0.0025 Epoch: 16 Global Step: 85260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:08,423-Speed 5584.75 samples/sec Loss 2.5465 LearningRate 0.0025 Epoch: 16 Global Step: 85270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:10,258-Speed 5583.58 samples/sec Loss 2.4987 LearningRate 0.0025 Epoch: 16 Global Step: 85280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:12,098-Speed 5568.49 samples/sec Loss 2.4917 LearningRate 0.0025 Epoch: 16 Global Step: 85290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:13,944-Speed 5548.68 samples/sec Loss 2.5253 LearningRate 0.0025 Epoch: 16 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:15,804-Speed 5508.06 samples/sec Loss 2.5362 LearningRate 0.0025 Epoch: 16 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:17,635-Speed 5593.19 samples/sec Loss 2.4407 LearningRate 0.0025 Epoch: 16 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:19,470-Speed 5581.00 samples/sec Loss 2.4504 LearningRate 0.0024 Epoch: 16 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:21,308-Speed 5574.35 samples/sec Loss 2.5011 LearningRate 0.0024 Epoch: 16 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:23,141-Speed 5587.80 samples/sec Loss 2.5899 LearningRate 0.0024 Epoch: 16 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:24,976-Speed 5584.39 samples/sec Loss 2.4285 LearningRate 0.0024 Epoch: 16 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:26,829-Speed 5529.80 samples/sec Loss 2.5460 LearningRate 0.0024 Epoch: 16 Global Step: 85370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:28,680-Speed 5533.06 samples/sec Loss 2.5377 LearningRate 0.0024 Epoch: 16 Global Step: 85380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:30,519-Speed 5570.61 samples/sec Loss 2.5866 LearningRate 0.0024 Epoch: 16 Global Step: 85390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:32,355-Speed 5579.86 samples/sec Loss 2.5742 LearningRate 0.0024 Epoch: 16 Global Step: 85400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:34,195-Speed 5566.44 samples/sec Loss 2.4734 LearningRate 0.0024 Epoch: 16 Global Step: 85410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:36,031-Speed 5580.65 samples/sec Loss 2.5683 LearningRate 0.0024 Epoch: 16 Global Step: 85420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:37,876-Speed 5551.61 samples/sec Loss 2.5468 LearningRate 0.0024 Epoch: 16 Global Step: 85430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:39,714-Speed 5571.59 samples/sec Loss 2.6313 LearningRate 0.0024 Epoch: 16 Global Step: 85440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:41,577-Speed 5498.78 samples/sec Loss 2.5775 LearningRate 0.0024 Epoch: 16 Global Step: 85450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:43,411-Speed 5585.11 samples/sec Loss 2.5703 LearningRate 0.0024 Epoch: 16 Global Step: 85460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:45,245-Speed 5588.47 samples/sec Loss 2.5714 LearningRate 0.0024 Epoch: 16 Global Step: 85470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:47,094-Speed 5539.08 samples/sec Loss 2.6359 LearningRate 0.0024 Epoch: 16 Global Step: 85480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:48,936-Speed 5562.76 samples/sec Loss 2.6382 LearningRate 0.0024 Epoch: 16 Global Step: 85490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:50,780-Speed 5553.69 samples/sec Loss 2.6018 LearningRate 0.0024 Epoch: 16 Global Step: 85500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:52,618-Speed 5575.67 samples/sec Loss 2.5330 LearningRate 0.0024 Epoch: 16 Global Step: 85510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:54,453-Speed 5580.56 samples/sec Loss 2.5956 LearningRate 0.0024 Epoch: 16 Global Step: 85520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:56,291-Speed 5575.59 samples/sec Loss 2.5207 LearningRate 0.0024 Epoch: 16 Global Step: 85530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:44:58,119-Speed 5603.33 samples/sec Loss 2.5720 LearningRate 0.0024 Epoch: 16 Global Step: 85540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:44:59,953-Speed 5585.99 samples/sec Loss 2.5084 LearningRate 0.0024 Epoch: 16 Global Step: 85550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:01,789-Speed 5579.18 samples/sec Loss 2.6042 LearningRate 0.0024 Epoch: 16 Global Step: 85560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:03,641-Speed 5530.57 samples/sec Loss 2.4402 LearningRate 0.0024 Epoch: 16 Global Step: 85570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:05,501-Speed 5508.47 samples/sec Loss 2.5837 LearningRate 0.0024 Epoch: 16 Global Step: 85580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:07,369-Speed 5480.82 samples/sec Loss 2.4607 LearningRate 0.0024 Epoch: 16 Global Step: 85590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:09,215-Speed 5552.77 samples/sec Loss 2.5046 LearningRate 0.0024 Epoch: 16 Global Step: 85600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:11,052-Speed 5576.92 samples/sec Loss 2.4605 LearningRate 0.0024 Epoch: 16 Global Step: 85610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:12,896-Speed 5554.86 samples/sec Loss 2.5489 LearningRate 0.0024 Epoch: 16 Global Step: 85620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:14,732-Speed 5577.28 samples/sec Loss 2.6414 LearningRate 0.0024 Epoch: 16 Global Step: 85630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:16,573-Speed 5564.66 samples/sec Loss 2.4809 LearningRate 0.0024 Epoch: 16 Global Step: 85640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:18,410-Speed 5578.07 samples/sec Loss 2.5548 LearningRate 0.0024 Epoch: 16 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:20,243-Speed 5586.74 samples/sec Loss 2.5824 LearningRate 0.0023 Epoch: 16 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:22,081-Speed 5574.88 samples/sec Loss 2.5504 LearningRate 0.0023 Epoch: 16 Global Step: 85670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:23,969-Speed 5424.86 samples/sec Loss 2.3725 LearningRate 0.0023 Epoch: 16 Global Step: 85680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:25,819-Speed 5536.42 samples/sec Loss 2.5180 LearningRate 0.0023 Epoch: 16 Global Step: 85690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:27,662-Speed 5559.68 samples/sec Loss 2.3729 LearningRate 0.0023 Epoch: 16 Global Step: 85700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:29,503-Speed 5562.06 samples/sec Loss 2.5120 LearningRate 0.0023 Epoch: 16 Global Step: 85710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:31,355-Speed 5535.11 samples/sec Loss 2.5966 LearningRate 0.0023 Epoch: 16 Global Step: 85720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:33,195-Speed 5567.05 samples/sec Loss 2.5092 LearningRate 0.0023 Epoch: 16 Global Step: 85730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:35,028-Speed 5587.15 samples/sec Loss 2.4933 LearningRate 0.0023 Epoch: 16 Global Step: 85740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:36,864-Speed 5580.82 samples/sec Loss 2.4255 LearningRate 0.0023 Epoch: 16 Global Step: 85750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:38,702-Speed 5572.74 samples/sec Loss 2.5397 LearningRate 0.0023 Epoch: 16 Global Step: 85760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:40,534-Speed 5590.29 samples/sec Loss 2.5110 LearningRate 0.0023 Epoch: 16 Global Step: 85770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:42,384-Speed 5538.18 samples/sec Loss 2.3998 LearningRate 0.0023 Epoch: 16 Global Step: 85780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:44,223-Speed 5569.84 samples/sec Loss 2.4810 LearningRate 0.0023 Epoch: 16 Global Step: 85790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:45:46,063-Speed 5568.47 samples/sec Loss 2.5840 LearningRate 0.0023 Epoch: 16 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:47,910-Speed 5543.79 samples/sec Loss 2.4736 LearningRate 0.0023 Epoch: 16 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:49,752-Speed 5561.84 samples/sec Loss 2.5520 LearningRate 0.0023 Epoch: 16 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:51,599-Speed 5546.72 samples/sec Loss 2.5879 LearningRate 0.0023 Epoch: 16 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:53,455-Speed 5519.11 samples/sec Loss 2.6130 LearningRate 0.0023 Epoch: 16 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:55,292-Speed 5578.61 samples/sec Loss 2.5543 LearningRate 0.0023 Epoch: 16 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:57,128-Speed 5577.68 samples/sec Loss 2.6232 LearningRate 0.0023 Epoch: 16 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:45:58,970-Speed 5562.58 samples/sec Loss 2.4845 LearningRate 0.0023 Epoch: 16 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:00,813-Speed 5558.79 samples/sec Loss 2.5017 LearningRate 0.0023 Epoch: 16 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:02,651-Speed 5570.61 samples/sec Loss 2.5123 LearningRate 0.0023 Epoch: 16 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:04,492-Speed 5565.23 samples/sec Loss 2.4952 LearningRate 0.0023 Epoch: 16 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:46:06,327-Speed 5584.42 samples/sec Loss 2.6726 LearningRate 0.0023 Epoch: 16 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:08,164-Speed 5574.81 samples/sec Loss 2.5375 LearningRate 0.0023 Epoch: 16 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:10,005-Speed 5565.15 samples/sec Loss 2.5456 LearningRate 0.0023 Epoch: 16 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:11,867-Speed 5500.57 samples/sec Loss 2.4701 LearningRate 0.0023 Epoch: 16 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:13,761-Speed 5408.78 samples/sec Loss 2.5492 LearningRate 0.0023 Epoch: 16 Global Step: 85950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:15,611-Speed 5539.16 samples/sec Loss 2.5191 LearningRate 0.0023 Epoch: 16 Global Step: 85960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:17,469-Speed 5511.97 samples/sec Loss 2.4539 LearningRate 0.0023 Epoch: 16 Global Step: 85970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:19,397-Speed 5312.93 samples/sec Loss 2.7140 LearningRate 0.0023 Epoch: 16 Global Step: 85980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:30,032-Speed 962.97 samples/sec Loss 2.2654 LearningRate 0.0022 Epoch: 17 Global Step: 85990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:31,892-Speed 5508.66 samples/sec Loss 1.9763 LearningRate 0.0022 Epoch: 17 Global Step: 86000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:46:58,452-[lfw][86000]XNorm: 22.249283 Training: 2022-04-11 15:46:58,453-[lfw][86000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:46:58,453-[lfw][86000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:47:28,991-[cfp_fp][86000]XNorm: 20.879434 Training: 2022-04-11 15:47:28,992-[cfp_fp][86000]Accuracy-Flip: 0.98386+-0.00586 Training: 2022-04-11 15:47:28,992-[cfp_fp][86000]Accuracy-Highest: 0.98386 Training: 2022-04-11 15:47:55,324-[agedb_30][86000]XNorm: 22.412029 Training: 2022-04-11 15:47:55,325-[agedb_30][86000]Accuracy-Flip: 0.98283+-0.00654 Training: 2022-04-11 15:47:55,326-[agedb_30][86000]Accuracy-Highest: 0.98350 Training: 2022-04-11 15:47:57,181-Speed 120.06 samples/sec Loss 1.9425 LearningRate 0.0022 Epoch: 17 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:47:58,999-Speed 5632.54 samples/sec Loss 1.8503 LearningRate 0.0022 Epoch: 17 Global Step: 86020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:00,838-Speed 5570.26 samples/sec Loss 1.9287 LearningRate 0.0022 Epoch: 17 Global Step: 86030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:02,685-Speed 5545.32 samples/sec Loss 1.8377 LearningRate 0.0022 Epoch: 17 Global Step: 86040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:04,518-Speed 5587.81 samples/sec Loss 2.0067 LearningRate 0.0022 Epoch: 17 Global Step: 86050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:06,390-Speed 5473.12 samples/sec Loss 1.9368 LearningRate 0.0022 Epoch: 17 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:08,223-Speed 5590.32 samples/sec Loss 1.9248 LearningRate 0.0022 Epoch: 17 Global Step: 86070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:10,063-Speed 5564.88 samples/sec Loss 1.9789 LearningRate 0.0022 Epoch: 17 Global Step: 86080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:11,928-Speed 5494.85 samples/sec Loss 1.9257 LearningRate 0.0022 Epoch: 17 Global Step: 86090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:13,847-Speed 5337.35 samples/sec Loss 1.9140 LearningRate 0.0022 Epoch: 17 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:15,752-Speed 5379.25 samples/sec Loss 1.8696 LearningRate 0.0022 Epoch: 17 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:17,718-Speed 5210.93 samples/sec Loss 1.9394 LearningRate 0.0022 Epoch: 17 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:19,556-Speed 5571.20 samples/sec Loss 1.9802 LearningRate 0.0022 Epoch: 17 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:21,399-Speed 5560.91 samples/sec Loss 1.9666 LearningRate 0.0022 Epoch: 17 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:23,237-Speed 5571.05 samples/sec Loss 1.9388 LearningRate 0.0022 Epoch: 17 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:25,068-Speed 5596.73 samples/sec Loss 1.9588 LearningRate 0.0022 Epoch: 17 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:26,891-Speed 5617.95 samples/sec Loss 1.8306 LearningRate 0.0022 Epoch: 17 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:28,734-Speed 5559.60 samples/sec Loss 1.9011 LearningRate 0.0022 Epoch: 17 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:30,579-Speed 5551.81 samples/sec Loss 1.9532 LearningRate 0.0022 Epoch: 17 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:32,426-Speed 5545.34 samples/sec Loss 1.9534 LearningRate 0.0022 Epoch: 17 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:34,266-Speed 5566.45 samples/sec Loss 1.8496 LearningRate 0.0022 Epoch: 17 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:36,114-Speed 5545.54 samples/sec Loss 1.8912 LearningRate 0.0022 Epoch: 17 Global Step: 86220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:37,957-Speed 5557.61 samples/sec Loss 1.9362 LearningRate 0.0022 Epoch: 17 Global Step: 86230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:39,840-Speed 5441.01 samples/sec Loss 1.9801 LearningRate 0.0022 Epoch: 17 Global Step: 86240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:41,692-Speed 5530.12 samples/sec Loss 2.0438 LearningRate 0.0022 Epoch: 17 Global Step: 86250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:43,526-Speed 5586.05 samples/sec Loss 1.9539 LearningRate 0.0022 Epoch: 17 Global Step: 86260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:45,359-Speed 5587.93 samples/sec Loss 1.8820 LearningRate 0.0022 Epoch: 17 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:47,201-Speed 5562.54 samples/sec Loss 1.9101 LearningRate 0.0022 Epoch: 17 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:49,035-Speed 5584.22 samples/sec Loss 1.9638 LearningRate 0.0022 Epoch: 17 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:50,871-Speed 5580.47 samples/sec Loss 1.9651 LearningRate 0.0022 Epoch: 17 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:48:52,703-Speed 5591.19 samples/sec Loss 1.9664 LearningRate 0.0022 Epoch: 17 Global Step: 86310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:54,560-Speed 5516.42 samples/sec Loss 1.9880 LearningRate 0.0022 Epoch: 17 Global Step: 86320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:56,405-Speed 5552.24 samples/sec Loss 1.9270 LearningRate 0.0021 Epoch: 17 Global Step: 86330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:48:58,246-Speed 5564.66 samples/sec Loss 1.9542 LearningRate 0.0021 Epoch: 17 Global Step: 86340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:00,087-Speed 5564.14 samples/sec Loss 1.9500 LearningRate 0.0021 Epoch: 17 Global Step: 86350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:01,944-Speed 5516.52 samples/sec Loss 1.9598 LearningRate 0.0021 Epoch: 17 Global Step: 86360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:03,788-Speed 5555.85 samples/sec Loss 1.9304 LearningRate 0.0021 Epoch: 17 Global Step: 86370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:05,630-Speed 5560.44 samples/sec Loss 1.8729 LearningRate 0.0021 Epoch: 17 Global Step: 86380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:07,478-Speed 5543.11 samples/sec Loss 1.9249 LearningRate 0.0021 Epoch: 17 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:09,315-Speed 5578.24 samples/sec Loss 1.9415 LearningRate 0.0021 Epoch: 17 Global Step: 86400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:11,156-Speed 5564.17 samples/sec Loss 1.8970 LearningRate 0.0021 Epoch: 17 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:49:13,018-Speed 5500.49 samples/sec Loss 1.9568 LearningRate 0.0021 Epoch: 17 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:14,886-Speed 5483.54 samples/sec Loss 1.9527 LearningRate 0.0021 Epoch: 17 Global Step: 86430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:16,738-Speed 5533.63 samples/sec Loss 1.8708 LearningRate 0.0021 Epoch: 17 Global Step: 86440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:18,580-Speed 5560.52 samples/sec Loss 1.9852 LearningRate 0.0021 Epoch: 17 Global Step: 86450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:20,422-Speed 5562.87 samples/sec Loss 2.0335 LearningRate 0.0021 Epoch: 17 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:22,280-Speed 5512.02 samples/sec Loss 1.9244 LearningRate 0.0021 Epoch: 17 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:24,165-Speed 5433.93 samples/sec Loss 1.9815 LearningRate 0.0021 Epoch: 17 Global Step: 86480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:26,044-Speed 5453.26 samples/sec Loss 2.0127 LearningRate 0.0021 Epoch: 17 Global Step: 86490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:27,885-Speed 5563.69 samples/sec Loss 2.0204 LearningRate 0.0021 Epoch: 17 Global Step: 86500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:29,718-Speed 5588.60 samples/sec Loss 1.9595 LearningRate 0.0021 Epoch: 17 Global Step: 86510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:31,555-Speed 5576.88 samples/sec Loss 2.0657 LearningRate 0.0021 Epoch: 17 Global Step: 86520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:33,398-Speed 5557.08 samples/sec Loss 2.0456 LearningRate 0.0021 Epoch: 17 Global Step: 86530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:35,242-Speed 5557.28 samples/sec Loss 1.9543 LearningRate 0.0021 Epoch: 17 Global Step: 86540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:37,083-Speed 5565.33 samples/sec Loss 1.9926 LearningRate 0.0021 Epoch: 17 Global Step: 86550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:38,918-Speed 5581.20 samples/sec Loss 2.0175 LearningRate 0.0021 Epoch: 17 Global Step: 86560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:40,751-Speed 5587.25 samples/sec Loss 2.0676 LearningRate 0.0021 Epoch: 17 Global Step: 86570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:42,584-Speed 5588.19 samples/sec Loss 2.0721 LearningRate 0.0021 Epoch: 17 Global Step: 86580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:44,426-Speed 5561.07 samples/sec Loss 1.9226 LearningRate 0.0021 Epoch: 17 Global Step: 86590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:46,263-Speed 5576.74 samples/sec Loss 2.0191 LearningRate 0.0021 Epoch: 17 Global Step: 86600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:48,110-Speed 5545.75 samples/sec Loss 1.9567 LearningRate 0.0021 Epoch: 17 Global Step: 86610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:49:49,987-Speed 5458.16 samples/sec Loss 2.0406 LearningRate 0.0021 Epoch: 17 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:51,859-Speed 5472.36 samples/sec Loss 2.0058 LearningRate 0.0021 Epoch: 17 Global Step: 86630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:53,704-Speed 5554.45 samples/sec Loss 1.8994 LearningRate 0.0021 Epoch: 17 Global Step: 86640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:55,538-Speed 5586.61 samples/sec Loss 1.9513 LearningRate 0.0021 Epoch: 17 Global Step: 86650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:57,370-Speed 5590.31 samples/sec Loss 1.9769 LearningRate 0.0021 Epoch: 17 Global Step: 86660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:49:59,206-Speed 5578.99 samples/sec Loss 2.0602 LearningRate 0.0021 Epoch: 17 Global Step: 86670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:01,048-Speed 5561.63 samples/sec Loss 2.1086 LearningRate 0.0020 Epoch: 17 Global Step: 86680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:02,936-Speed 5425.76 samples/sec Loss 2.0209 LearningRate 0.0020 Epoch: 17 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:04,769-Speed 5590.76 samples/sec Loss 2.0277 LearningRate 0.0020 Epoch: 17 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:06,615-Speed 5548.19 samples/sec Loss 2.0138 LearningRate 0.0020 Epoch: 17 Global Step: 86710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:08,454-Speed 5570.44 samples/sec Loss 2.0359 LearningRate 0.0020 Epoch: 17 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:50:10,294-Speed 5568.19 samples/sec Loss 2.0821 LearningRate 0.0020 Epoch: 17 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:50:12,139-Speed 5551.50 samples/sec Loss 1.9597 LearningRate 0.0020 Epoch: 17 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:50:14,007-Speed 5485.14 samples/sec Loss 1.9560 LearningRate 0.0020 Epoch: 17 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:50:15,887-Speed 5448.17 samples/sec Loss 2.0882 LearningRate 0.0020 Epoch: 17 Global Step: 86760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:17,730-Speed 5560.78 samples/sec Loss 2.0326 LearningRate 0.0020 Epoch: 17 Global Step: 86770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:19,563-Speed 5585.91 samples/sec Loss 2.0158 LearningRate 0.0020 Epoch: 17 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:21,409-Speed 5551.07 samples/sec Loss 1.9777 LearningRate 0.0020 Epoch: 17 Global Step: 86790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:23,246-Speed 5576.49 samples/sec Loss 2.0183 LearningRate 0.0020 Epoch: 17 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:25,090-Speed 5555.00 samples/sec Loss 1.9987 LearningRate 0.0020 Epoch: 17 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:26,937-Speed 5545.69 samples/sec Loss 1.9686 LearningRate 0.0020 Epoch: 17 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:28,773-Speed 5578.13 samples/sec Loss 2.0531 LearningRate 0.0020 Epoch: 17 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:30,624-Speed 5536.18 samples/sec Loss 1.9922 LearningRate 0.0020 Epoch: 17 Global Step: 86840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:32,471-Speed 5546.14 samples/sec Loss 2.0636 LearningRate 0.0020 Epoch: 17 Global Step: 86850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:34,298-Speed 5607.06 samples/sec Loss 1.9937 LearningRate 0.0020 Epoch: 17 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:36,136-Speed 5573.74 samples/sec Loss 2.0008 LearningRate 0.0020 Epoch: 17 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:37,974-Speed 5573.78 samples/sec Loss 2.0439 LearningRate 0.0020 Epoch: 17 Global Step: 86880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:50:39,800-Speed 5607.47 samples/sec Loss 1.9787 LearningRate 0.0020 Epoch: 17 Global Step: 86890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:41,637-Speed 5577.87 samples/sec Loss 1.8836 LearningRate 0.0020 Epoch: 17 Global Step: 86900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:43,472-Speed 5580.93 samples/sec Loss 1.9305 LearningRate 0.0020 Epoch: 17 Global Step: 86910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:45,311-Speed 5570.53 samples/sec Loss 1.9598 LearningRate 0.0020 Epoch: 17 Global Step: 86920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:47,154-Speed 5560.29 samples/sec Loss 2.0308 LearningRate 0.0020 Epoch: 17 Global Step: 86930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:48,997-Speed 5557.57 samples/sec Loss 1.9635 LearningRate 0.0020 Epoch: 17 Global Step: 86940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:50,829-Speed 5591.20 samples/sec Loss 1.9877 LearningRate 0.0020 Epoch: 17 Global Step: 86950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:52,689-Speed 5508.30 samples/sec Loss 2.0144 LearningRate 0.0020 Epoch: 17 Global Step: 86960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:54,525-Speed 5578.66 samples/sec Loss 1.9840 LearningRate 0.0020 Epoch: 17 Global Step: 86970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:56,359-Speed 5587.03 samples/sec Loss 2.0803 LearningRate 0.0020 Epoch: 17 Global Step: 86980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:50:58,207-Speed 5541.32 samples/sec Loss 2.0171 LearningRate 0.0020 Epoch: 17 Global Step: 86990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:00,042-Speed 5583.98 samples/sec Loss 1.9678 LearningRate 0.0020 Epoch: 17 Global Step: 87000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:01,884-Speed 5562.91 samples/sec Loss 2.0570 LearningRate 0.0020 Epoch: 17 Global Step: 87010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:03,745-Speed 5503.32 samples/sec Loss 2.0419 LearningRate 0.0020 Epoch: 17 Global Step: 87020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:05,612-Speed 5485.88 samples/sec Loss 1.9503 LearningRate 0.0020 Epoch: 17 Global Step: 87030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:07,456-Speed 5556.99 samples/sec Loss 2.0014 LearningRate 0.0019 Epoch: 17 Global Step: 87040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:09,307-Speed 5535.08 samples/sec Loss 2.0703 LearningRate 0.0019 Epoch: 17 Global Step: 87050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:11,167-Speed 5506.30 samples/sec Loss 2.0284 LearningRate 0.0019 Epoch: 17 Global Step: 87060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:13,014-Speed 5546.71 samples/sec Loss 1.9929 LearningRate 0.0019 Epoch: 17 Global Step: 87070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:14,873-Speed 5509.29 samples/sec Loss 2.0024 LearningRate 0.0019 Epoch: 17 Global Step: 87080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:16,715-Speed 5561.84 samples/sec Loss 1.9900 LearningRate 0.0019 Epoch: 17 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:51:18,549-Speed 5583.83 samples/sec Loss 1.9504 LearningRate 0.0019 Epoch: 17 Global Step: 87100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:20,386-Speed 5576.09 samples/sec Loss 1.9611 LearningRate 0.0019 Epoch: 17 Global Step: 87110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:22,229-Speed 5558.90 samples/sec Loss 2.0282 LearningRate 0.0019 Epoch: 17 Global Step: 87120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:24,079-Speed 5538.71 samples/sec Loss 2.1339 LearningRate 0.0019 Epoch: 17 Global Step: 87130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:25,950-Speed 5474.07 samples/sec Loss 1.9668 LearningRate 0.0019 Epoch: 17 Global Step: 87140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:27,793-Speed 5558.22 samples/sec Loss 1.9889 LearningRate 0.0019 Epoch: 17 Global Step: 87150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:29,660-Speed 5488.93 samples/sec Loss 2.0223 LearningRate 0.0019 Epoch: 17 Global Step: 87160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:31,501-Speed 5565.53 samples/sec Loss 2.0141 LearningRate 0.0019 Epoch: 17 Global Step: 87170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:33,335-Speed 5583.92 samples/sec Loss 2.1029 LearningRate 0.0019 Epoch: 17 Global Step: 87180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:35,171-Speed 5578.21 samples/sec Loss 1.9143 LearningRate 0.0019 Epoch: 17 Global Step: 87190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:37,015-Speed 5565.12 samples/sec Loss 2.0239 LearningRate 0.0019 Epoch: 17 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:51:38,860-Speed 5553.00 samples/sec Loss 1.9916 LearningRate 0.0019 Epoch: 17 Global Step: 87210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:40,711-Speed 5535.00 samples/sec Loss 2.0319 LearningRate 0.0019 Epoch: 17 Global Step: 87220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:42,545-Speed 5583.01 samples/sec Loss 2.0156 LearningRate 0.0019 Epoch: 17 Global Step: 87230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:44,392-Speed 5546.37 samples/sec Loss 1.9810 LearningRate 0.0019 Epoch: 17 Global Step: 87240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:46,232-Speed 5569.42 samples/sec Loss 1.9339 LearningRate 0.0019 Epoch: 17 Global Step: 87250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:48,074-Speed 5562.29 samples/sec Loss 2.0913 LearningRate 0.0019 Epoch: 17 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:49,914-Speed 5565.39 samples/sec Loss 2.0018 LearningRate 0.0019 Epoch: 17 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:51,748-Speed 5585.71 samples/sec Loss 2.0305 LearningRate 0.0019 Epoch: 17 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:53,593-Speed 5552.32 samples/sec Loss 1.9756 LearningRate 0.0019 Epoch: 17 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:55,440-Speed 5545.98 samples/sec Loss 2.1968 LearningRate 0.0019 Epoch: 17 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:51:57,279-Speed 5570.28 samples/sec Loss 1.9723 LearningRate 0.0019 Epoch: 17 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:51:59,127-Speed 5544.16 samples/sec Loss 2.0176 LearningRate 0.0019 Epoch: 17 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:00,964-Speed 5579.50 samples/sec Loss 2.0293 LearningRate 0.0019 Epoch: 17 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:02,809-Speed 5550.60 samples/sec Loss 2.0422 LearningRate 0.0019 Epoch: 17 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:04,655-Speed 5547.22 samples/sec Loss 2.0222 LearningRate 0.0019 Epoch: 17 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:06,517-Speed 5504.30 samples/sec Loss 1.9513 LearningRate 0.0019 Epoch: 17 Global Step: 87360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:08,356-Speed 5570.14 samples/sec Loss 2.0213 LearningRate 0.0019 Epoch: 17 Global Step: 87370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:10,192-Speed 5578.01 samples/sec Loss 2.0255 LearningRate 0.0019 Epoch: 17 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:12,036-Speed 5557.80 samples/sec Loss 2.0775 LearningRate 0.0019 Epoch: 17 Global Step: 87390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:13,910-Speed 5465.63 samples/sec Loss 1.9305 LearningRate 0.0019 Epoch: 17 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:15,785-Speed 5463.41 samples/sec Loss 1.9961 LearningRate 0.0018 Epoch: 17 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:17,625-Speed 5568.08 samples/sec Loss 2.1164 LearningRate 0.0018 Epoch: 17 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:19,466-Speed 5562.69 samples/sec Loss 2.0539 LearningRate 0.0018 Epoch: 17 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:21,302-Speed 5580.46 samples/sec Loss 2.0071 LearningRate 0.0018 Epoch: 17 Global Step: 87440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:23,139-Speed 5577.79 samples/sec Loss 1.9409 LearningRate 0.0018 Epoch: 17 Global Step: 87450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:24,977-Speed 5572.46 samples/sec Loss 2.1278 LearningRate 0.0018 Epoch: 17 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:26,799-Speed 5621.51 samples/sec Loss 2.0513 LearningRate 0.0018 Epoch: 17 Global Step: 87470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:28,642-Speed 5558.23 samples/sec Loss 2.1483 LearningRate 0.0018 Epoch: 17 Global Step: 87480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:30,477-Speed 5582.74 samples/sec Loss 2.0782 LearningRate 0.0018 Epoch: 17 Global Step: 87490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:32,324-Speed 5548.22 samples/sec Loss 1.9568 LearningRate 0.0018 Epoch: 17 Global Step: 87500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:34,165-Speed 5564.47 samples/sec Loss 1.9524 LearningRate 0.0018 Epoch: 17 Global Step: 87510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:36,014-Speed 5539.78 samples/sec Loss 2.0441 LearningRate 0.0018 Epoch: 17 Global Step: 87520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:37,868-Speed 5525.28 samples/sec Loss 2.0639 LearningRate 0.0018 Epoch: 17 Global Step: 87530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:39,720-Speed 5530.21 samples/sec Loss 2.1373 LearningRate 0.0018 Epoch: 17 Global Step: 87540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:41,565-Speed 5553.82 samples/sec Loss 2.1038 LearningRate 0.0018 Epoch: 17 Global Step: 87550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:43,398-Speed 5589.79 samples/sec Loss 2.0182 LearningRate 0.0018 Epoch: 17 Global Step: 87560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:45,254-Speed 5518.35 samples/sec Loss 2.0562 LearningRate 0.0018 Epoch: 17 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:47,092-Speed 5574.30 samples/sec Loss 2.0638 LearningRate 0.0018 Epoch: 17 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:48,934-Speed 5561.72 samples/sec Loss 2.1783 LearningRate 0.0018 Epoch: 17 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:50,775-Speed 5563.85 samples/sec Loss 2.0165 LearningRate 0.0018 Epoch: 17 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:52:52,623-Speed 5544.84 samples/sec Loss 2.0130 LearningRate 0.0018 Epoch: 17 Global Step: 87610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:54,461-Speed 5574.06 samples/sec Loss 2.0800 LearningRate 0.0018 Epoch: 17 Global Step: 87620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:56,314-Speed 5527.41 samples/sec Loss 2.0441 LearningRate 0.0018 Epoch: 17 Global Step: 87630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:58,152-Speed 5574.48 samples/sec Loss 2.1337 LearningRate 0.0018 Epoch: 17 Global Step: 87640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:52:59,984-Speed 5590.50 samples/sec Loss 2.0721 LearningRate 0.0018 Epoch: 17 Global Step: 87650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:01,834-Speed 5536.15 samples/sec Loss 2.0544 LearningRate 0.0018 Epoch: 17 Global Step: 87660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:03,700-Speed 5491.84 samples/sec Loss 2.0772 LearningRate 0.0018 Epoch: 17 Global Step: 87670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:05,557-Speed 5515.40 samples/sec Loss 2.0457 LearningRate 0.0018 Epoch: 17 Global Step: 87680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:07,398-Speed 5563.56 samples/sec Loss 2.1629 LearningRate 0.0018 Epoch: 17 Global Step: 87690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:09,240-Speed 5562.26 samples/sec Loss 2.0402 LearningRate 0.0018 Epoch: 17 Global Step: 87700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:11,090-Speed 5537.76 samples/sec Loss 2.0941 LearningRate 0.0018 Epoch: 17 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:53:12,934-Speed 5556.37 samples/sec Loss 2.1187 LearningRate 0.0018 Epoch: 17 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:53:14,779-Speed 5552.72 samples/sec Loss 2.1944 LearningRate 0.0018 Epoch: 17 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:53:16,620-Speed 5564.64 samples/sec Loss 2.0016 LearningRate 0.0018 Epoch: 17 Global Step: 87740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:18,455-Speed 5581.88 samples/sec Loss 2.1822 LearningRate 0.0018 Epoch: 17 Global Step: 87750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:20,300-Speed 5549.75 samples/sec Loss 2.0692 LearningRate 0.0018 Epoch: 17 Global Step: 87760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:22,142-Speed 5564.12 samples/sec Loss 1.9279 LearningRate 0.0018 Epoch: 17 Global Step: 87770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:23,977-Speed 5581.39 samples/sec Loss 2.0101 LearningRate 0.0017 Epoch: 17 Global Step: 87780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:25,829-Speed 5532.44 samples/sec Loss 2.0444 LearningRate 0.0017 Epoch: 17 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:27,678-Speed 5539.35 samples/sec Loss 2.0174 LearningRate 0.0017 Epoch: 17 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:29,539-Speed 5504.21 samples/sec Loss 2.0215 LearningRate 0.0017 Epoch: 17 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:31,390-Speed 5534.74 samples/sec Loss 2.0328 LearningRate 0.0017 Epoch: 17 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:33,226-Speed 5579.68 samples/sec Loss 1.9925 LearningRate 0.0017 Epoch: 17 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:35,060-Speed 5583.76 samples/sec Loss 2.1055 LearningRate 0.0017 Epoch: 17 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:36,901-Speed 5566.38 samples/sec Loss 2.0869 LearningRate 0.0017 Epoch: 17 Global Step: 87850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:38,740-Speed 5568.12 samples/sec Loss 2.0941 LearningRate 0.0017 Epoch: 17 Global Step: 87860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:40,579-Speed 5571.09 samples/sec Loss 2.0801 LearningRate 0.0017 Epoch: 17 Global Step: 87870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:42,417-Speed 5573.90 samples/sec Loss 2.1144 LearningRate 0.0017 Epoch: 17 Global Step: 87880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:44,253-Speed 5580.28 samples/sec Loss 2.0925 LearningRate 0.0017 Epoch: 17 Global Step: 87890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:46,097-Speed 5552.95 samples/sec Loss 2.0737 LearningRate 0.0017 Epoch: 17 Global Step: 87900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:47,953-Speed 5520.16 samples/sec Loss 2.0716 LearningRate 0.0017 Epoch: 17 Global Step: 87910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:49,789-Speed 5579.05 samples/sec Loss 2.0029 LearningRate 0.0017 Epoch: 17 Global Step: 87920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:53:51,619-Speed 5600.07 samples/sec Loss 2.0781 LearningRate 0.0017 Epoch: 17 Global Step: 87930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:53:53,466-Speed 5544.28 samples/sec Loss 2.0202 LearningRate 0.0017 Epoch: 17 Global Step: 87940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:53:55,322-Speed 5519.43 samples/sec Loss 2.0362 LearningRate 0.0017 Epoch: 17 Global Step: 87950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:53:57,161-Speed 5570.46 samples/sec Loss 2.0202 LearningRate 0.0017 Epoch: 17 Global Step: 87960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:53:59,009-Speed 5542.80 samples/sec Loss 2.0364 LearningRate 0.0017 Epoch: 17 Global Step: 87970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:54:00,849-Speed 5569.64 samples/sec Loss 2.0722 LearningRate 0.0017 Epoch: 17 Global Step: 87980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:54:02,688-Speed 5568.81 samples/sec Loss 2.0318 LearningRate 0.0017 Epoch: 17 Global Step: 87990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:54:04,528-Speed 5568.93 samples/sec Loss 1.9986 LearningRate 0.0017 Epoch: 17 Global Step: 88000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:54:31,115-[lfw][88000]XNorm: 22.533454 Training: 2022-04-11 15:54:31,115-[lfw][88000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 15:54:31,116-[lfw][88000]Accuracy-Highest: 0.99817 Training: 2022-04-11 15:55:01,718-[cfp_fp][88000]XNorm: 21.454129 Training: 2022-04-11 15:55:01,719-[cfp_fp][88000]Accuracy-Flip: 0.98329+-0.00515 Training: 2022-04-11 15:55:01,720-[cfp_fp][88000]Accuracy-Highest: 0.98386 Training: 2022-04-11 15:55:28,147-[agedb_30][88000]XNorm: 22.700543 Training: 2022-04-11 15:55:28,148-[agedb_30][88000]Accuracy-Flip: 0.98350+-0.00754 Training: 2022-04-11 15:55:28,148-[agedb_30][88000]Accuracy-Highest: 0.98350 Training: 2022-04-11 15:55:30,061-Speed 119.72 samples/sec Loss 2.0418 LearningRate 0.0017 Epoch: 17 Global Step: 88010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:55:31,943-Speed 5442.28 samples/sec Loss 2.0396 LearningRate 0.0017 Epoch: 17 Global Step: 88020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:55:33,772-Speed 5599.41 samples/sec Loss 1.9940 LearningRate 0.0017 Epoch: 17 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:35,630-Speed 5512.63 samples/sec Loss 2.1146 LearningRate 0.0017 Epoch: 17 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:37,489-Speed 5513.97 samples/sec Loss 2.0368 LearningRate 0.0017 Epoch: 17 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:39,334-Speed 5551.77 samples/sec Loss 2.0364 LearningRate 0.0017 Epoch: 17 Global Step: 88060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:41,176-Speed 5560.12 samples/sec Loss 2.0637 LearningRate 0.0017 Epoch: 17 Global Step: 88070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:43,041-Speed 5495.94 samples/sec Loss 2.0656 LearningRate 0.0017 Epoch: 17 Global Step: 88080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:44,874-Speed 5587.30 samples/sec Loss 2.0790 LearningRate 0.0017 Epoch: 17 Global Step: 88090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:46,704-Speed 5595.83 samples/sec Loss 2.0473 LearningRate 0.0017 Epoch: 17 Global Step: 88100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:48,549-Speed 5554.45 samples/sec Loss 1.9448 LearningRate 0.0017 Epoch: 17 Global Step: 88110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:50,394-Speed 5551.01 samples/sec Loss 2.0300 LearningRate 0.0017 Epoch: 17 Global Step: 88120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:52,231-Speed 5578.51 samples/sec Loss 2.0622 LearningRate 0.0017 Epoch: 17 Global Step: 88130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:54,076-Speed 5552.85 samples/sec Loss 2.0675 LearningRate 0.0017 Epoch: 17 Global Step: 88140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:55,918-Speed 5561.52 samples/sec Loss 2.0044 LearningRate 0.0017 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:57,747-Speed 5600.21 samples/sec Loss 1.9557 LearningRate 0.0017 Epoch: 17 Global Step: 88160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:55:59,581-Speed 5585.96 samples/sec Loss 2.0503 LearningRate 0.0016 Epoch: 17 Global Step: 88170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:01,412-Speed 5592.17 samples/sec Loss 2.1044 LearningRate 0.0016 Epoch: 17 Global Step: 88180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:03,245-Speed 5589.64 samples/sec Loss 2.0794 LearningRate 0.0016 Epoch: 17 Global Step: 88190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:05,075-Speed 5598.89 samples/sec Loss 2.0052 LearningRate 0.0016 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:06,932-Speed 5516.43 samples/sec Loss 1.9613 LearningRate 0.0016 Epoch: 17 Global Step: 88210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:08,763-Speed 5595.57 samples/sec Loss 2.1317 LearningRate 0.0016 Epoch: 17 Global Step: 88220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:10,605-Speed 5561.79 samples/sec Loss 1.9555 LearningRate 0.0016 Epoch: 17 Global Step: 88230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:12,445-Speed 5567.46 samples/sec Loss 2.0971 LearningRate 0.0016 Epoch: 17 Global Step: 88240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:14,297-Speed 5531.15 samples/sec Loss 2.0067 LearningRate 0.0016 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:16,137-Speed 5567.68 samples/sec Loss 2.2034 LearningRate 0.0016 Epoch: 17 Global Step: 88260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:17,971-Speed 5586.22 samples/sec Loss 2.0466 LearningRate 0.0016 Epoch: 17 Global Step: 88270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:56:19,824-Speed 5527.49 samples/sec Loss 2.0637 LearningRate 0.0016 Epoch: 17 Global Step: 88280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:21,666-Speed 5562.78 samples/sec Loss 2.0735 LearningRate 0.0016 Epoch: 17 Global Step: 88290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:23,503-Speed 5576.52 samples/sec Loss 2.0688 LearningRate 0.0016 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:25,341-Speed 5573.15 samples/sec Loss 2.0403 LearningRate 0.0016 Epoch: 17 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:27,186-Speed 5552.47 samples/sec Loss 2.0215 LearningRate 0.0016 Epoch: 17 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:29,037-Speed 5537.12 samples/sec Loss 2.0177 LearningRate 0.0016 Epoch: 17 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:30,870-Speed 5586.89 samples/sec Loss 2.0002 LearningRate 0.0016 Epoch: 17 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:32,700-Speed 5597.65 samples/sec Loss 2.0494 LearningRate 0.0016 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:34,535-Speed 5584.45 samples/sec Loss 2.0827 LearningRate 0.0016 Epoch: 17 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:36,385-Speed 5537.34 samples/sec Loss 1.9968 LearningRate 0.0016 Epoch: 17 Global Step: 88370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:38,229-Speed 5553.98 samples/sec Loss 2.0579 LearningRate 0.0016 Epoch: 17 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:40,077-Speed 5544.95 samples/sec Loss 2.0812 LearningRate 0.0016 Epoch: 17 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:41,918-Speed 5562.37 samples/sec Loss 2.1291 LearningRate 0.0016 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:43,759-Speed 5565.16 samples/sec Loss 2.0606 LearningRate 0.0016 Epoch: 17 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:45,596-Speed 5577.07 samples/sec Loss 2.0549 LearningRate 0.0016 Epoch: 17 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:47,443-Speed 5547.66 samples/sec Loss 2.0293 LearningRate 0.0016 Epoch: 17 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:56:49,273-Speed 5596.15 samples/sec Loss 2.0681 LearningRate 0.0016 Epoch: 17 Global Step: 88440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:51,108-Speed 5582.35 samples/sec Loss 2.0639 LearningRate 0.0016 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:52,954-Speed 5550.36 samples/sec Loss 2.0577 LearningRate 0.0016 Epoch: 17 Global Step: 88460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:54,797-Speed 5558.38 samples/sec Loss 2.0771 LearningRate 0.0016 Epoch: 17 Global Step: 88470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:56,628-Speed 5594.88 samples/sec Loss 2.1385 LearningRate 0.0016 Epoch: 17 Global Step: 88480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:56:58,459-Speed 5595.77 samples/sec Loss 2.0415 LearningRate 0.0016 Epoch: 17 Global Step: 88490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:00,289-Speed 5597.61 samples/sec Loss 2.1105 LearningRate 0.0016 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:02,131-Speed 5561.10 samples/sec Loss 2.0222 LearningRate 0.0016 Epoch: 17 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:03,973-Speed 5562.10 samples/sec Loss 2.0209 LearningRate 0.0016 Epoch: 17 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:05,824-Speed 5535.46 samples/sec Loss 2.0750 LearningRate 0.0016 Epoch: 17 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:07,657-Speed 5586.02 samples/sec Loss 1.9557 LearningRate 0.0016 Epoch: 17 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:09,490-Speed 5591.16 samples/sec Loss 2.1287 LearningRate 0.0016 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:11,336-Speed 5548.09 samples/sec Loss 2.0222 LearningRate 0.0016 Epoch: 17 Global Step: 88560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:13,186-Speed 5539.53 samples/sec Loss 2.0485 LearningRate 0.0015 Epoch: 17 Global Step: 88570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:15,030-Speed 5554.03 samples/sec Loss 2.0609 LearningRate 0.0015 Epoch: 17 Global Step: 88580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:16,881-Speed 5534.98 samples/sec Loss 2.0347 LearningRate 0.0015 Epoch: 17 Global Step: 88590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:18,722-Speed 5564.09 samples/sec Loss 1.9930 LearningRate 0.0015 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:20,556-Speed 5584.13 samples/sec Loss 2.0300 LearningRate 0.0015 Epoch: 17 Global Step: 88610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:22,395-Speed 5571.80 samples/sec Loss 2.1028 LearningRate 0.0015 Epoch: 17 Global Step: 88620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:24,247-Speed 5531.61 samples/sec Loss 2.1492 LearningRate 0.0015 Epoch: 17 Global Step: 88630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:26,092-Speed 5553.49 samples/sec Loss 2.0089 LearningRate 0.0015 Epoch: 17 Global Step: 88640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:27,935-Speed 5558.43 samples/sec Loss 2.1357 LearningRate 0.0015 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:29,771-Speed 5578.04 samples/sec Loss 1.9635 LearningRate 0.0015 Epoch: 17 Global Step: 88660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:31,607-Speed 5580.31 samples/sec Loss 2.0031 LearningRate 0.0015 Epoch: 17 Global Step: 88670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:33,443-Speed 5578.77 samples/sec Loss 1.9981 LearningRate 0.0015 Epoch: 17 Global Step: 88680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:35,275-Speed 5591.52 samples/sec Loss 2.0595 LearningRate 0.0015 Epoch: 17 Global Step: 88690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:37,120-Speed 5551.15 samples/sec Loss 1.9716 LearningRate 0.0015 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:38,967-Speed 5547.01 samples/sec Loss 2.1026 LearningRate 0.0015 Epoch: 17 Global Step: 88710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:40,809-Speed 5560.53 samples/sec Loss 2.0758 LearningRate 0.0015 Epoch: 17 Global Step: 88720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:42,668-Speed 5511.47 samples/sec Loss 2.0764 LearningRate 0.0015 Epoch: 17 Global Step: 88730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:44,505-Speed 5578.40 samples/sec Loss 2.1442 LearningRate 0.0015 Epoch: 17 Global Step: 88740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:46,377-Speed 5471.75 samples/sec Loss 2.0599 LearningRate 0.0015 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:48,216-Speed 5569.22 samples/sec Loss 2.0245 LearningRate 0.0015 Epoch: 17 Global Step: 88760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:50,065-Speed 5541.38 samples/sec Loss 2.0984 LearningRate 0.0015 Epoch: 17 Global Step: 88770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:51,912-Speed 5545.99 samples/sec Loss 1.9629 LearningRate 0.0015 Epoch: 17 Global Step: 88780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 15:57:53,762-Speed 5537.96 samples/sec Loss 2.0501 LearningRate 0.0015 Epoch: 17 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:55,603-Speed 5562.84 samples/sec Loss 2.0600 LearningRate 0.0015 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:57,438-Speed 5581.18 samples/sec Loss 2.0664 LearningRate 0.0015 Epoch: 17 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:57:59,284-Speed 5549.17 samples/sec Loss 2.0784 LearningRate 0.0015 Epoch: 17 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:01,131-Speed 5546.39 samples/sec Loss 2.0854 LearningRate 0.0015 Epoch: 17 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:02,985-Speed 5526.28 samples/sec Loss 2.1572 LearningRate 0.0015 Epoch: 17 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:04,836-Speed 5535.61 samples/sec Loss 2.0103 LearningRate 0.0015 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:06,688-Speed 5529.65 samples/sec Loss 2.0349 LearningRate 0.0015 Epoch: 17 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:08,524-Speed 5580.99 samples/sec Loss 2.1418 LearningRate 0.0015 Epoch: 17 Global Step: 88870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:10,353-Speed 5598.92 samples/sec Loss 2.0716 LearningRate 0.0015 Epoch: 17 Global Step: 88880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:12,200-Speed 5547.19 samples/sec Loss 1.9863 LearningRate 0.0015 Epoch: 17 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:58:14,055-Speed 5522.87 samples/sec Loss 1.9661 LearningRate 0.0015 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:15,901-Speed 5547.29 samples/sec Loss 2.0625 LearningRate 0.0015 Epoch: 17 Global Step: 88910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:17,757-Speed 5519.31 samples/sec Loss 2.1479 LearningRate 0.0015 Epoch: 17 Global Step: 88920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:19,598-Speed 5564.40 samples/sec Loss 2.0656 LearningRate 0.0015 Epoch: 17 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:21,434-Speed 5581.85 samples/sec Loss 2.0363 LearningRate 0.0015 Epoch: 17 Global Step: 88940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:23,276-Speed 5561.09 samples/sec Loss 2.1603 LearningRate 0.0015 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:25,129-Speed 5529.58 samples/sec Loss 2.0895 LearningRate 0.0015 Epoch: 17 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:26,977-Speed 5540.82 samples/sec Loss 2.0845 LearningRate 0.0015 Epoch: 17 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:28,816-Speed 5572.16 samples/sec Loss 2.1913 LearningRate 0.0014 Epoch: 17 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:30,655-Speed 5571.04 samples/sec Loss 2.1233 LearningRate 0.0014 Epoch: 17 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:32,490-Speed 5584.18 samples/sec Loss 1.9575 LearningRate 0.0014 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:34,336-Speed 5547.60 samples/sec Loss 2.0817 LearningRate 0.0014 Epoch: 17 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:36,185-Speed 5542.89 samples/sec Loss 2.0744 LearningRate 0.0014 Epoch: 17 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:38,033-Speed 5542.60 samples/sec Loss 2.0417 LearningRate 0.0014 Epoch: 17 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:39,879-Speed 5548.30 samples/sec Loss 2.1218 LearningRate 0.0014 Epoch: 17 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:41,745-Speed 5489.20 samples/sec Loss 2.1072 LearningRate 0.0014 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:43,601-Speed 5520.30 samples/sec Loss 2.0484 LearningRate 0.0014 Epoch: 17 Global Step: 89060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:45,433-Speed 5591.94 samples/sec Loss 2.0079 LearningRate 0.0014 Epoch: 17 Global Step: 89070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:47,269-Speed 5580.99 samples/sec Loss 2.1371 LearningRate 0.0014 Epoch: 17 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:49,105-Speed 5578.64 samples/sec Loss 2.1780 LearningRate 0.0014 Epoch: 17 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:50,944-Speed 5569.98 samples/sec Loss 2.1143 LearningRate 0.0014 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:58:52,781-Speed 5576.72 samples/sec Loss 2.0696 LearningRate 0.0014 Epoch: 17 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:54,637-Speed 5518.99 samples/sec Loss 2.0442 LearningRate 0.0014 Epoch: 17 Global Step: 89120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:56,479-Speed 5562.51 samples/sec Loss 2.0395 LearningRate 0.0014 Epoch: 17 Global Step: 89130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:58:58,319-Speed 5567.02 samples/sec Loss 2.0954 LearningRate 0.0014 Epoch: 17 Global Step: 89140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:00,158-Speed 5568.31 samples/sec Loss 1.8729 LearningRate 0.0014 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:01,997-Speed 5571.10 samples/sec Loss 1.9924 LearningRate 0.0014 Epoch: 17 Global Step: 89160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:03,842-Speed 5552.82 samples/sec Loss 2.0912 LearningRate 0.0014 Epoch: 17 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:05,691-Speed 5541.33 samples/sec Loss 2.1536 LearningRate 0.0014 Epoch: 17 Global Step: 89180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:07,541-Speed 5536.57 samples/sec Loss 2.1062 LearningRate 0.0014 Epoch: 17 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:09,378-Speed 5577.33 samples/sec Loss 2.0716 LearningRate 0.0014 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:11,209-Speed 5592.81 samples/sec Loss 2.0910 LearningRate 0.0014 Epoch: 17 Global Step: 89210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:13,064-Speed 5521.90 samples/sec Loss 2.0778 LearningRate 0.0014 Epoch: 17 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:14,900-Speed 5581.08 samples/sec Loss 1.9929 LearningRate 0.0014 Epoch: 17 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:16,744-Speed 5554.59 samples/sec Loss 1.9486 LearningRate 0.0014 Epoch: 17 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:18,579-Speed 5583.11 samples/sec Loss 2.1218 LearningRate 0.0014 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:20,422-Speed 5559.38 samples/sec Loss 2.0342 LearningRate 0.0014 Epoch: 17 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:22,256-Speed 5586.00 samples/sec Loss 2.1136 LearningRate 0.0014 Epoch: 17 Global Step: 89270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:24,093-Speed 5575.36 samples/sec Loss 2.1474 LearningRate 0.0014 Epoch: 17 Global Step: 89280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:25,938-Speed 5551.32 samples/sec Loss 1.9590 LearningRate 0.0014 Epoch: 17 Global Step: 89290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:27,787-Speed 5541.08 samples/sec Loss 2.0293 LearningRate 0.0014 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:29,630-Speed 5559.95 samples/sec Loss 2.0029 LearningRate 0.0014 Epoch: 17 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:59:31,465-Speed 5580.88 samples/sec Loss 2.1510 LearningRate 0.0014 Epoch: 17 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 15:59:33,302-Speed 5576.58 samples/sec Loss 2.0369 LearningRate 0.0014 Epoch: 17 Global Step: 89330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:35,141-Speed 5572.42 samples/sec Loss 2.0454 LearningRate 0.0014 Epoch: 17 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:36,981-Speed 5567.13 samples/sec Loss 2.1119 LearningRate 0.0014 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:38,826-Speed 5551.51 samples/sec Loss 2.0299 LearningRate 0.0014 Epoch: 17 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:40,667-Speed 5566.63 samples/sec Loss 2.0405 LearningRate 0.0014 Epoch: 17 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:42,508-Speed 5563.54 samples/sec Loss 2.1150 LearningRate 0.0014 Epoch: 17 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:44,348-Speed 5567.41 samples/sec Loss 2.0466 LearningRate 0.0014 Epoch: 17 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:46,180-Speed 5592.02 samples/sec Loss 2.0641 LearningRate 0.0014 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:48,019-Speed 5569.79 samples/sec Loss 2.0640 LearningRate 0.0013 Epoch: 17 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:49,854-Speed 5584.06 samples/sec Loss 1.9415 LearningRate 0.0013 Epoch: 17 Global Step: 89420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:51,680-Speed 5608.31 samples/sec Loss 2.0341 LearningRate 0.0013 Epoch: 17 Global Step: 89430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:53,517-Speed 5578.91 samples/sec Loss 1.9672 LearningRate 0.0013 Epoch: 17 Global Step: 89440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:55,357-Speed 5566.79 samples/sec Loss 2.0190 LearningRate 0.0013 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:57,191-Speed 5583.98 samples/sec Loss 2.1210 LearningRate 0.0013 Epoch: 17 Global Step: 89460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 15:59:59,031-Speed 5569.16 samples/sec Loss 2.0861 LearningRate 0.0013 Epoch: 17 Global Step: 89470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:00,867-Speed 5579.46 samples/sec Loss 2.0474 LearningRate 0.0013 Epoch: 17 Global Step: 89480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:02,708-Speed 5562.53 samples/sec Loss 2.1337 LearningRate 0.0013 Epoch: 17 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:04,552-Speed 5558.06 samples/sec Loss 1.9682 LearningRate 0.0013 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:06,390-Speed 5570.73 samples/sec Loss 2.0472 LearningRate 0.0013 Epoch: 17 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:08,222-Speed 5590.98 samples/sec Loss 2.0437 LearningRate 0.0013 Epoch: 17 Global Step: 89520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:10,059-Speed 5578.93 samples/sec Loss 2.0459 LearningRate 0.0013 Epoch: 17 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:00:11,920-Speed 5504.19 samples/sec Loss 2.0595 LearningRate 0.0013 Epoch: 17 Global Step: 89540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:13,765-Speed 5553.45 samples/sec Loss 1.9595 LearningRate 0.0013 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:15,628-Speed 5497.02 samples/sec Loss 2.0633 LearningRate 0.0013 Epoch: 17 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:17,474-Speed 5550.39 samples/sec Loss 2.0572 LearningRate 0.0013 Epoch: 17 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:19,315-Speed 5565.19 samples/sec Loss 2.0870 LearningRate 0.0013 Epoch: 17 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:21,148-Speed 5586.99 samples/sec Loss 2.0537 LearningRate 0.0013 Epoch: 17 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:22,998-Speed 5536.43 samples/sec Loss 1.9786 LearningRate 0.0013 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:24,835-Speed 5579.08 samples/sec Loss 2.0921 LearningRate 0.0013 Epoch: 17 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:26,673-Speed 5572.25 samples/sec Loss 2.0510 LearningRate 0.0013 Epoch: 17 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:28,515-Speed 5562.92 samples/sec Loss 2.0453 LearningRate 0.0013 Epoch: 17 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:30,349-Speed 5583.98 samples/sec Loss 2.0343 LearningRate 0.0013 Epoch: 17 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:00:32,180-Speed 5593.92 samples/sec Loss 2.0609 LearningRate 0.0013 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:34,017-Speed 5578.25 samples/sec Loss 1.9558 LearningRate 0.0013 Epoch: 17 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:35,861-Speed 5554.27 samples/sec Loss 2.1058 LearningRate 0.0013 Epoch: 17 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:37,705-Speed 5554.91 samples/sec Loss 2.0910 LearningRate 0.0013 Epoch: 17 Global Step: 89680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:39,568-Speed 5501.05 samples/sec Loss 2.0938 LearningRate 0.0013 Epoch: 17 Global Step: 89690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:41,428-Speed 5506.97 samples/sec Loss 2.1437 LearningRate 0.0013 Epoch: 17 Global Step: 89700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:43,272-Speed 5555.24 samples/sec Loss 2.1150 LearningRate 0.0013 Epoch: 17 Global Step: 89710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:45,110-Speed 5571.74 samples/sec Loss 2.0630 LearningRate 0.0013 Epoch: 17 Global Step: 89720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:46,958-Speed 5543.92 samples/sec Loss 2.1514 LearningRate 0.0013 Epoch: 17 Global Step: 89730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:48,807-Speed 5542.12 samples/sec Loss 2.0494 LearningRate 0.0013 Epoch: 17 Global Step: 89740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:50,640-Speed 5585.96 samples/sec Loss 2.0198 LearningRate 0.0013 Epoch: 17 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:00:52,497-Speed 5517.44 samples/sec Loss 2.1382 LearningRate 0.0013 Epoch: 17 Global Step: 89760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:00:54,374-Speed 5459.06 samples/sec Loss 1.9906 LearningRate 0.0013 Epoch: 17 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:56,226-Speed 5530.40 samples/sec Loss 2.1148 LearningRate 0.0013 Epoch: 17 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:58,073-Speed 5545.44 samples/sec Loss 2.0922 LearningRate 0.0013 Epoch: 17 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:00:59,916-Speed 5560.73 samples/sec Loss 2.0498 LearningRate 0.0013 Epoch: 17 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:01,754-Speed 5570.28 samples/sec Loss 2.0693 LearningRate 0.0013 Epoch: 17 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:03,591-Speed 5578.62 samples/sec Loss 2.0507 LearningRate 0.0013 Epoch: 17 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:05,432-Speed 5563.42 samples/sec Loss 2.1143 LearningRate 0.0013 Epoch: 17 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:07,292-Speed 5507.76 samples/sec Loss 2.1187 LearningRate 0.0013 Epoch: 17 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:09,126-Speed 5584.43 samples/sec Loss 2.0367 LearningRate 0.0012 Epoch: 17 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:10,968-Speed 5563.36 samples/sec Loss 2.0756 LearningRate 0.0012 Epoch: 17 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:12,820-Speed 5530.21 samples/sec Loss 2.0803 LearningRate 0.0012 Epoch: 17 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:14,672-Speed 5533.50 samples/sec Loss 2.0416 LearningRate 0.0012 Epoch: 17 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:16,514-Speed 5560.73 samples/sec Loss 2.0376 LearningRate 0.0012 Epoch: 17 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:18,359-Speed 5550.04 samples/sec Loss 2.0818 LearningRate 0.0012 Epoch: 17 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:20,195-Speed 5580.49 samples/sec Loss 2.1450 LearningRate 0.0012 Epoch: 17 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:01:22,024-Speed 5600.82 samples/sec Loss 2.1407 LearningRate 0.0012 Epoch: 17 Global Step: 89920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:23,864-Speed 5566.28 samples/sec Loss 2.0301 LearningRate 0.0012 Epoch: 17 Global Step: 89930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:25,713-Speed 5540.76 samples/sec Loss 2.1012 LearningRate 0.0012 Epoch: 17 Global Step: 89940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:27,569-Speed 5520.63 samples/sec Loss 2.0577 LearningRate 0.0012 Epoch: 17 Global Step: 89950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:29,421-Speed 5530.10 samples/sec Loss 2.1568 LearningRate 0.0012 Epoch: 17 Global Step: 89960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:31,260-Speed 5569.86 samples/sec Loss 2.0078 LearningRate 0.0012 Epoch: 17 Global Step: 89970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:33,118-Speed 5515.48 samples/sec Loss 1.9957 LearningRate 0.0012 Epoch: 17 Global Step: 89980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:34,964-Speed 5549.13 samples/sec Loss 2.1145 LearningRate 0.0012 Epoch: 17 Global Step: 89990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:01:36,804-Speed 5568.72 samples/sec Loss 2.0924 LearningRate 0.0012 Epoch: 17 Global Step: 90000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:02:03,514-[lfw][90000]XNorm: 22.050924 Training: 2022-04-11 16:02:03,515-[lfw][90000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 16:02:03,515-[lfw][90000]Accuracy-Highest: 0.99817 Training: 2022-04-11 16:02:34,328-[cfp_fp][90000]XNorm: 21.039067 Training: 2022-04-11 16:02:34,329-[cfp_fp][90000]Accuracy-Flip: 0.98414+-0.00548 Training: 2022-04-11 16:02:34,329-[cfp_fp][90000]Accuracy-Highest: 0.98414 Training: 2022-04-11 16:03:00,895-[agedb_30][90000]XNorm: 22.194027 Training: 2022-04-11 16:03:00,896-[agedb_30][90000]Accuracy-Flip: 0.98217+-0.00610 Training: 2022-04-11 16:03:00,896-[agedb_30][90000]Accuracy-Highest: 0.98350 Training: 2022-04-11 16:03:02,755-Speed 119.14 samples/sec Loss 2.1028 LearningRate 0.0012 Epoch: 17 Global Step: 90010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:03:04,605-Speed 5536.14 samples/sec Loss 2.1079 LearningRate 0.0012 Epoch: 17 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:06,449-Speed 5554.97 samples/sec Loss 1.9305 LearningRate 0.0012 Epoch: 17 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:08,280-Speed 5593.20 samples/sec Loss 2.0863 LearningRate 0.0012 Epoch: 17 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:10,123-Speed 5559.48 samples/sec Loss 2.0027 LearningRate 0.0012 Epoch: 17 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:11,974-Speed 5534.68 samples/sec Loss 2.0735 LearningRate 0.0012 Epoch: 17 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:13,819-Speed 5551.27 samples/sec Loss 2.0249 LearningRate 0.0012 Epoch: 17 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:15,676-Speed 5515.85 samples/sec Loss 2.0807 LearningRate 0.0012 Epoch: 17 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:17,535-Speed 5512.17 samples/sec Loss 2.2197 LearningRate 0.0012 Epoch: 17 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:19,368-Speed 5587.00 samples/sec Loss 2.0466 LearningRate 0.0012 Epoch: 17 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:21,210-Speed 5563.67 samples/sec Loss 2.1572 LearningRate 0.0012 Epoch: 17 Global Step: 90110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:23,057-Speed 5544.33 samples/sec Loss 2.0435 LearningRate 0.0012 Epoch: 17 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:24,934-Speed 5459.02 samples/sec Loss 2.0965 LearningRate 0.0012 Epoch: 17 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:26,763-Speed 5601.99 samples/sec Loss 2.0115 LearningRate 0.0012 Epoch: 17 Global Step: 90140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:28,602-Speed 5567.51 samples/sec Loss 2.0995 LearningRate 0.0012 Epoch: 17 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:30,438-Speed 5580.75 samples/sec Loss 2.0002 LearningRate 0.0012 Epoch: 17 Global Step: 90160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:32,279-Speed 5564.80 samples/sec Loss 1.9809 LearningRate 0.0012 Epoch: 17 Global Step: 90170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:34,112-Speed 5587.22 samples/sec Loss 2.0350 LearningRate 0.0012 Epoch: 17 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:35,950-Speed 5576.28 samples/sec Loss 2.1370 LearningRate 0.0012 Epoch: 17 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:37,789-Speed 5567.90 samples/sec Loss 1.9961 LearningRate 0.0012 Epoch: 17 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:39,621-Speed 5593.15 samples/sec Loss 2.0411 LearningRate 0.0012 Epoch: 17 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:41,462-Speed 5566.00 samples/sec Loss 2.1257 LearningRate 0.0012 Epoch: 17 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:43,337-Speed 5462.45 samples/sec Loss 2.1026 LearningRate 0.0012 Epoch: 17 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:03:45,172-Speed 5580.34 samples/sec Loss 2.0829 LearningRate 0.0012 Epoch: 17 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:47,026-Speed 5526.47 samples/sec Loss 2.0733 LearningRate 0.0012 Epoch: 17 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:48,861-Speed 5583.96 samples/sec Loss 2.0955 LearningRate 0.0012 Epoch: 17 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:50,720-Speed 5510.03 samples/sec Loss 2.0445 LearningRate 0.0012 Epoch: 17 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:52,616-Speed 5401.69 samples/sec Loss 2.0916 LearningRate 0.0012 Epoch: 17 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:54,506-Speed 5420.57 samples/sec Loss 2.0349 LearningRate 0.0012 Epoch: 17 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:56,349-Speed 5560.28 samples/sec Loss 2.0290 LearningRate 0.0012 Epoch: 17 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:03:58,186-Speed 5576.89 samples/sec Loss 2.1238 LearningRate 0.0012 Epoch: 17 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:00,031-Speed 5550.83 samples/sec Loss 2.0886 LearningRate 0.0011 Epoch: 17 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:01,873-Speed 5562.24 samples/sec Loss 2.0811 LearningRate 0.0011 Epoch: 17 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:03,716-Speed 5560.68 samples/sec Loss 2.0669 LearningRate 0.0011 Epoch: 17 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:05,568-Speed 5529.51 samples/sec Loss 2.0756 LearningRate 0.0011 Epoch: 17 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:07,421-Speed 5528.54 samples/sec Loss 2.0542 LearningRate 0.0011 Epoch: 17 Global Step: 90360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:09,262-Speed 5565.02 samples/sec Loss 2.0797 LearningRate 0.0011 Epoch: 17 Global Step: 90370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:11,106-Speed 5556.24 samples/sec Loss 2.1446 LearningRate 0.0011 Epoch: 17 Global Step: 90380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:12,944-Speed 5571.56 samples/sec Loss 2.0412 LearningRate 0.0011 Epoch: 17 Global Step: 90390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:14,782-Speed 5574.45 samples/sec Loss 2.0215 LearningRate 0.0011 Epoch: 17 Global Step: 90400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:16,637-Speed 5523.04 samples/sec Loss 2.0889 LearningRate 0.0011 Epoch: 17 Global Step: 90410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:18,492-Speed 5520.10 samples/sec Loss 2.1176 LearningRate 0.0011 Epoch: 17 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:20,325-Speed 5588.72 samples/sec Loss 2.0848 LearningRate 0.0011 Epoch: 17 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:22,155-Speed 5599.27 samples/sec Loss 1.9652 LearningRate 0.0011 Epoch: 17 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:23,994-Speed 5569.81 samples/sec Loss 2.0823 LearningRate 0.0011 Epoch: 17 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:25,837-Speed 5559.04 samples/sec Loss 2.0604 LearningRate 0.0011 Epoch: 17 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:27,676-Speed 5570.90 samples/sec Loss 2.1839 LearningRate 0.0011 Epoch: 17 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:29,514-Speed 5572.09 samples/sec Loss 2.0914 LearningRate 0.0011 Epoch: 17 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:31,367-Speed 5530.47 samples/sec Loss 2.1434 LearningRate 0.0011 Epoch: 17 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:33,203-Speed 5577.62 samples/sec Loss 2.1909 LearningRate 0.0011 Epoch: 17 Global Step: 90500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:35,051-Speed 5544.04 samples/sec Loss 2.0429 LearningRate 0.0011 Epoch: 17 Global Step: 90510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:36,891-Speed 5566.60 samples/sec Loss 2.1893 LearningRate 0.0011 Epoch: 17 Global Step: 90520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:38,751-Speed 5506.04 samples/sec Loss 1.9920 LearningRate 0.0011 Epoch: 17 Global Step: 90530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:40,589-Speed 5573.30 samples/sec Loss 2.0865 LearningRate 0.0011 Epoch: 17 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:42,435-Speed 5551.16 samples/sec Loss 2.1367 LearningRate 0.0011 Epoch: 17 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:44,266-Speed 5594.19 samples/sec Loss 2.1208 LearningRate 0.0011 Epoch: 17 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:04:46,096-Speed 5599.46 samples/sec Loss 2.0462 LearningRate 0.0011 Epoch: 17 Global Step: 90570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:47,949-Speed 5528.23 samples/sec Loss 2.0489 LearningRate 0.0011 Epoch: 17 Global Step: 90580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:49,781-Speed 5591.64 samples/sec Loss 2.1251 LearningRate 0.0011 Epoch: 17 Global Step: 90590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:51,618-Speed 5575.78 samples/sec Loss 2.1614 LearningRate 0.0011 Epoch: 17 Global Step: 90600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:53,460-Speed 5563.02 samples/sec Loss 2.0894 LearningRate 0.0011 Epoch: 17 Global Step: 90610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:55,297-Speed 5574.97 samples/sec Loss 2.1995 LearningRate 0.0011 Epoch: 17 Global Step: 90620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:57,140-Speed 5559.42 samples/sec Loss 2.0447 LearningRate 0.0011 Epoch: 17 Global Step: 90630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:04:58,978-Speed 5571.25 samples/sec Loss 2.0681 LearningRate 0.0011 Epoch: 17 Global Step: 90640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:00,813-Speed 5582.97 samples/sec Loss 2.0584 LearningRate 0.0011 Epoch: 17 Global Step: 90650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:02,658-Speed 5552.47 samples/sec Loss 1.9644 LearningRate 0.0011 Epoch: 17 Global Step: 90660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:04,494-Speed 5581.60 samples/sec Loss 2.0504 LearningRate 0.0011 Epoch: 17 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:06,319-Speed 5613.04 samples/sec Loss 2.1271 LearningRate 0.0011 Epoch: 17 Global Step: 90680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:08,170-Speed 5534.99 samples/sec Loss 2.1778 LearningRate 0.0011 Epoch: 17 Global Step: 90690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:10,012-Speed 5558.35 samples/sec Loss 2.0875 LearningRate 0.0011 Epoch: 17 Global Step: 90700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:11,866-Speed 5527.50 samples/sec Loss 2.0215 LearningRate 0.0011 Epoch: 17 Global Step: 90710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:13,736-Speed 5476.86 samples/sec Loss 2.0801 LearningRate 0.0011 Epoch: 17 Global Step: 90720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:15,580-Speed 5554.27 samples/sec Loss 2.0869 LearningRate 0.0011 Epoch: 17 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:17,426-Speed 5552.25 samples/sec Loss 2.0449 LearningRate 0.0011 Epoch: 17 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:19,268-Speed 5560.55 samples/sec Loss 2.0387 LearningRate 0.0011 Epoch: 17 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:21,113-Speed 5551.27 samples/sec Loss 2.0385 LearningRate 0.0011 Epoch: 17 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:22,963-Speed 5538.31 samples/sec Loss 2.0037 LearningRate 0.0011 Epoch: 17 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:24,801-Speed 5572.48 samples/sec Loss 1.9750 LearningRate 0.0011 Epoch: 17 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:26,643-Speed 5562.69 samples/sec Loss 2.1519 LearningRate 0.0011 Epoch: 17 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:28,479-Speed 5578.91 samples/sec Loss 2.1404 LearningRate 0.0010 Epoch: 17 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:30,329-Speed 5537.45 samples/sec Loss 2.0856 LearningRate 0.0010 Epoch: 17 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:32,169-Speed 5569.36 samples/sec Loss 2.0373 LearningRate 0.0010 Epoch: 17 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:34,003-Speed 5582.81 samples/sec Loss 2.0654 LearningRate 0.0010 Epoch: 17 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:35,863-Speed 5509.48 samples/sec Loss 2.0286 LearningRate 0.0010 Epoch: 17 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:37,689-Speed 5610.42 samples/sec Loss 2.1191 LearningRate 0.0010 Epoch: 17 Global Step: 90850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:39,536-Speed 5545.09 samples/sec Loss 2.0857 LearningRate 0.0010 Epoch: 17 Global Step: 90860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:41,378-Speed 5562.73 samples/sec Loss 2.1456 LearningRate 0.0010 Epoch: 17 Global Step: 90870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:43,209-Speed 5594.67 samples/sec Loss 2.0429 LearningRate 0.0010 Epoch: 17 Global Step: 90880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:45,053-Speed 5553.87 samples/sec Loss 2.0569 LearningRate 0.0010 Epoch: 17 Global Step: 90890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:46,900-Speed 5548.92 samples/sec Loss 2.0013 LearningRate 0.0010 Epoch: 17 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:48,741-Speed 5562.68 samples/sec Loss 2.0271 LearningRate 0.0010 Epoch: 17 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:50,580-Speed 5569.51 samples/sec Loss 2.0079 LearningRate 0.0010 Epoch: 17 Global Step: 90920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:52,430-Speed 5539.47 samples/sec Loss 2.0974 LearningRate 0.0010 Epoch: 17 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:54,275-Speed 5551.36 samples/sec Loss 2.1140 LearningRate 0.0010 Epoch: 17 Global Step: 90940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:05:56,114-Speed 5570.26 samples/sec Loss 2.0172 LearningRate 0.0010 Epoch: 17 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:57,958-Speed 5554.08 samples/sec Loss 2.0504 LearningRate 0.0010 Epoch: 17 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:05:59,853-Speed 5406.42 samples/sec Loss 1.9566 LearningRate 0.0010 Epoch: 17 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:01,727-Speed 5467.53 samples/sec Loss 2.0424 LearningRate 0.0010 Epoch: 17 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:03,554-Speed 5608.02 samples/sec Loss 2.0075 LearningRate 0.0010 Epoch: 17 Global Step: 90990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:05,392-Speed 5571.39 samples/sec Loss 2.1756 LearningRate 0.0010 Epoch: 17 Global Step: 91000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:07,232-Speed 5567.68 samples/sec Loss 2.1747 LearningRate 0.0010 Epoch: 17 Global Step: 91010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:09,078-Speed 5550.04 samples/sec Loss 2.1073 LearningRate 0.0010 Epoch: 17 Global Step: 91020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:10,924-Speed 5549.94 samples/sec Loss 2.0791 LearningRate 0.0010 Epoch: 17 Global Step: 91030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:12,821-Speed 5399.76 samples/sec Loss 2.0105 LearningRate 0.0010 Epoch: 17 Global Step: 91040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:24,176-Speed 901.85 samples/sec Loss 1.7908 LearningRate 0.0010 Epoch: 18 Global Step: 91050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:26,048-Speed 5475.35 samples/sec Loss 1.5925 LearningRate 0.0010 Epoch: 18 Global Step: 91060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:28,043-Speed 5134.62 samples/sec Loss 1.6665 LearningRate 0.0010 Epoch: 18 Global Step: 91070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:29,911-Speed 5483.57 samples/sec Loss 1.6445 LearningRate 0.0010 Epoch: 18 Global Step: 91080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:31,768-Speed 5514.21 samples/sec Loss 1.7378 LearningRate 0.0010 Epoch: 18 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:33,603-Speed 5584.00 samples/sec Loss 1.5850 LearningRate 0.0010 Epoch: 18 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:35,461-Speed 5511.72 samples/sec Loss 1.7458 LearningRate 0.0010 Epoch: 18 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:37,318-Speed 5518.02 samples/sec Loss 1.5713 LearningRate 0.0010 Epoch: 18 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:39,169-Speed 5532.71 samples/sec Loss 1.6983 LearningRate 0.0010 Epoch: 18 Global Step: 91130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:41,012-Speed 5557.53 samples/sec Loss 1.6516 LearningRate 0.0010 Epoch: 18 Global Step: 91140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:42,864-Speed 5532.49 samples/sec Loss 1.6835 LearningRate 0.0010 Epoch: 18 Global Step: 91150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:44,704-Speed 5566.62 samples/sec Loss 1.6414 LearningRate 0.0010 Epoch: 18 Global Step: 91160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:46,569-Speed 5491.72 samples/sec Loss 1.6604 LearningRate 0.0010 Epoch: 18 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:48,423-Speed 5525.30 samples/sec Loss 1.7600 LearningRate 0.0010 Epoch: 18 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:50,273-Speed 5540.52 samples/sec Loss 1.6710 LearningRate 0.0010 Epoch: 18 Global Step: 91190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:06:52,125-Speed 5532.21 samples/sec Loss 1.6525 LearningRate 0.0010 Epoch: 18 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:53,971-Speed 5547.06 samples/sec Loss 1.6703 LearningRate 0.0010 Epoch: 18 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:55,808-Speed 5578.06 samples/sec Loss 1.5907 LearningRate 0.0010 Epoch: 18 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:57,650-Speed 5562.07 samples/sec Loss 1.6547 LearningRate 0.0010 Epoch: 18 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:06:59,478-Speed 5601.29 samples/sec Loss 1.7225 LearningRate 0.0010 Epoch: 18 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:01,326-Speed 5544.05 samples/sec Loss 1.6379 LearningRate 0.0010 Epoch: 18 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:03,177-Speed 5535.63 samples/sec Loss 1.6503 LearningRate 0.0010 Epoch: 18 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:05,034-Speed 5514.68 samples/sec Loss 1.7440 LearningRate 0.0010 Epoch: 18 Global Step: 91270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:06,911-Speed 5459.28 samples/sec Loss 1.6765 LearningRate 0.0010 Epoch: 18 Global Step: 91280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:08,756-Speed 5550.13 samples/sec Loss 1.6201 LearningRate 0.0010 Epoch: 18 Global Step: 91290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:10,600-Speed 5556.98 samples/sec Loss 1.6741 LearningRate 0.0010 Epoch: 18 Global Step: 91300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:12,454-Speed 5524.82 samples/sec Loss 1.7272 LearningRate 0.0009 Epoch: 18 Global Step: 91310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:14,309-Speed 5522.49 samples/sec Loss 1.7218 LearningRate 0.0009 Epoch: 18 Global Step: 91320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:16,155-Speed 5547.99 samples/sec Loss 1.7620 LearningRate 0.0009 Epoch: 18 Global Step: 91330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:18,013-Speed 5513.90 samples/sec Loss 1.7044 LearningRate 0.0009 Epoch: 18 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:07:19,843-Speed 5598.32 samples/sec Loss 1.7311 LearningRate 0.0009 Epoch: 18 Global Step: 91350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:21,710-Speed 5485.33 samples/sec Loss 1.7198 LearningRate 0.0009 Epoch: 18 Global Step: 91360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:23,568-Speed 5514.83 samples/sec Loss 1.6058 LearningRate 0.0009 Epoch: 18 Global Step: 91370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:25,411-Speed 5557.45 samples/sec Loss 1.6709 LearningRate 0.0009 Epoch: 18 Global Step: 91380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:27,275-Speed 5496.27 samples/sec Loss 1.6697 LearningRate 0.0009 Epoch: 18 Global Step: 91390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:29,143-Speed 5482.86 samples/sec Loss 1.7078 LearningRate 0.0009 Epoch: 18 Global Step: 91400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:30,989-Speed 5550.40 samples/sec Loss 1.6930 LearningRate 0.0009 Epoch: 18 Global Step: 91410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:32,841-Speed 5530.88 samples/sec Loss 1.7601 LearningRate 0.0009 Epoch: 18 Global Step: 91420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:34,683-Speed 5560.23 samples/sec Loss 1.5586 LearningRate 0.0009 Epoch: 18 Global Step: 91430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:36,523-Speed 5569.05 samples/sec Loss 1.7070 LearningRate 0.0009 Epoch: 18 Global Step: 91440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:38,371-Speed 5543.49 samples/sec Loss 1.6770 LearningRate 0.0009 Epoch: 18 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:07:40,215-Speed 5555.85 samples/sec Loss 1.6820 LearningRate 0.0009 Epoch: 18 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:07:42,051-Speed 5577.67 samples/sec Loss 1.7545 LearningRate 0.0009 Epoch: 18 Global Step: 91470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:43,901-Speed 5537.41 samples/sec Loss 1.6303 LearningRate 0.0009 Epoch: 18 Global Step: 91480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:45,744-Speed 5556.56 samples/sec Loss 1.6447 LearningRate 0.0009 Epoch: 18 Global Step: 91490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:47,605-Speed 5506.30 samples/sec Loss 1.6687 LearningRate 0.0009 Epoch: 18 Global Step: 91500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:49,470-Speed 5493.00 samples/sec Loss 1.5954 LearningRate 0.0009 Epoch: 18 Global Step: 91510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:51,315-Speed 5551.76 samples/sec Loss 1.5900 LearningRate 0.0009 Epoch: 18 Global Step: 91520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:53,154-Speed 5572.62 samples/sec Loss 1.7395 LearningRate 0.0009 Epoch: 18 Global Step: 91530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:54,998-Speed 5552.45 samples/sec Loss 1.7522 LearningRate 0.0009 Epoch: 18 Global Step: 91540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:56,846-Speed 5543.98 samples/sec Loss 1.6881 LearningRate 0.0009 Epoch: 18 Global Step: 91550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:07:58,690-Speed 5563.22 samples/sec Loss 1.6661 LearningRate 0.0009 Epoch: 18 Global Step: 91560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:00,523-Speed 5587.42 samples/sec Loss 1.7333 LearningRate 0.0009 Epoch: 18 Global Step: 91570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:02,367-Speed 5557.03 samples/sec Loss 1.6646 LearningRate 0.0009 Epoch: 18 Global Step: 91580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:04,209-Speed 5561.82 samples/sec Loss 1.6073 LearningRate 0.0009 Epoch: 18 Global Step: 91590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:06,055-Speed 5547.29 samples/sec Loss 1.6327 LearningRate 0.0009 Epoch: 18 Global Step: 91600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:07,896-Speed 5564.81 samples/sec Loss 1.6823 LearningRate 0.0009 Epoch: 18 Global Step: 91610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:09,734-Speed 5573.27 samples/sec Loss 1.6937 LearningRate 0.0009 Epoch: 18 Global Step: 91620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:11,578-Speed 5558.21 samples/sec Loss 1.6562 LearningRate 0.0009 Epoch: 18 Global Step: 91630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:13,429-Speed 5533.66 samples/sec Loss 1.6501 LearningRate 0.0009 Epoch: 18 Global Step: 91640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:15,317-Speed 5423.77 samples/sec Loss 1.7789 LearningRate 0.0009 Epoch: 18 Global Step: 91650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:17,160-Speed 5561.57 samples/sec Loss 1.7022 LearningRate 0.0009 Epoch: 18 Global Step: 91660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:19,004-Speed 5553.02 samples/sec Loss 1.6459 LearningRate 0.0009 Epoch: 18 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:08:20,837-Speed 5589.42 samples/sec Loss 1.6844 LearningRate 0.0009 Epoch: 18 Global Step: 91680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:22,695-Speed 5513.41 samples/sec Loss 1.7707 LearningRate 0.0009 Epoch: 18 Global Step: 91690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:24,534-Speed 5570.92 samples/sec Loss 1.7086 LearningRate 0.0009 Epoch: 18 Global Step: 91700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:26,390-Speed 5518.66 samples/sec Loss 1.7397 LearningRate 0.0009 Epoch: 18 Global Step: 91710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:28,234-Speed 5554.25 samples/sec Loss 1.6245 LearningRate 0.0009 Epoch: 18 Global Step: 91720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:30,081-Speed 5548.36 samples/sec Loss 1.6807 LearningRate 0.0009 Epoch: 18 Global Step: 91730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:31,924-Speed 5559.10 samples/sec Loss 1.7155 LearningRate 0.0009 Epoch: 18 Global Step: 91740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:33,762-Speed 5571.03 samples/sec Loss 1.6340 LearningRate 0.0009 Epoch: 18 Global Step: 91750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:35,600-Speed 5573.54 samples/sec Loss 1.6111 LearningRate 0.0009 Epoch: 18 Global Step: 91760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:37,451-Speed 5535.67 samples/sec Loss 1.7359 LearningRate 0.0009 Epoch: 18 Global Step: 91770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:39,301-Speed 5537.22 samples/sec Loss 1.7212 LearningRate 0.0009 Epoch: 18 Global Step: 91780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:41,157-Speed 5519.07 samples/sec Loss 1.6616 LearningRate 0.0009 Epoch: 18 Global Step: 91790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:43,006-Speed 5540.02 samples/sec Loss 1.6319 LearningRate 0.0009 Epoch: 18 Global Step: 91800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:44,843-Speed 5575.47 samples/sec Loss 1.6774 LearningRate 0.0009 Epoch: 18 Global Step: 91810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:46,685-Speed 5562.54 samples/sec Loss 1.7858 LearningRate 0.0009 Epoch: 18 Global Step: 91820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:48,526-Speed 5562.56 samples/sec Loss 1.7315 LearningRate 0.0009 Epoch: 18 Global Step: 91830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:08:50,384-Speed 5513.17 samples/sec Loss 1.6792 LearningRate 0.0008 Epoch: 18 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:52,231-Speed 5548.15 samples/sec Loss 1.7607 LearningRate 0.0008 Epoch: 18 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:54,090-Speed 5509.01 samples/sec Loss 1.5944 LearningRate 0.0008 Epoch: 18 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:55,952-Speed 5501.84 samples/sec Loss 1.6399 LearningRate 0.0008 Epoch: 18 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:57,800-Speed 5545.63 samples/sec Loss 1.7281 LearningRate 0.0008 Epoch: 18 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:08:59,651-Speed 5531.83 samples/sec Loss 1.6754 LearningRate 0.0008 Epoch: 18 Global Step: 91890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:01,566-Speed 5348.90 samples/sec Loss 1.6271 LearningRate 0.0008 Epoch: 18 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:03,412-Speed 5550.80 samples/sec Loss 1.7073 LearningRate 0.0008 Epoch: 18 Global Step: 91910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:05,256-Speed 5553.65 samples/sec Loss 1.6675 LearningRate 0.0008 Epoch: 18 Global Step: 91920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:07,116-Speed 5507.54 samples/sec Loss 1.6220 LearningRate 0.0008 Epoch: 18 Global Step: 91930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:08,949-Speed 5587.73 samples/sec Loss 1.7275 LearningRate 0.0008 Epoch: 18 Global Step: 91940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:10,809-Speed 5508.96 samples/sec Loss 1.6270 LearningRate 0.0008 Epoch: 18 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:12,684-Speed 5463.21 samples/sec Loss 1.7448 LearningRate 0.0008 Epoch: 18 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:14,532-Speed 5545.15 samples/sec Loss 1.7538 LearningRate 0.0008 Epoch: 18 Global Step: 91970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:16,371-Speed 5569.16 samples/sec Loss 1.6840 LearningRate 0.0008 Epoch: 18 Global Step: 91980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:18,215-Speed 5555.71 samples/sec Loss 1.7528 LearningRate 0.0008 Epoch: 18 Global Step: 91990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:20,056-Speed 5562.56 samples/sec Loss 1.6257 LearningRate 0.0008 Epoch: 18 Global Step: 92000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:09:46,705-[lfw][92000]XNorm: 22.401583 Training: 2022-04-11 16:09:46,706-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 16:09:46,706-[lfw][92000]Accuracy-Highest: 0.99817 Training: 2022-04-11 16:10:17,588-[cfp_fp][92000]XNorm: 21.515875 Training: 2022-04-11 16:10:17,589-[cfp_fp][92000]Accuracy-Flip: 0.98471+-0.00546 Training: 2022-04-11 16:10:17,589-[cfp_fp][92000]Accuracy-Highest: 0.98471 Training: 2022-04-11 16:10:44,264-[agedb_30][92000]XNorm: 22.596551 Training: 2022-04-11 16:10:44,265-[agedb_30][92000]Accuracy-Flip: 0.98267+-0.00775 Training: 2022-04-11 16:10:44,265-[agedb_30][92000]Accuracy-Highest: 0.98350 Training: 2022-04-11 16:10:46,110-Speed 119.00 samples/sec Loss 1.7382 LearningRate 0.0008 Epoch: 18 Global Step: 92010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:47,947-Speed 5576.70 samples/sec Loss 1.7208 LearningRate 0.0008 Epoch: 18 Global Step: 92020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:49,794-Speed 5546.14 samples/sec Loss 1.6546 LearningRate 0.0008 Epoch: 18 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:51,638-Speed 5554.66 samples/sec Loss 1.7049 LearningRate 0.0008 Epoch: 18 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:10:53,469-Speed 5594.66 samples/sec Loss 1.7332 LearningRate 0.0008 Epoch: 18 Global Step: 92050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:55,311-Speed 5561.18 samples/sec Loss 1.7266 LearningRate 0.0008 Epoch: 18 Global Step: 92060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:57,152-Speed 5563.64 samples/sec Loss 1.7365 LearningRate 0.0008 Epoch: 18 Global Step: 92070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:10:58,984-Speed 5591.59 samples/sec Loss 1.6848 LearningRate 0.0008 Epoch: 18 Global Step: 92080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:00,824-Speed 5567.64 samples/sec Loss 1.6494 LearningRate 0.0008 Epoch: 18 Global Step: 92090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:02,666-Speed 5561.89 samples/sec Loss 1.6625 LearningRate 0.0008 Epoch: 18 Global Step: 92100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:04,505-Speed 5570.05 samples/sec Loss 1.7070 LearningRate 0.0008 Epoch: 18 Global Step: 92110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:06,344-Speed 5570.24 samples/sec Loss 1.6246 LearningRate 0.0008 Epoch: 18 Global Step: 92120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:08,179-Speed 5580.78 samples/sec Loss 1.6678 LearningRate 0.0008 Epoch: 18 Global Step: 92130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:10,008-Speed 5601.36 samples/sec Loss 1.7348 LearningRate 0.0008 Epoch: 18 Global Step: 92140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:11,851-Speed 5558.33 samples/sec Loss 1.6382 LearningRate 0.0008 Epoch: 18 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:13,723-Speed 5474.05 samples/sec Loss 1.6664 LearningRate 0.0008 Epoch: 18 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:15,573-Speed 5538.62 samples/sec Loss 1.6875 LearningRate 0.0008 Epoch: 18 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:17,409-Speed 5577.81 samples/sec Loss 1.7372 LearningRate 0.0008 Epoch: 18 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:19,244-Speed 5583.91 samples/sec Loss 1.7369 LearningRate 0.0008 Epoch: 18 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:21,083-Speed 5570.64 samples/sec Loss 1.7234 LearningRate 0.0008 Epoch: 18 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:22,919-Speed 5576.80 samples/sec Loss 1.7259 LearningRate 0.0008 Epoch: 18 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:24,772-Speed 5531.14 samples/sec Loss 1.7372 LearningRate 0.0008 Epoch: 18 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:26,600-Speed 5603.23 samples/sec Loss 1.7254 LearningRate 0.0008 Epoch: 18 Global Step: 92230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:28,463-Speed 5497.85 samples/sec Loss 1.6892 LearningRate 0.0008 Epoch: 18 Global Step: 92240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:30,310-Speed 5545.07 samples/sec Loss 1.6972 LearningRate 0.0008 Epoch: 18 Global Step: 92250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:32,152-Speed 5562.90 samples/sec Loss 1.7258 LearningRate 0.0008 Epoch: 18 Global Step: 92260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:33,987-Speed 5583.55 samples/sec Loss 1.7316 LearningRate 0.0008 Epoch: 18 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:35,829-Speed 5560.14 samples/sec Loss 1.6969 LearningRate 0.0008 Epoch: 18 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:37,670-Speed 5566.36 samples/sec Loss 1.6493 LearningRate 0.0008 Epoch: 18 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:39,510-Speed 5567.98 samples/sec Loss 1.7174 LearningRate 0.0008 Epoch: 18 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:41,343-Speed 5586.00 samples/sec Loss 1.7226 LearningRate 0.0008 Epoch: 18 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:43,187-Speed 5554.44 samples/sec Loss 1.6214 LearningRate 0.0008 Epoch: 18 Global Step: 92320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:45,025-Speed 5574.62 samples/sec Loss 1.7710 LearningRate 0.0008 Epoch: 18 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:46,871-Speed 5548.56 samples/sec Loss 1.7483 LearningRate 0.0008 Epoch: 18 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:48,709-Speed 5574.57 samples/sec Loss 1.6360 LearningRate 0.0008 Epoch: 18 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:11:50,544-Speed 5582.52 samples/sec Loss 1.7182 LearningRate 0.0008 Epoch: 18 Global Step: 92360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:52,383-Speed 5568.25 samples/sec Loss 1.7561 LearningRate 0.0008 Epoch: 18 Global Step: 92370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:54,223-Speed 5570.54 samples/sec Loss 1.7047 LearningRate 0.0008 Epoch: 18 Global Step: 92380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:56,056-Speed 5586.93 samples/sec Loss 1.6260 LearningRate 0.0008 Epoch: 18 Global Step: 92390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:57,903-Speed 5547.83 samples/sec Loss 1.7763 LearningRate 0.0007 Epoch: 18 Global Step: 92400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:11:59,746-Speed 5556.10 samples/sec Loss 1.7240 LearningRate 0.0007 Epoch: 18 Global Step: 92410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:01,607-Speed 5506.93 samples/sec Loss 1.8217 LearningRate 0.0007 Epoch: 18 Global Step: 92420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:03,454-Speed 5544.00 samples/sec Loss 1.7048 LearningRate 0.0007 Epoch: 18 Global Step: 92430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:05,293-Speed 5571.55 samples/sec Loss 1.6341 LearningRate 0.0007 Epoch: 18 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:07,136-Speed 5558.36 samples/sec Loss 1.7892 LearningRate 0.0007 Epoch: 18 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:08,980-Speed 5554.84 samples/sec Loss 1.7282 LearningRate 0.0007 Epoch: 18 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:10,817-Speed 5575.67 samples/sec Loss 1.7449 LearningRate 0.0007 Epoch: 18 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:12,667-Speed 5538.08 samples/sec Loss 1.7168 LearningRate 0.0007 Epoch: 18 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:14,538-Speed 5476.49 samples/sec Loss 1.7445 LearningRate 0.0007 Epoch: 18 Global Step: 92490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:16,384-Speed 5549.76 samples/sec Loss 1.7580 LearningRate 0.0007 Epoch: 18 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:18,225-Speed 5562.65 samples/sec Loss 1.7265 LearningRate 0.0007 Epoch: 18 Global Step: 92510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:20,060-Speed 5583.93 samples/sec Loss 1.7155 LearningRate 0.0007 Epoch: 18 Global Step: 92520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:21,907-Speed 5546.24 samples/sec Loss 1.7239 LearningRate 0.0007 Epoch: 18 Global Step: 92530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:23,736-Speed 5599.67 samples/sec Loss 1.7294 LearningRate 0.0007 Epoch: 18 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:25,578-Speed 5560.45 samples/sec Loss 1.6378 LearningRate 0.0007 Epoch: 18 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:27,435-Speed 5518.17 samples/sec Loss 1.6693 LearningRate 0.0007 Epoch: 18 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:29,274-Speed 5570.91 samples/sec Loss 1.6936 LearningRate 0.0007 Epoch: 18 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:31,127-Speed 5526.52 samples/sec Loss 1.6877 LearningRate 0.0007 Epoch: 18 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:32,967-Speed 5567.21 samples/sec Loss 1.7657 LearningRate 0.0007 Epoch: 18 Global Step: 92590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:34,810-Speed 5559.00 samples/sec Loss 1.8042 LearningRate 0.0007 Epoch: 18 Global Step: 92600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:36,660-Speed 5537.46 samples/sec Loss 1.7771 LearningRate 0.0007 Epoch: 18 Global Step: 92610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:38,509-Speed 5539.89 samples/sec Loss 1.6487 LearningRate 0.0007 Epoch: 18 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:40,345-Speed 5581.64 samples/sec Loss 1.6766 LearningRate 0.0007 Epoch: 18 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:42,215-Speed 5475.89 samples/sec Loss 1.7296 LearningRate 0.0007 Epoch: 18 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:44,054-Speed 5570.94 samples/sec Loss 1.6446 LearningRate 0.0007 Epoch: 18 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:12:45,886-Speed 5592.51 samples/sec Loss 1.6652 LearningRate 0.0007 Epoch: 18 Global Step: 92660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:47,726-Speed 5565.05 samples/sec Loss 1.8083 LearningRate 0.0007 Epoch: 18 Global Step: 92670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:49,562-Speed 5582.20 samples/sec Loss 1.7546 LearningRate 0.0007 Epoch: 18 Global Step: 92680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:51,401-Speed 5567.63 samples/sec Loss 1.7160 LearningRate 0.0007 Epoch: 18 Global Step: 92690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:53,243-Speed 5562.63 samples/sec Loss 1.6299 LearningRate 0.0007 Epoch: 18 Global Step: 92700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:55,093-Speed 5536.16 samples/sec Loss 1.7143 LearningRate 0.0007 Epoch: 18 Global Step: 92710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:56,929-Speed 5581.19 samples/sec Loss 1.6892 LearningRate 0.0007 Epoch: 18 Global Step: 92720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:12:58,780-Speed 5533.83 samples/sec Loss 1.8093 LearningRate 0.0007 Epoch: 18 Global Step: 92730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:00,641-Speed 5505.39 samples/sec Loss 1.7441 LearningRate 0.0007 Epoch: 18 Global Step: 92740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:02,503-Speed 5499.87 samples/sec Loss 1.6457 LearningRate 0.0007 Epoch: 18 Global Step: 92750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:04,372-Speed 5483.00 samples/sec Loss 1.6851 LearningRate 0.0007 Epoch: 18 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:13:06,215-Speed 5558.39 samples/sec Loss 1.7104 LearningRate 0.0007 Epoch: 18 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:13:08,058-Speed 5556.88 samples/sec Loss 1.7813 LearningRate 0.0007 Epoch: 18 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:13:09,885-Speed 5608.35 samples/sec Loss 1.8169 LearningRate 0.0007 Epoch: 18 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:11,734-Speed 5539.48 samples/sec Loss 1.6818 LearningRate 0.0007 Epoch: 18 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:13,629-Speed 5403.21 samples/sec Loss 1.7631 LearningRate 0.0007 Epoch: 18 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:15,482-Speed 5528.84 samples/sec Loss 1.7516 LearningRate 0.0007 Epoch: 18 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:17,337-Speed 5522.50 samples/sec Loss 1.7357 LearningRate 0.0007 Epoch: 18 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:19,184-Speed 5548.70 samples/sec Loss 1.8795 LearningRate 0.0007 Epoch: 18 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:21,023-Speed 5570.73 samples/sec Loss 1.7866 LearningRate 0.0007 Epoch: 18 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:22,860-Speed 5578.57 samples/sec Loss 1.6383 LearningRate 0.0007 Epoch: 18 Global Step: 92860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:24,709-Speed 5542.11 samples/sec Loss 1.7514 LearningRate 0.0007 Epoch: 18 Global Step: 92870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:26,573-Speed 5493.49 samples/sec Loss 1.7378 LearningRate 0.0007 Epoch: 18 Global Step: 92880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:28,433-Speed 5507.35 samples/sec Loss 1.6441 LearningRate 0.0007 Epoch: 18 Global Step: 92890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:30,274-Speed 5565.21 samples/sec Loss 1.7721 LearningRate 0.0007 Epoch: 18 Global Step: 92900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:32,113-Speed 5571.41 samples/sec Loss 1.7813 LearningRate 0.0007 Epoch: 18 Global Step: 92910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:33,964-Speed 5531.76 samples/sec Loss 1.6250 LearningRate 0.0007 Epoch: 18 Global Step: 92920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:35,803-Speed 5571.50 samples/sec Loss 1.6958 LearningRate 0.0007 Epoch: 18 Global Step: 92930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:37,653-Speed 5536.09 samples/sec Loss 1.7531 LearningRate 0.0007 Epoch: 18 Global Step: 92940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:39,507-Speed 5526.60 samples/sec Loss 1.7055 LearningRate 0.0007 Epoch: 18 Global Step: 92950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:13:41,358-Speed 5533.85 samples/sec Loss 1.6304 LearningRate 0.0007 Epoch: 18 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:43,209-Speed 5536.47 samples/sec Loss 1.6870 LearningRate 0.0007 Epoch: 18 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:45,053-Speed 5555.49 samples/sec Loss 1.7398 LearningRate 0.0007 Epoch: 18 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:46,888-Speed 5579.95 samples/sec Loss 1.6988 LearningRate 0.0007 Epoch: 18 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:48,746-Speed 5514.91 samples/sec Loss 1.6509 LearningRate 0.0007 Epoch: 18 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:50,602-Speed 5518.53 samples/sec Loss 1.7744 LearningRate 0.0006 Epoch: 18 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:52,460-Speed 5513.50 samples/sec Loss 1.6538 LearningRate 0.0006 Epoch: 18 Global Step: 93020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:54,313-Speed 5525.99 samples/sec Loss 1.7187 LearningRate 0.0006 Epoch: 18 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:56,166-Speed 5530.24 samples/sec Loss 1.8288 LearningRate 0.0006 Epoch: 18 Global Step: 93040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:58,010-Speed 5554.03 samples/sec Loss 1.7205 LearningRate 0.0006 Epoch: 18 Global Step: 93050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:13:59,940-Speed 5308.14 samples/sec Loss 1.7449 LearningRate 0.0006 Epoch: 18 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:14:01,782-Speed 5562.62 samples/sec Loss 1.7232 LearningRate 0.0006 Epoch: 18 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:14:03,627-Speed 5552.53 samples/sec Loss 1.8108 LearningRate 0.0006 Epoch: 18 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:14:05,467-Speed 5567.20 samples/sec Loss 1.7099 LearningRate 0.0006 Epoch: 18 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 16:14:07,306-Speed 5571.25 samples/sec Loss 1.6513 LearningRate 0.0006 Epoch: 18 Global Step: 93100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:09,146-Speed 5565.09 samples/sec Loss 1.6666 LearningRate 0.0006 Epoch: 18 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:10,982-Speed 5580.14 samples/sec Loss 1.6691 LearningRate 0.0006 Epoch: 18 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:12,862-Speed 5447.92 samples/sec Loss 1.6931 LearningRate 0.0006 Epoch: 18 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:14,727-Speed 5495.68 samples/sec Loss 1.7056 LearningRate 0.0006 Epoch: 18 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:16,581-Speed 5525.17 samples/sec Loss 1.7821 LearningRate 0.0006 Epoch: 18 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:18,460-Speed 5451.11 samples/sec Loss 1.6632 LearningRate 0.0006 Epoch: 18 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:20,297-Speed 5575.89 samples/sec Loss 1.7245 LearningRate 0.0006 Epoch: 18 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:22,126-Speed 5601.65 samples/sec Loss 1.6881 LearningRate 0.0006 Epoch: 18 Global Step: 93180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:23,967-Speed 5563.18 samples/sec Loss 1.7013 LearningRate 0.0006 Epoch: 18 Global Step: 93190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:25,808-Speed 5564.70 samples/sec Loss 1.6584 LearningRate 0.0006 Epoch: 18 Global Step: 93200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:27,655-Speed 5547.17 samples/sec Loss 1.7128 LearningRate 0.0006 Epoch: 18 Global Step: 93210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:29,502-Speed 5545.93 samples/sec Loss 1.8186 LearningRate 0.0006 Epoch: 18 Global Step: 93220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:31,339-Speed 5575.46 samples/sec Loss 1.6999 LearningRate 0.0006 Epoch: 18 Global Step: 93230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:33,176-Speed 5577.62 samples/sec Loss 1.6641 LearningRate 0.0006 Epoch: 18 Global Step: 93240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:35,038-Speed 5501.53 samples/sec Loss 1.6468 LearningRate 0.0006 Epoch: 18 Global Step: 93250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:36,892-Speed 5525.13 samples/sec Loss 1.7021 LearningRate 0.0006 Epoch: 18 Global Step: 93260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:38,757-Speed 5492.28 samples/sec Loss 1.7779 LearningRate 0.0006 Epoch: 18 Global Step: 93270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 16:14:40,610-Speed 5529.30 samples/sec Loss 1.8076 LearningRate 0.0006 Epoch: 18 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:42,484-Speed 5465.15 samples/sec Loss 1.7489 LearningRate 0.0006 Epoch: 18 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 16:14:44,327-Speed 5557.42 samples/sec Loss 1.6522 LearningRate 0.0006 Epoch: 18 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:46,167-Speed 5570.55 samples/sec Loss 1.7361 LearningRate 0.0006 Epoch: 18 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:48,011-Speed 5552.35 samples/sec Loss 1.7230 LearningRate 0.0006 Epoch: 18 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:49,853-Speed 5562.18 samples/sec Loss 1.7427 LearningRate 0.0006 Epoch: 18 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:51,697-Speed 5555.98 samples/sec Loss 1.7416 LearningRate 0.0006 Epoch: 18 Global Step: 93340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:53,538-Speed 5562.39 samples/sec Loss 1.7122 LearningRate 0.0006 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:55,378-Speed 5568.39 samples/sec Loss 1.7689 LearningRate 0.0006 Epoch: 18 Global Step: 93360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:57,223-Speed 5551.85 samples/sec Loss 1.7658 LearningRate 0.0006 Epoch: 18 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:14:59,054-Speed 5596.31 samples/sec Loss 1.7162 LearningRate 0.0006 Epoch: 18 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:00,894-Speed 5569.02 samples/sec Loss 1.6111 LearningRate 0.0006 Epoch: 18 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:02,750-Speed 5517.22 samples/sec Loss 1.7072 LearningRate 0.0006 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:04,600-Speed 5540.02 samples/sec Loss 1.7096 LearningRate 0.0006 Epoch: 18 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:06,459-Speed 5509.23 samples/sec Loss 1.8028 LearningRate 0.0006 Epoch: 18 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:08,307-Speed 5540.74 samples/sec Loss 1.7496 LearningRate 0.0006 Epoch: 18 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:10,151-Speed 5557.16 samples/sec Loss 1.7125 LearningRate 0.0006 Epoch: 18 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:12,005-Speed 5524.99 samples/sec Loss 1.6890 LearningRate 0.0006 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:13,874-Speed 5480.33 samples/sec Loss 1.7420 LearningRate 0.0006 Epoch: 18 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:15,725-Speed 5535.34 samples/sec Loss 1.6191 LearningRate 0.0006 Epoch: 18 Global Step: 93470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:17,572-Speed 5545.59 samples/sec Loss 1.6934 LearningRate 0.0006 Epoch: 18 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:15:19,411-Speed 5569.26 samples/sec Loss 1.7544 LearningRate 0.0006 Epoch: 18 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:15:21,241-Speed 5599.15 samples/sec Loss 1.6464 LearningRate 0.0006 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:23,083-Speed 5560.97 samples/sec Loss 1.7080 LearningRate 0.0006 Epoch: 18 Global Step: 93510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:24,940-Speed 5518.06 samples/sec Loss 1.8150 LearningRate 0.0006 Epoch: 18 Global Step: 93520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:26,788-Speed 5542.00 samples/sec Loss 1.7431 LearningRate 0.0006 Epoch: 18 Global Step: 93530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:28,683-Speed 5407.06 samples/sec Loss 1.7794 LearningRate 0.0006 Epoch: 18 Global Step: 93540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:30,517-Speed 5586.12 samples/sec Loss 1.6913 LearningRate 0.0006 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:32,366-Speed 5538.12 samples/sec Loss 1.7743 LearningRate 0.0006 Epoch: 18 Global Step: 93560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:34,210-Speed 5555.93 samples/sec Loss 1.7678 LearningRate 0.0006 Epoch: 18 Global Step: 93570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:36,047-Speed 5576.84 samples/sec Loss 1.7573 LearningRate 0.0006 Epoch: 18 Global Step: 93580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:37,890-Speed 5557.38 samples/sec Loss 1.6117 LearningRate 0.0006 Epoch: 18 Global Step: 93590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:39,727-Speed 5578.21 samples/sec Loss 1.7719 LearningRate 0.0006 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:41,566-Speed 5568.94 samples/sec Loss 1.7256 LearningRate 0.0006 Epoch: 18 Global Step: 93610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:43,408-Speed 5562.75 samples/sec Loss 1.6831 LearningRate 0.0006 Epoch: 18 Global Step: 93620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:45,243-Speed 5581.73 samples/sec Loss 1.7555 LearningRate 0.0006 Epoch: 18 Global Step: 93630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:47,090-Speed 5546.94 samples/sec Loss 1.7197 LearningRate 0.0006 Epoch: 18 Global Step: 93640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:15:48,938-Speed 5542.05 samples/sec Loss 1.7401 LearningRate 0.0006 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:50,787-Speed 5541.83 samples/sec Loss 1.7511 LearningRate 0.0005 Epoch: 18 Global Step: 93660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:52,630-Speed 5558.29 samples/sec Loss 1.7014 LearningRate 0.0005 Epoch: 18 Global Step: 93670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:54,519-Speed 5421.57 samples/sec Loss 1.6876 LearningRate 0.0005 Epoch: 18 Global Step: 93680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:56,372-Speed 5528.56 samples/sec Loss 1.7101 LearningRate 0.0005 Epoch: 18 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:15:58,215-Speed 5557.89 samples/sec Loss 1.6504 LearningRate 0.0005 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:00,053-Speed 5572.89 samples/sec Loss 1.8078 LearningRate 0.0005 Epoch: 18 Global Step: 93710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:01,893-Speed 5568.20 samples/sec Loss 1.8174 LearningRate 0.0005 Epoch: 18 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:03,742-Speed 5541.21 samples/sec Loss 1.7780 LearningRate 0.0005 Epoch: 18 Global Step: 93730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:05,586-Speed 5556.42 samples/sec Loss 1.7615 LearningRate 0.0005 Epoch: 18 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:07,422-Speed 5577.91 samples/sec Loss 1.7325 LearningRate 0.0005 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:09,262-Speed 5567.01 samples/sec Loss 1.6885 LearningRate 0.0005 Epoch: 18 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:11,107-Speed 5553.25 samples/sec Loss 1.8094 LearningRate 0.0005 Epoch: 18 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:12,955-Speed 5540.94 samples/sec Loss 1.7195 LearningRate 0.0005 Epoch: 18 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:14,797-Speed 5562.64 samples/sec Loss 1.6832 LearningRate 0.0005 Epoch: 18 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:16,653-Speed 5520.19 samples/sec Loss 1.6898 LearningRate 0.0005 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:18,503-Speed 5535.09 samples/sec Loss 1.7484 LearningRate 0.0005 Epoch: 18 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:20,344-Speed 5564.50 samples/sec Loss 1.8185 LearningRate 0.0005 Epoch: 18 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:22,188-Speed 5554.82 samples/sec Loss 1.7601 LearningRate 0.0005 Epoch: 18 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:24,071-Speed 5440.86 samples/sec Loss 1.7417 LearningRate 0.0005 Epoch: 18 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:25,918-Speed 5547.89 samples/sec Loss 1.7140 LearningRate 0.0005 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:27,843-Speed 5320.17 samples/sec Loss 1.6503 LearningRate 0.0005 Epoch: 18 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:29,746-Speed 5384.12 samples/sec Loss 1.7630 LearningRate 0.0005 Epoch: 18 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:31,588-Speed 5562.42 samples/sec Loss 1.7125 LearningRate 0.0005 Epoch: 18 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:33,437-Speed 5538.87 samples/sec Loss 1.7453 LearningRate 0.0005 Epoch: 18 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:35,279-Speed 5560.44 samples/sec Loss 1.6688 LearningRate 0.0005 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:37,126-Speed 5547.29 samples/sec Loss 1.7454 LearningRate 0.0005 Epoch: 18 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:16:38,979-Speed 5526.74 samples/sec Loss 1.7905 LearningRate 0.0005 Epoch: 18 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:40,834-Speed 5524.21 samples/sec Loss 1.7214 LearningRate 0.0005 Epoch: 18 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:42,701-Speed 5484.97 samples/sec Loss 1.6771 LearningRate 0.0005 Epoch: 18 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:44,542-Speed 5564.81 samples/sec Loss 1.6248 LearningRate 0.0005 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:46,382-Speed 5567.75 samples/sec Loss 1.8158 LearningRate 0.0005 Epoch: 18 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:48,236-Speed 5525.65 samples/sec Loss 1.7124 LearningRate 0.0005 Epoch: 18 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:50,082-Speed 5551.38 samples/sec Loss 1.8401 LearningRate 0.0005 Epoch: 18 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:51,930-Speed 5541.26 samples/sec Loss 1.7337 LearningRate 0.0005 Epoch: 18 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:16:53,778-Speed 5542.62 samples/sec Loss 1.7487 LearningRate 0.0005 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:17:20,185-[lfw][94000]XNorm: 22.356079 Training: 2022-04-11 16:17:20,186-[lfw][94000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 16:17:20,186-[lfw][94000]Accuracy-Highest: 0.99817 Training: 2022-04-11 16:17:50,830-[cfp_fp][94000]XNorm: 21.361798 Training: 2022-04-11 16:17:50,830-[cfp_fp][94000]Accuracy-Flip: 0.98500+-0.00488 Training: 2022-04-11 16:17:50,831-[cfp_fp][94000]Accuracy-Highest: 0.98500 Training: 2022-04-11 16:18:17,527-[agedb_30][94000]XNorm: 22.452351 Training: 2022-04-11 16:18:17,528-[agedb_30][94000]Accuracy-Flip: 0.98150+-0.00621 Training: 2022-04-11 16:18:17,528-[agedb_30][94000]Accuracy-Highest: 0.98350 Training: 2022-04-11 16:18:19,378-Speed 119.63 samples/sec Loss 1.6988 LearningRate 0.0005 Epoch: 18 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:21,194-Speed 5639.84 samples/sec Loss 1.7415 LearningRate 0.0005 Epoch: 18 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:23,027-Speed 5587.25 samples/sec Loss 1.7727 LearningRate 0.0005 Epoch: 18 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:24,889-Speed 5500.98 samples/sec Loss 1.7324 LearningRate 0.0005 Epoch: 18 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:26,745-Speed 5520.01 samples/sec Loss 1.7772 LearningRate 0.0005 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:28,574-Speed 5601.70 samples/sec Loss 1.7370 LearningRate 0.0005 Epoch: 18 Global Step: 94060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:30,413-Speed 5570.31 samples/sec Loss 1.6955 LearningRate 0.0005 Epoch: 18 Global Step: 94070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:32,250-Speed 5573.21 samples/sec Loss 1.7015 LearningRate 0.0005 Epoch: 18 Global Step: 94080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:34,079-Speed 5601.81 samples/sec Loss 1.7874 LearningRate 0.0005 Epoch: 18 Global Step: 94090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:35,907-Speed 5603.35 samples/sec Loss 1.7043 LearningRate 0.0005 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:37,738-Speed 5595.54 samples/sec Loss 1.8611 LearningRate 0.0005 Epoch: 18 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:39,584-Speed 5548.71 samples/sec Loss 1.7610 LearningRate 0.0005 Epoch: 18 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:18:41,420-Speed 5580.04 samples/sec Loss 1.7794 LearningRate 0.0005 Epoch: 18 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:43,260-Speed 5567.54 samples/sec Loss 1.6832 LearningRate 0.0005 Epoch: 18 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:45,097-Speed 5575.45 samples/sec Loss 1.7333 LearningRate 0.0005 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:46,934-Speed 5577.44 samples/sec Loss 1.6918 LearningRate 0.0005 Epoch: 18 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:48,768-Speed 5587.34 samples/sec Loss 1.7229 LearningRate 0.0005 Epoch: 18 Global Step: 94170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:50,600-Speed 5590.76 samples/sec Loss 1.7508 LearningRate 0.0005 Epoch: 18 Global Step: 94180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:52,436-Speed 5579.72 samples/sec Loss 1.6791 LearningRate 0.0005 Epoch: 18 Global Step: 94190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:54,289-Speed 5528.69 samples/sec Loss 1.7028 LearningRate 0.0005 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:56,134-Speed 5551.45 samples/sec Loss 1.7343 LearningRate 0.0005 Epoch: 18 Global Step: 94210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:57,970-Speed 5579.89 samples/sec Loss 1.7830 LearningRate 0.0005 Epoch: 18 Global Step: 94220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:18:59,825-Speed 5522.46 samples/sec Loss 1.7176 LearningRate 0.0005 Epoch: 18 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:19:01,680-Speed 5522.83 samples/sec Loss 1.6571 LearningRate 0.0005 Epoch: 18 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:03,533-Speed 5527.55 samples/sec Loss 1.6836 LearningRate 0.0005 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:05,381-Speed 5543.46 samples/sec Loss 1.7327 LearningRate 0.0005 Epoch: 18 Global Step: 94260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:07,221-Speed 5566.97 samples/sec Loss 1.8212 LearningRate 0.0005 Epoch: 18 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:09,064-Speed 5559.81 samples/sec Loss 1.7194 LearningRate 0.0005 Epoch: 18 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:10,918-Speed 5525.57 samples/sec Loss 1.8078 LearningRate 0.0005 Epoch: 18 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:12,773-Speed 5520.69 samples/sec Loss 1.6501 LearningRate 0.0005 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:14,650-Speed 5459.01 samples/sec Loss 1.6408 LearningRate 0.0005 Epoch: 18 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:16,558-Speed 5369.08 samples/sec Loss 1.6951 LearningRate 0.0005 Epoch: 18 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:18,410-Speed 5531.39 samples/sec Loss 1.6827 LearningRate 0.0005 Epoch: 18 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:20,256-Speed 5546.48 samples/sec Loss 1.6938 LearningRate 0.0005 Epoch: 18 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:19:22,103-Speed 5547.99 samples/sec Loss 1.7884 LearningRate 0.0005 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:19:23,933-Speed 5598.75 samples/sec Loss 1.7948 LearningRate 0.0005 Epoch: 18 Global Step: 94360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:25,784-Speed 5532.99 samples/sec Loss 1.7238 LearningRate 0.0005 Epoch: 18 Global Step: 94370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:27,667-Speed 5441.66 samples/sec Loss 1.7015 LearningRate 0.0004 Epoch: 18 Global Step: 94380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:29,512-Speed 5551.61 samples/sec Loss 1.7921 LearningRate 0.0004 Epoch: 18 Global Step: 94390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:31,353-Speed 5564.29 samples/sec Loss 1.6675 LearningRate 0.0004 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:33,199-Speed 5549.21 samples/sec Loss 1.6353 LearningRate 0.0004 Epoch: 18 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:35,041-Speed 5561.66 samples/sec Loss 1.7226 LearningRate 0.0004 Epoch: 18 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:36,888-Speed 5543.87 samples/sec Loss 1.7278 LearningRate 0.0004 Epoch: 18 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:38,751-Speed 5501.92 samples/sec Loss 1.7202 LearningRate 0.0004 Epoch: 18 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:40,598-Speed 5544.67 samples/sec Loss 1.6014 LearningRate 0.0004 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:42,441-Speed 5557.48 samples/sec Loss 1.7249 LearningRate 0.0004 Epoch: 18 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:19:44,281-Speed 5568.81 samples/sec Loss 1.6782 LearningRate 0.0004 Epoch: 18 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:19:46,121-Speed 5568.57 samples/sec Loss 1.6951 LearningRate 0.0004 Epoch: 18 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:47,965-Speed 5554.99 samples/sec Loss 1.7873 LearningRate 0.0004 Epoch: 18 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:49,831-Speed 5488.73 samples/sec Loss 1.6216 LearningRate 0.0004 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:51,688-Speed 5516.91 samples/sec Loss 1.6980 LearningRate 0.0004 Epoch: 18 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:53,535-Speed 5545.97 samples/sec Loss 1.7935 LearningRate 0.0004 Epoch: 18 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:55,375-Speed 5565.88 samples/sec Loss 1.7529 LearningRate 0.0004 Epoch: 18 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:57,220-Speed 5554.09 samples/sec Loss 1.6298 LearningRate 0.0004 Epoch: 18 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:19:59,056-Speed 5580.11 samples/sec Loss 1.7852 LearningRate 0.0004 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:00,905-Speed 5539.72 samples/sec Loss 1.7238 LearningRate 0.0004 Epoch: 18 Global Step: 94560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:02,741-Speed 5578.70 samples/sec Loss 1.6833 LearningRate 0.0004 Epoch: 18 Global Step: 94570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:04,576-Speed 5583.54 samples/sec Loss 1.7033 LearningRate 0.0004 Epoch: 18 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:20:06,404-Speed 5604.79 samples/sec Loss 1.6958 LearningRate 0.0004 Epoch: 18 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:08,239-Speed 5581.46 samples/sec Loss 1.7923 LearningRate 0.0004 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:10,075-Speed 5580.76 samples/sec Loss 1.7388 LearningRate 0.0004 Epoch: 18 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:11,933-Speed 5511.48 samples/sec Loss 1.7252 LearningRate 0.0004 Epoch: 18 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:13,779-Speed 5551.15 samples/sec Loss 1.6816 LearningRate 0.0004 Epoch: 18 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:15,613-Speed 5583.72 samples/sec Loss 1.7293 LearningRate 0.0004 Epoch: 18 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:17,461-Speed 5544.90 samples/sec Loss 1.7094 LearningRate 0.0004 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:19,301-Speed 5566.84 samples/sec Loss 1.6402 LearningRate 0.0004 Epoch: 18 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:21,153-Speed 5531.93 samples/sec Loss 1.7001 LearningRate 0.0004 Epoch: 18 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:22,993-Speed 5569.10 samples/sec Loss 1.7385 LearningRate 0.0004 Epoch: 18 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:24,836-Speed 5556.52 samples/sec Loss 1.7435 LearningRate 0.0004 Epoch: 18 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:20:26,661-Speed 5613.70 samples/sec Loss 1.6379 LearningRate 0.0004 Epoch: 18 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:28,492-Speed 5595.90 samples/sec Loss 1.6982 LearningRate 0.0004 Epoch: 18 Global Step: 94710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:30,349-Speed 5515.02 samples/sec Loss 1.7244 LearningRate 0.0004 Epoch: 18 Global Step: 94720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:32,188-Speed 5568.84 samples/sec Loss 1.7365 LearningRate 0.0004 Epoch: 18 Global Step: 94730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:34,031-Speed 5560.92 samples/sec Loss 1.7122 LearningRate 0.0004 Epoch: 18 Global Step: 94740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:35,877-Speed 5548.00 samples/sec Loss 1.6970 LearningRate 0.0004 Epoch: 18 Global Step: 94750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:37,726-Speed 5540.03 samples/sec Loss 1.6495 LearningRate 0.0004 Epoch: 18 Global Step: 94760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:39,563-Speed 5576.03 samples/sec Loss 1.6980 LearningRate 0.0004 Epoch: 18 Global Step: 94770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:41,420-Speed 5517.09 samples/sec Loss 1.7530 LearningRate 0.0004 Epoch: 18 Global Step: 94780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:43,260-Speed 5566.56 samples/sec Loss 1.6593 LearningRate 0.0004 Epoch: 18 Global Step: 94790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:45,106-Speed 5549.06 samples/sec Loss 1.6704 LearningRate 0.0004 Epoch: 18 Global Step: 94800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:20:46,941-Speed 5582.02 samples/sec Loss 1.7047 LearningRate 0.0004 Epoch: 18 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:48,774-Speed 5591.12 samples/sec Loss 1.8218 LearningRate 0.0004 Epoch: 18 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:50,610-Speed 5577.39 samples/sec Loss 1.7318 LearningRate 0.0004 Epoch: 18 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:52,451-Speed 5563.95 samples/sec Loss 1.6882 LearningRate 0.0004 Epoch: 18 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:54,298-Speed 5546.97 samples/sec Loss 1.6572 LearningRate 0.0004 Epoch: 18 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:56,132-Speed 5586.27 samples/sec Loss 1.6071 LearningRate 0.0004 Epoch: 18 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:57,969-Speed 5574.79 samples/sec Loss 1.8373 LearningRate 0.0004 Epoch: 18 Global Step: 94870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:20:59,805-Speed 5581.19 samples/sec Loss 1.6901 LearningRate 0.0004 Epoch: 18 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:01,654-Speed 5539.52 samples/sec Loss 1.6510 LearningRate 0.0004 Epoch: 18 Global Step: 94890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:03,511-Speed 5517.76 samples/sec Loss 1.7170 LearningRate 0.0004 Epoch: 18 Global Step: 94900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:05,358-Speed 5546.29 samples/sec Loss 1.7176 LearningRate 0.0004 Epoch: 18 Global Step: 94910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:07,200-Speed 5563.54 samples/sec Loss 1.7559 LearningRate 0.0004 Epoch: 18 Global Step: 94920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:09,036-Speed 5578.76 samples/sec Loss 1.6673 LearningRate 0.0004 Epoch: 18 Global Step: 94930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:10,874-Speed 5573.44 samples/sec Loss 1.6362 LearningRate 0.0004 Epoch: 18 Global Step: 94940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:12,722-Speed 5542.28 samples/sec Loss 1.6828 LearningRate 0.0004 Epoch: 18 Global Step: 94950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:14,613-Speed 5418.48 samples/sec Loss 1.7246 LearningRate 0.0004 Epoch: 18 Global Step: 94960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:16,453-Speed 5566.91 samples/sec Loss 1.6546 LearningRate 0.0004 Epoch: 18 Global Step: 94970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:18,322-Speed 5480.11 samples/sec Loss 1.6897 LearningRate 0.0004 Epoch: 18 Global Step: 94980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:21:20,182-Speed 5508.42 samples/sec Loss 1.7376 LearningRate 0.0004 Epoch: 18 Global Step: 94990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:22,021-Speed 5568.21 samples/sec Loss 1.6309 LearningRate 0.0004 Epoch: 18 Global Step: 95000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:23,856-Speed 5584.25 samples/sec Loss 1.7191 LearningRate 0.0004 Epoch: 18 Global Step: 95010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:25,735-Speed 5452.24 samples/sec Loss 1.6312 LearningRate 0.0004 Epoch: 18 Global Step: 95020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:27,576-Speed 5563.67 samples/sec Loss 1.7187 LearningRate 0.0004 Epoch: 18 Global Step: 95030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:29,410-Speed 5586.06 samples/sec Loss 1.7195 LearningRate 0.0004 Epoch: 18 Global Step: 95040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:31,244-Speed 5583.91 samples/sec Loss 1.6956 LearningRate 0.0004 Epoch: 18 Global Step: 95050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:33,081-Speed 5578.10 samples/sec Loss 1.7373 LearningRate 0.0004 Epoch: 18 Global Step: 95060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:34,918-Speed 5575.11 samples/sec Loss 1.6628 LearningRate 0.0004 Epoch: 18 Global Step: 95070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:36,773-Speed 5523.03 samples/sec Loss 1.8141 LearningRate 0.0004 Epoch: 18 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:38,610-Speed 5578.10 samples/sec Loss 1.7567 LearningRate 0.0004 Epoch: 18 Global Step: 95090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:40,452-Speed 5559.23 samples/sec Loss 1.6539 LearningRate 0.0004 Epoch: 18 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:42,304-Speed 5530.44 samples/sec Loss 1.8315 LearningRate 0.0004 Epoch: 18 Global Step: 95110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:44,150-Speed 5550.90 samples/sec Loss 1.7952 LearningRate 0.0004 Epoch: 18 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:45,982-Speed 5591.59 samples/sec Loss 1.7584 LearningRate 0.0004 Epoch: 18 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:47,831-Speed 5542.54 samples/sec Loss 1.7148 LearningRate 0.0004 Epoch: 18 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:49,704-Speed 5466.41 samples/sec Loss 1.6926 LearningRate 0.0004 Epoch: 18 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:51,567-Speed 5499.72 samples/sec Loss 1.8588 LearningRate 0.0004 Epoch: 18 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:53,406-Speed 5570.28 samples/sec Loss 1.8265 LearningRate 0.0004 Epoch: 18 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:55,243-Speed 5577.85 samples/sec Loss 1.7059 LearningRate 0.0003 Epoch: 18 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:21:57,077-Speed 5583.13 samples/sec Loss 1.7451 LearningRate 0.0003 Epoch: 18 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:21:58,920-Speed 5559.09 samples/sec Loss 1.7815 LearningRate 0.0003 Epoch: 18 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:22:00,751-Speed 5594.96 samples/sec Loss 1.8254 LearningRate 0.0003 Epoch: 18 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:02,608-Speed 5514.65 samples/sec Loss 1.6443 LearningRate 0.0003 Epoch: 18 Global Step: 95220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:04,453-Speed 5552.99 samples/sec Loss 1.7911 LearningRate 0.0003 Epoch: 18 Global Step: 95230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:06,295-Speed 5563.82 samples/sec Loss 1.7031 LearningRate 0.0003 Epoch: 18 Global Step: 95240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:08,132-Speed 5576.34 samples/sec Loss 1.6915 LearningRate 0.0003 Epoch: 18 Global Step: 95250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:09,967-Speed 5582.50 samples/sec Loss 1.6708 LearningRate 0.0003 Epoch: 18 Global Step: 95260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:11,806-Speed 5568.69 samples/sec Loss 1.7522 LearningRate 0.0003 Epoch: 18 Global Step: 95270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:13,686-Speed 5449.84 samples/sec Loss 1.6990 LearningRate 0.0003 Epoch: 18 Global Step: 95280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:15,547-Speed 5503.34 samples/sec Loss 1.7476 LearningRate 0.0003 Epoch: 18 Global Step: 95290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:17,420-Speed 5470.51 samples/sec Loss 1.6889 LearningRate 0.0003 Epoch: 18 Global Step: 95300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:19,254-Speed 5585.40 samples/sec Loss 1.6613 LearningRate 0.0003 Epoch: 18 Global Step: 95310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:21,088-Speed 5584.23 samples/sec Loss 1.6118 LearningRate 0.0003 Epoch: 18 Global Step: 95320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:22,935-Speed 5547.97 samples/sec Loss 1.6892 LearningRate 0.0003 Epoch: 18 Global Step: 95330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:22:24,777-Speed 5560.31 samples/sec Loss 1.8240 LearningRate 0.0003 Epoch: 18 Global Step: 95340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:26,617-Speed 5566.67 samples/sec Loss 1.6743 LearningRate 0.0003 Epoch: 18 Global Step: 95350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:28,455-Speed 5574.03 samples/sec Loss 1.7499 LearningRate 0.0003 Epoch: 18 Global Step: 95360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:30,296-Speed 5565.51 samples/sec Loss 1.6288 LearningRate 0.0003 Epoch: 18 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:32,142-Speed 5547.65 samples/sec Loss 1.7236 LearningRate 0.0003 Epoch: 18 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:33,983-Speed 5574.40 samples/sec Loss 1.6928 LearningRate 0.0003 Epoch: 18 Global Step: 95390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:35,815-Speed 5591.51 samples/sec Loss 1.6367 LearningRate 0.0003 Epoch: 18 Global Step: 95400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:37,666-Speed 5533.00 samples/sec Loss 1.7567 LearningRate 0.0003 Epoch: 18 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:39,510-Speed 5554.40 samples/sec Loss 1.7021 LearningRate 0.0003 Epoch: 18 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:41,351-Speed 5564.81 samples/sec Loss 1.7064 LearningRate 0.0003 Epoch: 18 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:43,209-Speed 5514.12 samples/sec Loss 1.7556 LearningRate 0.0003 Epoch: 18 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:22:45,037-Speed 5603.23 samples/sec Loss 1.8782 LearningRate 0.0003 Epoch: 18 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:46,875-Speed 5573.04 samples/sec Loss 1.7410 LearningRate 0.0003 Epoch: 18 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:48,724-Speed 5542.10 samples/sec Loss 1.7313 LearningRate 0.0003 Epoch: 18 Global Step: 95470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:50,558-Speed 5584.61 samples/sec Loss 1.7172 LearningRate 0.0003 Epoch: 18 Global Step: 95480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:52,398-Speed 5569.27 samples/sec Loss 1.7580 LearningRate 0.0003 Epoch: 18 Global Step: 95490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:54,239-Speed 5562.72 samples/sec Loss 1.7766 LearningRate 0.0003 Epoch: 18 Global Step: 95500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:56,089-Speed 5538.07 samples/sec Loss 1.7145 LearningRate 0.0003 Epoch: 18 Global Step: 95510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:57,927-Speed 5573.16 samples/sec Loss 1.7146 LearningRate 0.0003 Epoch: 18 Global Step: 95520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:22:59,774-Speed 5545.38 samples/sec Loss 1.7226 LearningRate 0.0003 Epoch: 18 Global Step: 95530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:01,625-Speed 5536.02 samples/sec Loss 1.8193 LearningRate 0.0003 Epoch: 18 Global Step: 95540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:03,495-Speed 5478.17 samples/sec Loss 1.5902 LearningRate 0.0003 Epoch: 18 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:23:05,347-Speed 5531.66 samples/sec Loss 1.6633 LearningRate 0.0003 Epoch: 18 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:23:07,184-Speed 5575.31 samples/sec Loss 1.7138 LearningRate 0.0003 Epoch: 18 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:09,022-Speed 5573.37 samples/sec Loss 1.6554 LearningRate 0.0003 Epoch: 18 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:10,869-Speed 5546.84 samples/sec Loss 1.7846 LearningRate 0.0003 Epoch: 18 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:12,738-Speed 5482.16 samples/sec Loss 1.6664 LearningRate 0.0003 Epoch: 18 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:14,598-Speed 5507.72 samples/sec Loss 1.7221 LearningRate 0.0003 Epoch: 18 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:16,439-Speed 5563.03 samples/sec Loss 1.6306 LearningRate 0.0003 Epoch: 18 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:18,280-Speed 5564.11 samples/sec Loss 1.5820 LearningRate 0.0003 Epoch: 18 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:20,115-Speed 5582.99 samples/sec Loss 1.7859 LearningRate 0.0003 Epoch: 18 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:21,956-Speed 5563.98 samples/sec Loss 1.7794 LearningRate 0.0003 Epoch: 18 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:23,803-Speed 5546.57 samples/sec Loss 1.6530 LearningRate 0.0003 Epoch: 18 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:25,694-Speed 5419.11 samples/sec Loss 1.6828 LearningRate 0.0003 Epoch: 18 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:27,537-Speed 5555.11 samples/sec Loss 1.7090 LearningRate 0.0003 Epoch: 18 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:29,383-Speed 5550.28 samples/sec Loss 1.7249 LearningRate 0.0003 Epoch: 18 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:31,246-Speed 5497.81 samples/sec Loss 1.6951 LearningRate 0.0003 Epoch: 18 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:33,097-Speed 5534.72 samples/sec Loss 1.8077 LearningRate 0.0003 Epoch: 18 Global Step: 95710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:34,943-Speed 5550.00 samples/sec Loss 1.6792 LearningRate 0.0003 Epoch: 18 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:36,784-Speed 5563.78 samples/sec Loss 1.6317 LearningRate 0.0003 Epoch: 18 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:38,629-Speed 5554.76 samples/sec Loss 1.7649 LearningRate 0.0003 Epoch: 18 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:40,470-Speed 5563.91 samples/sec Loss 1.8371 LearningRate 0.0003 Epoch: 18 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:42,326-Speed 5518.46 samples/sec Loss 1.6847 LearningRate 0.0003 Epoch: 18 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:23:44,161-Speed 5583.59 samples/sec Loss 1.7723 LearningRate 0.0003 Epoch: 18 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:23:45,997-Speed 5580.31 samples/sec Loss 1.6993 LearningRate 0.0003 Epoch: 18 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:23:47,837-Speed 5566.65 samples/sec Loss 1.7067 LearningRate 0.0003 Epoch: 18 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:23:49,663-Speed 5608.12 samples/sec Loss 1.7339 LearningRate 0.0003 Epoch: 18 Global Step: 95800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:23:51,522-Speed 5511.21 samples/sec Loss 1.7682 LearningRate 0.0003 Epoch: 18 Global Step: 95810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:23:53,374-Speed 5531.92 samples/sec Loss 1.7474 LearningRate 0.0003 Epoch: 18 Global Step: 95820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:23:55,227-Speed 5527.51 samples/sec Loss 1.7121 LearningRate 0.0003 Epoch: 18 Global Step: 95830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:23:57,063-Speed 5580.78 samples/sec Loss 1.6532 LearningRate 0.0003 Epoch: 18 Global Step: 95840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:23:58,902-Speed 5568.68 samples/sec Loss 1.6766 LearningRate 0.0003 Epoch: 18 Global Step: 95850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:00,742-Speed 5566.96 samples/sec Loss 1.7047 LearningRate 0.0003 Epoch: 18 Global Step: 95860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:02,591-Speed 5540.23 samples/sec Loss 1.7445 LearningRate 0.0003 Epoch: 18 Global Step: 95870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:04,435-Speed 5558.44 samples/sec Loss 1.8043 LearningRate 0.0003 Epoch: 18 Global Step: 95880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:06,271-Speed 5577.80 samples/sec Loss 1.7631 LearningRate 0.0003 Epoch: 18 Global Step: 95890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:08,105-Speed 5587.85 samples/sec Loss 1.7372 LearningRate 0.0003 Epoch: 18 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:09,945-Speed 5564.48 samples/sec Loss 1.7694 LearningRate 0.0003 Epoch: 18 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:11,817-Speed 5471.69 samples/sec Loss 1.7495 LearningRate 0.0003 Epoch: 18 Global Step: 95920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:13,689-Speed 5472.67 samples/sec Loss 1.6386 LearningRate 0.0003 Epoch: 18 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:15,556-Speed 5487.66 samples/sec Loss 1.8113 LearningRate 0.0003 Epoch: 18 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:17,422-Speed 5488.19 samples/sec Loss 1.7306 LearningRate 0.0003 Epoch: 18 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:19,272-Speed 5539.63 samples/sec Loss 1.7387 LearningRate 0.0003 Epoch: 18 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:21,113-Speed 5562.49 samples/sec Loss 1.7172 LearningRate 0.0003 Epoch: 18 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:22,967-Speed 5528.12 samples/sec Loss 1.7220 LearningRate 0.0003 Epoch: 18 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:24:24,816-Speed 5539.02 samples/sec Loss 1.7026 LearningRate 0.0003 Epoch: 18 Global Step: 95990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:26,660-Speed 5555.51 samples/sec Loss 1.6638 LearningRate 0.0003 Epoch: 18 Global Step: 96000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:24:53,347-[lfw][96000]XNorm: 22.436321 Training: 2022-04-11 16:24:53,347-[lfw][96000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 16:24:53,348-[lfw][96000]Accuracy-Highest: 0.99817 Training: 2022-04-11 16:25:24,198-[cfp_fp][96000]XNorm: 21.560779 Training: 2022-04-11 16:25:24,199-[cfp_fp][96000]Accuracy-Flip: 0.98386+-0.00593 Training: 2022-04-11 16:25:24,199-[cfp_fp][96000]Accuracy-Highest: 0.98500 Training: 2022-04-11 16:25:50,762-[agedb_30][96000]XNorm: 22.637707 Training: 2022-04-11 16:25:50,763-[agedb_30][96000]Accuracy-Flip: 0.98417+-0.00647 Training: 2022-04-11 16:25:50,763-[agedb_30][96000]Accuracy-Highest: 0.98417 Training: 2022-04-11 16:25:52,630-Speed 119.11 samples/sec Loss 1.7052 LearningRate 0.0003 Epoch: 18 Global Step: 96010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:25:54,535-Speed 5376.59 samples/sec Loss 1.6769 LearningRate 0.0003 Epoch: 18 Global Step: 96020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:25:56,366-Speed 5596.60 samples/sec Loss 1.6879 LearningRate 0.0003 Epoch: 18 Global Step: 96030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:25:58,199-Speed 5588.01 samples/sec Loss 1.7293 LearningRate 0.0003 Epoch: 18 Global Step: 96040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:26:00,037-Speed 5574.61 samples/sec Loss 1.7952 LearningRate 0.0003 Epoch: 18 Global Step: 96050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:26:01,873-Speed 5577.50 samples/sec Loss 1.7215 LearningRate 0.0003 Epoch: 18 Global Step: 96060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:26:03,706-Speed 5590.05 samples/sec Loss 1.6452 LearningRate 0.0003 Epoch: 18 Global Step: 96070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:26:05,544-Speed 5572.37 samples/sec Loss 1.6685 LearningRate 0.0003 Epoch: 18 Global Step: 96080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:26:07,381-Speed 5576.70 samples/sec Loss 1.7190 LearningRate 0.0003 Epoch: 18 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:09,272-Speed 5418.12 samples/sec Loss 1.6523 LearningRate 0.0003 Epoch: 18 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:19,924-Speed 961.39 samples/sec Loss 1.6288 LearningRate 0.0002 Epoch: 19 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:21,779-Speed 5523.72 samples/sec Loss 1.5500 LearningRate 0.0002 Epoch: 19 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:23,614-Speed 5581.62 samples/sec Loss 1.6171 LearningRate 0.0002 Epoch: 19 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:25,451-Speed 5575.15 samples/sec Loss 1.5062 LearningRate 0.0002 Epoch: 19 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:27,548-Speed 4885.49 samples/sec Loss 1.4894 LearningRate 0.0002 Epoch: 19 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:29,392-Speed 5554.56 samples/sec Loss 1.5033 LearningRate 0.0002 Epoch: 19 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:31,233-Speed 5564.29 samples/sec Loss 1.5522 LearningRate 0.0002 Epoch: 19 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:33,083-Speed 5536.52 samples/sec Loss 1.5919 LearningRate 0.0002 Epoch: 19 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:34,931-Speed 5543.26 samples/sec Loss 1.5657 LearningRate 0.0002 Epoch: 19 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:26:36,766-Speed 5583.79 samples/sec Loss 1.4484 LearningRate 0.0002 Epoch: 19 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:26:38,617-Speed 5533.38 samples/sec Loss 1.4633 LearningRate 0.0002 Epoch: 19 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:26:40,481-Speed 5495.59 samples/sec Loss 1.5142 LearningRate 0.0002 Epoch: 19 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:42,329-Speed 5542.81 samples/sec Loss 1.4413 LearningRate 0.0002 Epoch: 19 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:44,183-Speed 5525.56 samples/sec Loss 1.4122 LearningRate 0.0002 Epoch: 19 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:46,024-Speed 5564.43 samples/sec Loss 1.5488 LearningRate 0.0002 Epoch: 19 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:47,858-Speed 5585.82 samples/sec Loss 1.5331 LearningRate 0.0002 Epoch: 19 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:49,696-Speed 5573.34 samples/sec Loss 1.5835 LearningRate 0.0002 Epoch: 19 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:51,556-Speed 5506.05 samples/sec Loss 1.5863 LearningRate 0.0002 Epoch: 19 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:53,400-Speed 5556.70 samples/sec Loss 1.4578 LearningRate 0.0002 Epoch: 19 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:55,241-Speed 5561.91 samples/sec Loss 1.5130 LearningRate 0.0002 Epoch: 19 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:57,074-Speed 5592.56 samples/sec Loss 1.5679 LearningRate 0.0002 Epoch: 19 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:26:58,911-Speed 5574.78 samples/sec Loss 1.5646 LearningRate 0.0002 Epoch: 19 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:00,763-Speed 5532.87 samples/sec Loss 1.5487 LearningRate 0.0002 Epoch: 19 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:02,600-Speed 5574.17 samples/sec Loss 1.5059 LearningRate 0.0002 Epoch: 19 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:04,442-Speed 5561.73 samples/sec Loss 1.5336 LearningRate 0.0002 Epoch: 19 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:06,304-Speed 5503.68 samples/sec Loss 1.5271 LearningRate 0.0002 Epoch: 19 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:08,148-Speed 5553.99 samples/sec Loss 1.5063 LearningRate 0.0002 Epoch: 19 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:10,011-Speed 5499.16 samples/sec Loss 1.4885 LearningRate 0.0002 Epoch: 19 Global Step: 96380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:11,898-Speed 5428.25 samples/sec Loss 1.4566 LearningRate 0.0002 Epoch: 19 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:13,762-Speed 5495.23 samples/sec Loss 1.6043 LearningRate 0.0002 Epoch: 19 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:15,600-Speed 5572.04 samples/sec Loss 1.5409 LearningRate 0.0002 Epoch: 19 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:17,471-Speed 5477.54 samples/sec Loss 1.5653 LearningRate 0.0002 Epoch: 19 Global Step: 96420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:19,303-Speed 5591.64 samples/sec Loss 1.5728 LearningRate 0.0002 Epoch: 19 Global Step: 96430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:21,140-Speed 5576.39 samples/sec Loss 1.4928 LearningRate 0.0002 Epoch: 19 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:22,968-Speed 5603.79 samples/sec Loss 1.4695 LearningRate 0.0002 Epoch: 19 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:24,804-Speed 5577.68 samples/sec Loss 1.4448 LearningRate 0.0002 Epoch: 19 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:26,660-Speed 5521.83 samples/sec Loss 1.5919 LearningRate 0.0002 Epoch: 19 Global Step: 96470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:28,499-Speed 5568.24 samples/sec Loss 1.5885 LearningRate 0.0002 Epoch: 19 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:30,332-Speed 5590.23 samples/sec Loss 1.4721 LearningRate 0.0002 Epoch: 19 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:32,172-Speed 5567.46 samples/sec Loss 1.5424 LearningRate 0.0002 Epoch: 19 Global Step: 96500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:34,007-Speed 5580.64 samples/sec Loss 1.5276 LearningRate 0.0002 Epoch: 19 Global Step: 96510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:35,853-Speed 5550.44 samples/sec Loss 1.4780 LearningRate 0.0002 Epoch: 19 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:37,709-Speed 5518.19 samples/sec Loss 1.6754 LearningRate 0.0002 Epoch: 19 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:39,542-Speed 5591.03 samples/sec Loss 1.4864 LearningRate 0.0002 Epoch: 19 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:41,374-Speed 5590.50 samples/sec Loss 1.6298 LearningRate 0.0002 Epoch: 19 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:43,206-Speed 5591.75 samples/sec Loss 1.6660 LearningRate 0.0002 Epoch: 19 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:45,038-Speed 5590.64 samples/sec Loss 1.5445 LearningRate 0.0002 Epoch: 19 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:46,879-Speed 5565.47 samples/sec Loss 1.5003 LearningRate 0.0002 Epoch: 19 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:27:48,709-Speed 5597.29 samples/sec Loss 1.5549 LearningRate 0.0002 Epoch: 19 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:50,544-Speed 5582.13 samples/sec Loss 1.4732 LearningRate 0.0002 Epoch: 19 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:52,401-Speed 5518.93 samples/sec Loss 1.6774 LearningRate 0.0002 Epoch: 19 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:54,238-Speed 5574.88 samples/sec Loss 1.5135 LearningRate 0.0002 Epoch: 19 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:56,068-Speed 5599.29 samples/sec Loss 1.4696 LearningRate 0.0002 Epoch: 19 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:57,905-Speed 5575.91 samples/sec Loss 1.5830 LearningRate 0.0002 Epoch: 19 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:27:59,742-Speed 5574.37 samples/sec Loss 1.5306 LearningRate 0.0002 Epoch: 19 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:01,583-Speed 5564.90 samples/sec Loss 1.5565 LearningRate 0.0002 Epoch: 19 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:03,431-Speed 5545.17 samples/sec Loss 1.5637 LearningRate 0.0002 Epoch: 19 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:05,293-Speed 5501.90 samples/sec Loss 1.5401 LearningRate 0.0002 Epoch: 19 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:07,125-Speed 5591.06 samples/sec Loss 1.5558 LearningRate 0.0002 Epoch: 19 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:08,952-Speed 5606.48 samples/sec Loss 1.4871 LearningRate 0.0002 Epoch: 19 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:10,793-Speed 5562.98 samples/sec Loss 1.6032 LearningRate 0.0002 Epoch: 19 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:12,631-Speed 5574.98 samples/sec Loss 1.5365 LearningRate 0.0002 Epoch: 19 Global Step: 96720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:14,467-Speed 5577.68 samples/sec Loss 1.6064 LearningRate 0.0002 Epoch: 19 Global Step: 96730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:16,335-Speed 5485.33 samples/sec Loss 1.5063 LearningRate 0.0002 Epoch: 19 Global Step: 96740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:18,166-Speed 5593.85 samples/sec Loss 1.4849 LearningRate 0.0002 Epoch: 19 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:19,999-Speed 5587.84 samples/sec Loss 1.6050 LearningRate 0.0002 Epoch: 19 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:21,842-Speed 5559.56 samples/sec Loss 1.4584 LearningRate 0.0002 Epoch: 19 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:23,695-Speed 5530.28 samples/sec Loss 1.6288 LearningRate 0.0002 Epoch: 19 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:25,529-Speed 5584.40 samples/sec Loss 1.5922 LearningRate 0.0002 Epoch: 19 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:27,369-Speed 5567.25 samples/sec Loss 1.5093 LearningRate 0.0002 Epoch: 19 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:29,200-Speed 5595.12 samples/sec Loss 1.5577 LearningRate 0.0002 Epoch: 19 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:31,034-Speed 5586.77 samples/sec Loss 1.5981 LearningRate 0.0002 Epoch: 19 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:32,888-Speed 5525.32 samples/sec Loss 1.4584 LearningRate 0.0002 Epoch: 19 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:34,739-Speed 5531.46 samples/sec Loss 1.5282 LearningRate 0.0002 Epoch: 19 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:36,577-Speed 5574.84 samples/sec Loss 1.6049 LearningRate 0.0002 Epoch: 19 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:38,437-Speed 5507.26 samples/sec Loss 1.5967 LearningRate 0.0002 Epoch: 19 Global Step: 96860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:40,277-Speed 5567.79 samples/sec Loss 1.5644 LearningRate 0.0002 Epoch: 19 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:42,121-Speed 5553.59 samples/sec Loss 1.4490 LearningRate 0.0002 Epoch: 19 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:43,964-Speed 5560.18 samples/sec Loss 1.5956 LearningRate 0.0002 Epoch: 19 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:28:45,810-Speed 5547.50 samples/sec Loss 1.5547 LearningRate 0.0002 Epoch: 19 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:47,646-Speed 5582.96 samples/sec Loss 1.6425 LearningRate 0.0002 Epoch: 19 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:49,483-Speed 5575.51 samples/sec Loss 1.5897 LearningRate 0.0002 Epoch: 19 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:51,328-Speed 5551.92 samples/sec Loss 1.6153 LearningRate 0.0002 Epoch: 19 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:53,163-Speed 5581.04 samples/sec Loss 1.5716 LearningRate 0.0002 Epoch: 19 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:54,997-Speed 5585.52 samples/sec Loss 1.4954 LearningRate 0.0002 Epoch: 19 Global Step: 96950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:56,831-Speed 5586.14 samples/sec Loss 1.5248 LearningRate 0.0002 Epoch: 19 Global Step: 96960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:28:58,681-Speed 5538.56 samples/sec Loss 1.5471 LearningRate 0.0002 Epoch: 19 Global Step: 96970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:00,530-Speed 5539.67 samples/sec Loss 1.4920 LearningRate 0.0002 Epoch: 19 Global Step: 96980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:02,371-Speed 5562.77 samples/sec Loss 1.5504 LearningRate 0.0002 Epoch: 19 Global Step: 96990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:04,214-Speed 5558.43 samples/sec Loss 1.5899 LearningRate 0.0002 Epoch: 19 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:06,055-Speed 5564.75 samples/sec Loss 1.6381 LearningRate 0.0002 Epoch: 19 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:07,889-Speed 5586.22 samples/sec Loss 1.5267 LearningRate 0.0002 Epoch: 19 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:29:09,714-Speed 5614.57 samples/sec Loss 1.6197 LearningRate 0.0002 Epoch: 19 Global Step: 97030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:11,551-Speed 5577.39 samples/sec Loss 1.5780 LearningRate 0.0002 Epoch: 19 Global Step: 97040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:13,389-Speed 5570.54 samples/sec Loss 1.5218 LearningRate 0.0002 Epoch: 19 Global Step: 97050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:15,229-Speed 5567.35 samples/sec Loss 1.4878 LearningRate 0.0002 Epoch: 19 Global Step: 97060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:17,070-Speed 5565.96 samples/sec Loss 1.6445 LearningRate 0.0002 Epoch: 19 Global Step: 97070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:18,904-Speed 5585.23 samples/sec Loss 1.5782 LearningRate 0.0002 Epoch: 19 Global Step: 97080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:20,736-Speed 5589.33 samples/sec Loss 1.6008 LearningRate 0.0002 Epoch: 19 Global Step: 97090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:22,567-Speed 5595.08 samples/sec Loss 1.5073 LearningRate 0.0002 Epoch: 19 Global Step: 97100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:24,416-Speed 5542.29 samples/sec Loss 1.6279 LearningRate 0.0002 Epoch: 19 Global Step: 97110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:26,253-Speed 5575.44 samples/sec Loss 1.6264 LearningRate 0.0002 Epoch: 19 Global Step: 97120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:28,102-Speed 5540.93 samples/sec Loss 1.6026 LearningRate 0.0002 Epoch: 19 Global Step: 97130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:29,940-Speed 5574.06 samples/sec Loss 1.5924 LearningRate 0.0002 Epoch: 19 Global Step: 97140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:31,775-Speed 5587.57 samples/sec Loss 1.5293 LearningRate 0.0002 Epoch: 19 Global Step: 97150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:33,614-Speed 5571.94 samples/sec Loss 1.5793 LearningRate 0.0002 Epoch: 19 Global Step: 97160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:35,451-Speed 5576.34 samples/sec Loss 1.5088 LearningRate 0.0002 Epoch: 19 Global Step: 97170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:37,285-Speed 5585.33 samples/sec Loss 1.5292 LearningRate 0.0002 Epoch: 19 Global Step: 97180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:39,116-Speed 5593.00 samples/sec Loss 1.6175 LearningRate 0.0002 Epoch: 19 Global Step: 97190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:40,957-Speed 5565.23 samples/sec Loss 1.5385 LearningRate 0.0002 Epoch: 19 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:42,793-Speed 5578.80 samples/sec Loss 1.5468 LearningRate 0.0002 Epoch: 19 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:44,636-Speed 5558.07 samples/sec Loss 1.5630 LearningRate 0.0002 Epoch: 19 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:46,483-Speed 5547.12 samples/sec Loss 1.5253 LearningRate 0.0002 Epoch: 19 Global Step: 97230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:29:48,339-Speed 5519.30 samples/sec Loss 1.5211 LearningRate 0.0002 Epoch: 19 Global Step: 97240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:50,191-Speed 5533.09 samples/sec Loss 1.4622 LearningRate 0.0001 Epoch: 19 Global Step: 97250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:52,092-Speed 5388.71 samples/sec Loss 1.5588 LearningRate 0.0001 Epoch: 19 Global Step: 97260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:53,953-Speed 5502.59 samples/sec Loss 1.5096 LearningRate 0.0001 Epoch: 19 Global Step: 97270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:55,793-Speed 5568.28 samples/sec Loss 1.4815 LearningRate 0.0001 Epoch: 19 Global Step: 97280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:57,633-Speed 5568.74 samples/sec Loss 1.5897 LearningRate 0.0001 Epoch: 19 Global Step: 97290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:29:59,490-Speed 5514.26 samples/sec Loss 1.5691 LearningRate 0.0001 Epoch: 19 Global Step: 97300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:30:01,351-Speed 5506.82 samples/sec Loss 1.5362 LearningRate 0.0001 Epoch: 19 Global Step: 97310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:30:03,205-Speed 5523.64 samples/sec Loss 1.5433 LearningRate 0.0001 Epoch: 19 Global Step: 97320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:30:05,038-Speed 5588.48 samples/sec Loss 1.6234 LearningRate 0.0001 Epoch: 19 Global Step: 97330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:30:06,873-Speed 5585.02 samples/sec Loss 1.4902 LearningRate 0.0001 Epoch: 19 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:08,709-Speed 5579.51 samples/sec Loss 1.6154 LearningRate 0.0001 Epoch: 19 Global Step: 97350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:10,549-Speed 5564.96 samples/sec Loss 1.5567 LearningRate 0.0001 Epoch: 19 Global Step: 97360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:12,387-Speed 5574.92 samples/sec Loss 1.5868 LearningRate 0.0001 Epoch: 19 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:14,228-Speed 5562.34 samples/sec Loss 1.5264 LearningRate 0.0001 Epoch: 19 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:16,064-Speed 5581.15 samples/sec Loss 1.4759 LearningRate 0.0001 Epoch: 19 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:17,904-Speed 5566.51 samples/sec Loss 1.5689 LearningRate 0.0001 Epoch: 19 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:19,738-Speed 5586.84 samples/sec Loss 1.5749 LearningRate 0.0001 Epoch: 19 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:21,573-Speed 5581.96 samples/sec Loss 1.5145 LearningRate 0.0001 Epoch: 19 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:23,407-Speed 5586.72 samples/sec Loss 1.5260 LearningRate 0.0001 Epoch: 19 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:25,254-Speed 5546.01 samples/sec Loss 1.5888 LearningRate 0.0001 Epoch: 19 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:30:27,089-Speed 5584.14 samples/sec Loss 1.5807 LearningRate 0.0001 Epoch: 19 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:28,923-Speed 5584.83 samples/sec Loss 1.5078 LearningRate 0.0001 Epoch: 19 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:30,758-Speed 5580.28 samples/sec Loss 1.5475 LearningRate 0.0001 Epoch: 19 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:32,593-Speed 5583.19 samples/sec Loss 1.5042 LearningRate 0.0001 Epoch: 19 Global Step: 97480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:34,435-Speed 5562.22 samples/sec Loss 1.4937 LearningRate 0.0001 Epoch: 19 Global Step: 97490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:36,278-Speed 5559.00 samples/sec Loss 1.5787 LearningRate 0.0001 Epoch: 19 Global Step: 97500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:38,134-Speed 5517.26 samples/sec Loss 1.4576 LearningRate 0.0001 Epoch: 19 Global Step: 97510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:39,971-Speed 5576.68 samples/sec Loss 1.6353 LearningRate 0.0001 Epoch: 19 Global Step: 97520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:41,821-Speed 5537.55 samples/sec Loss 1.5839 LearningRate 0.0001 Epoch: 19 Global Step: 97530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:43,657-Speed 5580.41 samples/sec Loss 1.3945 LearningRate 0.0001 Epoch: 19 Global Step: 97540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:45,478-Speed 5623.94 samples/sec Loss 1.5067 LearningRate 0.0001 Epoch: 19 Global Step: 97550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:47,314-Speed 5579.99 samples/sec Loss 1.5166 LearningRate 0.0001 Epoch: 19 Global Step: 97560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:49,153-Speed 5572.18 samples/sec Loss 1.5265 LearningRate 0.0001 Epoch: 19 Global Step: 97570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:50,990-Speed 5573.70 samples/sec Loss 1.5782 LearningRate 0.0001 Epoch: 19 Global Step: 97580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:52,827-Speed 5578.19 samples/sec Loss 1.5006 LearningRate 0.0001 Epoch: 19 Global Step: 97590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:54,666-Speed 5568.76 samples/sec Loss 1.5237 LearningRate 0.0001 Epoch: 19 Global Step: 97600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:56,510-Speed 5558.30 samples/sec Loss 1.5298 LearningRate 0.0001 Epoch: 19 Global Step: 97610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:30:58,342-Speed 5590.37 samples/sec Loss 1.4784 LearningRate 0.0001 Epoch: 19 Global Step: 97620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:00,178-Speed 5578.17 samples/sec Loss 1.5498 LearningRate 0.0001 Epoch: 19 Global Step: 97630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:02,045-Speed 5486.87 samples/sec Loss 1.5968 LearningRate 0.0001 Epoch: 19 Global Step: 97640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:03,924-Speed 5451.71 samples/sec Loss 1.5192 LearningRate 0.0001 Epoch: 19 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:31:05,764-Speed 5569.90 samples/sec Loss 1.5549 LearningRate 0.0001 Epoch: 19 Global Step: 97660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:07,606-Speed 5561.72 samples/sec Loss 1.5322 LearningRate 0.0001 Epoch: 19 Global Step: 97670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:09,439-Speed 5589.17 samples/sec Loss 1.4695 LearningRate 0.0001 Epoch: 19 Global Step: 97680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:11,281-Speed 5559.92 samples/sec Loss 1.5265 LearningRate 0.0001 Epoch: 19 Global Step: 97690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:13,143-Speed 5501.40 samples/sec Loss 1.5530 LearningRate 0.0001 Epoch: 19 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:14,984-Speed 5562.90 samples/sec Loss 1.5590 LearningRate 0.0001 Epoch: 19 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:16,830-Speed 5551.51 samples/sec Loss 1.5534 LearningRate 0.0001 Epoch: 19 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:18,674-Speed 5554.60 samples/sec Loss 1.5098 LearningRate 0.0001 Epoch: 19 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:20,507-Speed 5588.21 samples/sec Loss 1.5702 LearningRate 0.0001 Epoch: 19 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:22,353-Speed 5548.39 samples/sec Loss 1.5424 LearningRate 0.0001 Epoch: 19 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:24,192-Speed 5570.84 samples/sec Loss 1.5917 LearningRate 0.0001 Epoch: 19 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:31:26,053-Speed 5503.95 samples/sec Loss 1.6055 LearningRate 0.0001 Epoch: 19 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:27,882-Speed 5602.83 samples/sec Loss 1.5538 LearningRate 0.0001 Epoch: 19 Global Step: 97780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:29,721-Speed 5568.43 samples/sec Loss 1.5859 LearningRate 0.0001 Epoch: 19 Global Step: 97790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:31,556-Speed 5584.61 samples/sec Loss 1.5049 LearningRate 0.0001 Epoch: 19 Global Step: 97800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:33,394-Speed 5572.68 samples/sec Loss 1.4840 LearningRate 0.0001 Epoch: 19 Global Step: 97810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:35,231-Speed 5576.78 samples/sec Loss 1.5672 LearningRate 0.0001 Epoch: 19 Global Step: 97820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:37,073-Speed 5562.64 samples/sec Loss 1.5252 LearningRate 0.0001 Epoch: 19 Global Step: 97830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:38,914-Speed 5562.65 samples/sec Loss 1.6160 LearningRate 0.0001 Epoch: 19 Global Step: 97840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:40,748-Speed 5586.10 samples/sec Loss 1.6245 LearningRate 0.0001 Epoch: 19 Global Step: 97850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:42,591-Speed 5557.70 samples/sec Loss 1.5544 LearningRate 0.0001 Epoch: 19 Global Step: 97860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:44,429-Speed 5573.71 samples/sec Loss 1.5147 LearningRate 0.0001 Epoch: 19 Global Step: 97870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:31:46,287-Speed 5514.43 samples/sec Loss 1.5191 LearningRate 0.0001 Epoch: 19 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:48,134-Speed 5543.93 samples/sec Loss 1.5384 LearningRate 0.0001 Epoch: 19 Global Step: 97890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:50,036-Speed 5386.55 samples/sec Loss 1.5537 LearningRate 0.0001 Epoch: 19 Global Step: 97900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:51,888-Speed 5533.48 samples/sec Loss 1.4853 LearningRate 0.0001 Epoch: 19 Global Step: 97910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:53,733-Speed 5552.84 samples/sec Loss 1.5143 LearningRate 0.0001 Epoch: 19 Global Step: 97920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:55,571-Speed 5572.31 samples/sec Loss 1.5523 LearningRate 0.0001 Epoch: 19 Global Step: 97930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:57,416-Speed 5552.61 samples/sec Loss 1.5639 LearningRate 0.0001 Epoch: 19 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:31:59,251-Speed 5580.81 samples/sec Loss 1.5381 LearningRate 0.0001 Epoch: 19 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:32:01,093-Speed 5562.36 samples/sec Loss 1.5348 LearningRate 0.0001 Epoch: 19 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:32:02,929-Speed 5578.45 samples/sec Loss 1.5863 LearningRate 0.0001 Epoch: 19 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:32:04,795-Speed 5489.16 samples/sec Loss 1.5208 LearningRate 0.0001 Epoch: 19 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:32:06,640-Speed 5554.18 samples/sec Loss 1.5683 LearningRate 0.0001 Epoch: 19 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:32:08,479-Speed 5570.41 samples/sec Loss 1.5615 LearningRate 0.0001 Epoch: 19 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:32:34,903-[lfw][98000]XNorm: 22.373969 Training: 2022-04-11 16:32:34,904-[lfw][98000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 16:32:34,904-[lfw][98000]Accuracy-Highest: 0.99817 Training: 2022-04-11 16:33:05,524-[cfp_fp][98000]XNorm: 21.498835 Training: 2022-04-11 16:33:05,525-[cfp_fp][98000]Accuracy-Flip: 0.98529+-0.00527 Training: 2022-04-11 16:33:05,525-[cfp_fp][98000]Accuracy-Highest: 0.98529 Training: 2022-04-11 16:33:31,867-[agedb_30][98000]XNorm: 22.535583 Training: 2022-04-11 16:33:31,867-[agedb_30][98000]Accuracy-Flip: 0.98450+-0.00654 Training: 2022-04-11 16:33:31,868-[agedb_30][98000]Accuracy-Highest: 0.98450 Training: 2022-04-11 16:33:33,702-Speed 120.16 samples/sec Loss 1.5491 LearningRate 0.0001 Epoch: 19 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:35,529-Speed 5608.09 samples/sec Loss 1.5762 LearningRate 0.0001 Epoch: 19 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:37,357-Speed 5603.72 samples/sec Loss 1.5689 LearningRate 0.0001 Epoch: 19 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:39,186-Speed 5598.55 samples/sec Loss 1.5942 LearningRate 0.0001 Epoch: 19 Global Step: 98040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:41,017-Speed 5593.99 samples/sec Loss 1.4326 LearningRate 0.0001 Epoch: 19 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:42,848-Speed 5596.31 samples/sec Loss 1.5950 LearningRate 0.0001 Epoch: 19 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:44,683-Speed 5582.73 samples/sec Loss 1.4537 LearningRate 0.0001 Epoch: 19 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:46,525-Speed 5561.45 samples/sec Loss 1.4576 LearningRate 0.0001 Epoch: 19 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:48,365-Speed 5566.67 samples/sec Loss 1.5298 LearningRate 0.0001 Epoch: 19 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:50,197-Speed 5592.07 samples/sec Loss 1.5708 LearningRate 0.0001 Epoch: 19 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:52,031-Speed 5584.39 samples/sec Loss 1.5289 LearningRate 0.0001 Epoch: 19 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:33:53,869-Speed 5574.65 samples/sec Loss 1.5788 LearningRate 0.0001 Epoch: 19 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:55,707-Speed 5571.70 samples/sec Loss 1.5150 LearningRate 0.0001 Epoch: 19 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:57,540-Speed 5590.95 samples/sec Loss 1.5046 LearningRate 0.0001 Epoch: 19 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:33:59,381-Speed 5564.62 samples/sec Loss 1.5839 LearningRate 0.0001 Epoch: 19 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:01,232-Speed 5532.97 samples/sec Loss 1.5878 LearningRate 0.0001 Epoch: 19 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:03,066-Speed 5585.03 samples/sec Loss 1.5751 LearningRate 0.0001 Epoch: 19 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:04,902-Speed 5578.71 samples/sec Loss 1.5056 LearningRate 0.0001 Epoch: 19 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:06,738-Speed 5581.55 samples/sec Loss 1.5489 LearningRate 0.0001 Epoch: 19 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:08,570-Speed 5589.88 samples/sec Loss 1.5553 LearningRate 0.0001 Epoch: 19 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:10,409-Speed 5570.04 samples/sec Loss 1.5398 LearningRate 0.0001 Epoch: 19 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:12,244-Speed 5584.64 samples/sec Loss 1.6236 LearningRate 0.0001 Epoch: 19 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:34:14,083-Speed 5568.70 samples/sec Loss 1.4875 LearningRate 0.0001 Epoch: 19 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:34:15,911-Speed 5606.13 samples/sec Loss 1.5319 LearningRate 0.0001 Epoch: 19 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:17,756-Speed 5552.58 samples/sec Loss 1.5957 LearningRate 0.0001 Epoch: 19 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:19,590-Speed 5585.45 samples/sec Loss 1.5612 LearningRate 0.0001 Epoch: 19 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:21,430-Speed 5565.15 samples/sec Loss 1.5654 LearningRate 0.0001 Epoch: 19 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:23,266-Speed 5580.15 samples/sec Loss 1.5554 LearningRate 0.0001 Epoch: 19 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:25,124-Speed 5513.35 samples/sec Loss 1.5819 LearningRate 0.0001 Epoch: 19 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:26,962-Speed 5575.78 samples/sec Loss 1.5464 LearningRate 0.0001 Epoch: 19 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:28,789-Speed 5604.79 samples/sec Loss 1.4836 LearningRate 0.0001 Epoch: 19 Global Step: 98310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:30,633-Speed 5556.37 samples/sec Loss 1.5671 LearningRate 0.0001 Epoch: 19 Global Step: 98320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:32,469-Speed 5580.76 samples/sec Loss 1.5847 LearningRate 0.0001 Epoch: 19 Global Step: 98330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:34,327-Speed 5512.84 samples/sec Loss 1.5092 LearningRate 0.0001 Epoch: 19 Global Step: 98340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:36,177-Speed 5537.00 samples/sec Loss 1.4904 LearningRate 0.0001 Epoch: 19 Global Step: 98350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:38,041-Speed 5496.09 samples/sec Loss 1.4984 LearningRate 0.0001 Epoch: 19 Global Step: 98360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:39,905-Speed 5496.95 samples/sec Loss 1.4611 LearningRate 0.0001 Epoch: 19 Global Step: 98370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:41,752-Speed 5544.82 samples/sec Loss 1.6575 LearningRate 0.0001 Epoch: 19 Global Step: 98380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:43,592-Speed 5568.99 samples/sec Loss 1.5046 LearningRate 0.0001 Epoch: 19 Global Step: 98390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:45,432-Speed 5565.26 samples/sec Loss 1.6495 LearningRate 0.0001 Epoch: 19 Global Step: 98400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:34:47,280-Speed 5543.59 samples/sec Loss 1.5218 LearningRate 0.0001 Epoch: 19 Global Step: 98410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:49,120-Speed 5567.13 samples/sec Loss 1.5482 LearningRate 0.0001 Epoch: 19 Global Step: 98420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:50,962-Speed 5561.63 samples/sec Loss 1.5098 LearningRate 0.0001 Epoch: 19 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:52,802-Speed 5568.69 samples/sec Loss 1.5826 LearningRate 0.0001 Epoch: 19 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:54,642-Speed 5565.03 samples/sec Loss 1.5873 LearningRate 0.0001 Epoch: 19 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:56,493-Speed 5536.36 samples/sec Loss 1.6411 LearningRate 0.0001 Epoch: 19 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:34:58,336-Speed 5559.02 samples/sec Loss 1.6501 LearningRate 0.0001 Epoch: 19 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:00,174-Speed 5572.93 samples/sec Loss 1.5320 LearningRate 0.0001 Epoch: 19 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:02,037-Speed 5497.01 samples/sec Loss 1.4210 LearningRate 0.0001 Epoch: 19 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:03,903-Speed 5492.03 samples/sec Loss 1.5486 LearningRate 0.0001 Epoch: 19 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:05,788-Speed 5432.91 samples/sec Loss 1.5599 LearningRate 0.0001 Epoch: 19 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:07,639-Speed 5535.54 samples/sec Loss 1.5070 LearningRate 0.0001 Epoch: 19 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:09,467-Speed 5601.46 samples/sec Loss 1.6505 LearningRate 0.0001 Epoch: 19 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:11,308-Speed 5566.12 samples/sec Loss 1.5561 LearningRate 0.0001 Epoch: 19 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:13,160-Speed 5532.54 samples/sec Loss 1.5003 LearningRate 0.0001 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:15,017-Speed 5513.90 samples/sec Loss 1.5340 LearningRate 0.0001 Epoch: 19 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:16,872-Speed 5523.22 samples/sec Loss 1.4834 LearningRate 0.0001 Epoch: 19 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:18,713-Speed 5565.91 samples/sec Loss 1.6588 LearningRate 0.0001 Epoch: 19 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:20,546-Speed 5587.12 samples/sec Loss 1.5161 LearningRate 0.0001 Epoch: 19 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:22,380-Speed 5587.15 samples/sec Loss 1.6025 LearningRate 0.0001 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:24,229-Speed 5539.81 samples/sec Loss 1.5274 LearningRate 0.0001 Epoch: 19 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:26,064-Speed 5580.71 samples/sec Loss 1.5029 LearningRate 0.0001 Epoch: 19 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:27,901-Speed 5578.37 samples/sec Loss 1.5332 LearningRate 0.0001 Epoch: 19 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:29,741-Speed 5566.52 samples/sec Loss 1.6062 LearningRate 0.0001 Epoch: 19 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:31,571-Speed 5595.69 samples/sec Loss 1.6063 LearningRate 0.0001 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:33,405-Speed 5587.50 samples/sec Loss 1.4816 LearningRate 0.0001 Epoch: 19 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:35,247-Speed 5560.95 samples/sec Loss 1.5309 LearningRate 0.0001 Epoch: 19 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:37,079-Speed 5593.98 samples/sec Loss 1.6012 LearningRate 0.0001 Epoch: 19 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:38,923-Speed 5553.73 samples/sec Loss 1.5664 LearningRate 0.0001 Epoch: 19 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:40,777-Speed 5525.44 samples/sec Loss 1.5748 LearningRate 0.0001 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:42,615-Speed 5573.85 samples/sec Loss 1.5483 LearningRate 0.0001 Epoch: 19 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:44,456-Speed 5565.53 samples/sec Loss 1.5462 LearningRate 0.0001 Epoch: 19 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:46,315-Speed 5507.54 samples/sec Loss 1.6601 LearningRate 0.0001 Epoch: 19 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:48,150-Speed 5586.03 samples/sec Loss 1.4587 LearningRate 0.0001 Epoch: 19 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:49,986-Speed 5576.48 samples/sec Loss 1.6492 LearningRate 0.0001 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:51,833-Speed 5548.09 samples/sec Loss 1.5210 LearningRate 0.0001 Epoch: 19 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:35:53,657-Speed 5614.32 samples/sec Loss 1.5600 LearningRate 0.0001 Epoch: 19 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:55,494-Speed 5577.87 samples/sec Loss 1.5662 LearningRate 0.0001 Epoch: 19 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:57,334-Speed 5566.74 samples/sec Loss 1.5276 LearningRate 0.0001 Epoch: 19 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:35:59,169-Speed 5584.89 samples/sec Loss 1.5091 LearningRate 0.0001 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:01,003-Speed 5585.25 samples/sec Loss 1.5553 LearningRate 0.0001 Epoch: 19 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:02,849-Speed 5549.30 samples/sec Loss 1.5425 LearningRate 0.0001 Epoch: 19 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:04,687-Speed 5573.97 samples/sec Loss 1.5734 LearningRate 0.0001 Epoch: 19 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:06,530-Speed 5555.57 samples/sec Loss 1.4738 LearningRate 0.0001 Epoch: 19 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:08,374-Speed 5558.69 samples/sec Loss 1.5950 LearningRate 0.0001 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:10,205-Speed 5591.77 samples/sec Loss 1.5419 LearningRate 0.0001 Epoch: 19 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:12,048-Speed 5561.06 samples/sec Loss 1.6811 LearningRate 0.0001 Epoch: 19 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:36:13,889-Speed 5563.14 samples/sec Loss 1.5677 LearningRate 0.0001 Epoch: 19 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:36:15,718-Speed 5598.55 samples/sec Loss 1.5148 LearningRate 0.0001 Epoch: 19 Global Step: 98890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:17,559-Speed 5567.64 samples/sec Loss 1.5549 LearningRate 0.0000 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:19,421-Speed 5502.03 samples/sec Loss 1.5636 LearningRate 0.0000 Epoch: 19 Global Step: 98910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:21,255-Speed 5582.83 samples/sec Loss 1.5323 LearningRate 0.0000 Epoch: 19 Global Step: 98920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:23,087-Speed 5594.11 samples/sec Loss 1.5042 LearningRate 0.0000 Epoch: 19 Global Step: 98930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:24,938-Speed 5534.51 samples/sec Loss 1.5433 LearningRate 0.0000 Epoch: 19 Global Step: 98940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:26,771-Speed 5586.74 samples/sec Loss 1.5973 LearningRate 0.0000 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:28,608-Speed 5577.36 samples/sec Loss 1.5377 LearningRate 0.0000 Epoch: 19 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:30,446-Speed 5573.69 samples/sec Loss 1.5308 LearningRate 0.0000 Epoch: 19 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:32,282-Speed 5577.88 samples/sec Loss 1.5760 LearningRate 0.0000 Epoch: 19 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:34,112-Speed 5598.52 samples/sec Loss 1.5666 LearningRate 0.0000 Epoch: 19 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:35,943-Speed 5595.36 samples/sec Loss 1.6505 LearningRate 0.0000 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:37,789-Speed 5546.83 samples/sec Loss 1.5794 LearningRate 0.0000 Epoch: 19 Global Step: 99010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:39,630-Speed 5565.64 samples/sec Loss 1.5711 LearningRate 0.0000 Epoch: 19 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:41,467-Speed 5576.37 samples/sec Loss 1.5869 LearningRate 0.0000 Epoch: 19 Global Step: 99030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:43,305-Speed 5575.03 samples/sec Loss 1.5488 LearningRate 0.0000 Epoch: 19 Global Step: 99040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:45,142-Speed 5576.23 samples/sec Loss 1.4894 LearningRate 0.0000 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:46,981-Speed 5568.22 samples/sec Loss 1.6079 LearningRate 0.0000 Epoch: 19 Global Step: 99060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:48,842-Speed 5507.08 samples/sec Loss 1.5433 LearningRate 0.0000 Epoch: 19 Global Step: 99070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:50,680-Speed 5570.38 samples/sec Loss 1.4866 LearningRate 0.0000 Epoch: 19 Global Step: 99080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:52,516-Speed 5581.58 samples/sec Loss 1.6010 LearningRate 0.0000 Epoch: 19 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:36:54,351-Speed 5582.56 samples/sec Loss 1.5711 LearningRate 0.0000 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:56,201-Speed 5535.07 samples/sec Loss 1.5629 LearningRate 0.0000 Epoch: 19 Global Step: 99110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:58,037-Speed 5581.84 samples/sec Loss 1.4897 LearningRate 0.0000 Epoch: 19 Global Step: 99120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:36:59,893-Speed 5519.59 samples/sec Loss 1.5854 LearningRate 0.0000 Epoch: 19 Global Step: 99130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:01,730-Speed 5576.40 samples/sec Loss 1.4849 LearningRate 0.0000 Epoch: 19 Global Step: 99140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:03,569-Speed 5570.04 samples/sec Loss 1.4680 LearningRate 0.0000 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:05,413-Speed 5555.16 samples/sec Loss 1.4905 LearningRate 0.0000 Epoch: 19 Global Step: 99160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:07,248-Speed 5584.35 samples/sec Loss 1.5755 LearningRate 0.0000 Epoch: 19 Global Step: 99170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:09,083-Speed 5580.13 samples/sec Loss 1.6214 LearningRate 0.0000 Epoch: 19 Global Step: 99180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:10,937-Speed 5526.49 samples/sec Loss 1.5123 LearningRate 0.0000 Epoch: 19 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:12,793-Speed 5518.72 samples/sec Loss 1.4790 LearningRate 0.0000 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:37:14,670-Speed 5457.80 samples/sec Loss 1.5239 LearningRate 0.0000 Epoch: 19 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:37:16,546-Speed 5458.18 samples/sec Loss 1.5783 LearningRate 0.0000 Epoch: 19 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:37:18,370-Speed 5623.88 samples/sec Loss 1.5448 LearningRate 0.0000 Epoch: 19 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:20,196-Speed 5609.98 samples/sec Loss 1.4737 LearningRate 0.0000 Epoch: 19 Global Step: 99240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:22,069-Speed 5468.37 samples/sec Loss 1.5059 LearningRate 0.0000 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:23,928-Speed 5510.62 samples/sec Loss 1.5371 LearningRate 0.0000 Epoch: 19 Global Step: 99260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:25,777-Speed 5541.96 samples/sec Loss 1.6095 LearningRate 0.0000 Epoch: 19 Global Step: 99270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:27,633-Speed 5518.12 samples/sec Loss 1.4999 LearningRate 0.0000 Epoch: 19 Global Step: 99280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:29,500-Speed 5487.39 samples/sec Loss 1.5028 LearningRate 0.0000 Epoch: 19 Global Step: 99290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:31,356-Speed 5520.42 samples/sec Loss 1.5398 LearningRate 0.0000 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:33,194-Speed 5571.17 samples/sec Loss 1.4506 LearningRate 0.0000 Epoch: 19 Global Step: 99310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:35,052-Speed 5514.65 samples/sec Loss 1.6072 LearningRate 0.0000 Epoch: 19 Global Step: 99320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:36,901-Speed 5539.79 samples/sec Loss 1.5243 LearningRate 0.0000 Epoch: 19 Global Step: 99330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:37:38,749-Speed 5544.35 samples/sec Loss 1.5526 LearningRate 0.0000 Epoch: 19 Global Step: 99340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:40,588-Speed 5570.15 samples/sec Loss 1.5271 LearningRate 0.0000 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:42,420-Speed 5591.83 samples/sec Loss 1.4992 LearningRate 0.0000 Epoch: 19 Global Step: 99360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:44,254-Speed 5586.23 samples/sec Loss 1.6003 LearningRate 0.0000 Epoch: 19 Global Step: 99370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:46,099-Speed 5551.47 samples/sec Loss 1.5242 LearningRate 0.0000 Epoch: 19 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:47,946-Speed 5546.84 samples/sec Loss 1.5389 LearningRate 0.0000 Epoch: 19 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:49,782-Speed 5576.98 samples/sec Loss 1.5509 LearningRate 0.0000 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:51,625-Speed 5559.51 samples/sec Loss 1.5213 LearningRate 0.0000 Epoch: 19 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:53,463-Speed 5573.22 samples/sec Loss 1.5248 LearningRate 0.0000 Epoch: 19 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:55,307-Speed 5556.17 samples/sec Loss 1.4947 LearningRate 0.0000 Epoch: 19 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:37:57,145-Speed 5572.32 samples/sec Loss 1.5268 LearningRate 0.0000 Epoch: 19 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:37:58,987-Speed 5562.34 samples/sec Loss 1.4729 LearningRate 0.0000 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:00,823-Speed 5579.90 samples/sec Loss 1.5950 LearningRate 0.0000 Epoch: 19 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:02,667-Speed 5554.26 samples/sec Loss 1.5958 LearningRate 0.0000 Epoch: 19 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:04,501-Speed 5586.69 samples/sec Loss 1.5574 LearningRate 0.0000 Epoch: 19 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:06,338-Speed 5575.65 samples/sec Loss 1.5536 LearningRate 0.0000 Epoch: 19 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:08,171-Speed 5588.79 samples/sec Loss 1.4935 LearningRate 0.0000 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:10,003-Speed 5591.11 samples/sec Loss 1.5435 LearningRate 0.0000 Epoch: 19 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:11,863-Speed 5507.54 samples/sec Loss 1.6333 LearningRate 0.0000 Epoch: 19 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:13,699-Speed 5578.45 samples/sec Loss 1.6476 LearningRate 0.0000 Epoch: 19 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:15,543-Speed 5556.63 samples/sec Loss 1.5138 LearningRate 0.0000 Epoch: 19 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:17,390-Speed 5547.51 samples/sec Loss 1.5398 LearningRate 0.0000 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:19,240-Speed 5534.80 samples/sec Loss 1.5649 LearningRate 0.0000 Epoch: 19 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:21,075-Speed 5583.58 samples/sec Loss 1.5797 LearningRate 0.0000 Epoch: 19 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:22,915-Speed 5568.57 samples/sec Loss 1.5230 LearningRate 0.0000 Epoch: 19 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:24,762-Speed 5544.60 samples/sec Loss 1.5404 LearningRate 0.0000 Epoch: 19 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:26,606-Speed 5553.99 samples/sec Loss 1.5551 LearningRate 0.0000 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:28,433-Speed 5609.93 samples/sec Loss 1.5295 LearningRate 0.0000 Epoch: 19 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:30,265-Speed 5589.53 samples/sec Loss 1.6056 LearningRate 0.0000 Epoch: 19 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:32,107-Speed 5560.63 samples/sec Loss 1.5698 LearningRate 0.0000 Epoch: 19 Global Step: 99630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:33,943-Speed 5581.30 samples/sec Loss 1.5406 LearningRate 0.0000 Epoch: 19 Global Step: 99640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:35,784-Speed 5564.58 samples/sec Loss 1.4891 LearningRate 0.0000 Epoch: 19 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:37,622-Speed 5575.20 samples/sec Loss 1.5352 LearningRate 0.0000 Epoch: 19 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:39,462-Speed 5565.52 samples/sec Loss 1.6449 LearningRate 0.0000 Epoch: 19 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:41,305-Speed 5558.30 samples/sec Loss 1.5306 LearningRate 0.0000 Epoch: 19 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:43,161-Speed 5520.85 samples/sec Loss 1.5322 LearningRate 0.0000 Epoch: 19 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:44,997-Speed 5579.52 samples/sec Loss 1.5114 LearningRate 0.0000 Epoch: 19 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:46,832-Speed 5581.96 samples/sec Loss 1.5835 LearningRate 0.0000 Epoch: 19 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:38:48,659-Speed 5605.96 samples/sec Loss 1.5428 LearningRate 0.0000 Epoch: 19 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:50,499-Speed 5568.38 samples/sec Loss 1.6402 LearningRate 0.0000 Epoch: 19 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:52,336-Speed 5577.12 samples/sec Loss 1.5448 LearningRate 0.0000 Epoch: 19 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:54,215-Speed 5451.04 samples/sec Loss 1.4859 LearningRate 0.0000 Epoch: 19 Global Step: 99750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:56,050-Speed 5581.88 samples/sec Loss 1.4122 LearningRate 0.0000 Epoch: 19 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:57,885-Speed 5582.99 samples/sec Loss 1.5136 LearningRate 0.0000 Epoch: 19 Global Step: 99770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:38:59,721-Speed 5580.42 samples/sec Loss 1.5696 LearningRate 0.0000 Epoch: 19 Global Step: 99780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:01,601-Speed 5447.22 samples/sec Loss 1.6049 LearningRate 0.0000 Epoch: 19 Global Step: 99790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:03,458-Speed 5516.80 samples/sec Loss 1.5307 LearningRate 0.0000 Epoch: 19 Global Step: 99800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:05,298-Speed 5567.31 samples/sec Loss 1.5574 LearningRate 0.0000 Epoch: 19 Global Step: 99810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:07,137-Speed 5568.70 samples/sec Loss 1.5579 LearningRate 0.0000 Epoch: 19 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:39:08,971-Speed 5586.92 samples/sec Loss 1.5193 LearningRate 0.0000 Epoch: 19 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:39:10,798-Speed 5609.54 samples/sec Loss 1.5862 LearningRate 0.0000 Epoch: 19 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:12,646-Speed 5543.63 samples/sec Loss 1.4372 LearningRate 0.0000 Epoch: 19 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:14,500-Speed 5523.23 samples/sec Loss 1.6092 LearningRate 0.0000 Epoch: 19 Global Step: 99860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:16,344-Speed 5555.73 samples/sec Loss 1.5828 LearningRate 0.0000 Epoch: 19 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:18,199-Speed 5522.31 samples/sec Loss 1.4493 LearningRate 0.0000 Epoch: 19 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:20,038-Speed 5571.18 samples/sec Loss 1.5304 LearningRate 0.0000 Epoch: 19 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:21,883-Speed 5552.17 samples/sec Loss 1.5508 LearningRate 0.0000 Epoch: 19 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:23,717-Speed 5584.39 samples/sec Loss 1.5884 LearningRate 0.0000 Epoch: 19 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:25,550-Speed 5589.12 samples/sec Loss 1.6476 LearningRate 0.0000 Epoch: 19 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:27,402-Speed 5531.65 samples/sec Loss 1.5528 LearningRate 0.0000 Epoch: 19 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:29,263-Speed 5505.87 samples/sec Loss 1.5267 LearningRate 0.0000 Epoch: 19 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:39:31,092-Speed 5598.94 samples/sec Loss 1.5511 LearningRate 0.0000 Epoch: 19 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:32,932-Speed 5567.67 samples/sec Loss 1.4862 LearningRate 0.0000 Epoch: 19 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:34,778-Speed 5550.21 samples/sec Loss 1.6034 LearningRate 0.0000 Epoch: 19 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:36,626-Speed 5543.80 samples/sec Loss 1.5801 LearningRate 0.0000 Epoch: 19 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:38,461-Speed 5581.24 samples/sec Loss 1.4670 LearningRate 0.0000 Epoch: 19 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:39:40,305-Speed 5556.82 samples/sec Loss 1.6267 LearningRate 0.0000 Epoch: 19 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:40:06,735-[lfw][100000]XNorm: 22.374677 Training: 2022-04-11 16:40:06,736-[lfw][100000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 16:40:06,736-[lfw][100000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:40:37,201-[cfp_fp][100000]XNorm: 21.519639 Training: 2022-04-11 16:40:37,202-[cfp_fp][100000]Accuracy-Flip: 0.98543+-0.00545 Training: 2022-04-11 16:40:37,202-[cfp_fp][100000]Accuracy-Highest: 0.98543 Training: 2022-04-11 16:41:03,659-[agedb_30][100000]XNorm: 22.519361 Training: 2022-04-11 16:41:03,659-[agedb_30][100000]Accuracy-Flip: 0.98167+-0.00767 Training: 2022-04-11 16:41:03,660-[agedb_30][100000]Accuracy-Highest: 0.98450 Training: 2022-04-11 16:41:05,512-Speed 120.18 samples/sec Loss 1.5613 LearningRate 0.0000 Epoch: 19 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:07,363-Speed 5534.74 samples/sec Loss 1.5364 LearningRate 0.0000 Epoch: 19 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:09,204-Speed 5563.70 samples/sec Loss 1.5536 LearningRate 0.0000 Epoch: 19 Global Step: 100030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:11,085-Speed 5445.87 samples/sec Loss 1.6470 LearningRate 0.0000 Epoch: 19 Global Step: 100040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:12,932-Speed 5545.32 samples/sec Loss 1.5129 LearningRate 0.0000 Epoch: 19 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:41:14,767-Speed 5583.49 samples/sec Loss 1.4756 LearningRate 0.0000 Epoch: 19 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:41:16,605-Speed 5570.85 samples/sec Loss 1.5512 LearningRate 0.0000 Epoch: 19 Global Step: 100070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:18,446-Speed 5563.76 samples/sec Loss 1.5128 LearningRate 0.0000 Epoch: 19 Global Step: 100080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:20,272-Speed 5610.41 samples/sec Loss 1.5129 LearningRate 0.0000 Epoch: 19 Global Step: 100090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:22,104-Speed 5591.65 samples/sec Loss 1.5173 LearningRate 0.0000 Epoch: 19 Global Step: 100100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:23,943-Speed 5572.18 samples/sec Loss 1.5575 LearningRate 0.0000 Epoch: 19 Global Step: 100110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:25,780-Speed 5576.86 samples/sec Loss 1.6114 LearningRate 0.0000 Epoch: 19 Global Step: 100120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:27,611-Speed 5593.08 samples/sec Loss 1.6246 LearningRate 0.0000 Epoch: 19 Global Step: 100130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:29,439-Speed 5604.59 samples/sec Loss 1.4899 LearningRate 0.0000 Epoch: 19 Global Step: 100140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:31,279-Speed 5567.35 samples/sec Loss 1.4984 LearningRate 0.0000 Epoch: 19 Global Step: 100150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:33,114-Speed 5583.59 samples/sec Loss 1.5484 LearningRate 0.0000 Epoch: 19 Global Step: 100160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:34,940-Speed 5609.03 samples/sec Loss 1.5152 LearningRate 0.0000 Epoch: 19 Global Step: 100170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:36,771-Speed 5596.28 samples/sec Loss 1.5305 LearningRate 0.0000 Epoch: 19 Global Step: 100180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:38,602-Speed 5593.80 samples/sec Loss 1.6447 LearningRate 0.0000 Epoch: 19 Global Step: 100190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:40,433-Speed 5593.40 samples/sec Loss 1.5478 LearningRate 0.0000 Epoch: 19 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:42,269-Speed 5581.70 samples/sec Loss 1.5400 LearningRate 0.0000 Epoch: 19 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:44,100-Speed 5595.70 samples/sec Loss 1.5053 LearningRate 0.0000 Epoch: 19 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:45,932-Speed 5589.48 samples/sec Loss 1.5701 LearningRate 0.0000 Epoch: 19 Global Step: 100230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:47,767-Speed 5581.32 samples/sec Loss 1.6357 LearningRate 0.0000 Epoch: 19 Global Step: 100240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:49,646-Speed 5454.79 samples/sec Loss 1.5208 LearningRate 0.0000 Epoch: 19 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:51,490-Speed 5554.27 samples/sec Loss 1.5365 LearningRate 0.0000 Epoch: 19 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:53,318-Speed 5603.69 samples/sec Loss 1.5162 LearningRate 0.0000 Epoch: 19 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:55,153-Speed 5585.08 samples/sec Loss 1.4981 LearningRate 0.0000 Epoch: 19 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:56,987-Speed 5585.02 samples/sec Loss 1.5505 LearningRate 0.0000 Epoch: 19 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:41:58,822-Speed 5583.53 samples/sec Loss 1.5597 LearningRate 0.0000 Epoch: 19 Global Step: 100300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:00,650-Speed 5601.68 samples/sec Loss 1.4801 LearningRate 0.0000 Epoch: 19 Global Step: 100310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:02,483-Speed 5589.31 samples/sec Loss 1.5708 LearningRate 0.0000 Epoch: 19 Global Step: 100320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:04,328-Speed 5553.57 samples/sec Loss 1.6345 LearningRate 0.0000 Epoch: 19 Global Step: 100330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:06,160-Speed 5589.16 samples/sec Loss 1.4848 LearningRate 0.0000 Epoch: 19 Global Step: 100340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:07,993-Speed 5590.67 samples/sec Loss 1.5180 LearningRate 0.0000 Epoch: 19 Global Step: 100350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:09,833-Speed 5566.34 samples/sec Loss 1.5533 LearningRate 0.0000 Epoch: 19 Global Step: 100360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:11,681-Speed 5542.58 samples/sec Loss 1.5494 LearningRate 0.0000 Epoch: 19 Global Step: 100370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:13,527-Speed 5549.71 samples/sec Loss 1.6425 LearningRate 0.0000 Epoch: 19 Global Step: 100380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:15,374-Speed 5547.63 samples/sec Loss 1.5772 LearningRate 0.0000 Epoch: 19 Global Step: 100390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:17,207-Speed 5586.91 samples/sec Loss 1.5299 LearningRate 0.0000 Epoch: 19 Global Step: 100400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:42:19,040-Speed 5589.34 samples/sec Loss 1.4655 LearningRate 0.0000 Epoch: 19 Global Step: 100410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:20,870-Speed 5599.38 samples/sec Loss 1.4868 LearningRate 0.0000 Epoch: 19 Global Step: 100420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:22,705-Speed 5581.82 samples/sec Loss 1.5557 LearningRate 0.0000 Epoch: 19 Global Step: 100430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:24,539-Speed 5585.06 samples/sec Loss 1.5357 LearningRate 0.0000 Epoch: 19 Global Step: 100440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:26,389-Speed 5536.67 samples/sec Loss 1.5620 LearningRate 0.0000 Epoch: 19 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:28,225-Speed 5580.68 samples/sec Loss 1.5270 LearningRate 0.0000 Epoch: 19 Global Step: 100460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:30,059-Speed 5584.71 samples/sec Loss 1.6223 LearningRate 0.0000 Epoch: 19 Global Step: 100470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:31,890-Speed 5593.53 samples/sec Loss 1.4744 LearningRate 0.0000 Epoch: 19 Global Step: 100480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:33,721-Speed 5597.60 samples/sec Loss 1.5056 LearningRate 0.0000 Epoch: 19 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:35,574-Speed 5526.92 samples/sec Loss 1.4846 LearningRate 0.0000 Epoch: 19 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:37,405-Speed 5593.80 samples/sec Loss 1.5588 LearningRate 0.0000 Epoch: 19 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:42:39,242-Speed 5578.72 samples/sec Loss 1.5121 LearningRate 0.0000 Epoch: 19 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:42:41,088-Speed 5549.08 samples/sec Loss 1.4202 LearningRate 0.0000 Epoch: 19 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:42,941-Speed 5527.77 samples/sec Loss 1.5759 LearningRate 0.0000 Epoch: 19 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:44,773-Speed 5593.22 samples/sec Loss 1.5173 LearningRate 0.0000 Epoch: 19 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:46,609-Speed 5576.99 samples/sec Loss 1.5604 LearningRate 0.0000 Epoch: 19 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:48,453-Speed 5555.03 samples/sec Loss 1.4869 LearningRate 0.0000 Epoch: 19 Global Step: 100570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:50,302-Speed 5542.52 samples/sec Loss 1.5304 LearningRate 0.0000 Epoch: 19 Global Step: 100580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:52,140-Speed 5572.98 samples/sec Loss 1.5888 LearningRate 0.0000 Epoch: 19 Global Step: 100590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:54,002-Speed 5503.27 samples/sec Loss 1.5424 LearningRate 0.0000 Epoch: 19 Global Step: 100600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:55,848-Speed 5548.41 samples/sec Loss 1.5421 LearningRate 0.0000 Epoch: 19 Global Step: 100610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:57,685-Speed 5574.42 samples/sec Loss 1.6090 LearningRate 0.0000 Epoch: 19 Global Step: 100620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:42:59,510-Speed 5613.50 samples/sec Loss 1.5747 LearningRate 0.0000 Epoch: 19 Global Step: 100630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:01,354-Speed 5556.15 samples/sec Loss 1.4801 LearningRate 0.0000 Epoch: 19 Global Step: 100640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:03,195-Speed 5562.69 samples/sec Loss 1.5378 LearningRate 0.0000 Epoch: 19 Global Step: 100650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:05,032-Speed 5577.43 samples/sec Loss 1.5450 LearningRate 0.0000 Epoch: 19 Global Step: 100660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:06,868-Speed 5579.58 samples/sec Loss 1.5391 LearningRate 0.0000 Epoch: 19 Global Step: 100670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:08,698-Speed 5598.59 samples/sec Loss 1.5710 LearningRate 0.0000 Epoch: 19 Global Step: 100680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:10,551-Speed 5528.73 samples/sec Loss 1.4891 LearningRate 0.0000 Epoch: 19 Global Step: 100690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:12,382-Speed 5596.29 samples/sec Loss 1.6110 LearningRate 0.0000 Epoch: 19 Global Step: 100700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:14,231-Speed 5539.39 samples/sec Loss 1.4947 LearningRate 0.0000 Epoch: 19 Global Step: 100710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:16,065-Speed 5584.73 samples/sec Loss 1.6215 LearningRate 0.0000 Epoch: 19 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:17,917-Speed 5530.12 samples/sec Loss 1.5123 LearningRate 0.0000 Epoch: 19 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:43:19,738-Speed 5627.34 samples/sec Loss 1.4700 LearningRate 0.0000 Epoch: 19 Global Step: 100740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:21,569-Speed 5593.05 samples/sec Loss 1.4786 LearningRate 0.0000 Epoch: 19 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:23,411-Speed 5560.56 samples/sec Loss 1.5158 LearningRate 0.0000 Epoch: 19 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:25,256-Speed 5553.86 samples/sec Loss 1.5067 LearningRate 0.0000 Epoch: 19 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:27,108-Speed 5532.32 samples/sec Loss 1.6094 LearningRate 0.0000 Epoch: 19 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:28,953-Speed 5550.47 samples/sec Loss 1.5322 LearningRate 0.0000 Epoch: 19 Global Step: 100790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:30,794-Speed 5567.16 samples/sec Loss 1.5195 LearningRate 0.0000 Epoch: 19 Global Step: 100800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:32,624-Speed 5597.71 samples/sec Loss 1.5206 LearningRate 0.0000 Epoch: 19 Global Step: 100810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:34,458-Speed 5586.01 samples/sec Loss 1.4713 LearningRate 0.0000 Epoch: 19 Global Step: 100820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:36,289-Speed 5592.56 samples/sec Loss 1.6088 LearningRate 0.0000 Epoch: 19 Global Step: 100830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:38,124-Speed 5581.88 samples/sec Loss 1.4945 LearningRate 0.0000 Epoch: 19 Global Step: 100840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:39,961-Speed 5579.29 samples/sec Loss 1.5250 LearningRate 0.0000 Epoch: 19 Global Step: 100850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:41,794-Speed 5586.83 samples/sec Loss 1.5099 LearningRate 0.0000 Epoch: 19 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:43,625-Speed 5594.96 samples/sec Loss 1.4279 LearningRate 0.0000 Epoch: 19 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:45,461-Speed 5578.02 samples/sec Loss 1.5159 LearningRate 0.0000 Epoch: 19 Global Step: 100880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:47,306-Speed 5552.04 samples/sec Loss 1.5473 LearningRate 0.0000 Epoch: 19 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:49,189-Speed 5442.03 samples/sec Loss 1.5156 LearningRate 0.0000 Epoch: 19 Global Step: 100900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:51,080-Speed 5417.35 samples/sec Loss 1.5026 LearningRate 0.0000 Epoch: 19 Global Step: 100910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:52,918-Speed 5574.29 samples/sec Loss 1.5635 LearningRate 0.0000 Epoch: 19 Global Step: 100920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:54,756-Speed 5572.19 samples/sec Loss 1.5136 LearningRate 0.0000 Epoch: 19 Global Step: 100930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:56,580-Speed 5618.16 samples/sec Loss 1.4635 LearningRate 0.0000 Epoch: 19 Global Step: 100940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:43:58,413-Speed 5586.29 samples/sec Loss 1.5180 LearningRate 0.0000 Epoch: 19 Global Step: 100950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:00,247-Speed 5585.81 samples/sec Loss 1.5239 LearningRate 0.0000 Epoch: 19 Global Step: 100960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:02,087-Speed 5568.27 samples/sec Loss 1.5212 LearningRate 0.0000 Epoch: 19 Global Step: 100970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:03,926-Speed 5570.40 samples/sec Loss 1.5330 LearningRate 0.0000 Epoch: 19 Global Step: 100980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:05,762-Speed 5579.61 samples/sec Loss 1.5464 LearningRate 0.0000 Epoch: 19 Global Step: 100990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:07,596-Speed 5583.72 samples/sec Loss 1.5363 LearningRate 0.0000 Epoch: 19 Global Step: 101000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:09,438-Speed 5561.98 samples/sec Loss 1.5348 LearningRate 0.0000 Epoch: 19 Global Step: 101010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:11,271-Speed 5590.68 samples/sec Loss 1.5249 LearningRate 0.0000 Epoch: 19 Global Step: 101020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:13,120-Speed 5539.22 samples/sec Loss 1.5277 LearningRate 0.0000 Epoch: 19 Global Step: 101030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:14,960-Speed 5567.56 samples/sec Loss 1.5634 LearningRate 0.0000 Epoch: 19 Global Step: 101040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:44:16,800-Speed 5568.38 samples/sec Loss 1.4445 LearningRate 0.0000 Epoch: 19 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:44:18,637-Speed 5574.81 samples/sec Loss 1.5661 LearningRate 0.0000 Epoch: 19 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 16:44:20,475-Speed 5574.33 samples/sec Loss 1.5339 LearningRate 0.0000 Epoch: 19 Global Step: 101070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:22,315-Speed 5567.23 samples/sec Loss 1.6465 LearningRate 0.0000 Epoch: 19 Global Step: 101080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:24,156-Speed 5566.19 samples/sec Loss 1.5718 LearningRate 0.0000 Epoch: 19 Global Step: 101090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 16:44:25,995-Speed 5569.65 samples/sec Loss 1.5917 LearningRate 0.0000 Epoch: 19 Global Step: 101100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:27,839-Speed 5553.29 samples/sec Loss 1.6061 LearningRate 0.0000 Epoch: 19 Global Step: 101110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:29,674-Speed 5582.25 samples/sec Loss 1.4670 LearningRate 0.0000 Epoch: 19 Global Step: 101120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:31,516-Speed 5563.60 samples/sec Loss 1.5379 LearningRate 0.0000 Epoch: 19 Global Step: 101130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:33,348-Speed 5592.15 samples/sec Loss 1.5829 LearningRate 0.0000 Epoch: 19 Global Step: 101140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:35,261-Speed 5355.66 samples/sec Loss 1.5560 LearningRate 0.0000 Epoch: 19 Global Step: 101150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 16:44:37,066-Speed 5673.23 samples/sec Loss 1.5722 LearningRate 0.0000 Epoch: 19 Global Step: 101160 Fp16 Grad Scale: 16384 Required: -0 hours