Training: 2022-04-27 01:27:05,279-rank_id: 0 Training: 2022-04-27 01:27:19,616-: margin_list [1.0, 0.5, 0.0] Training: 2022-04-27 01:27:19,617-: network r100 Training: 2022-04-27 01:27:19,617-: resume False Training: 2022-04-27 01:27:19,617-: output work_dirs/ms1mv2_r100 Training: 2022-04-27 01:27:19,617-: embedding_size 512 Training: 2022-04-27 01:27:19,617-: sample_rate 1.0 Training: 2022-04-27 01:27:19,617-: interclass_filtering_threshold0 Training: 2022-04-27 01:27:19,617-: fp16 True Training: 2022-04-27 01:27:19,617-: batch_size 128 Training: 2022-04-27 01:27:19,617-: optimizer sgd Training: 2022-04-27 01:27:19,617-: lr 0.1 Training: 2022-04-27 01:27:19,617-: momentum 0.9 Training: 2022-04-27 01:27:19,617-: weight_decay 0.0005 Training: 2022-04-27 01:27:19,617-: verbose 2000 Training: 2022-04-27 01:27:19,617-: frequent 10 Training: 2022-04-27 01:27:19,617-: dali False Training: 2022-04-27 01:27:19,617-: rec /train_tmp/faces_emore Training: 2022-04-27 01:27:19,617-: num_classes 85742 Training: 2022-04-27 01:27:19,617-: num_image 5822653 Training: 2022-04-27 01:27:19,617-: num_epoch 20 Training: 2022-04-27 01:27:19,617-: warmup_epoch 0 Training: 2022-04-27 01:27:19,617-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-27 01:27:19,617-: total_batch_size 1024 Training: 2022-04-27 01:27:19,617-: warmup_step 0 Training: 2022-04-27 01:27:19,618-: total_step 113720 Training: 2022-04-27 01:28:26,950-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-27 01:28:32,357-Speed 3431.74 samples/sec Loss 46.7125 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-27 01:28:35,354-Speed 3417.36 samples/sec Loss 47.6395 LearningRate 0.0999 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 01:28:38,318-Speed 3455.95 samples/sec Loss 48.0146 LearningRate 0.0999 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 01:28:41,264-Speed 3477.52 samples/sec Loss 47.0877 LearningRate 0.0999 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 01:28:44,197-Speed 3492.18 samples/sec Loss 47.1361 LearningRate 0.0999 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 01:28:47,124-Speed 3499.40 samples/sec Loss 46.9640 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 01:28:50,053-Speed 3496.64 samples/sec Loss 46.7923 LearningRate 0.0999 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 01:28:52,997-Speed 3479.21 samples/sec Loss 46.2833 LearningRate 0.0998 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 01:28:55,948-Speed 3471.46 samples/sec Loss 46.2290 LearningRate 0.0998 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 01:28:58,922-Speed 3443.20 samples/sec Loss 46.1605 LearningRate 0.0998 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 01:29:01,855-Speed 3492.63 samples/sec Loss 46.0213 LearningRate 0.0998 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 01:29:04,785-Speed 3495.88 samples/sec Loss 45.8512 LearningRate 0.0998 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 01:29:07,720-Speed 3489.64 samples/sec Loss 45.6772 LearningRate 0.0998 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 01:29:10,649-Speed 3497.20 samples/sec Loss 45.6512 LearningRate 0.0997 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:13,580-Speed 3495.18 samples/sec Loss 45.4421 LearningRate 0.0997 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:16,572-Speed 3423.07 samples/sec Loss 45.2200 LearningRate 0.0997 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:19,509-Speed 3486.41 samples/sec Loss 45.1176 LearningRate 0.0997 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:22,444-Speed 3491.13 samples/sec Loss 44.8073 LearningRate 0.0997 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:25,376-Speed 3492.51 samples/sec Loss 44.6877 LearningRate 0.0996 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 01:29:28,312-Speed 3489.39 samples/sec Loss 44.5415 LearningRate 0.0996 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:31,249-Speed 3487.09 samples/sec Loss 44.2927 LearningRate 0.0996 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:34,198-Speed 3473.13 samples/sec Loss 44.1602 LearningRate 0.0996 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:37,136-Speed 3486.71 samples/sec Loss 43.8958 LearningRate 0.0996 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:40,074-Speed 3485.60 samples/sec Loss 43.6988 LearningRate 0.0996 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:43,017-Speed 3481.43 samples/sec Loss 43.5673 LearningRate 0.0995 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:29:45,953-Speed 3488.13 samples/sec Loss 43.3693 LearningRate 0.0995 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:29:48,891-Speed 3486.56 samples/sec Loss 43.1162 LearningRate 0.0995 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:29:51,833-Speed 3481.73 samples/sec Loss 43.0610 LearningRate 0.0995 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:29:54,773-Speed 3483.34 samples/sec Loss 42.8345 LearningRate 0.0995 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:29:57,715-Speed 3481.58 samples/sec Loss 42.6498 LearningRate 0.0995 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:00,652-Speed 3487.12 samples/sec Loss 42.4366 LearningRate 0.0994 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:03,591-Speed 3486.01 samples/sec Loss 42.2423 LearningRate 0.0994 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:06,531-Speed 3484.20 samples/sec Loss 42.0894 LearningRate 0.0994 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:09,471-Speed 3483.31 samples/sec Loss 41.8885 LearningRate 0.0994 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:12,411-Speed 3483.68 samples/sec Loss 41.6302 LearningRate 0.0994 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:15,349-Speed 3486.52 samples/sec Loss 41.4453 LearningRate 0.0994 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:18,287-Speed 3485.85 samples/sec Loss 41.3575 LearningRate 0.0993 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:21,230-Speed 3480.56 samples/sec Loss 41.1698 LearningRate 0.0993 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:24,168-Speed 3485.93 samples/sec Loss 40.8791 LearningRate 0.0993 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:30:27,116-Speed 3474.65 samples/sec Loss 40.7846 LearningRate 0.0993 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:30,054-Speed 3485.78 samples/sec Loss 40.7333 LearningRate 0.0993 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:32,994-Speed 3483.35 samples/sec Loss 40.3832 LearningRate 0.0992 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:35,950-Speed 3465.09 samples/sec Loss 40.3120 LearningRate 0.0992 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:38,891-Speed 3483.13 samples/sec Loss 40.0522 LearningRate 0.0992 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:41,842-Speed 3471.29 samples/sec Loss 39.9291 LearningRate 0.0992 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:44,789-Speed 3475.58 samples/sec Loss 39.7384 LearningRate 0.0992 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:47,731-Speed 3480.52 samples/sec Loss 39.5834 LearningRate 0.0992 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:50,678-Speed 3475.95 samples/sec Loss 39.4160 LearningRate 0.0991 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:53,623-Speed 3477.88 samples/sec Loss 39.1798 LearningRate 0.0991 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:56,553-Speed 3495.39 samples/sec Loss 39.1398 LearningRate 0.0991 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:30:59,499-Speed 3476.75 samples/sec Loss 38.9353 LearningRate 0.0991 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:02,450-Speed 3471.72 samples/sec Loss 38.6784 LearningRate 0.0991 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:05,395-Speed 3477.04 samples/sec Loss 38.4823 LearningRate 0.0991 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:08,342-Speed 3476.32 samples/sec Loss 38.3644 LearningRate 0.0990 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:11,285-Speed 3479.24 samples/sec Loss 38.1609 LearningRate 0.0990 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:14,227-Speed 3481.57 samples/sec Loss 37.9103 LearningRate 0.0990 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:17,174-Speed 3475.47 samples/sec Loss 37.8805 LearningRate 0.0990 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:20,117-Speed 3480.55 samples/sec Loss 37.6852 LearningRate 0.0990 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:23,061-Speed 3478.56 samples/sec Loss 37.5015 LearningRate 0.0989 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:25,997-Speed 3489.60 samples/sec Loss 37.2822 LearningRate 0.0989 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:28,950-Speed 3468.74 samples/sec Loss 37.0927 LearningRate 0.0989 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:31,894-Speed 3478.98 samples/sec Loss 36.7745 LearningRate 0.0989 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:34,838-Speed 3478.66 samples/sec Loss 36.7736 LearningRate 0.0989 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:37,781-Speed 3480.29 samples/sec Loss 36.6476 LearningRate 0.0989 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:40,726-Speed 3478.06 samples/sec Loss 36.2974 LearningRate 0.0988 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:43,669-Speed 3480.82 samples/sec Loss 36.1542 LearningRate 0.0988 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:46,619-Speed 3471.72 samples/sec Loss 35.9093 LearningRate 0.0988 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:49,569-Speed 3471.68 samples/sec Loss 35.7093 LearningRate 0.0988 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:52,518-Speed 3473.61 samples/sec Loss 35.5678 LearningRate 0.0988 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:55,450-Speed 3492.80 samples/sec Loss 35.3586 LearningRate 0.0988 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:31:58,404-Speed 3468.06 samples/sec Loss 35.2870 LearningRate 0.0987 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:01,355-Speed 3471.75 samples/sec Loss 35.1534 LearningRate 0.0987 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:04,302-Speed 3474.64 samples/sec Loss 34.8681 LearningRate 0.0987 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:07,254-Speed 3470.64 samples/sec Loss 34.7991 LearningRate 0.0987 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:10,203-Speed 3471.96 samples/sec Loss 34.4469 LearningRate 0.0987 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:13,158-Speed 3466.40 samples/sec Loss 34.2558 LearningRate 0.0987 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:16,107-Speed 3473.95 samples/sec Loss 34.1711 LearningRate 0.0986 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:19,054-Speed 3475.42 samples/sec Loss 34.0545 LearningRate 0.0986 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:22,017-Speed 3457.09 samples/sec Loss 33.7512 LearningRate 0.0986 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:24,966-Speed 3473.29 samples/sec Loss 33.5246 LearningRate 0.0986 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 524288 Required: 10 hours Training: 2022-04-27 01:32:27,905-Speed 3485.28 samples/sec Loss 33.4970 LearningRate 0.0986 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:30,856-Speed 3470.61 samples/sec Loss 33.3390 LearningRate 0.0985 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:33,813-Speed 3463.89 samples/sec Loss 33.0562 LearningRate 0.0985 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:36,765-Speed 3469.13 samples/sec Loss 32.8666 LearningRate 0.0985 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:39,751-Speed 3430.58 samples/sec Loss 32.6138 LearningRate 0.0985 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:42,721-Speed 3448.07 samples/sec Loss 32.5411 LearningRate 0.0985 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:45,676-Speed 3465.70 samples/sec Loss 32.2798 LearningRate 0.0985 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:48,628-Speed 3470.55 samples/sec Loss 32.2559 LearningRate 0.0984 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:51,584-Speed 3465.29 samples/sec Loss 31.9369 LearningRate 0.0984 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:54,541-Speed 3463.37 samples/sec Loss 31.7570 LearningRate 0.0984 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:32:57,472-Speed 3494.92 samples/sec Loss 31.6951 LearningRate 0.0984 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:00,424-Speed 3469.43 samples/sec Loss 31.6234 LearningRate 0.0984 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:03,375-Speed 3471.31 samples/sec Loss 31.3588 LearningRate 0.0984 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:06,331-Speed 3464.47 samples/sec Loss 31.1023 LearningRate 0.0983 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:09,285-Speed 3466.55 samples/sec Loss 30.9703 LearningRate 0.0983 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:12,243-Speed 3463.17 samples/sec Loss 30.7795 LearningRate 0.0983 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:15,195-Speed 3469.27 samples/sec Loss 30.5400 LearningRate 0.0983 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:18,147-Speed 3470.27 samples/sec Loss 30.5388 LearningRate 0.0983 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:21,097-Speed 3472.36 samples/sec Loss 30.1616 LearningRate 0.0982 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:24,051-Speed 3466.81 samples/sec Loss 30.1865 LearningRate 0.0982 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:27,003-Speed 3469.88 samples/sec Loss 29.8027 LearningRate 0.0982 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:29,954-Speed 3471.35 samples/sec Loss 29.7471 LearningRate 0.0982 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:32,910-Speed 3464.59 samples/sec Loss 29.5194 LearningRate 0.0982 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:35,860-Speed 3472.49 samples/sec Loss 29.2013 LearningRate 0.0982 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:38,818-Speed 3462.08 samples/sec Loss 29.0126 LearningRate 0.0981 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:41,772-Speed 3467.29 samples/sec Loss 28.9054 LearningRate 0.0981 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:33:44,716-Speed 3479.87 samples/sec Loss 28.7258 LearningRate 0.0981 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:47,667-Speed 3471.26 samples/sec Loss 28.5205 LearningRate 0.0981 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:50,619-Speed 3469.59 samples/sec Loss 28.5390 LearningRate 0.0981 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:53,571-Speed 3468.85 samples/sec Loss 28.2381 LearningRate 0.0981 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:56,522-Speed 3471.32 samples/sec Loss 28.1222 LearningRate 0.0980 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:33:59,475-Speed 3468.02 samples/sec Loss 28.0501 LearningRate 0.0980 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:02,456-Speed 3436.36 samples/sec Loss 27.8529 LearningRate 0.0980 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:05,414-Speed 3462.33 samples/sec Loss 27.4859 LearningRate 0.0980 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:08,370-Speed 3465.22 samples/sec Loss 27.4816 LearningRate 0.0980 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:11,322-Speed 3469.30 samples/sec Loss 27.4047 LearningRate 0.0980 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:14,277-Speed 3466.47 samples/sec Loss 27.1757 LearningRate 0.0979 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:34:17,230-Speed 3468.93 samples/sec Loss 26.9348 LearningRate 0.0979 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:34:20,186-Speed 3465.48 samples/sec Loss 26.8652 LearningRate 0.0979 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:34:23,141-Speed 3465.41 samples/sec Loss 26.6820 LearningRate 0.0979 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:34:26,094-Speed 3468.61 samples/sec Loss 26.6253 LearningRate 0.0979 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:34:29,051-Speed 3463.96 samples/sec Loss 26.3177 LearningRate 0.0978 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:32,001-Speed 3471.45 samples/sec Loss 26.3232 LearningRate 0.0978 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:34,957-Speed 3464.87 samples/sec Loss 26.0734 LearningRate 0.0978 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:37,913-Speed 3465.48 samples/sec Loss 25.8342 LearningRate 0.0978 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:40,866-Speed 3468.71 samples/sec Loss 25.7510 LearningRate 0.0978 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:43,823-Speed 3464.53 samples/sec Loss 25.5355 LearningRate 0.0978 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:46,783-Speed 3459.71 samples/sec Loss 25.4521 LearningRate 0.0977 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:49,738-Speed 3465.97 samples/sec Loss 25.3086 LearningRate 0.0977 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:52,691-Speed 3468.50 samples/sec Loss 24.9476 LearningRate 0.0977 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:55,683-Speed 3422.61 samples/sec Loss 24.9923 LearningRate 0.0977 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:34:58,652-Speed 3450.33 samples/sec Loss 24.8184 LearningRate 0.0977 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:35:01,613-Speed 3459.64 samples/sec Loss 24.7787 LearningRate 0.0977 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:35:04,576-Speed 3456.34 samples/sec Loss 24.6523 LearningRate 0.0976 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:35:07,524-Speed 3474.77 samples/sec Loss 24.4503 LearningRate 0.0976 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:10,480-Speed 3464.97 samples/sec Loss 24.2784 LearningRate 0.0976 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:13,443-Speed 3456.38 samples/sec Loss 24.0857 LearningRate 0.0976 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:16,401-Speed 3463.28 samples/sec Loss 24.0810 LearningRate 0.0976 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:19,353-Speed 3469.10 samples/sec Loss 23.9469 LearningRate 0.0976 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:22,313-Speed 3460.43 samples/sec Loss 23.6692 LearningRate 0.0975 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:25,267-Speed 3467.39 samples/sec Loss 23.6292 LearningRate 0.0975 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:28,231-Speed 3455.77 samples/sec Loss 23.5041 LearningRate 0.0975 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:31,196-Speed 3453.70 samples/sec Loss 23.3429 LearningRate 0.0975 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:34,154-Speed 3462.96 samples/sec Loss 23.2969 LearningRate 0.0975 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:35:37,113-Speed 3461.96 samples/sec Loss 23.2366 LearningRate 0.0974 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 01:35:40,040-Speed 3500.05 samples/sec Loss 23.0133 LearningRate 0.0974 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:42,999-Speed 3461.25 samples/sec Loss 22.8900 LearningRate 0.0974 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:45,954-Speed 3465.16 samples/sec Loss 22.9868 LearningRate 0.0974 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:48,914-Speed 3460.88 samples/sec Loss 22.7094 LearningRate 0.0974 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:51,877-Speed 3457.08 samples/sec Loss 22.6335 LearningRate 0.0974 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:54,835-Speed 3461.70 samples/sec Loss 22.4347 LearningRate 0.0973 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:35:57,799-Speed 3455.83 samples/sec Loss 22.3818 LearningRate 0.0973 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:36:00,760-Speed 3459.62 samples/sec Loss 22.1709 LearningRate 0.0973 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:36:03,718-Speed 3462.31 samples/sec Loss 22.0594 LearningRate 0.0973 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:36:06,679-Speed 3459.13 samples/sec Loss 21.9179 LearningRate 0.0973 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:36:09,639-Speed 3461.07 samples/sec Loss 21.7345 LearningRate 0.0973 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:12,604-Speed 3453.96 samples/sec Loss 21.9192 LearningRate 0.0972 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:15,573-Speed 3449.52 samples/sec Loss 21.4801 LearningRate 0.0972 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:18,536-Speed 3457.60 samples/sec Loss 21.4815 LearningRate 0.0972 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:21,507-Speed 3447.19 samples/sec Loss 21.5369 LearningRate 0.0972 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:24,466-Speed 3462.01 samples/sec Loss 21.4062 LearningRate 0.0972 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:27,430-Speed 3455.55 samples/sec Loss 21.3606 LearningRate 0.0972 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:30,394-Speed 3455.37 samples/sec Loss 21.1720 LearningRate 0.0971 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:33,354-Speed 3460.79 samples/sec Loss 21.2957 LearningRate 0.0971 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:36,312-Speed 3462.28 samples/sec Loss 21.0234 LearningRate 0.0971 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:39,270-Speed 3463.36 samples/sec Loss 21.0604 LearningRate 0.0971 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:36:42,231-Speed 3459.17 samples/sec Loss 20.7749 LearningRate 0.0971 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:36:45,181-Speed 3472.04 samples/sec Loss 20.7017 LearningRate 0.0970 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:48,142-Speed 3458.85 samples/sec Loss 20.6507 LearningRate 0.0970 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:51,103-Speed 3459.61 samples/sec Loss 20.6199 LearningRate 0.0970 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:54,111-Speed 3404.61 samples/sec Loss 20.4980 LearningRate 0.0970 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:36:57,129-Speed 3393.28 samples/sec Loss 20.4476 LearningRate 0.0970 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:00,090-Speed 3460.05 samples/sec Loss 20.1752 LearningRate 0.0970 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:03,052-Speed 3458.51 samples/sec Loss 20.2595 LearningRate 0.0969 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:06,060-Speed 3404.15 samples/sec Loss 20.0411 LearningRate 0.0969 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:09,021-Speed 3458.93 samples/sec Loss 20.0573 LearningRate 0.0969 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:11,983-Speed 3458.89 samples/sec Loss 19.9011 LearningRate 0.0969 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:14,946-Speed 3456.65 samples/sec Loss 19.9083 LearningRate 0.0969 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:17,917-Speed 3446.91 samples/sec Loss 19.8106 LearningRate 0.0969 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:20,884-Speed 3451.93 samples/sec Loss 19.6799 LearningRate 0.0968 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:23,851-Speed 3452.45 samples/sec Loss 19.6714 LearningRate 0.0968 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:26,820-Speed 3450.21 samples/sec Loss 19.6649 LearningRate 0.0968 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:29,787-Speed 3451.54 samples/sec Loss 19.4689 LearningRate 0.0968 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:32,747-Speed 3460.41 samples/sec Loss 19.4527 LearningRate 0.0968 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:37:35,699-Speed 3470.33 samples/sec Loss 19.3712 LearningRate 0.0968 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:38,662-Speed 3456.04 samples/sec Loss 19.4042 LearningRate 0.0967 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:41,624-Speed 3458.90 samples/sec Loss 19.0824 LearningRate 0.0967 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:44,589-Speed 3453.37 samples/sec Loss 18.8792 LearningRate 0.0967 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:47,550-Speed 3459.43 samples/sec Loss 19.0752 LearningRate 0.0967 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:50,511-Speed 3459.85 samples/sec Loss 18.7519 LearningRate 0.0967 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:53,475-Speed 3455.63 samples/sec Loss 18.8056 LearningRate 0.0967 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:56,436-Speed 3459.08 samples/sec Loss 18.9784 LearningRate 0.0966 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:37:59,398-Speed 3458.38 samples/sec Loss 18.6702 LearningRate 0.0966 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:38:02,365-Speed 3451.83 samples/sec Loss 18.6783 LearningRate 0.0966 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:38:05,355-Speed 3438.58 samples/sec Loss 18.6623 LearningRate 0.0966 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:38:08,317-Speed 3458.10 samples/sec Loss 18.5580 LearningRate 0.0966 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:38:11,286-Speed 3452.93 samples/sec Loss 18.4478 LearningRate 0.0965 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:38:14,252-Speed 3453.92 samples/sec Loss 18.5991 LearningRate 0.0965 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 01:38:17,203-Speed 3470.47 samples/sec Loss 18.4642 LearningRate 0.0965 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 01:39:00,686-[lfw][2000]XNorm: 22.268713 Training: 2022-04-27 01:39:00,691-[lfw][2000]Accuracy-Flip: 0.98117+-0.00606 Training: 2022-04-27 01:39:00,691-[lfw][2000]Accuracy-Highest: 0.98117 Training: 2022-04-27 01:39:50,905-[cfp_fp][2000]XNorm: 18.848542 Training: 2022-04-27 01:39:50,906-[cfp_fp][2000]Accuracy-Flip: 0.78671+-0.02209 Training: 2022-04-27 01:39:50,906-[cfp_fp][2000]Accuracy-Highest: 0.78671 Training: 2022-04-27 01:40:34,127-[agedb_30][2000]XNorm: 21.529968 Training: 2022-04-27 01:40:34,128-[agedb_30][2000]Accuracy-Flip: 0.89517+-0.01863 Training: 2022-04-27 01:40:34,129-[agedb_30][2000]Accuracy-Highest: 0.89517 Training: 2022-04-27 01:40:37,086-Speed 73.21 samples/sec Loss 18.1831 LearningRate 0.0965 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:40,047-Speed 3459.48 samples/sec Loss 18.0367 LearningRate 0.0965 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:43,012-Speed 3453.81 samples/sec Loss 18.0553 LearningRate 0.0965 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:45,977-Speed 3467.35 samples/sec Loss 18.1793 LearningRate 0.0964 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:48,931-Speed 3466.23 samples/sec Loss 18.0511 LearningRate 0.0964 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:51,888-Speed 3464.47 samples/sec Loss 18.1344 LearningRate 0.0964 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:54,858-Speed 3448.46 samples/sec Loss 18.2913 LearningRate 0.0964 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:40:57,837-Speed 3458.59 samples/sec Loss 18.0095 LearningRate 0.0964 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:41:00,797-Speed 3459.84 samples/sec Loss 17.8124 LearningRate 0.0964 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:41:03,760-Speed 3457.01 samples/sec Loss 17.9375 LearningRate 0.0963 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:06,742-Speed 3448.29 samples/sec Loss 17.6548 LearningRate 0.0963 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:09,706-Speed 3455.21 samples/sec Loss 17.5310 LearningRate 0.0963 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:12,671-Speed 3454.76 samples/sec Loss 17.4227 LearningRate 0.0963 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:15,637-Speed 3452.77 samples/sec Loss 17.3870 LearningRate 0.0963 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:18,603-Speed 3453.98 samples/sec Loss 17.2406 LearningRate 0.0963 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:21,607-Speed 3453.97 samples/sec Loss 17.3169 LearningRate 0.0962 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:24,573-Speed 3452.56 samples/sec Loss 17.4195 LearningRate 0.0962 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:27,553-Speed 3441.46 samples/sec Loss 17.1651 LearningRate 0.0962 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:30,528-Speed 3443.17 samples/sec Loss 17.2418 LearningRate 0.0962 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:33,508-Speed 3437.39 samples/sec Loss 17.2135 LearningRate 0.0962 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:41:36,484-Speed 3447.40 samples/sec Loss 17.1875 LearningRate 0.0962 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:41:39,460-Speed 3441.44 samples/sec Loss 17.0191 LearningRate 0.0961 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:42,436-Speed 3442.48 samples/sec Loss 16.9554 LearningRate 0.0961 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:45,424-Speed 3427.51 samples/sec Loss 17.0467 LearningRate 0.0961 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:48,403-Speed 3438.29 samples/sec Loss 16.8527 LearningRate 0.0961 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:51,392-Speed 3436.79 samples/sec Loss 16.9890 LearningRate 0.0961 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:54,371-Speed 3437.56 samples/sec Loss 16.8323 LearningRate 0.0960 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:41:57,356-Speed 3437.51 samples/sec Loss 16.7029 LearningRate 0.0960 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:00,340-Speed 3432.31 samples/sec Loss 16.6687 LearningRate 0.0960 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:03,315-Speed 3442.65 samples/sec Loss 16.5367 LearningRate 0.0960 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:06,302-Speed 3444.57 samples/sec Loss 16.7361 LearningRate 0.0960 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:09,280-Speed 3438.64 samples/sec Loss 16.4924 LearningRate 0.0960 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:42:12,253-Speed 3445.31 samples/sec Loss 16.7148 LearningRate 0.0959 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:42:15,212-Speed 3461.34 samples/sec Loss 16.2982 LearningRate 0.0959 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:18,180-Speed 3450.17 samples/sec Loss 16.3453 LearningRate 0.0959 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:21,150-Speed 3453.25 samples/sec Loss 16.3905 LearningRate 0.0959 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:24,113-Speed 3456.21 samples/sec Loss 16.4681 LearningRate 0.0959 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:27,083-Speed 3449.00 samples/sec Loss 16.4621 LearningRate 0.0959 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:30,058-Speed 3455.72 samples/sec Loss 16.3237 LearningRate 0.0958 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:33,022-Speed 3455.87 samples/sec Loss 16.1416 LearningRate 0.0958 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:35,985-Speed 3456.41 samples/sec Loss 16.2304 LearningRate 0.0958 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:38,961-Speed 3450.47 samples/sec Loss 16.0783 LearningRate 0.0958 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:41,929-Speed 3451.00 samples/sec Loss 16.2764 LearningRate 0.0958 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:44,894-Speed 3454.45 samples/sec Loss 16.3176 LearningRate 0.0958 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:42:47,844-Speed 3472.34 samples/sec Loss 16.0671 LearningRate 0.0957 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:50,822-Speed 3439.04 samples/sec Loss 16.0153 LearningRate 0.0957 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:53,825-Speed 3424.90 samples/sec Loss 15.8714 LearningRate 0.0957 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:56,782-Speed 3463.16 samples/sec Loss 15.8232 LearningRate 0.0957 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:42:59,751-Speed 3457.04 samples/sec Loss 15.8106 LearningRate 0.0957 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:02,716-Speed 3454.85 samples/sec Loss 15.7955 LearningRate 0.0957 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:05,675-Speed 3460.89 samples/sec Loss 15.9779 LearningRate 0.0956 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:08,635-Speed 3465.79 samples/sec Loss 15.8576 LearningRate 0.0956 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:11,594-Speed 3461.07 samples/sec Loss 15.7525 LearningRate 0.0956 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:14,553-Speed 3461.50 samples/sec Loss 15.6378 LearningRate 0.0956 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:17,541-Speed 3427.83 samples/sec Loss 15.7377 LearningRate 0.0956 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:43:20,479-Speed 3486.18 samples/sec Loss 15.7147 LearningRate 0.0955 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:23,441-Speed 3463.26 samples/sec Loss 15.6587 LearningRate 0.0955 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:26,399-Speed 3462.42 samples/sec Loss 15.5164 LearningRate 0.0955 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:29,375-Speed 3449.33 samples/sec Loss 15.3378 LearningRate 0.0955 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:32,333-Speed 3462.65 samples/sec Loss 15.5933 LearningRate 0.0955 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:35,296-Speed 3456.29 samples/sec Loss 15.4742 LearningRate 0.0955 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:38,262-Speed 3461.91 samples/sec Loss 15.4657 LearningRate 0.0954 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:41,220-Speed 3462.59 samples/sec Loss 15.4065 LearningRate 0.0954 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:44,180-Speed 3459.51 samples/sec Loss 15.3714 LearningRate 0.0954 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:47,140-Speed 3460.88 samples/sec Loss 15.4012 LearningRate 0.0954 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:43:50,102-Speed 3457.51 samples/sec Loss 15.2465 LearningRate 0.0954 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:53,068-Speed 3457.70 samples/sec Loss 15.3444 LearningRate 0.0954 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:56,022-Speed 3467.50 samples/sec Loss 15.2158 LearningRate 0.0953 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:43:58,999-Speed 3458.02 samples/sec Loss 15.3221 LearningRate 0.0953 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:01,962-Speed 3457.37 samples/sec Loss 15.4514 LearningRate 0.0953 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:04,925-Speed 3456.41 samples/sec Loss 15.1176 LearningRate 0.0953 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:07,893-Speed 3459.22 samples/sec Loss 15.1491 LearningRate 0.0953 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:10,851-Speed 3462.72 samples/sec Loss 15.1606 LearningRate 0.0953 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:13,810-Speed 3460.52 samples/sec Loss 14.9439 LearningRate 0.0952 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:16,770-Speed 3460.16 samples/sec Loss 14.9734 LearningRate 0.0952 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:44:19,733-Speed 3457.25 samples/sec Loss 15.0973 LearningRate 0.0952 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:22,698-Speed 3463.66 samples/sec Loss 15.0369 LearningRate 0.0952 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:25,658-Speed 3460.30 samples/sec Loss 15.0502 LearningRate 0.0952 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:28,662-Speed 3416.45 samples/sec Loss 15.0699 LearningRate 0.0952 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:31,626-Speed 3456.05 samples/sec Loss 15.0686 LearningRate 0.0951 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:34,583-Speed 3463.11 samples/sec Loss 14.8821 LearningRate 0.0951 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:37,543-Speed 3460.85 samples/sec Loss 14.9850 LearningRate 0.0951 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:40,520-Speed 3440.01 samples/sec Loss 14.8803 LearningRate 0.0951 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:43,485-Speed 3454.72 samples/sec Loss 15.0090 LearningRate 0.0951 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:46,447-Speed 3457.41 samples/sec Loss 14.7879 LearningRate 0.0951 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:49,407-Speed 3460.58 samples/sec Loss 14.7958 LearningRate 0.0950 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:52,369-Speed 3458.28 samples/sec Loss 14.6468 LearningRate 0.0950 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:55,340-Speed 3447.23 samples/sec Loss 14.7457 LearningRate 0.0950 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:44:58,292-Speed 3469.94 samples/sec Loss 14.3814 LearningRate 0.0950 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:01,254-Speed 3458.24 samples/sec Loss 14.6328 LearningRate 0.0950 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:04,215-Speed 3458.16 samples/sec Loss 14.8609 LearningRate 0.0949 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:07,180-Speed 3454.80 samples/sec Loss 14.4447 LearningRate 0.0949 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:10,144-Speed 3455.89 samples/sec Loss 14.6500 LearningRate 0.0949 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:13,117-Speed 3445.07 samples/sec Loss 14.6104 LearningRate 0.0949 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:16,085-Speed 3450.31 samples/sec Loss 14.6259 LearningRate 0.0949 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:19,046-Speed 3460.04 samples/sec Loss 14.5242 LearningRate 0.0949 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:22,012-Speed 3453.15 samples/sec Loss 14.4962 LearningRate 0.0948 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:24,975-Speed 3456.38 samples/sec Loss 14.7006 LearningRate 0.0948 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:27,941-Speed 3453.14 samples/sec Loss 14.4967 LearningRate 0.0948 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:45:30,907-Speed 3453.95 samples/sec Loss 14.4583 LearningRate 0.0948 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:45:33,862-Speed 3465.79 samples/sec Loss 14.3377 LearningRate 0.0948 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:36,824-Speed 3458.39 samples/sec Loss 14.4729 LearningRate 0.0948 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:39,790-Speed 3452.76 samples/sec Loss 14.4582 LearningRate 0.0947 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:42,751-Speed 3458.88 samples/sec Loss 14.6884 LearningRate 0.0947 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:45,715-Speed 3456.50 samples/sec Loss 14.3913 LearningRate 0.0947 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:48,680-Speed 3454.32 samples/sec Loss 14.3389 LearningRate 0.0947 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:51,655-Speed 3442.51 samples/sec Loss 14.2578 LearningRate 0.0947 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:54,617-Speed 3457.82 samples/sec Loss 14.1476 LearningRate 0.0947 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:45:57,584-Speed 3452.70 samples/sec Loss 14.1439 LearningRate 0.0946 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:00,548-Speed 3455.20 samples/sec Loss 14.2396 LearningRate 0.0946 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:03,524-Speed 3441.44 samples/sec Loss 14.2403 LearningRate 0.0946 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:46:06,475-Speed 3470.75 samples/sec Loss 14.2431 LearningRate 0.0946 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:09,438-Speed 3457.80 samples/sec Loss 14.1686 LearningRate 0.0946 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:12,402-Speed 3455.51 samples/sec Loss 14.2021 LearningRate 0.0946 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:15,370-Speed 3451.23 samples/sec Loss 14.0034 LearningRate 0.0945 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:18,338-Speed 3450.80 samples/sec Loss 14.0763 LearningRate 0.0945 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:21,303-Speed 3453.95 samples/sec Loss 13.9800 LearningRate 0.0945 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:24,272-Speed 3449.69 samples/sec Loss 14.0742 LearningRate 0.0945 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:46:27,237-Speed 3455.07 samples/sec Loss 13.9726 LearningRate 0.0945 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:46:30,205-Speed 3450.38 samples/sec Loss 14.1402 LearningRate 0.0945 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:46:33,167-Speed 3457.35 samples/sec Loss 13.9443 LearningRate 0.0944 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:46:36,136-Speed 3451.08 samples/sec Loss 13.9287 LearningRate 0.0944 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:46:39,118-Speed 3434.38 samples/sec Loss 13.9053 LearningRate 0.0944 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:46:42,085-Speed 3451.97 samples/sec Loss 13.8279 LearningRate 0.0944 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:46:45,029-Speed 3479.56 samples/sec Loss 14.0538 LearningRate 0.0944 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:46:47,994-Speed 3454.38 samples/sec Loss 13.7916 LearningRate 0.0943 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:46:50,981-Speed 3428.27 samples/sec Loss 13.7024 LearningRate 0.0943 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:46:53,944-Speed 3457.03 samples/sec Loss 13.7790 LearningRate 0.0943 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:46:56,906-Speed 3457.86 samples/sec Loss 13.8780 LearningRate 0.0943 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:46:59,876-Speed 3448.51 samples/sec Loss 13.8322 LearningRate 0.0943 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:47:02,843-Speed 3452.26 samples/sec Loss 13.7727 LearningRate 0.0943 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:47:05,807-Speed 3455.73 samples/sec Loss 13.8067 LearningRate 0.0942 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:47:08,775-Speed 3450.51 samples/sec Loss 13.6751 LearningRate 0.0942 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:47:11,738-Speed 3457.52 samples/sec Loss 13.7858 LearningRate 0.0942 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:47:14,700-Speed 3457.23 samples/sec Loss 13.7152 LearningRate 0.0942 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:17,668-Speed 3451.11 samples/sec Loss 13.5333 LearningRate 0.0942 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:20,632-Speed 3455.42 samples/sec Loss 13.8620 LearningRate 0.0942 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:23,602-Speed 3449.02 samples/sec Loss 13.6661 LearningRate 0.0941 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:26,575-Speed 3445.41 samples/sec Loss 13.6261 LearningRate 0.0941 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:29,543-Speed 3450.01 samples/sec Loss 13.4465 LearningRate 0.0941 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:32,516-Speed 3445.35 samples/sec Loss 13.6180 LearningRate 0.0941 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:35,481-Speed 3455.45 samples/sec Loss 13.5932 LearningRate 0.0941 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:38,456-Speed 3442.33 samples/sec Loss 13.2598 LearningRate 0.0941 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:41,422-Speed 3453.10 samples/sec Loss 13.6328 LearningRate 0.0940 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:44,389-Speed 3452.70 samples/sec Loss 13.6105 LearningRate 0.0940 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:47:47,343-Speed 3466.86 samples/sec Loss 13.4114 LearningRate 0.0940 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:50,322-Speed 3438.76 samples/sec Loss 13.7047 LearningRate 0.0940 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:53,289-Speed 3451.55 samples/sec Loss 13.5576 LearningRate 0.0940 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:56,255-Speed 3453.13 samples/sec Loss 13.4074 LearningRate 0.0940 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:47:59,224-Speed 3450.17 samples/sec Loss 13.6480 LearningRate 0.0939 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:02,187-Speed 3456.60 samples/sec Loss 13.6496 LearningRate 0.0939 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:05,166-Speed 3438.78 samples/sec Loss 13.4692 LearningRate 0.0939 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:08,139-Speed 3445.51 samples/sec Loss 13.4721 LearningRate 0.0939 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:11,118-Speed 3437.04 samples/sec Loss 13.3394 LearningRate 0.0939 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:14,088-Speed 3449.32 samples/sec Loss 13.2881 LearningRate 0.0939 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:17,044-Speed 3465.51 samples/sec Loss 13.3024 LearningRate 0.0938 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:20,011-Speed 3450.99 samples/sec Loss 13.3821 LearningRate 0.0938 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:22,980-Speed 3450.68 samples/sec Loss 13.2987 LearningRate 0.0938 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:25,948-Speed 3450.79 samples/sec Loss 13.1173 LearningRate 0.0938 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:48:28,907-Speed 3461.47 samples/sec Loss 12.9718 LearningRate 0.0938 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:31,877-Speed 3449.01 samples/sec Loss 13.3359 LearningRate 0.0938 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:34,859-Speed 3434.96 samples/sec Loss 13.4502 LearningRate 0.0937 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:37,829-Speed 3448.29 samples/sec Loss 13.2952 LearningRate 0.0937 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:40,797-Speed 3450.06 samples/sec Loss 13.1307 LearningRate 0.0937 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:43,765-Speed 3451.27 samples/sec Loss 13.3252 LearningRate 0.0937 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:46,740-Speed 3443.47 samples/sec Loss 13.1921 LearningRate 0.0937 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:49,715-Speed 3442.28 samples/sec Loss 13.1363 LearningRate 0.0936 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:52,703-Speed 3428.55 samples/sec Loss 13.2379 LearningRate 0.0936 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:55,674-Speed 3446.51 samples/sec Loss 13.2232 LearningRate 0.0936 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:48:58,642-Speed 3452.17 samples/sec Loss 13.1979 LearningRate 0.0936 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:01,630-Speed 3426.72 samples/sec Loss 13.2076 LearningRate 0.0936 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:04,611-Speed 3437.19 samples/sec Loss 13.2507 LearningRate 0.0936 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:07,578-Speed 3451.69 samples/sec Loss 12.8917 LearningRate 0.0935 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:10,548-Speed 3449.36 samples/sec Loss 13.1249 LearningRate 0.0935 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:13,518-Speed 3449.01 samples/sec Loss 12.9993 LearningRate 0.0935 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:16,485-Speed 3452.54 samples/sec Loss 13.2269 LearningRate 0.0935 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:19,459-Speed 3444.23 samples/sec Loss 13.2934 LearningRate 0.0935 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:22,429-Speed 3448.20 samples/sec Loss 13.1207 LearningRate 0.0935 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:25,398-Speed 3449.44 samples/sec Loss 12.8527 LearningRate 0.0934 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:28,370-Speed 3447.27 samples/sec Loss 13.0059 LearningRate 0.0934 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:31,367-Speed 3417.05 samples/sec Loss 12.9649 LearningRate 0.0934 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:34,340-Speed 3445.19 samples/sec Loss 12.9584 LearningRate 0.0934 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:37,315-Speed 3443.41 samples/sec Loss 12.7697 LearningRate 0.0934 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:40,289-Speed 3443.47 samples/sec Loss 12.9830 LearningRate 0.0934 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:43,263-Speed 3443.88 samples/sec Loss 12.8748 LearningRate 0.0933 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:46,237-Speed 3444.16 samples/sec Loss 12.9882 LearningRate 0.0933 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:49,208-Speed 3447.05 samples/sec Loss 13.0280 LearningRate 0.0933 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:52,180-Speed 3446.37 samples/sec Loss 13.0841 LearningRate 0.0933 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:55,152-Speed 3447.01 samples/sec Loss 12.7353 LearningRate 0.0933 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:49:58,125-Speed 3445.02 samples/sec Loss 12.8807 LearningRate 0.0933 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:50:01,110-Speed 3430.95 samples/sec Loss 12.9883 LearningRate 0.0932 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:50:04,084-Speed 3444.71 samples/sec Loss 12.7149 LearningRate 0.0932 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 01:50:07,047-Speed 3456.24 samples/sec Loss 12.7856 LearningRate 0.0932 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:10,022-Speed 3442.28 samples/sec Loss 12.8257 LearningRate 0.0932 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:12,998-Speed 3442.55 samples/sec Loss 12.8412 LearningRate 0.0932 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:15,988-Speed 3425.11 samples/sec Loss 12.9467 LearningRate 0.0932 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:18,995-Speed 3406.95 samples/sec Loss 12.7295 LearningRate 0.0931 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:21,967-Speed 3445.93 samples/sec Loss 12.8251 LearningRate 0.0931 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:24,943-Speed 3441.78 samples/sec Loss 12.9355 LearningRate 0.0931 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:50:27,918-Speed 3443.98 samples/sec Loss 12.7543 LearningRate 0.0931 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 01:51:11,467-[lfw][4000]XNorm: 22.044406 Training: 2022-04-27 01:51:11,468-[lfw][4000]Accuracy-Flip: 0.99200+-0.00314 Training: 2022-04-27 01:51:11,468-[lfw][4000]Accuracy-Highest: 0.99200 Training: 2022-04-27 01:52:02,153-[cfp_fp][4000]XNorm: 19.166544 Training: 2022-04-27 01:52:02,153-[cfp_fp][4000]Accuracy-Flip: 0.87400+-0.01373 Training: 2022-04-27 01:52:02,154-[cfp_fp][4000]Accuracy-Highest: 0.87400 Training: 2022-04-27 01:52:45,825-[agedb_30][4000]XNorm: 21.483009 Training: 2022-04-27 01:52:45,825-[agedb_30][4000]Accuracy-Flip: 0.94000+-0.01402 Training: 2022-04-27 01:52:45,826-[agedb_30][4000]Accuracy-Highest: 0.94000 Training: 2022-04-27 01:52:48,782-Speed 72.69 samples/sec Loss 12.7220 LearningRate 0.0931 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:52:51,736-Speed 3466.51 samples/sec Loss 12.6761 LearningRate 0.0931 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:52:54,698-Speed 3458.05 samples/sec Loss 12.6618 LearningRate 0.0930 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:52:57,662-Speed 3456.00 samples/sec Loss 12.6964 LearningRate 0.0930 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:00,625-Speed 3456.67 samples/sec Loss 12.7081 LearningRate 0.0930 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:03,587-Speed 3458.15 samples/sec Loss 12.6644 LearningRate 0.0930 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:06,547-Speed 3459.77 samples/sec Loss 12.6412 LearningRate 0.0930 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:09,522-Speed 3442.35 samples/sec Loss 12.5219 LearningRate 0.0930 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:12,490-Speed 3451.46 samples/sec Loss 12.7559 LearningRate 0.0929 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:53:15,444-Speed 3467.01 samples/sec Loss 12.6429 LearningRate 0.0929 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:18,413-Speed 3450.61 samples/sec Loss 12.5599 LearningRate 0.0929 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:21,386-Speed 3445.29 samples/sec Loss 12.7687 LearningRate 0.0929 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:24,361-Speed 3442.69 samples/sec Loss 12.5487 LearningRate 0.0929 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:27,331-Speed 3448.38 samples/sec Loss 12.7661 LearningRate 0.0929 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:30,310-Speed 3438.14 samples/sec Loss 12.5928 LearningRate 0.0928 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:33,280-Speed 3448.99 samples/sec Loss 12.5189 LearningRate 0.0928 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:36,264-Speed 3431.53 samples/sec Loss 12.5061 LearningRate 0.0928 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:53:39,228-Speed 3456.30 samples/sec Loss 12.3325 LearningRate 0.0928 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:42,202-Speed 3443.72 samples/sec Loss 12.4509 LearningRate 0.0928 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:45,169-Speed 3452.18 samples/sec Loss 12.4642 LearningRate 0.0927 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:48,137-Speed 3451.11 samples/sec Loss 12.5399 LearningRate 0.0927 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:51,102-Speed 3454.45 samples/sec Loss 12.1225 LearningRate 0.0927 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:54,067-Speed 3454.15 samples/sec Loss 12.4095 LearningRate 0.0927 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:57,031-Speed 3456.16 samples/sec Loss 12.5147 LearningRate 0.0927 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:53:59,994-Speed 3455.60 samples/sec Loss 12.4771 LearningRate 0.0927 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:02,972-Speed 3439.93 samples/sec Loss 12.4487 LearningRate 0.0926 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:05,937-Speed 3454.43 samples/sec Loss 12.2745 LearningRate 0.0926 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:08,900-Speed 3457.31 samples/sec Loss 12.4280 LearningRate 0.0926 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:11,860-Speed 3459.38 samples/sec Loss 12.4596 LearningRate 0.0926 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:14,817-Speed 3464.02 samples/sec Loss 12.5640 LearningRate 0.0926 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:17,791-Speed 3444.63 samples/sec Loss 12.3124 LearningRate 0.0926 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:20,749-Speed 3462.29 samples/sec Loss 12.4590 LearningRate 0.0925 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:23,704-Speed 3466.01 samples/sec Loss 12.2962 LearningRate 0.0925 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:26,659-Speed 3466.43 samples/sec Loss 12.3322 LearningRate 0.0925 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:29,616-Speed 3463.60 samples/sec Loss 12.3833 LearningRate 0.0925 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:32,573-Speed 3463.52 samples/sec Loss 12.2661 LearningRate 0.0925 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:35,529-Speed 3465.27 samples/sec Loss 12.2460 LearningRate 0.0925 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:38,488-Speed 3461.05 samples/sec Loss 12.2668 LearningRate 0.0924 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:54:41,441-Speed 3468.96 samples/sec Loss 12.3868 LearningRate 0.0924 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:44,405-Speed 3455.49 samples/sec Loss 12.2766 LearningRate 0.0924 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:54:47,353-Speed 3474.77 samples/sec Loss 12.0989 LearningRate 0.0924 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:50,307-Speed 3467.55 samples/sec Loss 12.2504 LearningRate 0.0924 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:53,259-Speed 3469.91 samples/sec Loss 12.2606 LearningRate 0.0924 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:56,212-Speed 3467.41 samples/sec Loss 12.2635 LearningRate 0.0923 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:54:59,170-Speed 3462.92 samples/sec Loss 12.1805 LearningRate 0.0923 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:02,141-Speed 3448.04 samples/sec Loss 12.0922 LearningRate 0.0923 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:05,097-Speed 3464.68 samples/sec Loss 11.9597 LearningRate 0.0923 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:08,060-Speed 3456.22 samples/sec Loss 12.1669 LearningRate 0.0923 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:11,020-Speed 3460.60 samples/sec Loss 12.0596 LearningRate 0.0923 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:13,979-Speed 3461.41 samples/sec Loss 12.1990 LearningRate 0.0922 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:16,942-Speed 3457.15 samples/sec Loss 12.1878 LearningRate 0.0922 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:19,902-Speed 3459.71 samples/sec Loss 12.2039 LearningRate 0.0922 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:22,859-Speed 3464.06 samples/sec Loss 12.1032 LearningRate 0.0922 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:25,828-Speed 3449.40 samples/sec Loss 12.0611 LearningRate 0.0922 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:28,791-Speed 3457.16 samples/sec Loss 12.1608 LearningRate 0.0922 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:31,750-Speed 3461.86 samples/sec Loss 12.0708 LearningRate 0.0921 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:34,707-Speed 3463.95 samples/sec Loss 12.0174 LearningRate 0.0921 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:37,666-Speed 3461.52 samples/sec Loss 12.1736 LearningRate 0.0921 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:55:40,622-Speed 3464.54 samples/sec Loss 12.0749 LearningRate 0.0921 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:43,578-Speed 3464.57 samples/sec Loss 12.0388 LearningRate 0.0921 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:46,539-Speed 3460.22 samples/sec Loss 12.1564 LearningRate 0.0921 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:49,536-Speed 3417.38 samples/sec Loss 11.9758 LearningRate 0.0920 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:52,533-Speed 3417.86 samples/sec Loss 11.9349 LearningRate 0.0920 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:55,528-Speed 3419.59 samples/sec Loss 11.8644 LearningRate 0.0920 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:55:58,498-Speed 3448.86 samples/sec Loss 12.0072 LearningRate 0.0920 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:01,457-Speed 3460.90 samples/sec Loss 12.1318 LearningRate 0.0920 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:04,422-Speed 3455.04 samples/sec Loss 12.0745 LearningRate 0.0920 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:07,384-Speed 3457.47 samples/sec Loss 11.9165 LearningRate 0.0919 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:10,332-Speed 3474.55 samples/sec Loss 11.9810 LearningRate 0.0919 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:13,289-Speed 3463.08 samples/sec Loss 11.9553 LearningRate 0.0919 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:16,252-Speed 3457.01 samples/sec Loss 11.9592 LearningRate 0.0919 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:19,217-Speed 3455.05 samples/sec Loss 11.9294 LearningRate 0.0919 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:22,178-Speed 3459.57 samples/sec Loss 12.0220 LearningRate 0.0919 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:25,145-Speed 3451.29 samples/sec Loss 12.0176 LearningRate 0.0918 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:28,107-Speed 3457.76 samples/sec Loss 11.9086 LearningRate 0.0918 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:31,067-Speed 3461.01 samples/sec Loss 11.9220 LearningRate 0.0918 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:34,031-Speed 3456.11 samples/sec Loss 11.8397 LearningRate 0.0918 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:36,995-Speed 3454.86 samples/sec Loss 12.1992 LearningRate 0.0918 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:39,961-Speed 3453.01 samples/sec Loss 11.8839 LearningRate 0.0918 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:56:42,930-Speed 3449.94 samples/sec Loss 11.9992 LearningRate 0.0917 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:56:45,893-Speed 3457.18 samples/sec Loss 11.7339 LearningRate 0.0917 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:56:48,868-Speed 3441.88 samples/sec Loss 11.8986 LearningRate 0.0917 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:56:51,830-Speed 3458.46 samples/sec Loss 11.8758 LearningRate 0.0917 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:56:54,826-Speed 3418.86 samples/sec Loss 11.6342 LearningRate 0.0917 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:56:57,789-Speed 3456.83 samples/sec Loss 11.8295 LearningRate 0.0917 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:00,758-Speed 3450.29 samples/sec Loss 11.9476 LearningRate 0.0916 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:03,732-Speed 3443.71 samples/sec Loss 11.8129 LearningRate 0.0916 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:06,694-Speed 3458.12 samples/sec Loss 11.7009 LearningRate 0.0916 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:09,670-Speed 3441.18 samples/sec Loss 11.8622 LearningRate 0.0916 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:12,629-Speed 3460.87 samples/sec Loss 11.7870 LearningRate 0.0916 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:15,598-Speed 3449.85 samples/sec Loss 12.0083 LearningRate 0.0916 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:18,565-Speed 3451.84 samples/sec Loss 11.8415 LearningRate 0.0915 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:21,530-Speed 3454.90 samples/sec Loss 11.7247 LearningRate 0.0915 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:57:24,491-Speed 3458.97 samples/sec Loss 11.7343 LearningRate 0.0915 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:27,464-Speed 3446.31 samples/sec Loss 11.8280 LearningRate 0.0915 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:30,439-Speed 3442.62 samples/sec Loss 11.8131 LearningRate 0.0915 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:33,418-Speed 3437.42 samples/sec Loss 11.8861 LearningRate 0.0915 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:36,384-Speed 3453.89 samples/sec Loss 11.8026 LearningRate 0.0914 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:39,351-Speed 3452.01 samples/sec Loss 11.7676 LearningRate 0.0914 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:42,319-Speed 3451.09 samples/sec Loss 11.6989 LearningRate 0.0914 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:45,284-Speed 3453.79 samples/sec Loss 11.5832 LearningRate 0.0914 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:48,248-Speed 3456.04 samples/sec Loss 11.7145 LearningRate 0.0914 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:51,215-Speed 3451.84 samples/sec Loss 11.7248 LearningRate 0.0913 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:57:54,214-Speed 3415.74 samples/sec Loss 11.6808 LearningRate 0.0913 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:57:57,177-Speed 3456.94 samples/sec Loss 11.7859 LearningRate 0.0913 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:58:00,142-Speed 3454.02 samples/sec Loss 11.5850 LearningRate 0.0913 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:58:03,107-Speed 3455.14 samples/sec Loss 11.5795 LearningRate 0.0913 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:58:06,060-Speed 3467.94 samples/sec Loss 11.5968 LearningRate 0.0913 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:09,023-Speed 3456.49 samples/sec Loss 11.4955 LearningRate 0.0912 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:11,994-Speed 3447.81 samples/sec Loss 11.5108 LearningRate 0.0912 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:14,974-Speed 3437.17 samples/sec Loss 11.5584 LearningRate 0.0912 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:17,943-Speed 3448.75 samples/sec Loss 11.7204 LearningRate 0.0912 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:20,908-Speed 3454.70 samples/sec Loss 11.5258 LearningRate 0.0912 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:23,878-Speed 3449.31 samples/sec Loss 11.6642 LearningRate 0.0912 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:26,847-Speed 3449.73 samples/sec Loss 11.6276 LearningRate 0.0911 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:29,813-Speed 3453.50 samples/sec Loss 11.7060 LearningRate 0.0911 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:32,782-Speed 3449.50 samples/sec Loss 11.5916 LearningRate 0.0911 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:35,746-Speed 3455.06 samples/sec Loss 11.5142 LearningRate 0.0911 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:38,726-Speed 3437.92 samples/sec Loss 11.6085 LearningRate 0.0911 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:41,702-Speed 3441.38 samples/sec Loss 11.6366 LearningRate 0.0911 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:44,673-Speed 3447.12 samples/sec Loss 11.6984 LearningRate 0.0910 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:47,651-Speed 3439.13 samples/sec Loss 11.5753 LearningRate 0.0910 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:50,618-Speed 3452.03 samples/sec Loss 11.5495 LearningRate 0.0910 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:53,585-Speed 3453.28 samples/sec Loss 11.5842 LearningRate 0.0910 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:56,557-Speed 3445.76 samples/sec Loss 11.3767 LearningRate 0.0910 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:58:59,521-Speed 3456.36 samples/sec Loss 11.3179 LearningRate 0.0910 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:02,490-Speed 3448.85 samples/sec Loss 11.6310 LearningRate 0.0909 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:05,463-Speed 3445.82 samples/sec Loss 11.4901 LearningRate 0.0909 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 01:59:08,417-Speed 3466.68 samples/sec Loss 11.6950 LearningRate 0.0909 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:11,387-Speed 3448.87 samples/sec Loss 11.5676 LearningRate 0.0909 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:14,354-Speed 3451.43 samples/sec Loss 11.5911 LearningRate 0.0909 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:17,319-Speed 3454.61 samples/sec Loss 11.7352 LearningRate 0.0909 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:20,287-Speed 3451.47 samples/sec Loss 11.4962 LearningRate 0.0908 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:23,257-Speed 3448.43 samples/sec Loss 11.7128 LearningRate 0.0908 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:26,226-Speed 3449.76 samples/sec Loss 11.5622 LearningRate 0.0908 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:29,204-Speed 3440.16 samples/sec Loss 11.5311 LearningRate 0.0908 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 01:59:32,158-Speed 3466.83 samples/sec Loss 11.4741 LearningRate 0.0908 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:59:35,135-Speed 3440.85 samples/sec Loss 11.4841 LearningRate 0.0908 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:59:38,106-Speed 3446.91 samples/sec Loss 11.4434 LearningRate 0.0907 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:59:41,080-Speed 3444.06 samples/sec Loss 11.4643 LearningRate 0.0907 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 01:59:44,059-Speed 3438.37 samples/sec Loss 11.4438 LearningRate 0.0907 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:59:47,032-Speed 3445.63 samples/sec Loss 11.5566 LearningRate 0.0907 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:59:50,019-Speed 3428.71 samples/sec Loss 11.4290 LearningRate 0.0907 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:59:53,012-Speed 3429.48 samples/sec Loss 11.5018 LearningRate 0.0907 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:59:55,983-Speed 3447.76 samples/sec Loss 11.5005 LearningRate 0.0906 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 01:59:58,954-Speed 3448.15 samples/sec Loss 11.4596 LearningRate 0.0906 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:00:01,923-Speed 3449.30 samples/sec Loss 11.3218 LearningRate 0.0906 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:04,891-Speed 3451.07 samples/sec Loss 11.3086 LearningRate 0.0906 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:07,861-Speed 3448.62 samples/sec Loss 11.4523 LearningRate 0.0906 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:10,840-Speed 3438.29 samples/sec Loss 11.4423 LearningRate 0.0906 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:13,811-Speed 3447.17 samples/sec Loss 11.3704 LearningRate 0.0905 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:16,799-Speed 3428.01 samples/sec Loss 11.2679 LearningRate 0.0905 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:19,775-Speed 3441.64 samples/sec Loss 11.4258 LearningRate 0.0905 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:22,760-Speed 3432.79 samples/sec Loss 11.3139 LearningRate 0.0905 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:25,732-Speed 3445.29 samples/sec Loss 11.4127 LearningRate 0.0905 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:28,699-Speed 3452.06 samples/sec Loss 11.2902 LearningRate 0.0905 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:31,666-Speed 3452.81 samples/sec Loss 11.3586 LearningRate 0.0904 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:00:34,630-Speed 3455.57 samples/sec Loss 11.3395 LearningRate 0.0904 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:37,605-Speed 3442.32 samples/sec Loss 11.2115 LearningRate 0.0904 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:40,576-Speed 3448.37 samples/sec Loss 11.3782 LearningRate 0.0904 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:43,554-Speed 3439.30 samples/sec Loss 11.3691 LearningRate 0.0904 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:46,523-Speed 3449.27 samples/sec Loss 11.2051 LearningRate 0.0904 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:49,499-Speed 3441.51 samples/sec Loss 11.1439 LearningRate 0.0903 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:52,477-Speed 3439.70 samples/sec Loss 11.1068 LearningRate 0.0903 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:55,453-Speed 3442.01 samples/sec Loss 11.4076 LearningRate 0.0903 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:00:58,421-Speed 3450.81 samples/sec Loss 11.2732 LearningRate 0.0903 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:01,398-Speed 3439.75 samples/sec Loss 11.3598 LearningRate 0.0903 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:04,431-Speed 3377.23 samples/sec Loss 11.3454 LearningRate 0.0903 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:17,614-Speed 776.82 samples/sec Loss 10.8269 LearningRate 0.0902 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:20,787-Speed 3228.55 samples/sec Loss 10.6072 LearningRate 0.0902 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:23,754-Speed 3451.88 samples/sec Loss 10.4715 LearningRate 0.0902 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:26,758-Speed 3410.13 samples/sec Loss 10.5011 LearningRate 0.0902 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:29,734-Speed 3442.10 samples/sec Loss 10.3723 LearningRate 0.0902 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:32,698-Speed 3455.63 samples/sec Loss 10.5867 LearningRate 0.0902 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:35,667-Speed 3449.98 samples/sec Loss 10.5834 LearningRate 0.0901 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:38,634-Speed 3452.00 samples/sec Loss 10.4450 LearningRate 0.0901 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:41,602-Speed 3450.81 samples/sec Loss 10.5189 LearningRate 0.0901 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:44,570-Speed 3451.05 samples/sec Loss 10.6047 LearningRate 0.0901 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:47,536-Speed 3452.70 samples/sec Loss 10.6301 LearningRate 0.0901 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:01:50,506-Speed 3448.81 samples/sec Loss 10.6487 LearningRate 0.0901 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:53,476-Speed 3448.71 samples/sec Loss 10.5849 LearningRate 0.0900 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:56,455-Speed 3437.68 samples/sec Loss 10.5846 LearningRate 0.0900 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:01:59,433-Speed 3440.17 samples/sec Loss 10.7574 LearningRate 0.0900 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:02,411-Speed 3439.56 samples/sec Loss 10.7184 LearningRate 0.0900 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:05,384-Speed 3444.43 samples/sec Loss 10.7082 LearningRate 0.0900 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:08,366-Speed 3435.11 samples/sec Loss 10.5023 LearningRate 0.0900 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:11,339-Speed 3445.03 samples/sec Loss 10.6521 LearningRate 0.0899 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:14,321-Speed 3434.41 samples/sec Loss 10.6254 LearningRate 0.0899 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:17,331-Speed 3403.40 samples/sec Loss 10.7144 LearningRate 0.0899 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:20,309-Speed 3439.19 samples/sec Loss 10.7861 LearningRate 0.0899 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:23,290-Speed 3435.69 samples/sec Loss 10.6975 LearningRate 0.0899 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:02:26,262-Speed 3446.70 samples/sec Loss 10.6946 LearningRate 0.0899 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:29,249-Speed 3429.32 samples/sec Loss 10.7485 LearningRate 0.0898 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:32,233-Speed 3431.61 samples/sec Loss 10.7220 LearningRate 0.0898 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:35,222-Speed 3427.19 samples/sec Loss 10.9531 LearningRate 0.0898 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:38,253-Speed 3379.48 samples/sec Loss 10.8661 LearningRate 0.0898 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:41,260-Speed 3406.67 samples/sec Loss 10.7004 LearningRate 0.0898 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:44,241-Speed 3435.40 samples/sec Loss 10.7769 LearningRate 0.0898 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:47,231-Speed 3425.07 samples/sec Loss 10.6545 LearningRate 0.0897 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:02:50,216-Speed 3431.47 samples/sec Loss 10.7644 LearningRate 0.0897 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:03:33,585-[lfw][6000]XNorm: 23.073785 Training: 2022-04-27 02:03:33,585-[lfw][6000]Accuracy-Flip: 0.99367+-0.00371 Training: 2022-04-27 02:03:33,586-[lfw][6000]Accuracy-Highest: 0.99367 Training: 2022-04-27 02:04:23,975-[cfp_fp][6000]XNorm: 19.895782 Training: 2022-04-27 02:04:23,976-[cfp_fp][6000]Accuracy-Flip: 0.88571+-0.01898 Training: 2022-04-27 02:04:23,976-[cfp_fp][6000]Accuracy-Highest: 0.88571 Training: 2022-04-27 02:05:07,190-[agedb_30][6000]XNorm: 22.293554 Training: 2022-04-27 02:05:07,191-[agedb_30][6000]Accuracy-Flip: 0.95450+-0.00873 Training: 2022-04-27 02:05:07,191-[agedb_30][6000]Accuracy-Highest: 0.95450 Training: 2022-04-27 02:05:10,246-Speed 73.13 samples/sec Loss 10.5289 LearningRate 0.0897 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:13,184-Speed 3485.59 samples/sec Loss 10.7065 LearningRate 0.0897 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:05:16,124-Speed 3484.20 samples/sec Loss 10.7681 LearningRate 0.0897 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:19,084-Speed 3460.13 samples/sec Loss 10.8672 LearningRate 0.0897 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:22,035-Speed 3471.54 samples/sec Loss 10.6543 LearningRate 0.0896 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:24,984-Speed 3473.16 samples/sec Loss 10.6421 LearningRate 0.0896 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:27,933-Speed 3472.34 samples/sec Loss 10.6934 LearningRate 0.0896 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:30,889-Speed 3465.52 samples/sec Loss 10.7527 LearningRate 0.0896 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:33,846-Speed 3463.06 samples/sec Loss 10.7204 LearningRate 0.0896 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:36,808-Speed 3458.60 samples/sec Loss 10.6154 LearningRate 0.0896 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:39,767-Speed 3460.97 samples/sec Loss 10.6772 LearningRate 0.0895 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:42,725-Speed 3463.43 samples/sec Loss 10.7883 LearningRate 0.0895 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:05:45,681-Speed 3464.29 samples/sec Loss 10.8516 LearningRate 0.0895 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:05:48,645-Speed 3455.52 samples/sec Loss 10.8135 LearningRate 0.0895 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:05:51,648-Speed 3411.24 samples/sec Loss 10.8408 LearningRate 0.0895 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:05:54,620-Speed 3446.34 samples/sec Loss 10.7989 LearningRate 0.0895 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:05:57,581-Speed 3458.09 samples/sec Loss 10.9222 LearningRate 0.0894 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:00,553-Speed 3446.24 samples/sec Loss 10.7898 LearningRate 0.0894 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:03,524-Speed 3448.16 samples/sec Loss 10.7260 LearningRate 0.0894 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:06,493-Speed 3449.29 samples/sec Loss 10.6569 LearningRate 0.0894 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:09,464-Speed 3448.24 samples/sec Loss 10.6387 LearningRate 0.0894 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:12,435-Speed 3447.50 samples/sec Loss 10.6409 LearningRate 0.0894 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:15,395-Speed 3459.78 samples/sec Loss 10.7698 LearningRate 0.0893 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:18,379-Speed 3432.86 samples/sec Loss 10.5541 LearningRate 0.0893 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:21,355-Speed 3441.46 samples/sec Loss 10.7689 LearningRate 0.0893 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:24,325-Speed 3449.47 samples/sec Loss 10.6237 LearningRate 0.0893 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:27,294-Speed 3449.30 samples/sec Loss 10.8297 LearningRate 0.0893 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:30,261-Speed 3452.27 samples/sec Loss 10.7151 LearningRate 0.0893 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:33,232-Speed 3447.46 samples/sec Loss 10.5236 LearningRate 0.0892 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:36,196-Speed 3455.50 samples/sec Loss 10.7107 LearningRate 0.0892 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:39,163-Speed 3452.91 samples/sec Loss 10.6512 LearningRate 0.0892 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:42,123-Speed 3459.70 samples/sec Loss 10.7294 LearningRate 0.0892 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:45,064-Speed 3483.13 samples/sec Loss 10.6558 LearningRate 0.0892 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:06:48,006-Speed 3480.95 samples/sec Loss 10.6628 LearningRate 0.0892 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:06:50,965-Speed 3461.21 samples/sec Loss 10.6995 LearningRate 0.0891 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:06:53,921-Speed 3464.70 samples/sec Loss 10.7592 LearningRate 0.0891 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:06:56,876-Speed 3466.12 samples/sec Loss 10.6906 LearningRate 0.0891 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:06:59,834-Speed 3462.71 samples/sec Loss 10.6857 LearningRate 0.0891 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:02,796-Speed 3458.82 samples/sec Loss 10.7154 LearningRate 0.0891 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:05,755-Speed 3460.48 samples/sec Loss 10.6490 LearningRate 0.0891 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:08,708-Speed 3468.93 samples/sec Loss 10.7876 LearningRate 0.0890 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:11,664-Speed 3465.46 samples/sec Loss 10.6204 LearningRate 0.0890 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:14,618-Speed 3466.88 samples/sec Loss 10.5849 LearningRate 0.0890 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:18,635-Speed 2549.64 samples/sec Loss 10.6405 LearningRate 0.0890 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:07:21,989-Speed 3053.37 samples/sec Loss 10.9193 LearningRate 0.0890 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:07:24,934-Speed 3478.59 samples/sec Loss 10.7994 LearningRate 0.0890 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:27,896-Speed 3457.09 samples/sec Loss 10.6753 LearningRate 0.0889 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:30,855-Speed 3462.59 samples/sec Loss 10.6318 LearningRate 0.0889 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:33,814-Speed 3461.57 samples/sec Loss 10.6656 LearningRate 0.0889 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:36,769-Speed 3465.83 samples/sec Loss 10.7309 LearningRate 0.0889 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:39,738-Speed 3449.70 samples/sec Loss 10.5875 LearningRate 0.0889 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:42,695-Speed 3463.43 samples/sec Loss 10.6032 LearningRate 0.0889 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:45,649-Speed 3467.56 samples/sec Loss 10.5886 LearningRate 0.0888 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:48,612-Speed 3456.02 samples/sec Loss 10.6710 LearningRate 0.0888 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:51,593-Speed 3436.07 samples/sec Loss 10.5963 LearningRate 0.0888 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:07:54,547-Speed 3467.30 samples/sec Loss 10.6970 LearningRate 0.0888 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:07:57,500-Speed 3468.65 samples/sec Loss 10.7024 LearningRate 0.0888 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:00,469-Speed 3450.13 samples/sec Loss 10.7724 LearningRate 0.0888 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:03,421-Speed 3469.18 samples/sec Loss 10.7604 LearningRate 0.0887 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:06,380-Speed 3461.70 samples/sec Loss 10.8319 LearningRate 0.0887 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:09,334-Speed 3467.14 samples/sec Loss 10.7395 LearningRate 0.0887 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:12,291-Speed 3464.20 samples/sec Loss 10.5716 LearningRate 0.0887 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:15,288-Speed 3417.35 samples/sec Loss 10.7703 LearningRate 0.0887 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:18,251-Speed 3456.46 samples/sec Loss 10.4119 LearningRate 0.0887 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:21,206-Speed 3466.18 samples/sec Loss 10.5784 LearningRate 0.0886 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:24,164-Speed 3463.26 samples/sec Loss 10.6671 LearningRate 0.0886 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:08:27,122-Speed 3462.57 samples/sec Loss 10.5522 LearningRate 0.0886 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:08:30,079-Speed 3463.99 samples/sec Loss 10.6308 LearningRate 0.0886 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:08:33,039-Speed 3460.06 samples/sec Loss 10.5427 LearningRate 0.0886 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:36,000-Speed 3458.94 samples/sec Loss 10.6931 LearningRate 0.0886 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:38,960-Speed 3460.32 samples/sec Loss 10.6240 LearningRate 0.0885 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:41,927-Speed 3451.85 samples/sec Loss 10.4661 LearningRate 0.0885 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:44,887-Speed 3460.40 samples/sec Loss 10.5663 LearningRate 0.0885 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:47,842-Speed 3466.53 samples/sec Loss 10.5252 LearningRate 0.0885 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:50,799-Speed 3463.98 samples/sec Loss 10.4807 LearningRate 0.0885 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:53,758-Speed 3462.15 samples/sec Loss 10.7586 LearningRate 0.0885 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:56,715-Speed 3463.89 samples/sec Loss 10.6169 LearningRate 0.0884 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:08:59,672-Speed 3463.34 samples/sec Loss 10.5026 LearningRate 0.0884 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:02,620-Speed 3474.81 samples/sec Loss 10.4474 LearningRate 0.0884 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:05,580-Speed 3459.46 samples/sec Loss 10.5860 LearningRate 0.0884 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:08,539-Speed 3462.56 samples/sec Loss 10.6524 LearningRate 0.0884 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:11,496-Speed 3463.26 samples/sec Loss 10.5506 LearningRate 0.0884 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:14,452-Speed 3464.55 samples/sec Loss 10.5429 LearningRate 0.0883 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:17,409-Speed 3464.60 samples/sec Loss 10.5179 LearningRate 0.0883 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:20,365-Speed 3464.63 samples/sec Loss 10.6633 LearningRate 0.0883 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:23,323-Speed 3462.92 samples/sec Loss 10.5832 LearningRate 0.0883 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:26,302-Speed 3438.09 samples/sec Loss 10.5510 LearningRate 0.0883 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:29,270-Speed 3451.38 samples/sec Loss 10.6934 LearningRate 0.0883 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:09:32,229-Speed 3461.05 samples/sec Loss 10.4751 LearningRate 0.0882 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:35,190-Speed 3458.35 samples/sec Loss 10.5096 LearningRate 0.0882 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:38,152-Speed 3458.42 samples/sec Loss 10.4647 LearningRate 0.0882 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:41,114-Speed 3458.29 samples/sec Loss 10.5579 LearningRate 0.0882 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:44,076-Speed 3458.25 samples/sec Loss 10.5310 LearningRate 0.0882 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:47,036-Speed 3460.68 samples/sec Loss 10.4103 LearningRate 0.0882 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:50,015-Speed 3437.50 samples/sec Loss 10.5441 LearningRate 0.0882 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:52,979-Speed 3456.24 samples/sec Loss 10.5088 LearningRate 0.0881 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:55,943-Speed 3455.60 samples/sec Loss 10.5790 LearningRate 0.0881 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:09:58,905-Speed 3457.48 samples/sec Loss 10.5226 LearningRate 0.0881 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:01,853-Speed 3474.70 samples/sec Loss 10.4920 LearningRate 0.0881 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:04,811-Speed 3461.97 samples/sec Loss 10.4184 LearningRate 0.0881 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:07,804-Speed 3422.80 samples/sec Loss 10.7787 LearningRate 0.0881 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:10,773-Speed 3449.03 samples/sec Loss 10.4955 LearningRate 0.0880 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:13,743-Speed 3449.60 samples/sec Loss 10.3526 LearningRate 0.0880 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:16,704-Speed 3458.66 samples/sec Loss 10.5644 LearningRate 0.0880 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:19,663-Speed 3461.61 samples/sec Loss 10.3965 LearningRate 0.0880 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:22,639-Speed 3442.62 samples/sec Loss 10.4579 LearningRate 0.0880 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:25,599-Speed 3459.48 samples/sec Loss 10.5719 LearningRate 0.0880 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:28,562-Speed 3457.20 samples/sec Loss 10.5213 LearningRate 0.0879 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:10:31,525-Speed 3457.58 samples/sec Loss 10.5082 LearningRate 0.0879 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:34,486-Speed 3458.27 samples/sec Loss 10.2785 LearningRate 0.0879 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:37,449-Speed 3456.63 samples/sec Loss 10.4097 LearningRate 0.0879 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:40,409-Speed 3460.09 samples/sec Loss 10.4781 LearningRate 0.0879 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:43,374-Speed 3455.25 samples/sec Loss 10.3461 LearningRate 0.0879 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:46,336-Speed 3458.31 samples/sec Loss 10.5360 LearningRate 0.0878 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:49,302-Speed 3453.76 samples/sec Loss 10.4254 LearningRate 0.0878 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:52,266-Speed 3455.65 samples/sec Loss 10.3127 LearningRate 0.0878 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:55,232-Speed 3452.17 samples/sec Loss 10.5698 LearningRate 0.0878 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:10:58,195-Speed 3457.52 samples/sec Loss 10.4690 LearningRate 0.0878 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:01,175-Speed 3436.65 samples/sec Loss 10.5089 LearningRate 0.0878 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:11:04,150-Speed 3443.29 samples/sec Loss 10.5650 LearningRate 0.0877 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:11:07,113-Speed 3457.20 samples/sec Loss 10.4711 LearningRate 0.0877 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:11:10,087-Speed 3442.80 samples/sec Loss 10.2449 LearningRate 0.0877 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:13,063-Speed 3443.31 samples/sec Loss 10.4969 LearningRate 0.0877 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:16,077-Speed 3397.33 samples/sec Loss 10.3287 LearningRate 0.0877 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:19,043-Speed 3454.44 samples/sec Loss 10.3897 LearningRate 0.0877 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:22,006-Speed 3455.90 samples/sec Loss 10.3020 LearningRate 0.0876 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:11:24,961-Speed 3466.53 samples/sec Loss 10.3475 LearningRate 0.0876 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:27,924-Speed 3456.76 samples/sec Loss 10.2909 LearningRate 0.0876 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:30,887-Speed 3456.20 samples/sec Loss 10.2417 LearningRate 0.0876 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:33,853-Speed 3453.92 samples/sec Loss 10.2756 LearningRate 0.0876 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:36,833-Speed 3437.44 samples/sec Loss 10.2931 LearningRate 0.0876 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:39,803-Speed 3447.82 samples/sec Loss 10.3350 LearningRate 0.0875 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:42,772-Speed 3450.61 samples/sec Loss 10.2536 LearningRate 0.0875 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:45,734-Speed 3457.53 samples/sec Loss 10.1401 LearningRate 0.0875 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:48,715-Speed 3435.77 samples/sec Loss 10.3934 LearningRate 0.0875 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:11:51,702-Speed 3429.31 samples/sec Loss 10.2870 LearningRate 0.0875 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:11:54,683-Speed 3435.90 samples/sec Loss 10.2650 LearningRate 0.0875 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:11:57,655-Speed 3445.25 samples/sec Loss 10.1990 LearningRate 0.0874 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:00,635-Speed 3438.39 samples/sec Loss 10.4105 LearningRate 0.0874 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:03,605-Speed 3447.69 samples/sec Loss 10.1475 LearningRate 0.0874 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:06,575-Speed 3449.53 samples/sec Loss 10.1233 LearningRate 0.0874 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:09,543-Speed 3450.51 samples/sec Loss 10.3261 LearningRate 0.0874 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:12,510-Speed 3452.01 samples/sec Loss 10.2554 LearningRate 0.0874 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:15,477-Speed 3452.37 samples/sec Loss 10.1451 LearningRate 0.0873 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:18,448-Speed 3446.77 samples/sec Loss 10.2505 LearningRate 0.0873 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:21,414-Speed 3454.16 samples/sec Loss 10.1401 LearningRate 0.0873 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:24,398-Speed 3432.21 samples/sec Loss 10.2269 LearningRate 0.0873 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:12:27,368-Speed 3448.71 samples/sec Loss 10.3761 LearningRate 0.0873 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:12:30,341-Speed 3445.41 samples/sec Loss 10.2290 LearningRate 0.0873 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:12:33,314-Speed 3444.85 samples/sec Loss 10.2777 LearningRate 0.0872 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:36,280-Speed 3453.88 samples/sec Loss 9.8846 LearningRate 0.0872 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:39,260-Speed 3437.64 samples/sec Loss 10.1869 LearningRate 0.0872 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:42,240-Speed 3437.14 samples/sec Loss 10.0918 LearningRate 0.0872 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:45,207-Speed 3451.92 samples/sec Loss 10.2810 LearningRate 0.0872 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:48,186-Speed 3437.89 samples/sec Loss 10.1473 LearningRate 0.0872 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:51,157-Speed 3448.05 samples/sec Loss 10.1726 LearningRate 0.0871 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:54,127-Speed 3448.57 samples/sec Loss 10.2842 LearningRate 0.0871 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:12:57,100-Speed 3445.19 samples/sec Loss 10.3577 LearningRate 0.0871 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:00,079-Speed 3438.29 samples/sec Loss 10.2854 LearningRate 0.0871 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:03,052-Speed 3445.35 samples/sec Loss 10.2659 LearningRate 0.0871 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:06,026-Speed 3443.29 samples/sec Loss 10.2474 LearningRate 0.0871 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:09,000-Speed 3444.07 samples/sec Loss 10.3385 LearningRate 0.0870 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:11,962-Speed 3457.33 samples/sec Loss 10.2175 LearningRate 0.0870 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:14,938-Speed 3443.09 samples/sec Loss 10.0786 LearningRate 0.0870 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:17,914-Speed 3441.02 samples/sec Loss 10.1349 LearningRate 0.0870 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:20,883-Speed 3449.72 samples/sec Loss 10.2065 LearningRate 0.0870 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:23,859-Speed 3442.01 samples/sec Loss 10.0868 LearningRate 0.0870 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:26,843-Speed 3432.11 samples/sec Loss 10.1743 LearningRate 0.0869 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:29,817-Speed 3444.53 samples/sec Loss 10.0673 LearningRate 0.0869 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:32,787-Speed 3448.00 samples/sec Loss 10.3431 LearningRate 0.0869 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:35,756-Speed 3449.56 samples/sec Loss 10.2412 LearningRate 0.0869 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:38,725-Speed 3450.18 samples/sec Loss 10.1996 LearningRate 0.0869 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:41,709-Speed 3432.37 samples/sec Loss 10.1636 LearningRate 0.0869 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:44,683-Speed 3444.89 samples/sec Loss 10.1298 LearningRate 0.0869 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:47,656-Speed 3444.21 samples/sec Loss 10.0596 LearningRate 0.0868 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:50,648-Speed 3423.60 samples/sec Loss 10.2902 LearningRate 0.0868 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:53,662-Speed 3397.89 samples/sec Loss 10.1462 LearningRate 0.0868 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:13:56,623-Speed 3460.20 samples/sec Loss 10.0607 LearningRate 0.0868 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:13:59,597-Speed 3442.88 samples/sec Loss 10.2453 LearningRate 0.0868 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:02,568-Speed 3447.74 samples/sec Loss 10.1503 LearningRate 0.0868 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:05,540-Speed 3446.35 samples/sec Loss 9.9707 LearningRate 0.0867 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:08,511-Speed 3447.09 samples/sec Loss 10.1844 LearningRate 0.0867 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:11,491-Speed 3437.78 samples/sec Loss 10.2345 LearningRate 0.0867 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:14,460-Speed 3450.30 samples/sec Loss 10.2028 LearningRate 0.0867 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:17,455-Speed 3419.56 samples/sec Loss 10.1004 LearningRate 0.0867 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:20,432-Speed 3440.73 samples/sec Loss 9.9670 LearningRate 0.0867 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:23,420-Speed 3427.24 samples/sec Loss 9.9192 LearningRate 0.0866 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:14:26,391-Speed 3447.34 samples/sec Loss 10.3640 LearningRate 0.0866 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:14:29,342-Speed 3471.26 samples/sec Loss 10.1644 LearningRate 0.0866 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:32,312-Speed 3448.22 samples/sec Loss 10.1325 LearningRate 0.0866 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:35,284-Speed 3446.57 samples/sec Loss 10.1559 LearningRate 0.0866 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:38,260-Speed 3441.42 samples/sec Loss 10.1023 LearningRate 0.0866 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:41,246-Speed 3430.25 samples/sec Loss 10.2757 LearningRate 0.0865 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:44,221-Speed 3443.27 samples/sec Loss 10.0308 LearningRate 0.0865 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:47,189-Speed 3450.54 samples/sec Loss 9.9515 LearningRate 0.0865 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:50,172-Speed 3434.03 samples/sec Loss 10.0592 LearningRate 0.0865 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:53,139-Speed 3452.27 samples/sec Loss 10.0240 LearningRate 0.0865 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:56,108-Speed 3448.93 samples/sec Loss 10.0941 LearningRate 0.0865 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:14:59,078-Speed 3449.26 samples/sec Loss 10.0679 LearningRate 0.0864 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:15:02,071-Speed 3421.45 samples/sec Loss 10.0461 LearningRate 0.0864 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:15:45,403-[lfw][8000]XNorm: 22.481622 Training: 2022-04-27 02:15:45,404-[lfw][8000]Accuracy-Flip: 0.99300+-0.00379 Training: 2022-04-27 02:15:45,404-[lfw][8000]Accuracy-Highest: 0.99367 Training: 2022-04-27 02:16:35,626-[cfp_fp][8000]XNorm: 19.511639 Training: 2022-04-27 02:16:35,626-[cfp_fp][8000]Accuracy-Flip: 0.91271+-0.01837 Training: 2022-04-27 02:16:35,627-[cfp_fp][8000]Accuracy-Highest: 0.91271 Training: 2022-04-27 02:17:19,001-[agedb_30][8000]XNorm: 22.350999 Training: 2022-04-27 02:17:19,002-[agedb_30][8000]Accuracy-Flip: 0.95733+-0.00964 Training: 2022-04-27 02:17:19,002-[agedb_30][8000]Accuracy-Highest: 0.95733 Training: 2022-04-27 02:17:21,961-Speed 73.20 samples/sec Loss 10.0692 LearningRate 0.0864 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:24,934-Speed 3444.50 samples/sec Loss 10.0313 LearningRate 0.0864 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:27,897-Speed 3456.84 samples/sec Loss 9.8848 LearningRate 0.0864 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:30,869-Speed 3446.81 samples/sec Loss 10.2110 LearningRate 0.0864 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:33,865-Speed 3418.43 samples/sec Loss 9.9268 LearningRate 0.0863 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:36,835-Speed 3449.06 samples/sec Loss 10.1880 LearningRate 0.0863 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:39,802-Speed 3451.63 samples/sec Loss 10.0784 LearningRate 0.0863 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:42,770-Speed 3451.64 samples/sec Loss 10.0917 LearningRate 0.0863 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:45,740-Speed 3448.21 samples/sec Loss 9.8892 LearningRate 0.0863 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:17:48,708-Speed 3450.98 samples/sec Loss 9.9809 LearningRate 0.0863 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:17:51,675-Speed 3452.15 samples/sec Loss 10.0077 LearningRate 0.0862 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:17:54,648-Speed 3445.22 samples/sec Loss 10.1764 LearningRate 0.0862 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:17:57,629-Speed 3435.36 samples/sec Loss 10.1115 LearningRate 0.0862 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:00,595-Speed 3453.54 samples/sec Loss 10.0225 LearningRate 0.0862 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:03,566-Speed 3447.93 samples/sec Loss 9.9209 LearningRate 0.0862 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:06,533-Speed 3452.22 samples/sec Loss 9.9388 LearningRate 0.0862 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:09,497-Speed 3454.68 samples/sec Loss 10.0434 LearningRate 0.0861 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:12,487-Speed 3425.36 samples/sec Loss 10.1321 LearningRate 0.0861 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:15,465-Speed 3439.37 samples/sec Loss 10.0065 LearningRate 0.0861 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:18,432-Speed 3452.18 samples/sec Loss 10.0002 LearningRate 0.0861 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:18:21,396-Speed 3456.40 samples/sec Loss 9.9577 LearningRate 0.0861 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:24,361-Speed 3454.81 samples/sec Loss 9.9631 LearningRate 0.0861 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:27,324-Speed 3456.03 samples/sec Loss 9.9240 LearningRate 0.0860 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:30,286-Speed 3458.44 samples/sec Loss 9.9797 LearningRate 0.0860 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:33,248-Speed 3457.51 samples/sec Loss 9.7905 LearningRate 0.0860 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:36,215-Speed 3451.76 samples/sec Loss 9.9893 LearningRate 0.0860 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:39,179-Speed 3455.71 samples/sec Loss 9.7925 LearningRate 0.0860 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:42,149-Speed 3449.45 samples/sec Loss 9.9353 LearningRate 0.0860 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:45,113-Speed 3455.11 samples/sec Loss 9.8447 LearningRate 0.0860 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:48,075-Speed 3457.90 samples/sec Loss 9.8963 LearningRate 0.0859 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:51,036-Speed 3459.71 samples/sec Loss 9.6828 LearningRate 0.0859 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:18:53,982-Speed 3476.16 samples/sec Loss 9.8360 LearningRate 0.0859 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:56,938-Speed 3464.58 samples/sec Loss 9.7089 LearningRate 0.0859 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:18:59,898-Speed 3460.64 samples/sec Loss 9.9158 LearningRate 0.0859 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:02,859-Speed 3459.56 samples/sec Loss 9.9601 LearningRate 0.0859 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:05,825-Speed 3452.96 samples/sec Loss 10.0533 LearningRate 0.0858 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:08,786-Speed 3459.10 samples/sec Loss 9.9732 LearningRate 0.0858 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:11,746-Speed 3460.14 samples/sec Loss 9.8339 LearningRate 0.0858 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:14,699-Speed 3468.89 samples/sec Loss 9.8662 LearningRate 0.0858 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:17,667-Speed 3451.05 samples/sec Loss 9.9053 LearningRate 0.0858 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:20,627-Speed 3460.32 samples/sec Loss 9.9754 LearningRate 0.0858 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:23,609-Speed 3434.87 samples/sec Loss 10.0028 LearningRate 0.0857 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:26,578-Speed 3449.23 samples/sec Loss 10.0171 LearningRate 0.0857 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:29,556-Speed 3439.83 samples/sec Loss 9.7729 LearningRate 0.0857 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:32,523-Speed 3452.03 samples/sec Loss 9.7875 LearningRate 0.0857 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:35,485-Speed 3457.70 samples/sec Loss 9.9340 LearningRate 0.0857 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:38,445-Speed 3460.30 samples/sec Loss 9.9952 LearningRate 0.0857 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:41,402-Speed 3464.37 samples/sec Loss 9.8700 LearningRate 0.0856 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:19:44,361-Speed 3461.64 samples/sec Loss 9.8802 LearningRate 0.0856 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:47,320-Speed 3461.93 samples/sec Loss 9.8040 LearningRate 0.0856 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:50,280-Speed 3460.30 samples/sec Loss 9.9300 LearningRate 0.0856 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:53,241-Speed 3458.73 samples/sec Loss 9.7443 LearningRate 0.0856 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:56,199-Speed 3462.98 samples/sec Loss 9.8061 LearningRate 0.0856 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:19:59,158-Speed 3461.27 samples/sec Loss 9.8740 LearningRate 0.0855 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:02,119-Speed 3458.93 samples/sec Loss 9.6585 LearningRate 0.0855 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:05,087-Speed 3450.25 samples/sec Loss 9.8344 LearningRate 0.0855 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:08,045-Speed 3462.75 samples/sec Loss 9.7535 LearningRate 0.0855 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:11,003-Speed 3463.28 samples/sec Loss 9.9025 LearningRate 0.0855 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:13,961-Speed 3462.24 samples/sec Loss 9.6444 LearningRate 0.0855 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:20:16,943-Speed 3435.29 samples/sec Loss 9.8880 LearningRate 0.0854 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:20:19,891-Speed 3474.36 samples/sec Loss 10.0078 LearningRate 0.0854 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:22,849-Speed 3462.38 samples/sec Loss 9.8217 LearningRate 0.0854 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:25,810-Speed 3458.92 samples/sec Loss 9.7415 LearningRate 0.0854 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:28,776-Speed 3453.35 samples/sec Loss 9.9836 LearningRate 0.0854 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:31,741-Speed 3454.02 samples/sec Loss 9.8748 LearningRate 0.0854 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:34,702-Speed 3459.64 samples/sec Loss 9.6805 LearningRate 0.0853 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:37,664-Speed 3457.66 samples/sec Loss 9.7511 LearningRate 0.0853 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:40,623-Speed 3461.58 samples/sec Loss 9.7492 LearningRate 0.0853 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:43,593-Speed 3449.45 samples/sec Loss 9.6790 LearningRate 0.0853 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:46,556-Speed 3456.19 samples/sec Loss 9.6578 LearningRate 0.0853 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:20:49,517-Speed 3458.96 samples/sec Loss 9.6822 LearningRate 0.0853 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:20:52,481-Speed 3456.12 samples/sec Loss 9.8248 LearningRate 0.0853 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:20:55,444-Speed 3456.18 samples/sec Loss 9.7289 LearningRate 0.0852 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:20:58,419-Speed 3443.19 samples/sec Loss 9.8617 LearningRate 0.0852 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:01,386-Speed 3452.44 samples/sec Loss 9.7760 LearningRate 0.0852 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:04,364-Speed 3438.94 samples/sec Loss 9.6532 LearningRate 0.0852 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:07,328-Speed 3456.22 samples/sec Loss 9.7384 LearningRate 0.0852 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:10,289-Speed 3458.93 samples/sec Loss 9.7544 LearningRate 0.0852 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:13,258-Speed 3449.92 samples/sec Loss 9.7269 LearningRate 0.0851 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:16,220-Speed 3457.62 samples/sec Loss 9.7420 LearningRate 0.0851 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:19,170-Speed 3471.56 samples/sec Loss 9.6505 LearningRate 0.0851 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:22,134-Speed 3455.84 samples/sec Loss 9.7786 LearningRate 0.0851 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-27 02:21:25,094-Speed 3459.66 samples/sec Loss 9.7086 LearningRate 0.0851 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:28,063-Speed 3451.33 samples/sec Loss 9.6077 LearningRate 0.0851 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:31,027-Speed 3454.93 samples/sec Loss 9.6119 LearningRate 0.0850 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:33,992-Speed 3454.67 samples/sec Loss 9.5182 LearningRate 0.0850 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:36,958-Speed 3453.51 samples/sec Loss 9.6045 LearningRate 0.0850 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:39,923-Speed 3454.64 samples/sec Loss 9.6888 LearningRate 0.0850 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:42,889-Speed 3452.86 samples/sec Loss 9.4488 LearningRate 0.0850 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:45,854-Speed 3454.92 samples/sec Loss 9.5248 LearningRate 0.0850 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:21:48,820-Speed 3452.55 samples/sec Loss 9.6814 LearningRate 0.0849 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:21:51,792-Speed 3446.48 samples/sec Loss 9.6512 LearningRate 0.0849 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:21:54,763-Speed 3447.31 samples/sec Loss 9.5142 LearningRate 0.0849 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:21:57,729-Speed 3454.16 samples/sec Loss 9.6403 LearningRate 0.0849 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:00,708-Speed 3438.34 samples/sec Loss 9.5796 LearningRate 0.0849 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:03,678-Speed 3448.25 samples/sec Loss 9.6602 LearningRate 0.0849 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:06,650-Speed 3445.50 samples/sec Loss 9.7572 LearningRate 0.0848 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:09,606-Speed 3465.01 samples/sec Loss 9.6384 LearningRate 0.0848 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:12,574-Speed 3451.51 samples/sec Loss 9.5106 LearningRate 0.0848 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:15,584-Speed 3402.47 samples/sec Loss 9.6941 LearningRate 0.0848 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:18,553-Speed 3450.45 samples/sec Loss 9.5933 LearningRate 0.0848 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:21,517-Speed 3455.04 samples/sec Loss 9.6645 LearningRate 0.0848 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:24,490-Speed 3445.04 samples/sec Loss 9.6580 LearningRate 0.0847 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:27,459-Speed 3450.78 samples/sec Loss 9.7104 LearningRate 0.0847 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:30,425-Speed 3453.70 samples/sec Loss 9.6876 LearningRate 0.0847 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:33,389-Speed 3454.65 samples/sec Loss 9.5230 LearningRate 0.0847 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:36,359-Speed 3448.65 samples/sec Loss 9.8033 LearningRate 0.0847 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:39,328-Speed 3449.65 samples/sec Loss 9.6954 LearningRate 0.0847 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:42,313-Speed 3431.39 samples/sec Loss 9.7620 LearningRate 0.0847 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:22:45,269-Speed 3464.92 samples/sec Loss 9.6830 LearningRate 0.0846 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:48,241-Speed 3446.64 samples/sec Loss 9.7062 LearningRate 0.0846 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:51,209-Speed 3450.56 samples/sec Loss 9.6835 LearningRate 0.0846 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:54,190-Speed 3436.51 samples/sec Loss 9.5929 LearningRate 0.0846 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:22:57,172-Speed 3434.94 samples/sec Loss 9.7697 LearningRate 0.0846 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:00,144-Speed 3445.97 samples/sec Loss 9.7771 LearningRate 0.0846 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:03,111-Speed 3452.67 samples/sec Loss 9.6915 LearningRate 0.0845 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:06,078-Speed 3451.81 samples/sec Loss 9.7020 LearningRate 0.0845 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:09,030-Speed 3469.30 samples/sec Loss 9.5135 LearningRate 0.0845 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:12,000-Speed 3448.17 samples/sec Loss 9.5059 LearningRate 0.0845 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:14,975-Speed 3443.37 samples/sec Loss 9.6408 LearningRate 0.0845 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:17,944-Speed 3449.95 samples/sec Loss 9.5421 LearningRate 0.0845 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:20,915-Speed 3447.86 samples/sec Loss 9.6718 LearningRate 0.0844 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:23,894-Speed 3437.56 samples/sec Loss 9.5909 LearningRate 0.0844 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:26,874-Speed 3436.81 samples/sec Loss 9.4600 LearningRate 0.0844 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:29,838-Speed 3455.43 samples/sec Loss 9.5935 LearningRate 0.0844 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:32,803-Speed 3454.99 samples/sec Loss 9.5456 LearningRate 0.0844 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:35,768-Speed 3454.27 samples/sec Loss 9.6474 LearningRate 0.0844 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:23:38,748-Speed 3436.56 samples/sec Loss 9.5867 LearningRate 0.0843 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:41,720-Speed 3446.94 samples/sec Loss 9.6007 LearningRate 0.0843 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:44,684-Speed 3454.73 samples/sec Loss 9.5647 LearningRate 0.0843 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:47,654-Speed 3450.10 samples/sec Loss 9.5831 LearningRate 0.0843 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:50,663-Speed 3403.04 samples/sec Loss 9.7703 LearningRate 0.0843 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:53,636-Speed 3445.52 samples/sec Loss 9.6329 LearningRate 0.0843 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:56,602-Speed 3453.20 samples/sec Loss 9.5410 LearningRate 0.0842 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:23:59,572-Speed 3448.57 samples/sec Loss 9.6053 LearningRate 0.0842 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:02,544-Speed 3446.28 samples/sec Loss 9.4859 LearningRate 0.0842 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:05,512-Speed 3450.61 samples/sec Loss 9.5707 LearningRate 0.0842 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:08,480-Speed 3451.16 samples/sec Loss 9.5174 LearningRate 0.0842 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:11,436-Speed 3464.92 samples/sec Loss 9.5813 LearningRate 0.0842 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:14,431-Speed 3420.09 samples/sec Loss 9.5239 LearningRate 0.0842 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:17,415-Speed 3433.83 samples/sec Loss 9.6059 LearningRate 0.0841 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:20,390-Speed 3442.67 samples/sec Loss 9.5937 LearningRate 0.0841 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:23,363-Speed 3445.09 samples/sec Loss 9.5035 LearningRate 0.0841 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:26,333-Speed 3448.70 samples/sec Loss 9.5874 LearningRate 0.0841 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:29,299-Speed 3452.61 samples/sec Loss 9.4835 LearningRate 0.0841 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:32,265-Speed 3453.65 samples/sec Loss 9.5529 LearningRate 0.0841 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:35,257-Speed 3423.01 samples/sec Loss 9.5599 LearningRate 0.0840 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:38,230-Speed 3445.62 samples/sec Loss 9.5257 LearningRate 0.0840 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:41,200-Speed 3448.23 samples/sec Loss 9.6115 LearningRate 0.0840 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:44,168-Speed 3451.24 samples/sec Loss 9.5972 LearningRate 0.0840 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:47,136-Speed 3451.35 samples/sec Loss 9.5231 LearningRate 0.0840 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:50,102-Speed 3453.66 samples/sec Loss 9.4963 LearningRate 0.0840 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:53,072-Speed 3448.25 samples/sec Loss 9.4817 LearningRate 0.0839 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:24:56,030-Speed 3462.42 samples/sec Loss 9.5215 LearningRate 0.0839 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:24:58,999-Speed 3449.75 samples/sec Loss 9.5892 LearningRate 0.0839 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:01,974-Speed 3442.89 samples/sec Loss 9.4250 LearningRate 0.0839 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:04,981-Speed 3406.53 samples/sec Loss 9.4262 LearningRate 0.0839 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:07,956-Speed 3442.73 samples/sec Loss 9.6550 LearningRate 0.0839 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:10,928-Speed 3447.01 samples/sec Loss 9.4395 LearningRate 0.0838 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:13,895-Speed 3451.29 samples/sec Loss 9.5077 LearningRate 0.0838 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:16,865-Speed 3449.06 samples/sec Loss 9.4784 LearningRate 0.0838 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:19,836-Speed 3447.48 samples/sec Loss 9.4517 LearningRate 0.0838 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:22,841-Speed 3407.98 samples/sec Loss 9.5163 LearningRate 0.0838 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:25,831-Speed 3425.47 samples/sec Loss 9.6028 LearningRate 0.0838 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:25:28,816-Speed 3431.40 samples/sec Loss 9.5215 LearningRate 0.0837 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:25:31,789-Speed 3445.30 samples/sec Loss 9.6140 LearningRate 0.0837 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:25:34,765-Speed 3441.87 samples/sec Loss 9.4768 LearningRate 0.0837 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:25:37,732-Speed 3451.76 samples/sec Loss 9.2860 LearningRate 0.0837 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:25:40,728-Speed 3418.59 samples/sec Loss 9.6005 LearningRate 0.0837 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:43,698-Speed 3449.20 samples/sec Loss 9.5165 LearningRate 0.0837 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:46,685-Speed 3429.39 samples/sec Loss 9.4554 LearningRate 0.0837 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:49,665-Speed 3436.84 samples/sec Loss 9.4322 LearningRate 0.0836 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:52,636-Speed 3447.58 samples/sec Loss 9.5387 LearningRate 0.0836 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:55,606-Speed 3448.21 samples/sec Loss 9.3915 LearningRate 0.0836 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:25:58,576-Speed 3448.31 samples/sec Loss 9.3292 LearningRate 0.0836 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:01,554-Speed 3439.69 samples/sec Loss 9.3896 LearningRate 0.0836 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:04,522-Speed 3450.56 samples/sec Loss 9.4528 LearningRate 0.0836 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:07,493-Speed 3447.76 samples/sec Loss 9.4742 LearningRate 0.0835 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:10,462-Speed 3450.34 samples/sec Loss 9.2487 LearningRate 0.0835 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:26:13,441-Speed 3437.30 samples/sec Loss 9.3288 LearningRate 0.0835 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:26:16,413-Speed 3446.85 samples/sec Loss 9.1837 LearningRate 0.0835 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:19,383-Speed 3448.61 samples/sec Loss 9.3296 LearningRate 0.0835 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:22,358-Speed 3442.09 samples/sec Loss 9.3860 LearningRate 0.0835 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:25,333-Speed 3443.87 samples/sec Loss 9.5475 LearningRate 0.0834 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:28,313-Speed 3436.31 samples/sec Loss 9.3958 LearningRate 0.0834 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:31,291-Speed 3440.16 samples/sec Loss 9.5296 LearningRate 0.0834 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:34,267-Speed 3441.34 samples/sec Loss 9.2516 LearningRate 0.0834 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:37,247-Speed 3437.47 samples/sec Loss 9.1116 LearningRate 0.0834 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:40,235-Speed 3427.59 samples/sec Loss 9.2642 LearningRate 0.0834 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:43,245-Speed 3401.85 samples/sec Loss 9.3862 LearningRate 0.0833 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:46,194-Speed 3473.33 samples/sec Loss 9.4544 LearningRate 0.0833 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:49,162-Speed 3452.72 samples/sec Loss 9.3918 LearningRate 0.0833 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:52,137-Speed 3442.28 samples/sec Loss 9.2695 LearningRate 0.0833 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:55,105-Speed 3450.80 samples/sec Loss 9.2940 LearningRate 0.0833 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:26:58,074-Speed 3449.98 samples/sec Loss 9.4831 LearningRate 0.0833 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:27:01,044-Speed 3449.26 samples/sec Loss 9.4688 LearningRate 0.0833 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:27:04,005-Speed 3459.35 samples/sec Loss 9.3840 LearningRate 0.0832 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:27:06,971-Speed 3453.08 samples/sec Loss 9.3375 LearningRate 0.0832 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:27:09,946-Speed 3443.26 samples/sec Loss 9.4971 LearningRate 0.0832 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:27:12,916-Speed 3449.24 samples/sec Loss 9.3752 LearningRate 0.0832 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:27:56,368-[lfw][10000]XNorm: 22.190784 Training: 2022-04-27 02:27:56,369-[lfw][10000]Accuracy-Flip: 0.99483+-0.00263 Training: 2022-04-27 02:27:56,369-[lfw][10000]Accuracy-Highest: 0.99483 Training: 2022-04-27 02:28:46,783-[cfp_fp][10000]XNorm: 19.089473 Training: 2022-04-27 02:28:46,784-[cfp_fp][10000]Accuracy-Flip: 0.90543+-0.01380 Training: 2022-04-27 02:28:46,784-[cfp_fp][10000]Accuracy-Highest: 0.91271 Training: 2022-04-27 02:29:30,045-[agedb_30][10000]XNorm: 22.184881 Training: 2022-04-27 02:29:30,045-[agedb_30][10000]Accuracy-Flip: 0.95550+-0.00827 Training: 2022-04-27 02:29:30,046-[agedb_30][10000]Accuracy-Highest: 0.95733 Training: 2022-04-27 02:29:32,989-Speed 73.10 samples/sec Loss 9.2891 LearningRate 0.0832 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:35,934-Speed 3477.91 samples/sec Loss 9.2535 LearningRate 0.0832 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:38,875-Speed 3482.51 samples/sec Loss 9.3335 LearningRate 0.0831 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:41,835-Speed 3460.70 samples/sec Loss 9.3081 LearningRate 0.0831 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:44,778-Speed 3480.44 samples/sec Loss 9.3989 LearningRate 0.0831 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:47,726-Speed 3473.48 samples/sec Loss 9.2462 LearningRate 0.0831 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:29:50,694-Speed 3451.38 samples/sec Loss 9.3533 LearningRate 0.0831 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:29:53,648-Speed 3466.87 samples/sec Loss 9.3950 LearningRate 0.0831 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:29:56,606-Speed 3462.39 samples/sec Loss 9.4901 LearningRate 0.0830 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:29:59,561-Speed 3466.76 samples/sec Loss 9.5309 LearningRate 0.0830 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:02,517-Speed 3464.84 samples/sec Loss 9.3611 LearningRate 0.0830 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:05,475-Speed 3463.12 samples/sec Loss 9.3376 LearningRate 0.0830 Epoch: 1 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:08,435-Speed 3460.21 samples/sec Loss 9.3695 LearningRate 0.0830 Epoch: 1 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:11,403-Speed 3451.03 samples/sec Loss 9.3354 LearningRate 0.0830 Epoch: 1 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:14,360-Speed 3463.11 samples/sec Loss 9.4076 LearningRate 0.0829 Epoch: 1 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:17,312-Speed 3470.04 samples/sec Loss 9.4701 LearningRate 0.0829 Epoch: 1 Global Step: 10160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:20,283-Speed 3447.35 samples/sec Loss 9.2816 LearningRate 0.0829 Epoch: 1 Global Step: 10170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:23,247-Speed 3455.02 samples/sec Loss 9.2815 LearningRate 0.0829 Epoch: 1 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:26,219-Speed 3446.81 samples/sec Loss 9.3077 LearningRate 0.0829 Epoch: 1 Global Step: 10190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:29,189-Speed 3448.09 samples/sec Loss 9.3229 LearningRate 0.0829 Epoch: 1 Global Step: 10200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:32,156-Speed 3452.47 samples/sec Loss 9.1343 LearningRate 0.0828 Epoch: 1 Global Step: 10210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:35,124-Speed 3451.43 samples/sec Loss 9.2167 LearningRate 0.0828 Epoch: 1 Global Step: 10220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:38,097-Speed 3445.17 samples/sec Loss 9.1998 LearningRate 0.0828 Epoch: 1 Global Step: 10230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:41,068-Speed 3447.12 samples/sec Loss 9.3163 LearningRate 0.0828 Epoch: 1 Global Step: 10240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:44,040-Speed 3446.72 samples/sec Loss 9.1943 LearningRate 0.0828 Epoch: 1 Global Step: 10250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 02:30:47,024-Speed 3431.89 samples/sec Loss 9.3244 LearningRate 0.0828 Epoch: 1 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:49,994-Speed 3448.65 samples/sec Loss 9.1728 LearningRate 0.0828 Epoch: 1 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 02:30:52,969-Speed 3442.99 samples/sec Loss 9.2947 LearningRate 0.0827 Epoch: 1 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:30:55,932-Speed 3456.21 samples/sec Loss 9.3106 LearningRate 0.0827 Epoch: 1 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:30:58,899-Speed 3452.98 samples/sec Loss 9.3902 LearningRate 0.0827 Epoch: 1 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:01,869-Speed 3448.08 samples/sec Loss 9.2540 LearningRate 0.0827 Epoch: 1 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:04,834-Speed 3454.92 samples/sec Loss 9.3407 LearningRate 0.0827 Epoch: 1 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:07,797-Speed 3456.89 samples/sec Loss 9.2041 LearningRate 0.0827 Epoch: 1 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:10,758-Speed 3459.36 samples/sec Loss 9.1264 LearningRate 0.0826 Epoch: 1 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:13,726-Speed 3450.79 samples/sec Loss 9.3196 LearningRate 0.0826 Epoch: 1 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:16,680-Speed 3466.98 samples/sec Loss 9.3807 LearningRate 0.0826 Epoch: 1 Global Step: 10360 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:31:19,638-Speed 3463.45 samples/sec Loss 9.4370 LearningRate 0.0826 Epoch: 1 Global Step: 10370 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:31:22,591-Speed 3467.73 samples/sec Loss 9.4078 LearningRate 0.0826 Epoch: 1 Global Step: 10380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:31:25,549-Speed 3463.39 samples/sec Loss 9.3608 LearningRate 0.0826 Epoch: 1 Global Step: 10390 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:31:28,494-Speed 3477.90 samples/sec Loss 9.3442 LearningRate 0.0825 Epoch: 1 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:31,447-Speed 3467.84 samples/sec Loss 9.1051 LearningRate 0.0825 Epoch: 1 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:34,402-Speed 3466.99 samples/sec Loss 9.3177 LearningRate 0.0825 Epoch: 1 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:37,354-Speed 3469.76 samples/sec Loss 9.2233 LearningRate 0.0825 Epoch: 1 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:40,313-Speed 3460.84 samples/sec Loss 9.1675 LearningRate 0.0825 Epoch: 1 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:43,269-Speed 3465.48 samples/sec Loss 9.2768 LearningRate 0.0825 Epoch: 1 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:46,232-Speed 3456.47 samples/sec Loss 9.4785 LearningRate 0.0824 Epoch: 1 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:49,187-Speed 3465.68 samples/sec Loss 9.4253 LearningRate 0.0824 Epoch: 1 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:52,143-Speed 3465.17 samples/sec Loss 9.1086 LearningRate 0.0824 Epoch: 1 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:55,097-Speed 3467.35 samples/sec Loss 9.1740 LearningRate 0.0824 Epoch: 1 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:31:58,050-Speed 3468.71 samples/sec Loss 9.2070 LearningRate 0.0824 Epoch: 1 Global Step: 10500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:32:01,006-Speed 3465.65 samples/sec Loss 9.1098 LearningRate 0.0824 Epoch: 1 Global Step: 10510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:32:03,953-Speed 3475.29 samples/sec Loss 9.1563 LearningRate 0.0824 Epoch: 1 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:06,907-Speed 3467.02 samples/sec Loss 9.1144 LearningRate 0.0823 Epoch: 1 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:09,863-Speed 3464.87 samples/sec Loss 9.2930 LearningRate 0.0823 Epoch: 1 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:12,853-Speed 3425.08 samples/sec Loss 9.3373 LearningRate 0.0823 Epoch: 1 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:15,815-Speed 3458.81 samples/sec Loss 9.1407 LearningRate 0.0823 Epoch: 1 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:18,767-Speed 3469.25 samples/sec Loss 9.1812 LearningRate 0.0823 Epoch: 1 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:21,731-Speed 3456.11 samples/sec Loss 9.0756 LearningRate 0.0823 Epoch: 1 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:24,685-Speed 3466.95 samples/sec Loss 9.1564 LearningRate 0.0822 Epoch: 1 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:27,640-Speed 3465.65 samples/sec Loss 9.0358 LearningRate 0.0822 Epoch: 1 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:30,596-Speed 3465.64 samples/sec Loss 9.1526 LearningRate 0.0822 Epoch: 1 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:33,549-Speed 3467.84 samples/sec Loss 9.1457 LearningRate 0.0822 Epoch: 1 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:36,511-Speed 3458.60 samples/sec Loss 8.9709 LearningRate 0.0822 Epoch: 1 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:39,478-Speed 3451.75 samples/sec Loss 9.1944 LearningRate 0.0822 Epoch: 1 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:42,434-Speed 3464.79 samples/sec Loss 9.1838 LearningRate 0.0821 Epoch: 1 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:45,388-Speed 3466.83 samples/sec Loss 9.2665 LearningRate 0.0821 Epoch: 1 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:48,354-Speed 3454.60 samples/sec Loss 9.2384 LearningRate 0.0821 Epoch: 1 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:51,314-Speed 3460.15 samples/sec Loss 9.2665 LearningRate 0.0821 Epoch: 1 Global Step: 10680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:32:54,268-Speed 3466.67 samples/sec Loss 9.1575 LearningRate 0.0821 Epoch: 1 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:32:57,224-Speed 3464.97 samples/sec Loss 9.1672 LearningRate 0.0821 Epoch: 1 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:00,182-Speed 3463.00 samples/sec Loss 8.9677 LearningRate 0.0821 Epoch: 1 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:03,143-Speed 3459.25 samples/sec Loss 9.0035 LearningRate 0.0820 Epoch: 1 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:06,106-Speed 3456.65 samples/sec Loss 9.1638 LearningRate 0.0820 Epoch: 1 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:09,062-Speed 3464.49 samples/sec Loss 9.2630 LearningRate 0.0820 Epoch: 1 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:12,020-Speed 3462.27 samples/sec Loss 9.1905 LearningRate 0.0820 Epoch: 1 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:14,988-Speed 3452.05 samples/sec Loss 9.2254 LearningRate 0.0820 Epoch: 1 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:17,954-Speed 3453.45 samples/sec Loss 9.0129 LearningRate 0.0820 Epoch: 1 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:20,915-Speed 3458.96 samples/sec Loss 9.2196 LearningRate 0.0819 Epoch: 1 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:23,872-Speed 3463.24 samples/sec Loss 9.0397 LearningRate 0.0819 Epoch: 1 Global Step: 10790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:33:26,828-Speed 3464.54 samples/sec Loss 9.1843 LearningRate 0.0819 Epoch: 1 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:29,785-Speed 3463.95 samples/sec Loss 9.0446 LearningRate 0.0819 Epoch: 1 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:32,746-Speed 3459.49 samples/sec Loss 9.0373 LearningRate 0.0819 Epoch: 1 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:35,705-Speed 3460.51 samples/sec Loss 8.9260 LearningRate 0.0819 Epoch: 1 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:38,664-Speed 3462.23 samples/sec Loss 9.0704 LearningRate 0.0818 Epoch: 1 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:41,626-Speed 3458.77 samples/sec Loss 9.2277 LearningRate 0.0818 Epoch: 1 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:44,583-Speed 3463.83 samples/sec Loss 9.3381 LearningRate 0.0818 Epoch: 1 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:47,538-Speed 3465.32 samples/sec Loss 9.0980 LearningRate 0.0818 Epoch: 1 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:50,500-Speed 3458.81 samples/sec Loss 9.1473 LearningRate 0.0818 Epoch: 1 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:53,458-Speed 3461.63 samples/sec Loss 9.2201 LearningRate 0.0818 Epoch: 1 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:56,405-Speed 3475.40 samples/sec Loss 9.1369 LearningRate 0.0817 Epoch: 1 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:33:59,351-Speed 3476.79 samples/sec Loss 9.1255 LearningRate 0.0817 Epoch: 1 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:02,310-Speed 3461.14 samples/sec Loss 8.9893 LearningRate 0.0817 Epoch: 1 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:05,277-Speed 3452.08 samples/sec Loss 9.1727 LearningRate 0.0817 Epoch: 1 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:08,251-Speed 3445.03 samples/sec Loss 9.1715 LearningRate 0.0817 Epoch: 1 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:11,209-Speed 3462.65 samples/sec Loss 9.3148 LearningRate 0.0817 Epoch: 1 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:14,164-Speed 3465.93 samples/sec Loss 9.0534 LearningRate 0.0817 Epoch: 1 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:17,124-Speed 3460.35 samples/sec Loss 9.0863 LearningRate 0.0816 Epoch: 1 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:20,085-Speed 3458.50 samples/sec Loss 9.1506 LearningRate 0.0816 Epoch: 1 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:23,052-Speed 3452.57 samples/sec Loss 9.1438 LearningRate 0.0816 Epoch: 1 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:26,020-Speed 3450.84 samples/sec Loss 9.2623 LearningRate 0.0816 Epoch: 1 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:34:28,993-Speed 3445.45 samples/sec Loss 9.1859 LearningRate 0.0816 Epoch: 1 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:31,955-Speed 3457.29 samples/sec Loss 9.1339 LearningRate 0.0816 Epoch: 1 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:34,940-Speed 3431.57 samples/sec Loss 8.9219 LearningRate 0.0815 Epoch: 1 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:37,912-Speed 3446.48 samples/sec Loss 9.0326 LearningRate 0.0815 Epoch: 1 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:40,887-Speed 3443.00 samples/sec Loss 9.1718 LearningRate 0.0815 Epoch: 1 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:43,861-Speed 3445.45 samples/sec Loss 9.0993 LearningRate 0.0815 Epoch: 1 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:46,825-Speed 3455.67 samples/sec Loss 9.0670 LearningRate 0.0815 Epoch: 1 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:49,784-Speed 3460.75 samples/sec Loss 9.0391 LearningRate 0.0815 Epoch: 1 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:52,744-Speed 3460.64 samples/sec Loss 9.0347 LearningRate 0.0814 Epoch: 1 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:55,706-Speed 3457.30 samples/sec Loss 9.0816 LearningRate 0.0814 Epoch: 1 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:34:58,676-Speed 3449.26 samples/sec Loss 9.0009 LearningRate 0.0814 Epoch: 1 Global Step: 11110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:35:01,641-Speed 3454.53 samples/sec Loss 8.9568 LearningRate 0.0814 Epoch: 1 Global Step: 11120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:35:04,590-Speed 3472.84 samples/sec Loss 9.1947 LearningRate 0.0814 Epoch: 1 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:07,552-Speed 3457.76 samples/sec Loss 9.1268 LearningRate 0.0814 Epoch: 1 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:10,522-Speed 3449.44 samples/sec Loss 9.1328 LearningRate 0.0814 Epoch: 1 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:13,483-Speed 3459.06 samples/sec Loss 9.0296 LearningRate 0.0813 Epoch: 1 Global Step: 11160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:16,444-Speed 3458.71 samples/sec Loss 9.2550 LearningRate 0.0813 Epoch: 1 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:19,403-Speed 3461.75 samples/sec Loss 9.1382 LearningRate 0.0813 Epoch: 1 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:22,384-Speed 3435.23 samples/sec Loss 9.1580 LearningRate 0.0813 Epoch: 1 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:25,349-Speed 3454.52 samples/sec Loss 9.1952 LearningRate 0.0813 Epoch: 1 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:28,314-Speed 3454.99 samples/sec Loss 9.1366 LearningRate 0.0813 Epoch: 1 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:31,284-Speed 3448.87 samples/sec Loss 9.1156 LearningRate 0.0812 Epoch: 1 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:34,249-Speed 3454.55 samples/sec Loss 8.9286 LearningRate 0.0812 Epoch: 1 Global Step: 11230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:35:37,199-Speed 3472.54 samples/sec Loss 8.9594 LearningRate 0.0812 Epoch: 1 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:40,161-Speed 3457.19 samples/sec Loss 9.0871 LearningRate 0.0812 Epoch: 1 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:43,125-Speed 3455.58 samples/sec Loss 8.9413 LearningRate 0.0812 Epoch: 1 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:46,085-Speed 3460.57 samples/sec Loss 9.1832 LearningRate 0.0812 Epoch: 1 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:49,069-Speed 3432.67 samples/sec Loss 9.1366 LearningRate 0.0811 Epoch: 1 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:52,027-Speed 3462.08 samples/sec Loss 8.9133 LearningRate 0.0811 Epoch: 1 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:54,990-Speed 3456.84 samples/sec Loss 8.9714 LearningRate 0.0811 Epoch: 1 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:35:57,954-Speed 3456.05 samples/sec Loss 9.0260 LearningRate 0.0811 Epoch: 1 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:00,923-Speed 3450.00 samples/sec Loss 9.0391 LearningRate 0.0811 Epoch: 1 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:03,887-Speed 3455.25 samples/sec Loss 9.0272 LearningRate 0.0811 Epoch: 1 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:06,845-Speed 3462.74 samples/sec Loss 9.0132 LearningRate 0.0811 Epoch: 1 Global Step: 11340 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:36:09,805-Speed 3460.21 samples/sec Loss 9.0413 LearningRate 0.0810 Epoch: 1 Global Step: 11350 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:36:12,757-Speed 3470.10 samples/sec Loss 9.1754 LearningRate 0.0810 Epoch: 1 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:15,795-Speed 3370.44 samples/sec Loss 8.9825 LearningRate 0.0810 Epoch: 1 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:28,798-Speed 787.63 samples/sec Loss 8.6045 LearningRate 0.0810 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:31,761-Speed 3456.39 samples/sec Loss 8.4724 LearningRate 0.0810 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:34,729-Speed 3451.88 samples/sec Loss 8.5073 LearningRate 0.0810 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:37,750-Speed 3390.23 samples/sec Loss 8.3042 LearningRate 0.0809 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:40,732-Speed 3434.35 samples/sec Loss 8.3410 LearningRate 0.0809 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:43,707-Speed 3443.62 samples/sec Loss 8.3430 LearningRate 0.0809 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:46,682-Speed 3442.62 samples/sec Loss 8.4696 LearningRate 0.0809 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:49,654-Speed 3446.69 samples/sec Loss 8.2999 LearningRate 0.0809 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:36:52,635-Speed 3435.31 samples/sec Loss 8.3283 LearningRate 0.0809 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:36:55,613-Speed 3439.56 samples/sec Loss 8.5107 LearningRate 0.0808 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:36:58,583-Speed 3448.42 samples/sec Loss 8.3868 LearningRate 0.0808 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:01,554-Speed 3447.94 samples/sec Loss 8.5992 LearningRate 0.0808 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:04,542-Speed 3427.69 samples/sec Loss 8.4515 LearningRate 0.0808 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:07,548-Speed 3407.53 samples/sec Loss 8.5580 LearningRate 0.0808 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:10,541-Speed 3422.06 samples/sec Loss 8.5337 LearningRate 0.0808 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:13,522-Speed 3436.43 samples/sec Loss 8.4040 LearningRate 0.0808 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:16,503-Speed 3435.72 samples/sec Loss 8.5570 LearningRate 0.0807 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:19,487-Speed 3433.34 samples/sec Loss 8.6215 LearningRate 0.0807 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:22,465-Speed 3438.71 samples/sec Loss 8.7114 LearningRate 0.0807 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:25,443-Speed 3439.21 samples/sec Loss 8.5851 LearningRate 0.0807 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:28,412-Speed 3450.33 samples/sec Loss 8.4642 LearningRate 0.0807 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:31,406-Speed 3420.77 samples/sec Loss 8.6710 LearningRate 0.0807 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:34,386-Speed 3436.49 samples/sec Loss 8.6856 LearningRate 0.0806 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:37,371-Speed 3431.85 samples/sec Loss 8.7338 LearningRate 0.0806 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:40,360-Speed 3427.07 samples/sec Loss 8.5024 LearningRate 0.0806 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:43,349-Speed 3426.07 samples/sec Loss 8.5492 LearningRate 0.0806 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:46,408-Speed 3348.11 samples/sec Loss 8.4412 LearningRate 0.0806 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:49,402-Speed 3421.44 samples/sec Loss 8.6084 LearningRate 0.0806 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:52,390-Speed 3428.19 samples/sec Loss 8.6236 LearningRate 0.0805 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:55,372-Speed 3434.76 samples/sec Loss 8.5805 LearningRate 0.0805 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:37:58,347-Speed 3443.05 samples/sec Loss 8.6169 LearningRate 0.0805 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:01,352-Speed 3408.28 samples/sec Loss 8.5896 LearningRate 0.0805 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:04,333-Speed 3436.10 samples/sec Loss 8.4465 LearningRate 0.0805 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:07,319-Speed 3429.47 samples/sec Loss 8.6017 LearningRate 0.0805 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:10,292-Speed 3445.73 samples/sec Loss 8.5032 LearningRate 0.0805 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:13,280-Speed 3427.73 samples/sec Loss 8.6757 LearningRate 0.0804 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:16,269-Speed 3426.42 samples/sec Loss 8.9050 LearningRate 0.0804 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:19,236-Speed 3452.20 samples/sec Loss 8.8143 LearningRate 0.0804 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:22,213-Speed 3440.10 samples/sec Loss 8.7233 LearningRate 0.0804 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:25,184-Speed 3447.98 samples/sec Loss 8.7255 LearningRate 0.0804 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:28,156-Speed 3445.75 samples/sec Loss 8.7900 LearningRate 0.0804 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:31,142-Speed 3432.20 samples/sec Loss 8.7415 LearningRate 0.0803 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:34,133-Speed 3424.23 samples/sec Loss 8.6530 LearningRate 0.0803 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:37,116-Speed 3432.55 samples/sec Loss 8.5816 LearningRate 0.0803 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:40,090-Speed 3443.86 samples/sec Loss 8.6871 LearningRate 0.0803 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:43,058-Speed 3451.54 samples/sec Loss 8.5467 LearningRate 0.0803 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:46,025-Speed 3451.89 samples/sec Loss 8.7309 LearningRate 0.0803 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:38:48,997-Speed 3446.71 samples/sec Loss 8.6451 LearningRate 0.0802 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:51,974-Speed 3440.28 samples/sec Loss 8.7882 LearningRate 0.0802 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:54,951-Speed 3440.75 samples/sec Loss 8.7368 LearningRate 0.0802 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:38:57,922-Speed 3447.55 samples/sec Loss 8.6361 LearningRate 0.0802 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:00,890-Speed 3450.54 samples/sec Loss 8.5299 LearningRate 0.0802 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:03,901-Speed 3401.71 samples/sec Loss 8.6496 LearningRate 0.0802 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:06,880-Speed 3438.03 samples/sec Loss 8.7293 LearningRate 0.0802 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:09,849-Speed 3450.05 samples/sec Loss 8.6040 LearningRate 0.0801 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:12,822-Speed 3445.55 samples/sec Loss 8.6795 LearningRate 0.0801 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:15,792-Speed 3448.71 samples/sec Loss 8.7296 LearningRate 0.0801 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:18,752-Speed 3460.59 samples/sec Loss 8.8028 LearningRate 0.0801 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:39:21,780-Speed 3382.19 samples/sec Loss 8.8391 LearningRate 0.0801 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:39:24,772-Speed 3423.15 samples/sec Loss 8.7708 LearningRate 0.0801 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:39:27,747-Speed 3442.14 samples/sec Loss 8.6809 LearningRate 0.0800 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:39:30,717-Speed 3448.98 samples/sec Loss 8.6785 LearningRate 0.0800 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:39:33,685-Speed 3450.82 samples/sec Loss 8.5722 LearningRate 0.0800 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:40:16,957-[lfw][12000]XNorm: 22.718744 Training: 2022-04-27 02:40:16,958-[lfw][12000]Accuracy-Flip: 0.99600+-0.00291 Training: 2022-04-27 02:40:16,958-[lfw][12000]Accuracy-Highest: 0.99600 Training: 2022-04-27 02:41:07,144-[cfp_fp][12000]XNorm: 19.773736 Training: 2022-04-27 02:41:07,145-[cfp_fp][12000]Accuracy-Flip: 0.93214+-0.01344 Training: 2022-04-27 02:41:07,145-[cfp_fp][12000]Accuracy-Highest: 0.93214 Training: 2022-04-27 02:41:50,248-[agedb_30][12000]XNorm: 22.603312 Training: 2022-04-27 02:41:50,249-[agedb_30][12000]Accuracy-Flip: 0.96700+-0.00985 Training: 2022-04-27 02:41:50,249-[agedb_30][12000]Accuracy-Highest: 0.96700 Training: 2022-04-27 02:41:53,212-Speed 73.39 samples/sec Loss 8.8026 LearningRate 0.0800 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:41:56,163-Speed 3470.53 samples/sec Loss 8.6237 LearningRate 0.0800 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:41:59,126-Speed 3456.82 samples/sec Loss 8.7304 LearningRate 0.0800 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:02,083-Speed 3464.29 samples/sec Loss 8.6652 LearningRate 0.0799 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:05,047-Speed 3455.71 samples/sec Loss 8.6697 LearningRate 0.0799 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:08,007-Speed 3459.39 samples/sec Loss 8.6882 LearningRate 0.0799 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:10,992-Speed 3432.12 samples/sec Loss 8.6894 LearningRate 0.0799 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:13,961-Speed 3449.39 samples/sec Loss 8.7036 LearningRate 0.0799 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:17,022-Speed 3346.20 samples/sec Loss 8.7359 LearningRate 0.0799 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:20,010-Speed 3428.27 samples/sec Loss 8.5775 LearningRate 0.0799 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:22,974-Speed 3455.15 samples/sec Loss 8.8298 LearningRate 0.0798 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:25,963-Speed 3426.54 samples/sec Loss 8.7553 LearningRate 0.0798 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:28,930-Speed 3452.11 samples/sec Loss 8.6425 LearningRate 0.0798 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:31,900-Speed 3449.06 samples/sec Loss 8.6533 LearningRate 0.0798 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:34,875-Speed 3442.35 samples/sec Loss 8.6975 LearningRate 0.0798 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:37,823-Speed 3474.36 samples/sec Loss 8.8720 LearningRate 0.0798 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:42:40,772-Speed 3474.14 samples/sec Loss 8.7509 LearningRate 0.0797 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:43,738-Speed 3452.43 samples/sec Loss 8.7746 LearningRate 0.0797 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:46,698-Speed 3460.87 samples/sec Loss 8.7567 LearningRate 0.0797 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:49,666-Speed 3449.91 samples/sec Loss 8.6847 LearningRate 0.0797 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:52,634-Speed 3451.46 samples/sec Loss 8.6322 LearningRate 0.0797 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:55,590-Speed 3464.84 samples/sec Loss 8.7045 LearningRate 0.0797 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:42:58,566-Speed 3442.41 samples/sec Loss 8.7654 LearningRate 0.0796 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:01,523-Speed 3463.13 samples/sec Loss 8.6886 LearningRate 0.0796 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:04,486-Speed 3456.47 samples/sec Loss 8.9470 LearningRate 0.0796 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:07,474-Speed 3428.42 samples/sec Loss 8.7775 LearningRate 0.0796 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:10,431-Speed 3464.50 samples/sec Loss 8.7681 LearningRate 0.0796 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:13,394-Speed 3456.62 samples/sec Loss 8.6623 LearningRate 0.0796 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:16,355-Speed 3458.42 samples/sec Loss 8.8449 LearningRate 0.0796 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:19,314-Speed 3461.25 samples/sec Loss 8.6800 LearningRate 0.0795 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:22,297-Speed 3433.91 samples/sec Loss 8.5401 LearningRate 0.0795 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:25,289-Speed 3423.19 samples/sec Loss 8.8403 LearningRate 0.0795 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:28,262-Speed 3444.92 samples/sec Loss 8.7064 LearningRate 0.0795 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:31,223-Speed 3459.69 samples/sec Loss 8.7585 LearningRate 0.0795 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:34,180-Speed 3463.29 samples/sec Loss 8.6683 LearningRate 0.0795 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:37,144-Speed 3455.95 samples/sec Loss 8.8703 LearningRate 0.0794 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:40,102-Speed 3462.50 samples/sec Loss 8.7548 LearningRate 0.0794 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:43,058-Speed 3465.59 samples/sec Loss 8.8291 LearningRate 0.0794 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:46,012-Speed 3466.82 samples/sec Loss 8.6946 LearningRate 0.0794 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:48,968-Speed 3464.60 samples/sec Loss 8.7860 LearningRate 0.0794 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:51,926-Speed 3463.27 samples/sec Loss 8.7044 LearningRate 0.0794 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:43:54,884-Speed 3462.56 samples/sec Loss 8.6381 LearningRate 0.0793 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:43:57,854-Speed 3448.88 samples/sec Loss 8.5345 LearningRate 0.0793 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:00,817-Speed 3456.04 samples/sec Loss 8.7549 LearningRate 0.0793 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:03,785-Speed 3451.50 samples/sec Loss 8.7293 LearningRate 0.0793 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:06,750-Speed 3455.23 samples/sec Loss 8.5426 LearningRate 0.0793 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:09,715-Speed 3453.45 samples/sec Loss 8.7692 LearningRate 0.0793 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:12,701-Speed 3429.83 samples/sec Loss 8.8623 LearningRate 0.0793 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:15,670-Speed 3449.66 samples/sec Loss 8.6734 LearningRate 0.0792 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:18,646-Speed 3441.85 samples/sec Loss 8.6806 LearningRate 0.0792 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:21,611-Speed 3454.90 samples/sec Loss 8.6066 LearningRate 0.0792 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:24,590-Speed 3438.03 samples/sec Loss 8.6651 LearningRate 0.0792 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:44:27,564-Speed 3444.76 samples/sec Loss 8.6569 LearningRate 0.0792 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:44:30,525-Speed 3460.03 samples/sec Loss 8.7488 LearningRate 0.0792 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:44:33,470-Speed 3477.87 samples/sec Loss 8.6435 LearningRate 0.0791 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:36,450-Speed 3436.71 samples/sec Loss 8.6274 LearningRate 0.0791 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:39,412-Speed 3458.24 samples/sec Loss 8.4967 LearningRate 0.0791 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:42,377-Speed 3453.72 samples/sec Loss 8.6512 LearningRate 0.0791 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:45,344-Speed 3452.22 samples/sec Loss 8.8442 LearningRate 0.0791 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:48,304-Speed 3460.24 samples/sec Loss 8.6951 LearningRate 0.0791 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:51,272-Speed 3451.55 samples/sec Loss 8.7102 LearningRate 0.0791 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:54,234-Speed 3457.68 samples/sec Loss 8.7412 LearningRate 0.0790 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:44:57,200-Speed 3453.74 samples/sec Loss 8.8401 LearningRate 0.0790 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:45:00,179-Speed 3437.84 samples/sec Loss 8.7314 LearningRate 0.0790 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:45:03,161-Speed 3434.27 samples/sec Loss 8.5555 LearningRate 0.0790 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:06,155-Speed 3422.35 samples/sec Loss 8.8152 LearningRate 0.0790 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:09,116-Speed 3459.30 samples/sec Loss 8.6343 LearningRate 0.0790 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:12,089-Speed 3444.59 samples/sec Loss 8.6766 LearningRate 0.0789 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:15,071-Speed 3434.23 samples/sec Loss 8.5970 LearningRate 0.0789 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:18,039-Speed 3451.55 samples/sec Loss 8.6494 LearningRate 0.0789 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:21,004-Speed 3454.48 samples/sec Loss 8.4130 LearningRate 0.0789 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:23,972-Speed 3451.02 samples/sec Loss 8.5432 LearningRate 0.0789 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:26,933-Speed 3458.78 samples/sec Loss 8.6331 LearningRate 0.0789 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:29,900-Speed 3453.09 samples/sec Loss 8.7386 LearningRate 0.0788 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:32,864-Speed 3454.45 samples/sec Loss 8.5989 LearningRate 0.0788 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:35,835-Speed 3448.45 samples/sec Loss 8.5863 LearningRate 0.0788 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:38,808-Speed 3444.27 samples/sec Loss 8.7080 LearningRate 0.0788 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:41,801-Speed 3422.31 samples/sec Loss 8.5807 LearningRate 0.0788 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:44,782-Speed 3435.84 samples/sec Loss 8.7273 LearningRate 0.0788 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:47,768-Speed 3429.56 samples/sec Loss 8.6895 LearningRate 0.0788 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:45:50,737-Speed 3450.53 samples/sec Loss 8.5051 LearningRate 0.0787 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:53,709-Speed 3446.47 samples/sec Loss 8.5868 LearningRate 0.0787 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:56,691-Speed 3434.91 samples/sec Loss 8.7688 LearningRate 0.0787 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:45:59,668-Speed 3440.58 samples/sec Loss 8.6002 LearningRate 0.0787 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:02,638-Speed 3448.93 samples/sec Loss 8.6849 LearningRate 0.0787 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:05,608-Speed 3448.33 samples/sec Loss 8.6045 LearningRate 0.0787 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:08,570-Speed 3457.60 samples/sec Loss 8.8041 LearningRate 0.0786 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:11,548-Speed 3440.31 samples/sec Loss 8.6302 LearningRate 0.0786 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:14,518-Speed 3447.83 samples/sec Loss 8.6677 LearningRate 0.0786 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:17,499-Speed 3435.52 samples/sec Loss 8.6984 LearningRate 0.0786 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:46:20,470-Speed 3448.50 samples/sec Loss 8.7387 LearningRate 0.0786 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:23,441-Speed 3446.94 samples/sec Loss 8.6353 LearningRate 0.0786 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:26,408-Speed 3452.53 samples/sec Loss 8.6365 LearningRate 0.0786 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:29,384-Speed 3442.08 samples/sec Loss 8.8280 LearningRate 0.0785 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:32,361-Speed 3440.23 samples/sec Loss 8.7879 LearningRate 0.0785 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:35,330-Speed 3450.05 samples/sec Loss 8.6240 LearningRate 0.0785 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:38,303-Speed 3446.20 samples/sec Loss 8.5727 LearningRate 0.0785 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:41,324-Speed 3389.52 samples/sec Loss 8.5543 LearningRate 0.0785 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:44,296-Speed 3446.23 samples/sec Loss 8.7376 LearningRate 0.0785 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:47,262-Speed 3453.91 samples/sec Loss 8.6671 LearningRate 0.0784 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:46:50,250-Speed 3427.81 samples/sec Loss 8.6905 LearningRate 0.0784 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:46:53,217-Speed 3452.27 samples/sec Loss 8.6740 LearningRate 0.0784 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:46:56,186-Speed 3450.04 samples/sec Loss 8.6933 LearningRate 0.0784 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:46:59,157-Speed 3446.27 samples/sec Loss 8.6197 LearningRate 0.0784 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:47:02,120-Speed 3457.44 samples/sec Loss 8.8459 LearningRate 0.0784 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:05,124-Speed 3409.87 samples/sec Loss 8.6361 LearningRate 0.0784 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:08,087-Speed 3456.22 samples/sec Loss 8.5787 LearningRate 0.0783 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:11,056-Speed 3450.31 samples/sec Loss 8.6464 LearningRate 0.0783 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:14,031-Speed 3443.36 samples/sec Loss 8.5983 LearningRate 0.0783 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:16,999-Speed 3450.84 samples/sec Loss 8.7832 LearningRate 0.0783 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:19,989-Speed 3425.58 samples/sec Loss 8.6987 LearningRate 0.0783 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:22,961-Speed 3446.61 samples/sec Loss 8.7217 LearningRate 0.0783 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:25,952-Speed 3423.58 samples/sec Loss 8.5792 LearningRate 0.0782 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:28,924-Speed 3446.75 samples/sec Loss 8.6400 LearningRate 0.0782 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:31,889-Speed 3453.82 samples/sec Loss 8.5550 LearningRate 0.0782 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:47:34,854-Speed 3454.13 samples/sec Loss 8.6245 LearningRate 0.0782 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:47:37,814-Speed 3461.51 samples/sec Loss 8.6155 LearningRate 0.0782 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:40,785-Speed 3446.84 samples/sec Loss 8.5742 LearningRate 0.0782 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:47:43,755-Speed 3449.52 samples/sec Loss 8.7935 LearningRate 0.0781 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:47:46,818-Speed 3343.24 samples/sec Loss 8.4200 LearningRate 0.0781 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:47:49,857-Speed 3370.95 samples/sec Loss 8.5739 LearningRate 0.0781 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:47:52,873-Speed 3395.30 samples/sec Loss 8.6461 LearningRate 0.0781 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:47:55,850-Speed 3440.32 samples/sec Loss 8.6824 LearningRate 0.0781 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:47:58,824-Speed 3444.10 samples/sec Loss 8.3772 LearningRate 0.0781 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:48:01,808-Speed 3432.80 samples/sec Loss 8.7459 LearningRate 0.0781 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:48:04,788-Speed 3437.63 samples/sec Loss 8.5587 LearningRate 0.0780 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:48:07,766-Speed 3439.17 samples/sec Loss 8.7301 LearningRate 0.0780 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:48:10,751-Speed 3430.81 samples/sec Loss 8.6433 LearningRate 0.0780 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:48:13,758-Speed 3406.14 samples/sec Loss 8.5048 LearningRate 0.0780 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:16,724-Speed 3453.20 samples/sec Loss 8.3605 LearningRate 0.0780 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:19,699-Speed 3442.78 samples/sec Loss 8.5453 LearningRate 0.0780 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:22,680-Speed 3436.01 samples/sec Loss 8.4781 LearningRate 0.0779 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:25,683-Speed 3410.27 samples/sec Loss 8.6381 LearningRate 0.0779 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:28,652-Speed 3450.17 samples/sec Loss 8.5182 LearningRate 0.0779 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:31,619-Speed 3452.96 samples/sec Loss 8.5179 LearningRate 0.0779 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:34,604-Speed 3430.77 samples/sec Loss 8.5631 LearningRate 0.0779 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:37,572-Speed 3451.30 samples/sec Loss 8.5247 LearningRate 0.0779 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:40,540-Speed 3450.48 samples/sec Loss 8.6175 LearningRate 0.0779 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:43,498-Speed 3463.03 samples/sec Loss 8.4313 LearningRate 0.0778 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:46,480-Speed 3434.78 samples/sec Loss 8.4873 LearningRate 0.0778 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:49,466-Speed 3429.94 samples/sec Loss 8.4842 LearningRate 0.0778 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:52,453-Speed 3429.25 samples/sec Loss 8.4239 LearningRate 0.0778 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:55,426-Speed 3444.07 samples/sec Loss 8.4318 LearningRate 0.0778 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:48:58,401-Speed 3443.28 samples/sec Loss 8.3408 LearningRate 0.0778 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:01,385-Speed 3432.66 samples/sec Loss 8.7077 LearningRate 0.0777 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:04,368-Speed 3433.37 samples/sec Loss 8.5171 LearningRate 0.0777 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:07,338-Speed 3449.60 samples/sec Loss 8.3781 LearningRate 0.0777 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:10,314-Speed 3441.13 samples/sec Loss 8.6604 LearningRate 0.0777 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:13,294-Speed 3437.04 samples/sec Loss 8.5272 LearningRate 0.0777 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:49:16,264-Speed 3448.77 samples/sec Loss 8.4905 LearningRate 0.0777 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:49:19,228-Speed 3455.22 samples/sec Loss 8.5595 LearningRate 0.0777 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:22,205-Speed 3440.14 samples/sec Loss 8.6737 LearningRate 0.0776 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:25,178-Speed 3446.13 samples/sec Loss 8.4256 LearningRate 0.0776 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:28,158-Speed 3436.25 samples/sec Loss 8.7204 LearningRate 0.0776 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:31,155-Speed 3418.98 samples/sec Loss 8.4390 LearningRate 0.0776 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:34,126-Speed 3447.04 samples/sec Loss 8.5222 LearningRate 0.0776 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:37,119-Speed 3422.13 samples/sec Loss 8.4967 LearningRate 0.0776 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:40,095-Speed 3441.86 samples/sec Loss 8.5641 LearningRate 0.0775 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:43,062-Speed 3451.60 samples/sec Loss 8.6211 LearningRate 0.0775 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:46,031-Speed 3450.15 samples/sec Loss 8.4624 LearningRate 0.0775 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:49,027-Speed 3417.69 samples/sec Loss 8.4569 LearningRate 0.0775 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:49:51,991-Speed 3455.40 samples/sec Loss 8.6149 LearningRate 0.0775 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:49:54,960-Speed 3450.96 samples/sec Loss 8.5797 LearningRate 0.0775 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:49:57,934-Speed 3443.31 samples/sec Loss 8.4693 LearningRate 0.0774 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:00,910-Speed 3443.36 samples/sec Loss 8.5435 LearningRate 0.0774 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:03,888-Speed 3438.80 samples/sec Loss 8.5398 LearningRate 0.0774 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:06,866-Speed 3438.72 samples/sec Loss 8.4982 LearningRate 0.0774 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:09,836-Speed 3449.60 samples/sec Loss 8.5365 LearningRate 0.0774 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:12,808-Speed 3445.42 samples/sec Loss 8.4630 LearningRate 0.0774 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:15,777-Speed 3449.53 samples/sec Loss 8.2404 LearningRate 0.0774 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:18,757-Speed 3438.17 samples/sec Loss 8.3486 LearningRate 0.0773 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:21,733-Speed 3441.04 samples/sec Loss 8.4595 LearningRate 0.0773 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:24,706-Speed 3445.23 samples/sec Loss 8.4331 LearningRate 0.0773 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:50:27,688-Speed 3434.93 samples/sec Loss 8.5395 LearningRate 0.0773 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:50:30,677-Speed 3427.14 samples/sec Loss 8.4466 LearningRate 0.0773 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:50:33,631-Speed 3467.22 samples/sec Loss 8.4398 LearningRate 0.0773 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:36,609-Speed 3438.60 samples/sec Loss 8.4833 LearningRate 0.0772 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:39,598-Speed 3427.38 samples/sec Loss 8.4017 LearningRate 0.0772 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:42,590-Speed 3423.32 samples/sec Loss 8.6306 LearningRate 0.0772 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:45,573-Speed 3433.36 samples/sec Loss 8.5203 LearningRate 0.0772 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:48,545-Speed 3446.76 samples/sec Loss 8.4349 LearningRate 0.0772 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:51,514-Speed 3449.86 samples/sec Loss 8.5099 LearningRate 0.0772 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:54,491-Speed 3440.11 samples/sec Loss 8.4302 LearningRate 0.0772 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:50:57,479-Speed 3427.52 samples/sec Loss 8.4114 LearningRate 0.0771 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:51:00,460-Speed 3435.57 samples/sec Loss 8.4112 LearningRate 0.0771 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:51:03,433-Speed 3446.04 samples/sec Loss 8.5778 LearningRate 0.0771 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:06,403-Speed 3447.91 samples/sec Loss 8.4096 LearningRate 0.0771 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:09,394-Speed 3424.54 samples/sec Loss 8.5038 LearningRate 0.0771 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:12,366-Speed 3446.61 samples/sec Loss 8.3273 LearningRate 0.0771 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:15,355-Speed 3427.41 samples/sec Loss 8.4162 LearningRate 0.0770 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:18,325-Speed 3449.19 samples/sec Loss 8.5203 LearningRate 0.0770 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:21,298-Speed 3445.13 samples/sec Loss 8.4826 LearningRate 0.0770 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:24,270-Speed 3445.38 samples/sec Loss 8.5018 LearningRate 0.0770 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:27,268-Speed 3417.87 samples/sec Loss 8.4656 LearningRate 0.0770 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:30,237-Speed 3448.83 samples/sec Loss 8.3657 LearningRate 0.0770 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:33,213-Speed 3441.87 samples/sec Loss 8.4240 LearningRate 0.0770 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:51:36,185-Speed 3446.20 samples/sec Loss 8.4003 LearningRate 0.0769 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:51:39,156-Speed 3447.99 samples/sec Loss 8.4584 LearningRate 0.0769 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:42,150-Speed 3420.99 samples/sec Loss 8.4895 LearningRate 0.0769 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:51:45,125-Speed 3442.95 samples/sec Loss 8.5025 LearningRate 0.0769 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:52:28,336-[lfw][14000]XNorm: 21.906293 Training: 2022-04-27 02:52:28,336-[lfw][14000]Accuracy-Flip: 0.99617+-0.00279 Training: 2022-04-27 02:52:28,337-[lfw][14000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:53:18,525-[cfp_fp][14000]XNorm: 19.340853 Training: 2022-04-27 02:53:18,526-[cfp_fp][14000]Accuracy-Flip: 0.93029+-0.01142 Training: 2022-04-27 02:53:18,526-[cfp_fp][14000]Accuracy-Highest: 0.93214 Training: 2022-04-27 02:54:01,701-[agedb_30][14000]XNorm: 21.228078 Training: 2022-04-27 02:54:01,702-[agedb_30][14000]Accuracy-Flip: 0.96217+-0.00940 Training: 2022-04-27 02:54:01,702-[agedb_30][14000]Accuracy-Highest: 0.96700 Training: 2022-04-27 02:54:04,658-Speed 73.39 samples/sec Loss 8.5295 LearningRate 0.0769 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:07,611-Speed 3468.93 samples/sec Loss 8.5830 LearningRate 0.0769 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:10,580-Speed 3449.71 samples/sec Loss 8.4358 LearningRate 0.0768 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:13,536-Speed 3464.91 samples/sec Loss 8.3354 LearningRate 0.0768 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:16,506-Speed 3448.39 samples/sec Loss 8.5037 LearningRate 0.0768 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:19,466-Speed 3460.79 samples/sec Loss 8.4184 LearningRate 0.0768 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:22,423-Speed 3463.71 samples/sec Loss 8.4899 LearningRate 0.0768 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:25,382-Speed 3460.69 samples/sec Loss 8.2598 LearningRate 0.0768 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:28,357-Speed 3443.51 samples/sec Loss 8.3635 LearningRate 0.0768 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:31,326-Speed 3449.14 samples/sec Loss 8.4603 LearningRate 0.0767 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:34,289-Speed 3457.64 samples/sec Loss 8.2471 LearningRate 0.0767 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:54:37,262-Speed 3444.50 samples/sec Loss 8.4468 LearningRate 0.0767 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:40,232-Speed 3448.99 samples/sec Loss 8.4797 LearningRate 0.0767 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:43,201-Speed 3449.77 samples/sec Loss 8.4550 LearningRate 0.0767 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:46,171-Speed 3448.13 samples/sec Loss 8.4771 LearningRate 0.0767 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:49,152-Speed 3435.91 samples/sec Loss 8.4527 LearningRate 0.0766 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:52,129-Speed 3441.16 samples/sec Loss 8.3487 LearningRate 0.0766 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:55,120-Speed 3424.46 samples/sec Loss 8.3548 LearningRate 0.0766 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:54:58,095-Speed 3442.11 samples/sec Loss 8.3656 LearningRate 0.0766 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:01,060-Speed 3455.10 samples/sec Loss 8.2896 LearningRate 0.0766 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:04,037-Speed 3440.99 samples/sec Loss 8.5388 LearningRate 0.0766 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:07,073-Speed 3373.73 samples/sec Loss 8.4654 LearningRate 0.0766 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:10,042-Speed 3448.63 samples/sec Loss 8.3958 LearningRate 0.0765 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:13,024-Speed 3434.99 samples/sec Loss 8.3633 LearningRate 0.0765 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:16,000-Speed 3441.45 samples/sec Loss 8.3778 LearningRate 0.0765 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:18,971-Speed 3447.65 samples/sec Loss 8.2928 LearningRate 0.0765 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:21,941-Speed 3448.68 samples/sec Loss 8.3900 LearningRate 0.0765 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:24,909-Speed 3450.86 samples/sec Loss 8.3474 LearningRate 0.0765 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:27,879-Speed 3448.42 samples/sec Loss 8.2086 LearningRate 0.0764 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:55:30,853-Speed 3444.59 samples/sec Loss 8.4201 LearningRate 0.0764 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:33,815-Speed 3457.63 samples/sec Loss 8.3307 LearningRate 0.0764 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:36,795-Speed 3437.89 samples/sec Loss 8.4095 LearningRate 0.0764 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:39,760-Speed 3453.36 samples/sec Loss 8.4761 LearningRate 0.0764 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:42,734-Speed 3444.78 samples/sec Loss 8.4837 LearningRate 0.0764 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:45,703-Speed 3448.70 samples/sec Loss 8.3683 LearningRate 0.0764 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:48,671-Speed 3451.84 samples/sec Loss 8.4261 LearningRate 0.0763 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:51,631-Speed 3459.52 samples/sec Loss 8.4429 LearningRate 0.0763 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:54,586-Speed 3466.16 samples/sec Loss 8.3968 LearningRate 0.0763 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:55:57,571-Speed 3431.40 samples/sec Loss 8.2451 LearningRate 0.0763 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:00,529-Speed 3463.52 samples/sec Loss 8.2697 LearningRate 0.0763 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:03,476-Speed 3475.61 samples/sec Loss 8.2790 LearningRate 0.0763 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:06,442-Speed 3453.21 samples/sec Loss 8.4530 LearningRate 0.0762 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:09,399-Speed 3463.35 samples/sec Loss 8.2594 LearningRate 0.0762 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:12,359-Speed 3460.31 samples/sec Loss 8.3954 LearningRate 0.0762 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:15,312-Speed 3467.72 samples/sec Loss 8.3766 LearningRate 0.0762 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:18,267-Speed 3466.58 samples/sec Loss 8.3360 LearningRate 0.0762 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:21,228-Speed 3459.50 samples/sec Loss 8.3473 LearningRate 0.0762 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:24,192-Speed 3455.21 samples/sec Loss 8.3535 LearningRate 0.0762 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:27,159-Speed 3451.96 samples/sec Loss 8.2597 LearningRate 0.0761 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:30,136-Speed 3441.12 samples/sec Loss 8.3162 LearningRate 0.0761 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:33,101-Speed 3454.16 samples/sec Loss 8.2425 LearningRate 0.0761 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:36,053-Speed 3469.23 samples/sec Loss 8.4379 LearningRate 0.0761 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:39,020-Speed 3453.53 samples/sec Loss 8.3927 LearningRate 0.0761 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:41,992-Speed 3445.28 samples/sec Loss 8.3298 LearningRate 0.0761 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:44,947-Speed 3466.65 samples/sec Loss 8.2128 LearningRate 0.0760 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:56:47,890-Speed 3480.16 samples/sec Loss 8.4296 LearningRate 0.0760 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:50,847-Speed 3464.10 samples/sec Loss 8.3902 LearningRate 0.0760 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:53,824-Speed 3440.52 samples/sec Loss 8.4108 LearningRate 0.0760 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:56,809-Speed 3431.06 samples/sec Loss 8.3978 LearningRate 0.0760 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:56:59,822-Speed 3399.88 samples/sec Loss 8.3595 LearningRate 0.0760 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:02,791-Speed 3449.29 samples/sec Loss 8.2810 LearningRate 0.0760 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:05,748-Speed 3463.60 samples/sec Loss 8.5684 LearningRate 0.0759 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:08,711-Speed 3456.88 samples/sec Loss 8.3291 LearningRate 0.0759 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:11,687-Speed 3442.03 samples/sec Loss 8.3178 LearningRate 0.0759 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:14,646-Speed 3461.46 samples/sec Loss 8.4948 LearningRate 0.0759 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:17,605-Speed 3461.00 samples/sec Loss 8.2148 LearningRate 0.0759 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:57:20,562-Speed 3463.98 samples/sec Loss 8.2745 LearningRate 0.0759 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:57:23,537-Speed 3442.82 samples/sec Loss 8.2326 LearningRate 0.0758 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:57:26,498-Speed 3459.26 samples/sec Loss 8.3338 LearningRate 0.0758 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:57:29,448-Speed 3471.98 samples/sec Loss 8.3241 LearningRate 0.0758 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:32,415-Speed 3452.67 samples/sec Loss 8.3180 LearningRate 0.0758 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:35,369-Speed 3466.85 samples/sec Loss 8.2791 LearningRate 0.0758 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:38,336-Speed 3452.19 samples/sec Loss 8.2646 LearningRate 0.0758 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:41,312-Speed 3441.46 samples/sec Loss 8.3229 LearningRate 0.0758 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:44,303-Speed 3424.25 samples/sec Loss 8.4101 LearningRate 0.0757 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:47,269-Speed 3453.68 samples/sec Loss 8.1902 LearningRate 0.0757 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:50,244-Speed 3443.49 samples/sec Loss 8.1727 LearningRate 0.0757 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:53,210-Speed 3452.77 samples/sec Loss 8.4986 LearningRate 0.0757 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:56,194-Speed 3432.38 samples/sec Loss 8.3335 LearningRate 0.0757 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:57:59,152-Speed 3462.59 samples/sec Loss 8.3376 LearningRate 0.0757 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:58:02,100-Speed 3475.10 samples/sec Loss 8.3684 LearningRate 0.0756 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:05,058-Speed 3462.08 samples/sec Loss 8.2210 LearningRate 0.0756 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:08,028-Speed 3448.32 samples/sec Loss 8.2045 LearningRate 0.0756 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:10,989-Speed 3459.69 samples/sec Loss 8.2258 LearningRate 0.0756 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:13,954-Speed 3454.51 samples/sec Loss 8.3122 LearningRate 0.0756 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:16,919-Speed 3454.49 samples/sec Loss 8.2813 LearningRate 0.0756 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:19,882-Speed 3456.64 samples/sec Loss 8.3198 LearningRate 0.0756 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:22,851-Speed 3449.48 samples/sec Loss 8.4261 LearningRate 0.0755 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:25,816-Speed 3454.33 samples/sec Loss 8.2075 LearningRate 0.0755 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:28,789-Speed 3444.96 samples/sec Loss 8.2911 LearningRate 0.0755 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:31,757-Speed 3451.87 samples/sec Loss 8.0955 LearningRate 0.0755 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 02:58:34,722-Speed 3454.30 samples/sec Loss 8.2851 LearningRate 0.0755 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:37,719-Speed 3416.84 samples/sec Loss 8.5049 LearningRate 0.0755 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:40,684-Speed 3455.53 samples/sec Loss 8.3148 LearningRate 0.0755 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:43,648-Speed 3455.54 samples/sec Loss 8.3003 LearningRate 0.0754 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 02:58:46,619-Speed 3446.87 samples/sec Loss 8.2256 LearningRate 0.0754 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:58:49,592-Speed 3446.12 samples/sec Loss 8.4484 LearningRate 0.0754 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:58:52,549-Speed 3463.49 samples/sec Loss 8.1994 LearningRate 0.0754 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:58:55,538-Speed 3426.69 samples/sec Loss 8.2216 LearningRate 0.0754 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:58:58,544-Speed 3407.10 samples/sec Loss 8.1492 LearningRate 0.0754 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:01,531-Speed 3428.51 samples/sec Loss 8.3323 LearningRate 0.0753 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:04,525-Speed 3421.22 samples/sec Loss 8.3638 LearningRate 0.0753 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:07,484-Speed 3461.96 samples/sec Loss 8.2598 LearningRate 0.0753 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:10,462-Speed 3439.89 samples/sec Loss 8.1607 LearningRate 0.0753 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:13,425-Speed 3456.55 samples/sec Loss 8.3903 LearningRate 0.0753 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:16,387-Speed 3457.15 samples/sec Loss 8.2856 LearningRate 0.0753 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:19,353-Speed 3453.96 samples/sec Loss 8.3065 LearningRate 0.0753 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:22,316-Speed 3456.22 samples/sec Loss 8.3096 LearningRate 0.0752 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:25,283-Speed 3451.94 samples/sec Loss 8.3070 LearningRate 0.0752 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:28,262-Speed 3437.81 samples/sec Loss 8.0976 LearningRate 0.0752 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:31,252-Speed 3425.83 samples/sec Loss 8.2906 LearningRate 0.0752 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:34,243-Speed 3425.13 samples/sec Loss 8.3447 LearningRate 0.0752 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 02:59:37,208-Speed 3454.16 samples/sec Loss 8.2849 LearningRate 0.0752 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:40,190-Speed 3434.98 samples/sec Loss 8.2596 LearningRate 0.0751 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:43,170-Speed 3436.48 samples/sec Loss 8.3797 LearningRate 0.0751 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:46,153-Speed 3433.63 samples/sec Loss 8.3523 LearningRate 0.0751 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:49,116-Speed 3457.02 samples/sec Loss 8.2519 LearningRate 0.0751 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:52,080-Speed 3455.20 samples/sec Loss 8.1404 LearningRate 0.0751 Epoch: 2 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:55,054-Speed 3444.49 samples/sec Loss 8.4225 LearningRate 0.0751 Epoch: 2 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 02:59:58,001-Speed 3475.76 samples/sec Loss 8.3189 LearningRate 0.0751 Epoch: 2 Global Step: 15200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:00,980-Speed 3437.75 samples/sec Loss 8.2578 LearningRate 0.0750 Epoch: 2 Global Step: 15210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:03,967-Speed 3429.94 samples/sec Loss 8.2152 LearningRate 0.0750 Epoch: 2 Global Step: 15220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:06,947-Speed 3437.29 samples/sec Loss 8.1660 LearningRate 0.0750 Epoch: 2 Global Step: 15230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:09,922-Speed 3442.77 samples/sec Loss 8.2570 LearningRate 0.0750 Epoch: 2 Global Step: 15240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:12,908-Speed 3430.75 samples/sec Loss 8.3088 LearningRate 0.0750 Epoch: 2 Global Step: 15250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:15,892-Speed 3431.68 samples/sec Loss 8.1541 LearningRate 0.0750 Epoch: 2 Global Step: 15260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:18,880-Speed 3427.67 samples/sec Loss 8.0980 LearningRate 0.0749 Epoch: 2 Global Step: 15270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:21,850-Speed 3448.67 samples/sec Loss 8.2992 LearningRate 0.0749 Epoch: 2 Global Step: 15280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:24,818-Speed 3450.91 samples/sec Loss 8.2189 LearningRate 0.0749 Epoch: 2 Global Step: 15290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 03:00:27,804-Speed 3430.71 samples/sec Loss 8.2446 LearningRate 0.0749 Epoch: 2 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:30,793-Speed 3426.57 samples/sec Loss 8.2214 LearningRate 0.0749 Epoch: 2 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:33,761-Speed 3451.41 samples/sec Loss 8.3153 LearningRate 0.0749 Epoch: 2 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:36,723-Speed 3458.31 samples/sec Loss 8.1616 LearningRate 0.0749 Epoch: 2 Global Step: 15330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:39,698-Speed 3442.19 samples/sec Loss 8.3175 LearningRate 0.0748 Epoch: 2 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:42,664-Speed 3453.04 samples/sec Loss 8.1649 LearningRate 0.0748 Epoch: 2 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:45,633-Speed 3450.74 samples/sec Loss 8.1547 LearningRate 0.0748 Epoch: 2 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:48,594-Speed 3458.14 samples/sec Loss 8.3579 LearningRate 0.0748 Epoch: 2 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:51,560-Speed 3453.69 samples/sec Loss 8.2532 LearningRate 0.0748 Epoch: 2 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:54,540-Speed 3436.86 samples/sec Loss 8.1322 LearningRate 0.0748 Epoch: 2 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:00:57,522-Speed 3435.05 samples/sec Loss 8.2173 LearningRate 0.0747 Epoch: 2 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:00,514-Speed 3423.45 samples/sec Loss 8.3037 LearningRate 0.0747 Epoch: 2 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:03,498-Speed 3432.41 samples/sec Loss 8.2617 LearningRate 0.0747 Epoch: 2 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:06,489-Speed 3424.00 samples/sec Loss 8.0422 LearningRate 0.0747 Epoch: 2 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:09,473-Speed 3433.06 samples/sec Loss 8.2081 LearningRate 0.0747 Epoch: 2 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:12,465-Speed 3423.56 samples/sec Loss 8.1742 LearningRate 0.0747 Epoch: 2 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:15,467-Speed 3411.08 samples/sec Loss 8.2639 LearningRate 0.0747 Epoch: 2 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:18,442-Speed 3442.63 samples/sec Loss 8.3024 LearningRate 0.0746 Epoch: 2 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:21,434-Speed 3423.53 samples/sec Loss 8.0773 LearningRate 0.0746 Epoch: 2 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:24,426-Speed 3423.64 samples/sec Loss 8.2233 LearningRate 0.0746 Epoch: 2 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:27,412-Speed 3430.36 samples/sec Loss 8.4285 LearningRate 0.0746 Epoch: 2 Global Step: 15500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:01:30,397-Speed 3430.87 samples/sec Loss 8.2252 LearningRate 0.0746 Epoch: 2 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:33,374-Speed 3440.36 samples/sec Loss 8.3581 LearningRate 0.0746 Epoch: 2 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:36,347-Speed 3452.57 samples/sec Loss 8.1522 LearningRate 0.0746 Epoch: 2 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:39,314-Speed 3452.18 samples/sec Loss 8.1462 LearningRate 0.0745 Epoch: 2 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:42,298-Speed 3432.37 samples/sec Loss 8.3290 LearningRate 0.0745 Epoch: 2 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:45,279-Speed 3435.07 samples/sec Loss 8.2315 LearningRate 0.0745 Epoch: 2 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:48,276-Speed 3418.47 samples/sec Loss 8.1719 LearningRate 0.0745 Epoch: 2 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:51,249-Speed 3445.10 samples/sec Loss 8.1776 LearningRate 0.0745 Epoch: 2 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:54,236-Speed 3428.65 samples/sec Loss 8.2328 LearningRate 0.0745 Epoch: 2 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:01:57,212-Speed 3442.13 samples/sec Loss 8.2551 LearningRate 0.0744 Epoch: 2 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:00,153-Speed 3482.29 samples/sec Loss 8.2033 LearningRate 0.0744 Epoch: 2 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:03,138-Speed 3430.76 samples/sec Loss 8.1103 LearningRate 0.0744 Epoch: 2 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:06,112-Speed 3444.08 samples/sec Loss 8.2428 LearningRate 0.0744 Epoch: 2 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:09,077-Speed 3455.14 samples/sec Loss 8.0873 LearningRate 0.0744 Epoch: 2 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:12,051-Speed 3443.67 samples/sec Loss 8.0008 LearningRate 0.0744 Epoch: 2 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:15,018-Speed 3452.52 samples/sec Loss 8.0327 LearningRate 0.0744 Epoch: 2 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:17,997-Speed 3438.09 samples/sec Loss 8.1009 LearningRate 0.0743 Epoch: 2 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:20,985-Speed 3428.05 samples/sec Loss 8.1380 LearningRate 0.0743 Epoch: 2 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:23,965-Speed 3437.15 samples/sec Loss 8.2772 LearningRate 0.0743 Epoch: 2 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:26,942-Speed 3440.76 samples/sec Loss 8.2560 LearningRate 0.0743 Epoch: 2 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:02:29,916-Speed 3443.47 samples/sec Loss 8.1503 LearningRate 0.0743 Epoch: 2 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:32,902-Speed 3430.40 samples/sec Loss 8.1991 LearningRate 0.0743 Epoch: 2 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:35,865-Speed 3456.02 samples/sec Loss 8.0511 LearningRate 0.0742 Epoch: 2 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:38,863-Speed 3416.67 samples/sec Loss 8.0180 LearningRate 0.0742 Epoch: 2 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:41,846-Speed 3434.45 samples/sec Loss 8.2056 LearningRate 0.0742 Epoch: 2 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:44,812-Speed 3452.95 samples/sec Loss 8.1963 LearningRate 0.0742 Epoch: 2 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:47,801-Speed 3426.23 samples/sec Loss 8.0445 LearningRate 0.0742 Epoch: 2 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:50,770-Speed 3450.74 samples/sec Loss 8.1893 LearningRate 0.0742 Epoch: 2 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:53,754-Speed 3431.45 samples/sec Loss 8.0255 LearningRate 0.0742 Epoch: 2 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:56,751-Speed 3418.17 samples/sec Loss 8.0969 LearningRate 0.0741 Epoch: 2 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:02:59,723-Speed 3446.13 samples/sec Loss 8.1474 LearningRate 0.0741 Epoch: 2 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:03:02,702-Speed 3437.60 samples/sec Loss 8.1543 LearningRate 0.0741 Epoch: 2 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:03:05,648-Speed 3477.02 samples/sec Loss 8.2207 LearningRate 0.0741 Epoch: 2 Global Step: 15830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:08,638-Speed 3425.81 samples/sec Loss 8.2136 LearningRate 0.0741 Epoch: 2 Global Step: 15840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:11,609-Speed 3446.98 samples/sec Loss 8.1597 LearningRate 0.0741 Epoch: 2 Global Step: 15850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:14,585-Speed 3442.01 samples/sec Loss 8.1060 LearningRate 0.0741 Epoch: 2 Global Step: 15860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:17,549-Speed 3455.33 samples/sec Loss 8.1628 LearningRate 0.0740 Epoch: 2 Global Step: 15870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:20,526-Speed 3441.73 samples/sec Loss 8.3012 LearningRate 0.0740 Epoch: 2 Global Step: 15880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:23,518-Speed 3422.12 samples/sec Loss 8.1810 LearningRate 0.0740 Epoch: 2 Global Step: 15890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:26,532-Speed 3398.71 samples/sec Loss 8.1397 LearningRate 0.0740 Epoch: 2 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:29,513-Speed 3435.65 samples/sec Loss 8.0612 LearningRate 0.0740 Epoch: 2 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:32,484-Speed 3447.03 samples/sec Loss 7.9337 LearningRate 0.0740 Epoch: 2 Global Step: 15920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:03:35,455-Speed 3448.30 samples/sec Loss 7.9544 LearningRate 0.0739 Epoch: 2 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:38,428-Speed 3445.20 samples/sec Loss 8.0290 LearningRate 0.0739 Epoch: 2 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:41,409-Speed 3435.61 samples/sec Loss 8.1144 LearningRate 0.0739 Epoch: 2 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:44,394-Speed 3431.62 samples/sec Loss 8.0211 LearningRate 0.0739 Epoch: 2 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:47,397-Speed 3410.53 samples/sec Loss 8.1448 LearningRate 0.0739 Epoch: 2 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:50,370-Speed 3445.78 samples/sec Loss 8.0580 LearningRate 0.0739 Epoch: 2 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:53,382-Speed 3399.63 samples/sec Loss 8.1522 LearningRate 0.0739 Epoch: 2 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:03:56,361-Speed 3438.04 samples/sec Loss 8.0119 LearningRate 0.0738 Epoch: 2 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:04:39,843-[lfw][16000]XNorm: 21.961942 Training: 2022-04-27 03:04:39,844-[lfw][16000]Accuracy-Flip: 0.99633+-0.00306 Training: 2022-04-27 03:04:39,844-[lfw][16000]Accuracy-Highest: 0.99633 Training: 2022-04-27 03:05:30,035-[cfp_fp][16000]XNorm: 18.888790 Training: 2022-04-27 03:05:30,036-[cfp_fp][16000]Accuracy-Flip: 0.92114+-0.01830 Training: 2022-04-27 03:05:30,036-[cfp_fp][16000]Accuracy-Highest: 0.93214 Training: 2022-04-27 03:06:13,131-[agedb_30][16000]XNorm: 21.890852 Training: 2022-04-27 03:06:13,132-[agedb_30][16000]Accuracy-Flip: 0.96700+-0.01115 Training: 2022-04-27 03:06:13,132-[agedb_30][16000]Accuracy-Highest: 0.96700 Training: 2022-04-27 03:06:16,103-Speed 73.28 samples/sec Loss 8.2020 LearningRate 0.0738 Epoch: 2 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:19,057-Speed 3466.93 samples/sec Loss 8.2026 LearningRate 0.0738 Epoch: 2 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:22,027-Speed 3448.35 samples/sec Loss 8.0855 LearningRate 0.0738 Epoch: 2 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:24,999-Speed 3445.66 samples/sec Loss 8.0881 LearningRate 0.0738 Epoch: 2 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:27,964-Speed 3454.74 samples/sec Loss 7.9979 LearningRate 0.0738 Epoch: 2 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:30,934-Speed 3449.16 samples/sec Loss 7.9406 LearningRate 0.0737 Epoch: 2 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:33,893-Speed 3461.09 samples/sec Loss 7.8188 LearningRate 0.0737 Epoch: 2 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:36,864-Speed 3446.83 samples/sec Loss 8.0811 LearningRate 0.0737 Epoch: 2 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:39,946-Speed 3324.20 samples/sec Loss 8.0774 LearningRate 0.0737 Epoch: 2 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:42,922-Speed 3441.26 samples/sec Loss 8.2206 LearningRate 0.0737 Epoch: 2 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:45,910-Speed 3427.46 samples/sec Loss 8.1274 LearningRate 0.0737 Epoch: 2 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:48,885-Speed 3442.75 samples/sec Loss 8.0555 LearningRate 0.0737 Epoch: 2 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:06:51,860-Speed 3443.97 samples/sec Loss 8.0486 LearningRate 0.0736 Epoch: 2 Global Step: 16130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:06:54,824-Speed 3455.14 samples/sec Loss 7.8538 LearningRate 0.0736 Epoch: 2 Global Step: 16140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:06:57,794-Speed 3448.19 samples/sec Loss 8.1264 LearningRate 0.0736 Epoch: 2 Global Step: 16150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:07:00,782-Speed 3428.49 samples/sec Loss 8.1562 LearningRate 0.0736 Epoch: 2 Global Step: 16160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:07:03,770-Speed 3427.24 samples/sec Loss 8.2240 LearningRate 0.0736 Epoch: 2 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:06,760-Speed 3425.67 samples/sec Loss 8.1748 LearningRate 0.0736 Epoch: 2 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:09,736-Speed 3441.86 samples/sec Loss 8.1162 LearningRate 0.0736 Epoch: 2 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:12,735-Speed 3414.63 samples/sec Loss 8.1315 LearningRate 0.0735 Epoch: 2 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:15,712-Speed 3440.78 samples/sec Loss 8.1171 LearningRate 0.0735 Epoch: 2 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:18,685-Speed 3445.49 samples/sec Loss 7.9731 LearningRate 0.0735 Epoch: 2 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:21,662-Speed 3440.67 samples/sec Loss 8.0577 LearningRate 0.0735 Epoch: 2 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:24,653-Speed 3424.05 samples/sec Loss 8.0651 LearningRate 0.0735 Epoch: 2 Global Step: 16240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:27,657-Speed 3409.92 samples/sec Loss 8.0525 LearningRate 0.0735 Epoch: 2 Global Step: 16250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:30,653-Speed 3418.56 samples/sec Loss 8.0797 LearningRate 0.0734 Epoch: 2 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:33,644-Speed 3424.32 samples/sec Loss 7.8837 LearningRate 0.0734 Epoch: 2 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:36,619-Speed 3442.81 samples/sec Loss 7.9646 LearningRate 0.0734 Epoch: 2 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:39,606-Speed 3429.71 samples/sec Loss 8.1348 LearningRate 0.0734 Epoch: 2 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:42,582-Speed 3440.90 samples/sec Loss 8.0615 LearningRate 0.0734 Epoch: 2 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:45,576-Speed 3421.72 samples/sec Loss 8.0419 LearningRate 0.0734 Epoch: 2 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:48,540-Speed 3455.42 samples/sec Loss 7.9949 LearningRate 0.0734 Epoch: 2 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:51,517-Speed 3440.94 samples/sec Loss 8.0353 LearningRate 0.0733 Epoch: 2 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:07:54,484-Speed 3451.11 samples/sec Loss 8.1269 LearningRate 0.0733 Epoch: 2 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:07:57,479-Speed 3419.92 samples/sec Loss 8.1334 LearningRate 0.0733 Epoch: 2 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:00,464-Speed 3432.47 samples/sec Loss 8.0932 LearningRate 0.0733 Epoch: 2 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:03,440-Speed 3440.56 samples/sec Loss 8.1254 LearningRate 0.0733 Epoch: 2 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:06,435-Speed 3420.00 samples/sec Loss 8.0495 LearningRate 0.0733 Epoch: 2 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:09,418-Speed 3433.96 samples/sec Loss 8.0090 LearningRate 0.0733 Epoch: 2 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:12,393-Speed 3442.81 samples/sec Loss 7.9731 LearningRate 0.0732 Epoch: 2 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:15,365-Speed 3445.69 samples/sec Loss 7.8356 LearningRate 0.0732 Epoch: 2 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:18,356-Speed 3424.99 samples/sec Loss 8.1536 LearningRate 0.0732 Epoch: 2 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:21,328-Speed 3446.36 samples/sec Loss 7.8782 LearningRate 0.0732 Epoch: 2 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:08:24,329-Speed 3413.17 samples/sec Loss 8.0700 LearningRate 0.0732 Epoch: 2 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:27,321-Speed 3422.62 samples/sec Loss 7.9279 LearningRate 0.0732 Epoch: 2 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:30,332-Speed 3402.77 samples/sec Loss 8.2090 LearningRate 0.0731 Epoch: 2 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:33,322-Speed 3424.51 samples/sec Loss 8.0693 LearningRate 0.0731 Epoch: 2 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:36,303-Speed 3436.13 samples/sec Loss 7.8687 LearningRate 0.0731 Epoch: 2 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:39,315-Speed 3400.69 samples/sec Loss 8.1544 LearningRate 0.0731 Epoch: 2 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:42,334-Speed 3391.83 samples/sec Loss 7.9584 LearningRate 0.0731 Epoch: 2 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:45,310-Speed 3441.70 samples/sec Loss 7.9711 LearningRate 0.0731 Epoch: 2 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:48,288-Speed 3439.43 samples/sec Loss 7.9416 LearningRate 0.0731 Epoch: 2 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:51,288-Speed 3414.24 samples/sec Loss 8.0273 LearningRate 0.0730 Epoch: 2 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:08:54,256-Speed 3451.13 samples/sec Loss 8.0653 LearningRate 0.0730 Epoch: 2 Global Step: 16540 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:08:57,211-Speed 3466.37 samples/sec Loss 8.0316 LearningRate 0.0730 Epoch: 2 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:00,183-Speed 3446.34 samples/sec Loss 8.0037 LearningRate 0.0730 Epoch: 2 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:03,151-Speed 3451.39 samples/sec Loss 7.8970 LearningRate 0.0730 Epoch: 2 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:06,129-Speed 3438.64 samples/sec Loss 8.0820 LearningRate 0.0730 Epoch: 2 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:09,119-Speed 3425.58 samples/sec Loss 7.9661 LearningRate 0.0730 Epoch: 2 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:12,120-Speed 3412.82 samples/sec Loss 7.8978 LearningRate 0.0729 Epoch: 2 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:15,099-Speed 3438.74 samples/sec Loss 7.9587 LearningRate 0.0729 Epoch: 2 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:18,076-Speed 3440.58 samples/sec Loss 7.8931 LearningRate 0.0729 Epoch: 2 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:21,063-Speed 3429.85 samples/sec Loss 8.0525 LearningRate 0.0729 Epoch: 2 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:24,034-Speed 3448.03 samples/sec Loss 7.8939 LearningRate 0.0729 Epoch: 2 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:27,029-Speed 3419.98 samples/sec Loss 8.1917 LearningRate 0.0729 Epoch: 2 Global Step: 16650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:09:29,980-Speed 3470.27 samples/sec Loss 7.9754 LearningRate 0.0728 Epoch: 2 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:32,948-Speed 3451.12 samples/sec Loss 7.9723 LearningRate 0.0728 Epoch: 2 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:35,939-Speed 3423.68 samples/sec Loss 8.0555 LearningRate 0.0728 Epoch: 2 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:38,927-Speed 3428.49 samples/sec Loss 8.0453 LearningRate 0.0728 Epoch: 2 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:41,910-Speed 3432.81 samples/sec Loss 8.0264 LearningRate 0.0728 Epoch: 2 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:44,899-Speed 3426.96 samples/sec Loss 8.0593 LearningRate 0.0728 Epoch: 2 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:47,899-Speed 3414.80 samples/sec Loss 8.0475 LearningRate 0.0728 Epoch: 2 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:50,891-Speed 3423.67 samples/sec Loss 8.0690 LearningRate 0.0727 Epoch: 2 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:53,891-Speed 3413.35 samples/sec Loss 8.0722 LearningRate 0.0727 Epoch: 2 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:56,866-Speed 3443.16 samples/sec Loss 7.9896 LearningRate 0.0727 Epoch: 2 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:09:59,850-Speed 3432.24 samples/sec Loss 7.8893 LearningRate 0.0727 Epoch: 2 Global Step: 16760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:02,851-Speed 3413.23 samples/sec Loss 7.8627 LearningRate 0.0727 Epoch: 2 Global Step: 16770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:05,843-Speed 3422.60 samples/sec Loss 7.9459 LearningRate 0.0727 Epoch: 2 Global Step: 16780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:08,836-Speed 3422.90 samples/sec Loss 7.9429 LearningRate 0.0727 Epoch: 2 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:11,917-Speed 3324.37 samples/sec Loss 8.0394 LearningRate 0.0726 Epoch: 2 Global Step: 16800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:14,898-Speed 3436.00 samples/sec Loss 7.8870 LearningRate 0.0726 Epoch: 2 Global Step: 16810 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:10:17,878-Speed 3437.14 samples/sec Loss 7.7558 LearningRate 0.0726 Epoch: 2 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:20,870-Speed 3422.82 samples/sec Loss 8.0028 LearningRate 0.0726 Epoch: 2 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:23,857-Speed 3429.18 samples/sec Loss 7.8892 LearningRate 0.0726 Epoch: 2 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:26,860-Speed 3411.04 samples/sec Loss 7.9306 LearningRate 0.0726 Epoch: 2 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:29,839-Speed 3438.19 samples/sec Loss 7.8935 LearningRate 0.0725 Epoch: 2 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:32,824-Speed 3431.41 samples/sec Loss 8.0811 LearningRate 0.0725 Epoch: 2 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:35,804-Speed 3436.75 samples/sec Loss 7.9572 LearningRate 0.0725 Epoch: 2 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:38,808-Speed 3409.23 samples/sec Loss 8.1454 LearningRate 0.0725 Epoch: 2 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:41,799-Speed 3424.54 samples/sec Loss 7.9187 LearningRate 0.0725 Epoch: 2 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:44,785-Speed 3430.69 samples/sec Loss 7.8788 LearningRate 0.0725 Epoch: 2 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:47,758-Speed 3444.84 samples/sec Loss 7.8339 LearningRate 0.0725 Epoch: 2 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:50,741-Speed 3433.24 samples/sec Loss 8.0237 LearningRate 0.0724 Epoch: 2 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:53,727-Speed 3430.40 samples/sec Loss 7.9960 LearningRate 0.0724 Epoch: 2 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:56,700-Speed 3444.82 samples/sec Loss 7.9259 LearningRate 0.0724 Epoch: 2 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:10:59,676-Speed 3441.53 samples/sec Loss 7.9621 LearningRate 0.0724 Epoch: 2 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:02,645-Speed 3450.78 samples/sec Loss 7.9584 LearningRate 0.0724 Epoch: 2 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:05,631-Speed 3429.34 samples/sec Loss 7.8662 LearningRate 0.0724 Epoch: 2 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:08,612-Speed 3436.11 samples/sec Loss 7.8672 LearningRate 0.0724 Epoch: 2 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:11,610-Speed 3416.66 samples/sec Loss 7.9038 LearningRate 0.0723 Epoch: 2 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:14,603-Speed 3422.70 samples/sec Loss 7.9785 LearningRate 0.0723 Epoch: 2 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:17,590-Speed 3428.96 samples/sec Loss 7.9395 LearningRate 0.0723 Epoch: 2 Global Step: 17020 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:11:20,596-Speed 3407.30 samples/sec Loss 7.8231 LearningRate 0.0723 Epoch: 2 Global Step: 17030 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:11:23,563-Speed 3451.56 samples/sec Loss 7.9000 LearningRate 0.0723 Epoch: 2 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:11:26,607-Speed 3364.91 samples/sec Loss 8.1746 LearningRate 0.0723 Epoch: 2 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:39,751-Speed 779.10 samples/sec Loss 7.8263 LearningRate 0.0722 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:42,745-Speed 3422.43 samples/sec Loss 7.3061 LearningRate 0.0722 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:45,877-Speed 3270.96 samples/sec Loss 7.1845 LearningRate 0.0722 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:48,875-Speed 3416.25 samples/sec Loss 7.3882 LearningRate 0.0722 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:51,881-Speed 3407.69 samples/sec Loss 7.2892 LearningRate 0.0722 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:54,885-Speed 3410.51 samples/sec Loss 7.2068 LearningRate 0.0722 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:11:57,900-Speed 3397.23 samples/sec Loss 7.2141 LearningRate 0.0722 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:12:00,900-Speed 3415.23 samples/sec Loss 7.1894 LearningRate 0.0721 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:12:03,900-Speed 3414.37 samples/sec Loss 7.4200 LearningRate 0.0721 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:12:06,886-Speed 3430.24 samples/sec Loss 7.4053 LearningRate 0.0721 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:09,878-Speed 3423.78 samples/sec Loss 7.5607 LearningRate 0.0721 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:12,879-Speed 3413.20 samples/sec Loss 7.5331 LearningRate 0.0721 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:15,861-Speed 3435.30 samples/sec Loss 7.2968 LearningRate 0.0721 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:18,856-Speed 3419.20 samples/sec Loss 7.4130 LearningRate 0.0721 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:21,833-Speed 3441.75 samples/sec Loss 7.3662 LearningRate 0.0720 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:24,967-Speed 3267.79 samples/sec Loss 7.4494 LearningRate 0.0720 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:27,962-Speed 3420.33 samples/sec Loss 7.5304 LearningRate 0.0720 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:30,965-Speed 3410.22 samples/sec Loss 7.5298 LearningRate 0.0720 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:33,962-Speed 3418.69 samples/sec Loss 7.5157 LearningRate 0.0720 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:36,937-Speed 3442.60 samples/sec Loss 7.4297 LearningRate 0.0720 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:12:39,933-Speed 3418.51 samples/sec Loss 7.5177 LearningRate 0.0719 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:12:42,930-Speed 3416.76 samples/sec Loss 7.6374 LearningRate 0.0719 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:12:45,914-Speed 3432.33 samples/sec Loss 7.5380 LearningRate 0.0719 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:48,914-Speed 3414.55 samples/sec Loss 7.5632 LearningRate 0.0719 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:51,915-Speed 3413.44 samples/sec Loss 7.6450 LearningRate 0.0719 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:12:54,894-Speed 3438.12 samples/sec Loss 7.5645 LearningRate 0.0719 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:12:57,878-Speed 3432.61 samples/sec Loss 7.6225 LearningRate 0.0719 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:00,864-Speed 3430.02 samples/sec Loss 7.4636 LearningRate 0.0718 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:03,969-Speed 3298.35 samples/sec Loss 7.5329 LearningRate 0.0718 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:06,969-Speed 3414.58 samples/sec Loss 7.6636 LearningRate 0.0718 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:09,964-Speed 3419.69 samples/sec Loss 7.4500 LearningRate 0.0718 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:12,946-Speed 3434.64 samples/sec Loss 7.6630 LearningRate 0.0718 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:15,926-Speed 3437.64 samples/sec Loss 7.6570 LearningRate 0.0718 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:18,908-Speed 3434.69 samples/sec Loss 7.4985 LearningRate 0.0718 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:21,909-Speed 3412.32 samples/sec Loss 7.5661 LearningRate 0.0717 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:13:24,886-Speed 3440.64 samples/sec Loss 7.6062 LearningRate 0.0717 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:27,893-Speed 3406.05 samples/sec Loss 7.6564 LearningRate 0.0717 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:30,896-Speed 3410.53 samples/sec Loss 7.6646 LearningRate 0.0717 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:33,887-Speed 3424.60 samples/sec Loss 7.5751 LearningRate 0.0717 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:36,877-Speed 3425.82 samples/sec Loss 7.6410 LearningRate 0.0717 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:39,870-Speed 3422.57 samples/sec Loss 7.7013 LearningRate 0.0717 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:42,851-Speed 3435.44 samples/sec Loss 7.6950 LearningRate 0.0716 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:45,841-Speed 3425.25 samples/sec Loss 7.6038 LearningRate 0.0716 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:48,858-Speed 3395.73 samples/sec Loss 7.5904 LearningRate 0.0716 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:51,835-Speed 3440.29 samples/sec Loss 7.5920 LearningRate 0.0716 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:13:54,821-Speed 3430.12 samples/sec Loss 7.5945 LearningRate 0.0716 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:13:57,783-Speed 3457.88 samples/sec Loss 7.4592 LearningRate 0.0716 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:00,775-Speed 3422.63 samples/sec Loss 7.5083 LearningRate 0.0715 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:03,775-Speed 3414.64 samples/sec Loss 7.5330 LearningRate 0.0715 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:06,776-Speed 3413.13 samples/sec Loss 7.6959 LearningRate 0.0715 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:09,783-Speed 3406.53 samples/sec Loss 7.6123 LearningRate 0.0715 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:12,788-Speed 3407.92 samples/sec Loss 7.6903 LearningRate 0.0715 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:15,787-Speed 3415.78 samples/sec Loss 7.5350 LearningRate 0.0715 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:18,800-Speed 3399.20 samples/sec Loss 7.8129 LearningRate 0.0715 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:21,804-Speed 3409.82 samples/sec Loss 7.6198 LearningRate 0.0714 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:24,912-Speed 3294.89 samples/sec Loss 7.7478 LearningRate 0.0714 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:27,912-Speed 3413.73 samples/sec Loss 7.6231 LearningRate 0.0714 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:14:30,903-Speed 3425.25 samples/sec Loss 7.6840 LearningRate 0.0714 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:33,893-Speed 3424.80 samples/sec Loss 7.6827 LearningRate 0.0714 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:36,884-Speed 3425.14 samples/sec Loss 7.6618 LearningRate 0.0714 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:39,869-Speed 3431.38 samples/sec Loss 7.6193 LearningRate 0.0714 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:42,848-Speed 3437.84 samples/sec Loss 7.6040 LearningRate 0.0713 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:45,832-Speed 3432.04 samples/sec Loss 7.7618 LearningRate 0.0713 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:48,819-Speed 3429.50 samples/sec Loss 7.7916 LearningRate 0.0713 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:51,793-Speed 3444.44 samples/sec Loss 7.5851 LearningRate 0.0713 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:54,796-Speed 3410.13 samples/sec Loss 7.7041 LearningRate 0.0713 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:14:57,793-Speed 3418.25 samples/sec Loss 7.6435 LearningRate 0.0713 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:00,779-Speed 3430.01 samples/sec Loss 7.6448 LearningRate 0.0712 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:15:03,754-Speed 3442.78 samples/sec Loss 7.8288 LearningRate 0.0712 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:06,765-Speed 3401.64 samples/sec Loss 7.6389 LearningRate 0.0712 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:09,762-Speed 3417.23 samples/sec Loss 7.8096 LearningRate 0.0712 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:12,767-Speed 3408.24 samples/sec Loss 7.7188 LearningRate 0.0712 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:15,749-Speed 3435.27 samples/sec Loss 7.5655 LearningRate 0.0712 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:18,729-Speed 3437.14 samples/sec Loss 7.7416 LearningRate 0.0712 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:21,722-Speed 3421.78 samples/sec Loss 7.6062 LearningRate 0.0711 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:24,722-Speed 3414.28 samples/sec Loss 7.5862 LearningRate 0.0711 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:27,741-Speed 3392.20 samples/sec Loss 7.7084 LearningRate 0.0711 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:30,759-Speed 3394.03 samples/sec Loss 7.7732 LearningRate 0.0711 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:33,777-Speed 3394.41 samples/sec Loss 7.7383 LearningRate 0.0711 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:15:36,773-Speed 3419.14 samples/sec Loss 7.6951 LearningRate 0.0711 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:39,791-Speed 3393.73 samples/sec Loss 7.7693 LearningRate 0.0711 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:42,803-Speed 3400.54 samples/sec Loss 7.6512 LearningRate 0.0710 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:45,835-Speed 3378.31 samples/sec Loss 7.6271 LearningRate 0.0710 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:48,835-Speed 3414.68 samples/sec Loss 7.6921 LearningRate 0.0710 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:51,828-Speed 3422.01 samples/sec Loss 7.7503 LearningRate 0.0710 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:54,839-Speed 3400.89 samples/sec Loss 7.8306 LearningRate 0.0710 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:15:57,857-Speed 3394.58 samples/sec Loss 7.6995 LearningRate 0.0710 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:00,886-Speed 3381.10 samples/sec Loss 7.8562 LearningRate 0.0710 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:03,911-Speed 3386.67 samples/sec Loss 7.8840 LearningRate 0.0709 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:06,935-Speed 3387.03 samples/sec Loss 7.7127 LearningRate 0.0709 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:16:09,924-Speed 3425.87 samples/sec Loss 7.7293 LearningRate 0.0709 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:12,910-Speed 3431.09 samples/sec Loss 7.5480 LearningRate 0.0709 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:15,889-Speed 3437.57 samples/sec Loss 7.8240 LearningRate 0.0709 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:18,911-Speed 3389.22 samples/sec Loss 7.7183 LearningRate 0.0709 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:16:21,897-Speed 3430.11 samples/sec Loss 7.7898 LearningRate 0.0708 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:17:05,215-[lfw][18000]XNorm: 21.702406 Training: 2022-04-27 03:17:05,216-[lfw][18000]Accuracy-Flip: 0.99533+-0.00348 Training: 2022-04-27 03:17:05,216-[lfw][18000]Accuracy-Highest: 0.99633 Training: 2022-04-27 03:17:55,618-[cfp_fp][18000]XNorm: 18.879926 Training: 2022-04-27 03:17:55,619-[cfp_fp][18000]Accuracy-Flip: 0.92943+-0.01472 Training: 2022-04-27 03:17:55,619-[cfp_fp][18000]Accuracy-Highest: 0.93214 Training: 2022-04-27 03:18:39,074-[agedb_30][18000]XNorm: 21.278972 Training: 2022-04-27 03:18:39,075-[agedb_30][18000]Accuracy-Flip: 0.96583+-0.00790 Training: 2022-04-27 03:18:39,075-[agedb_30][18000]Accuracy-Highest: 0.96700 Training: 2022-04-27 03:18:42,070-Speed 73.05 samples/sec Loss 7.7985 LearningRate 0.0708 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:18:45,034-Speed 3455.41 samples/sec Loss 7.6265 LearningRate 0.0708 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:18:48,036-Speed 3412.04 samples/sec Loss 7.7247 LearningRate 0.0708 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:18:51,025-Speed 3426.97 samples/sec Loss 7.8285 LearningRate 0.0708 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:18:54,002-Speed 3440.67 samples/sec Loss 7.6176 LearningRate 0.0708 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:18:56,998-Speed 3418.65 samples/sec Loss 7.7467 LearningRate 0.0708 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:00,024-Speed 3384.86 samples/sec Loss 7.7955 LearningRate 0.0707 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:03,069-Speed 3363.86 samples/sec Loss 7.7107 LearningRate 0.0707 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:06,072-Speed 3410.72 samples/sec Loss 7.8319 LearningRate 0.0707 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:09,070-Speed 3416.62 samples/sec Loss 7.6809 LearningRate 0.0707 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:12,051-Speed 3435.78 samples/sec Loss 7.7490 LearningRate 0.0707 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:15,063-Speed 3399.83 samples/sec Loss 7.8048 LearningRate 0.0707 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:18,118-Speed 3352.91 samples/sec Loss 7.7618 LearningRate 0.0707 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:21,107-Speed 3427.03 samples/sec Loss 7.5527 LearningRate 0.0706 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:24,094-Speed 3429.22 samples/sec Loss 7.6804 LearningRate 0.0706 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:27,085-Speed 3424.60 samples/sec Loss 7.6763 LearningRate 0.0706 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:30,083-Speed 3416.17 samples/sec Loss 7.7382 LearningRate 0.0706 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:19:33,041-Speed 3462.77 samples/sec Loss 7.6534 LearningRate 0.0706 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:36,037-Speed 3417.98 samples/sec Loss 7.7553 LearningRate 0.0706 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:39,023-Speed 3430.41 samples/sec Loss 7.7666 LearningRate 0.0706 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:42,011-Speed 3427.70 samples/sec Loss 7.6131 LearningRate 0.0705 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:44,990-Speed 3437.85 samples/sec Loss 7.6836 LearningRate 0.0705 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:47,986-Speed 3418.56 samples/sec Loss 7.8663 LearningRate 0.0705 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:50,985-Speed 3415.76 samples/sec Loss 7.5930 LearningRate 0.0705 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:53,977-Speed 3423.14 samples/sec Loss 7.8730 LearningRate 0.0705 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:56,958-Speed 3435.98 samples/sec Loss 7.6953 LearningRate 0.0705 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:19:59,919-Speed 3458.61 samples/sec Loss 7.6864 LearningRate 0.0704 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:02,923-Speed 3410.75 samples/sec Loss 7.6364 LearningRate 0.0704 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:05,921-Speed 3416.01 samples/sec Loss 7.7808 LearningRate 0.0704 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:08,900-Speed 3437.87 samples/sec Loss 7.7535 LearningRate 0.0704 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:11,933-Speed 3377.56 samples/sec Loss 7.6288 LearningRate 0.0704 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:14,925-Speed 3422.18 samples/sec Loss 7.5487 LearningRate 0.0704 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:17,914-Speed 3427.00 samples/sec Loss 7.7234 LearningRate 0.0704 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:20,891-Speed 3441.09 samples/sec Loss 7.6883 LearningRate 0.0703 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:23,893-Speed 3412.23 samples/sec Loss 7.6098 LearningRate 0.0703 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:26,884-Speed 3423.87 samples/sec Loss 7.8244 LearningRate 0.0703 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:29,886-Speed 3411.61 samples/sec Loss 7.8459 LearningRate 0.0703 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:32,894-Speed 3405.62 samples/sec Loss 7.6137 LearningRate 0.0703 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:20:35,887-Speed 3422.44 samples/sec Loss 7.6798 LearningRate 0.0703 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:38,856-Speed 3448.89 samples/sec Loss 7.6111 LearningRate 0.0703 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:41,843-Speed 3428.95 samples/sec Loss 7.7119 LearningRate 0.0702 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:20:44,825-Speed 3434.70 samples/sec Loss 7.6681 LearningRate 0.0702 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:47,822-Speed 3417.49 samples/sec Loss 7.7973 LearningRate 0.0702 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:50,788-Speed 3453.77 samples/sec Loss 7.5511 LearningRate 0.0702 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:53,781-Speed 3422.57 samples/sec Loss 7.7019 LearningRate 0.0702 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:56,785-Speed 3408.99 samples/sec Loss 7.8359 LearningRate 0.0702 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:20:59,784-Speed 3415.45 samples/sec Loss 7.6144 LearningRate 0.0702 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:21:02,789-Speed 3408.80 samples/sec Loss 7.9264 LearningRate 0.0701 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:21:05,793-Speed 3408.74 samples/sec Loss 7.6969 LearningRate 0.0701 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:21:08,793-Speed 3414.28 samples/sec Loss 7.7355 LearningRate 0.0701 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:21:11,798-Speed 3408.25 samples/sec Loss 7.7713 LearningRate 0.0701 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:21:14,812-Speed 3399.18 samples/sec Loss 7.8533 LearningRate 0.0701 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:17,798-Speed 3429.71 samples/sec Loss 7.7450 LearningRate 0.0701 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:20,795-Speed 3418.47 samples/sec Loss 7.7435 LearningRate 0.0701 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:23,780-Speed 3430.56 samples/sec Loss 7.7312 LearningRate 0.0700 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:26,779-Speed 3414.86 samples/sec Loss 7.7524 LearningRate 0.0700 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:29,747-Speed 3451.00 samples/sec Loss 7.6475 LearningRate 0.0700 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:32,718-Speed 3447.33 samples/sec Loss 7.7429 LearningRate 0.0700 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:35,715-Speed 3417.50 samples/sec Loss 7.7190 LearningRate 0.0700 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:38,705-Speed 3426.13 samples/sec Loss 7.6042 LearningRate 0.0700 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:41,683-Speed 3439.31 samples/sec Loss 7.6608 LearningRate 0.0699 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:44,663-Speed 3437.35 samples/sec Loss 7.6961 LearningRate 0.0699 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:21:47,658-Speed 3419.81 samples/sec Loss 7.8637 LearningRate 0.0699 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:50,668-Speed 3402.75 samples/sec Loss 7.7290 LearningRate 0.0699 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:53,737-Speed 3336.97 samples/sec Loss 7.7663 LearningRate 0.0699 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:56,701-Speed 3455.37 samples/sec Loss 7.7239 LearningRate 0.0699 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:21:59,706-Speed 3409.17 samples/sec Loss 7.7168 LearningRate 0.0699 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:02,693-Speed 3428.61 samples/sec Loss 7.4515 LearningRate 0.0698 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:05,696-Speed 3410.44 samples/sec Loss 7.7857 LearningRate 0.0698 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:08,680-Speed 3432.38 samples/sec Loss 7.6572 LearningRate 0.0698 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:11,711-Speed 3380.31 samples/sec Loss 7.5355 LearningRate 0.0698 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:14,720-Speed 3403.31 samples/sec Loss 7.6151 LearningRate 0.0698 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:17,731-Speed 3402.20 samples/sec Loss 7.7179 LearningRate 0.0698 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:20,727-Speed 3417.87 samples/sec Loss 7.6252 LearningRate 0.0698 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:23,747-Speed 3391.79 samples/sec Loss 7.7119 LearningRate 0.0697 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:26,729-Speed 3434.58 samples/sec Loss 7.6702 LearningRate 0.0697 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:29,714-Speed 3431.02 samples/sec Loss 7.6384 LearningRate 0.0697 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:32,696-Speed 3434.70 samples/sec Loss 7.7072 LearningRate 0.0697 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:35,694-Speed 3417.23 samples/sec Loss 7.5547 LearningRate 0.0697 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:38,692-Speed 3416.21 samples/sec Loss 7.6689 LearningRate 0.0697 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:41,697-Speed 3409.12 samples/sec Loss 7.6189 LearningRate 0.0697 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:44,708-Speed 3401.29 samples/sec Loss 7.6273 LearningRate 0.0696 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:47,711-Speed 3410.37 samples/sec Loss 7.5945 LearningRate 0.0696 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:22:50,692-Speed 3435.49 samples/sec Loss 7.5921 LearningRate 0.0696 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:53,694-Speed 3412.11 samples/sec Loss 7.5833 LearningRate 0.0696 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:56,680-Speed 3429.72 samples/sec Loss 7.6398 LearningRate 0.0696 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:22:59,669-Speed 3426.88 samples/sec Loss 7.6218 LearningRate 0.0696 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:02,677-Speed 3405.05 samples/sec Loss 7.7118 LearningRate 0.0696 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:05,659-Speed 3434.87 samples/sec Loss 7.6237 LearningRate 0.0695 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:08,656-Speed 3417.90 samples/sec Loss 7.6511 LearningRate 0.0695 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:11,648-Speed 3423.07 samples/sec Loss 7.6979 LearningRate 0.0695 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:14,674-Speed 3384.63 samples/sec Loss 7.5621 LearningRate 0.0695 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:17,676-Speed 3411.51 samples/sec Loss 7.7125 LearningRate 0.0695 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:23:20,694-Speed 3394.19 samples/sec Loss 7.5855 LearningRate 0.0695 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:23,687-Speed 3422.71 samples/sec Loss 7.6522 LearningRate 0.0694 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:26,681-Speed 3420.55 samples/sec Loss 7.5076 LearningRate 0.0694 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:29,670-Speed 3426.63 samples/sec Loss 7.6540 LearningRate 0.0694 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:32,664-Speed 3420.76 samples/sec Loss 7.5686 LearningRate 0.0694 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:35,690-Speed 3385.57 samples/sec Loss 7.6426 LearningRate 0.0694 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:38,679-Speed 3426.68 samples/sec Loss 7.5768 LearningRate 0.0694 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:41,673-Speed 3420.97 samples/sec Loss 7.6927 LearningRate 0.0694 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:44,690-Speed 3394.52 samples/sec Loss 7.6787 LearningRate 0.0693 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:47,683-Speed 3422.00 samples/sec Loss 7.5778 LearningRate 0.0693 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:23:50,675-Speed 3422.65 samples/sec Loss 7.6053 LearningRate 0.0693 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:23:53,655-Speed 3437.00 samples/sec Loss 7.4828 LearningRate 0.0693 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:23:56,665-Speed 3402.76 samples/sec Loss 7.7067 LearningRate 0.0693 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:23:59,670-Speed 3409.11 samples/sec Loss 7.7008 LearningRate 0.0693 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:02,648-Speed 3439.45 samples/sec Loss 7.6218 LearningRate 0.0693 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:05,650-Speed 3412.23 samples/sec Loss 7.4842 LearningRate 0.0692 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:08,660-Speed 3402.38 samples/sec Loss 7.5629 LearningRate 0.0692 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:11,675-Speed 3397.08 samples/sec Loss 7.4711 LearningRate 0.0692 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:14,678-Speed 3410.81 samples/sec Loss 7.6747 LearningRate 0.0692 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:17,695-Speed 3394.52 samples/sec Loss 7.6338 LearningRate 0.0692 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:20,712-Speed 3394.90 samples/sec Loss 7.6456 LearningRate 0.0692 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:23,710-Speed 3416.12 samples/sec Loss 7.5422 LearningRate 0.0692 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:26,698-Speed 3428.70 samples/sec Loss 7.7426 LearningRate 0.0691 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:30,001-Speed 3100.68 samples/sec Loss 7.6659 LearningRate 0.0691 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:24:33,000-Speed 3415.36 samples/sec Loss 7.7848 LearningRate 0.0691 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:35,995-Speed 3419.60 samples/sec Loss 7.4875 LearningRate 0.0691 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:38,996-Speed 3413.08 samples/sec Loss 7.4876 LearningRate 0.0691 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:41,981-Speed 3430.95 samples/sec Loss 7.6384 LearningRate 0.0691 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:44,985-Speed 3409.09 samples/sec Loss 7.6934 LearningRate 0.0691 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:47,986-Speed 3412.93 samples/sec Loss 7.6750 LearningRate 0.0690 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:50,985-Speed 3415.37 samples/sec Loss 7.7137 LearningRate 0.0690 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:53,983-Speed 3416.39 samples/sec Loss 7.6491 LearningRate 0.0690 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:24:57,011-Speed 3382.54 samples/sec Loss 7.6775 LearningRate 0.0690 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:00,034-Speed 3388.79 samples/sec Loss 7.5309 LearningRate 0.0690 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:03,045-Speed 3402.16 samples/sec Loss 7.4758 LearningRate 0.0690 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:25:06,068-Speed 3388.58 samples/sec Loss 7.7104 LearningRate 0.0690 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:25:09,078-Speed 3402.46 samples/sec Loss 7.6014 LearningRate 0.0689 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:12,059-Speed 3436.63 samples/sec Loss 7.5011 LearningRate 0.0689 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:15,056-Speed 3417.34 samples/sec Loss 7.5211 LearningRate 0.0689 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:18,062-Speed 3406.91 samples/sec Loss 7.5312 LearningRate 0.0689 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:21,169-Speed 3296.42 samples/sec Loss 7.6088 LearningRate 0.0689 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:24,290-Speed 3282.18 samples/sec Loss 7.6826 LearningRate 0.0689 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:27,324-Speed 3376.47 samples/sec Loss 7.6802 LearningRate 0.0688 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:30,341-Speed 3394.91 samples/sec Loss 7.5817 LearningRate 0.0688 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:33,346-Speed 3407.59 samples/sec Loss 7.5857 LearningRate 0.0688 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:36,351-Speed 3408.73 samples/sec Loss 7.7001 LearningRate 0.0688 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:39,362-Speed 3401.72 samples/sec Loss 7.6537 LearningRate 0.0688 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:25:42,368-Speed 3407.45 samples/sec Loss 7.6037 LearningRate 0.0688 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:45,392-Speed 3387.29 samples/sec Loss 7.6189 LearningRate 0.0688 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:48,419-Speed 3383.64 samples/sec Loss 7.7574 LearningRate 0.0687 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:51,413-Speed 3420.67 samples/sec Loss 7.5587 LearningRate 0.0687 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:54,426-Speed 3400.08 samples/sec Loss 7.6314 LearningRate 0.0687 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:25:57,435-Speed 3403.49 samples/sec Loss 7.4862 LearningRate 0.0687 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:26:00,446-Speed 3402.01 samples/sec Loss 7.5442 LearningRate 0.0687 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:03,458-Speed 3399.80 samples/sec Loss 7.5191 LearningRate 0.0687 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:06,446-Speed 3428.72 samples/sec Loss 7.4440 LearningRate 0.0687 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:09,470-Speed 3386.21 samples/sec Loss 7.6514 LearningRate 0.0686 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:12,493-Speed 3389.18 samples/sec Loss 7.7438 LearningRate 0.0686 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:26:15,483-Speed 3424.65 samples/sec Loss 7.4790 LearningRate 0.0686 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:18,486-Speed 3411.99 samples/sec Loss 7.5406 LearningRate 0.0686 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:21,480-Speed 3420.00 samples/sec Loss 7.4391 LearningRate 0.0686 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:24,497-Speed 3395.25 samples/sec Loss 7.5903 LearningRate 0.0686 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:27,511-Speed 3398.02 samples/sec Loss 7.5264 LearningRate 0.0686 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:30,520-Speed 3403.62 samples/sec Loss 7.5722 LearningRate 0.0685 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:33,517-Speed 3417.98 samples/sec Loss 7.5189 LearningRate 0.0685 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:36,513-Speed 3418.05 samples/sec Loss 7.4658 LearningRate 0.0685 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:39,536-Speed 3389.25 samples/sec Loss 7.6546 LearningRate 0.0685 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:42,555-Speed 3392.22 samples/sec Loss 7.7609 LearningRate 0.0685 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:45,569-Speed 3398.43 samples/sec Loss 7.5181 LearningRate 0.0685 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:26:48,559-Speed 3426.32 samples/sec Loss 7.5483 LearningRate 0.0685 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:51,576-Speed 3393.91 samples/sec Loss 7.7061 LearningRate 0.0684 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:54,589-Speed 3399.80 samples/sec Loss 7.6449 LearningRate 0.0684 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:26:57,626-Speed 3372.00 samples/sec Loss 7.6496 LearningRate 0.0684 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:00,649-Speed 3388.18 samples/sec Loss 7.4137 LearningRate 0.0684 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:03,678-Speed 3381.26 samples/sec Loss 7.5955 LearningRate 0.0684 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:06,684-Speed 3407.30 samples/sec Loss 7.5434 LearningRate 0.0684 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:09,685-Speed 3414.13 samples/sec Loss 7.4623 LearningRate 0.0684 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:12,692-Speed 3405.39 samples/sec Loss 7.5666 LearningRate 0.0683 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:15,706-Speed 3398.14 samples/sec Loss 7.4482 LearningRate 0.0683 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:18,709-Speed 3411.76 samples/sec Loss 7.7088 LearningRate 0.0683 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:21,716-Speed 3405.35 samples/sec Loss 7.5852 LearningRate 0.0683 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:24,726-Speed 3403.20 samples/sec Loss 7.5290 LearningRate 0.0683 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:27,760-Speed 3375.51 samples/sec Loss 7.4765 LearningRate 0.0683 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:30,786-Speed 3385.08 samples/sec Loss 7.5030 LearningRate 0.0683 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:33,818-Speed 3377.75 samples/sec Loss 7.6792 LearningRate 0.0682 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:36,811-Speed 3422.47 samples/sec Loss 7.5889 LearningRate 0.0682 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:39,833-Speed 3389.25 samples/sec Loss 7.6579 LearningRate 0.0682 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:42,866-Speed 3377.50 samples/sec Loss 7.6460 LearningRate 0.0682 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:45,894-Speed 3381.99 samples/sec Loss 7.5330 LearningRate 0.0682 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:48,930-Speed 3373.49 samples/sec Loss 7.4628 LearningRate 0.0682 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:27:51,954-Speed 3386.94 samples/sec Loss 7.6671 LearningRate 0.0682 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:54,992-Speed 3371.76 samples/sec Loss 7.5224 LearningRate 0.0681 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:27:58,019-Speed 3383.93 samples/sec Loss 7.5998 LearningRate 0.0681 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:28:01,057-Speed 3370.85 samples/sec Loss 7.4758 LearningRate 0.0681 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:28:04,068-Speed 3402.06 samples/sec Loss 7.4956 LearningRate 0.0681 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:28:07,083-Speed 3396.91 samples/sec Loss 7.6408 LearningRate 0.0681 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:10,089-Speed 3407.35 samples/sec Loss 7.5708 LearningRate 0.0681 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:13,078-Speed 3426.64 samples/sec Loss 7.4473 LearningRate 0.0680 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:16,118-Speed 3369.88 samples/sec Loss 7.5498 LearningRate 0.0680 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:19,131-Speed 3399.47 samples/sec Loss 7.4117 LearningRate 0.0680 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:22,157-Speed 3384.98 samples/sec Loss 7.3860 LearningRate 0.0680 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:25,169-Speed 3400.91 samples/sec Loss 7.5044 LearningRate 0.0680 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:28,183-Speed 3397.72 samples/sec Loss 7.4408 LearningRate 0.0680 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:31,193-Speed 3403.20 samples/sec Loss 7.5799 LearningRate 0.0680 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:34,219-Speed 3385.08 samples/sec Loss 7.5435 LearningRate 0.0679 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:28:37,261-Speed 3367.17 samples/sec Loss 7.5722 LearningRate 0.0679 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:28:40,296-Speed 3373.76 samples/sec Loss 7.5731 LearningRate 0.0679 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:29:23,734-[lfw][20000]XNorm: 21.442830 Training: 2022-04-27 03:29:23,734-[lfw][20000]Accuracy-Flip: 0.99600+-0.00351 Training: 2022-04-27 03:29:23,735-[lfw][20000]Accuracy-Highest: 0.99633 Training: 2022-04-27 03:30:14,203-[cfp_fp][20000]XNorm: 19.539685 Training: 2022-04-27 03:30:14,203-[cfp_fp][20000]Accuracy-Flip: 0.93114+-0.01514 Training: 2022-04-27 03:30:14,204-[cfp_fp][20000]Accuracy-Highest: 0.93214 Training: 2022-04-27 03:30:57,601-[agedb_30][20000]XNorm: 21.576006 Training: 2022-04-27 03:30:57,602-[agedb_30][20000]Accuracy-Flip: 0.97167+-0.00771 Training: 2022-04-27 03:30:57,602-[agedb_30][20000]Accuracy-Highest: 0.97167 Training: 2022-04-27 03:31:00,614-Speed 72.98 samples/sec Loss 7.5742 LearningRate 0.0679 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:03,607-Speed 3421.31 samples/sec Loss 7.3463 LearningRate 0.0679 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:06,604-Speed 3417.79 samples/sec Loss 7.4157 LearningRate 0.0679 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:09,582-Speed 3439.00 samples/sec Loss 7.4190 LearningRate 0.0679 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:12,606-Speed 3387.37 samples/sec Loss 7.5717 LearningRate 0.0678 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:15,595-Speed 3425.88 samples/sec Loss 7.5570 LearningRate 0.0678 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:18,614-Speed 3393.50 samples/sec Loss 7.5244 LearningRate 0.0678 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:21,621-Speed 3406.09 samples/sec Loss 7.3571 LearningRate 0.0678 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:24,641-Speed 3391.63 samples/sec Loss 7.4560 LearningRate 0.0678 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-27 03:31:27,735-Speed 3310.49 samples/sec Loss 7.5317 LearningRate 0.0678 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:30,739-Speed 3409.62 samples/sec Loss 7.4287 LearningRate 0.0678 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:33,754-Speed 3396.40 samples/sec Loss 7.5535 LearningRate 0.0677 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:36,771-Speed 3395.08 samples/sec Loss 7.5843 LearningRate 0.0677 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:39,768-Speed 3417.41 samples/sec Loss 7.4061 LearningRate 0.0677 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:31:42,757-Speed 3427.08 samples/sec Loss 7.4228 LearningRate 0.0677 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:31:45,778-Speed 3390.73 samples/sec Loss 7.4979 LearningRate 0.0677 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:31:48,795-Speed 3394.72 samples/sec Loss 7.5426 LearningRate 0.0677 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:31:51,825-Speed 3380.83 samples/sec Loss 7.5500 LearningRate 0.0677 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:31:54,877-Speed 3355.38 samples/sec Loss 7.5545 LearningRate 0.0676 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:31:57,895-Speed 3394.12 samples/sec Loss 7.6523 LearningRate 0.0676 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:00,913-Speed 3394.12 samples/sec Loss 7.4397 LearningRate 0.0676 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:03,941-Speed 3382.86 samples/sec Loss 7.4859 LearningRate 0.0676 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:06,968-Speed 3383.29 samples/sec Loss 7.5125 LearningRate 0.0676 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:09,989-Speed 3390.39 samples/sec Loss 7.5852 LearningRate 0.0676 Epoch: 3 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:13,004-Speed 3397.48 samples/sec Loss 7.4494 LearningRate 0.0676 Epoch: 3 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:16,037-Speed 3377.31 samples/sec Loss 7.3952 LearningRate 0.0675 Epoch: 3 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:19,066-Speed 3381.38 samples/sec Loss 7.3298 LearningRate 0.0675 Epoch: 3 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:22,085-Speed 3392.81 samples/sec Loss 7.4873 LearningRate 0.0675 Epoch: 3 Global Step: 20280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:25,106-Speed 3390.55 samples/sec Loss 7.4090 LearningRate 0.0675 Epoch: 3 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:28,130-Speed 3386.88 samples/sec Loss 7.3767 LearningRate 0.0675 Epoch: 3 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:31,135-Speed 3408.10 samples/sec Loss 7.3997 LearningRate 0.0675 Epoch: 3 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:34,153-Speed 3394.10 samples/sec Loss 7.4026 LearningRate 0.0675 Epoch: 3 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:32:37,150-Speed 3417.30 samples/sec Loss 7.3439 LearningRate 0.0674 Epoch: 3 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:40,178-Speed 3382.58 samples/sec Loss 7.4667 LearningRate 0.0674 Epoch: 3 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:43,184-Speed 3408.18 samples/sec Loss 7.4904 LearningRate 0.0674 Epoch: 3 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:46,191-Speed 3405.32 samples/sec Loss 7.4895 LearningRate 0.0674 Epoch: 3 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:49,190-Speed 3415.67 samples/sec Loss 7.3712 LearningRate 0.0674 Epoch: 3 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:52,204-Speed 3398.27 samples/sec Loss 7.3810 LearningRate 0.0674 Epoch: 3 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:55,204-Speed 3413.79 samples/sec Loss 7.3774 LearningRate 0.0674 Epoch: 3 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:32:58,191-Speed 3429.45 samples/sec Loss 7.3719 LearningRate 0.0673 Epoch: 3 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:33:01,205-Speed 3397.58 samples/sec Loss 7.5305 LearningRate 0.0673 Epoch: 3 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:33:04,204-Speed 3416.30 samples/sec Loss 7.2459 LearningRate 0.0673 Epoch: 3 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 03:33:07,224-Speed 3391.57 samples/sec Loss 7.5860 LearningRate 0.0673 Epoch: 3 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 03:33:10,244-Speed 3391.22 samples/sec Loss 7.4569 LearningRate 0.0673 Epoch: 3 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:13,254-Speed 3403.07 samples/sec Loss 7.6904 LearningRate 0.0673 Epoch: 3 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:16,263-Speed 3403.22 samples/sec Loss 7.3755 LearningRate 0.0673 Epoch: 3 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:19,284-Speed 3391.02 samples/sec Loss 7.3978 LearningRate 0.0672 Epoch: 3 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:22,301-Speed 3394.70 samples/sec Loss 7.4320 LearningRate 0.0672 Epoch: 3 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:25,323-Speed 3389.31 samples/sec Loss 7.4796 LearningRate 0.0672 Epoch: 3 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:28,321-Speed 3416.28 samples/sec Loss 7.2400 LearningRate 0.0672 Epoch: 3 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:31,342-Speed 3389.98 samples/sec Loss 7.5217 LearningRate 0.0672 Epoch: 3 Global Step: 20510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:33:34,324-Speed 3436.05 samples/sec Loss 7.3600 LearningRate 0.0672 Epoch: 3 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:37,341-Speed 3393.97 samples/sec Loss 7.3467 LearningRate 0.0672 Epoch: 3 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:40,340-Speed 3416.15 samples/sec Loss 7.4513 LearningRate 0.0671 Epoch: 3 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:43,347-Speed 3404.92 samples/sec Loss 7.3992 LearningRate 0.0671 Epoch: 3 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:46,380-Speed 3378.06 samples/sec Loss 7.5161 LearningRate 0.0671 Epoch: 3 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:49,403-Speed 3387.66 samples/sec Loss 7.4147 LearningRate 0.0671 Epoch: 3 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:52,425-Speed 3389.63 samples/sec Loss 7.5711 LearningRate 0.0671 Epoch: 3 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:55,432-Speed 3405.12 samples/sec Loss 7.4032 LearningRate 0.0671 Epoch: 3 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:33:58,455-Speed 3388.62 samples/sec Loss 7.5301 LearningRate 0.0671 Epoch: 3 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:01,477-Speed 3389.95 samples/sec Loss 7.5856 LearningRate 0.0670 Epoch: 3 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:04,492-Speed 3397.10 samples/sec Loss 7.5411 LearningRate 0.0670 Epoch: 3 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:07,505-Speed 3399.69 samples/sec Loss 7.4747 LearningRate 0.0670 Epoch: 3 Global Step: 20630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:10,526-Speed 3390.55 samples/sec Loss 7.5534 LearningRate 0.0670 Epoch: 3 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:13,547-Speed 3389.62 samples/sec Loss 7.5776 LearningRate 0.0670 Epoch: 3 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:16,555-Speed 3405.16 samples/sec Loss 7.6411 LearningRate 0.0670 Epoch: 3 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:19,554-Speed 3414.97 samples/sec Loss 7.4576 LearningRate 0.0670 Epoch: 3 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:22,573-Speed 3392.80 samples/sec Loss 7.6101 LearningRate 0.0669 Epoch: 3 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:25,599-Speed 3385.35 samples/sec Loss 7.5818 LearningRate 0.0669 Epoch: 3 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:34:28,574-Speed 3442.53 samples/sec Loss 7.4362 LearningRate 0.0669 Epoch: 3 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:31,572-Speed 3416.27 samples/sec Loss 7.4263 LearningRate 0.0669 Epoch: 3 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:34,593-Speed 3391.07 samples/sec Loss 7.4473 LearningRate 0.0669 Epoch: 3 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:37,602-Speed 3404.46 samples/sec Loss 7.3261 LearningRate 0.0669 Epoch: 3 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:40,620-Speed 3393.71 samples/sec Loss 7.5023 LearningRate 0.0669 Epoch: 3 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:43,638-Speed 3392.76 samples/sec Loss 7.4754 LearningRate 0.0668 Epoch: 3 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:46,666-Speed 3382.64 samples/sec Loss 7.4602 LearningRate 0.0668 Epoch: 3 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:49,687-Speed 3390.89 samples/sec Loss 7.3565 LearningRate 0.0668 Epoch: 3 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:52,702-Speed 3397.33 samples/sec Loss 7.4476 LearningRate 0.0668 Epoch: 3 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:55,719-Speed 3395.23 samples/sec Loss 7.3586 LearningRate 0.0668 Epoch: 3 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:34:58,743-Speed 3386.70 samples/sec Loss 7.3502 LearningRate 0.0668 Epoch: 3 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:01,759-Speed 3396.46 samples/sec Loss 7.4882 LearningRate 0.0667 Epoch: 3 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:04,777-Speed 3394.55 samples/sec Loss 7.3885 LearningRate 0.0667 Epoch: 3 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:07,794-Speed 3394.60 samples/sec Loss 7.4521 LearningRate 0.0667 Epoch: 3 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:10,818-Speed 3387.03 samples/sec Loss 7.4854 LearningRate 0.0667 Epoch: 3 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:13,839-Speed 3390.23 samples/sec Loss 7.4102 LearningRate 0.0667 Epoch: 3 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:16,859-Speed 3391.62 samples/sec Loss 7.3850 LearningRate 0.0667 Epoch: 3 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:19,878-Speed 3393.04 samples/sec Loss 7.5158 LearningRate 0.0667 Epoch: 3 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:22,906-Speed 3381.72 samples/sec Loss 7.3493 LearningRate 0.0666 Epoch: 3 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:25,918-Speed 3400.99 samples/sec Loss 7.5158 LearningRate 0.0666 Epoch: 3 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:28,946-Speed 3382.86 samples/sec Loss 7.3947 LearningRate 0.0666 Epoch: 3 Global Step: 20900 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:35:31,967-Speed 3390.06 samples/sec Loss 7.2988 LearningRate 0.0666 Epoch: 3 Global Step: 20910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:35:34,992-Speed 3386.61 samples/sec Loss 7.5861 LearningRate 0.0666 Epoch: 3 Global Step: 20920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:35:37,987-Speed 3419.17 samples/sec Loss 7.4642 LearningRate 0.0666 Epoch: 3 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:41,019-Speed 3378.03 samples/sec Loss 7.3765 LearningRate 0.0666 Epoch: 3 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:44,046-Speed 3383.81 samples/sec Loss 7.4054 LearningRate 0.0665 Epoch: 3 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:35:47,050-Speed 3409.28 samples/sec Loss 7.2580 LearningRate 0.0665 Epoch: 3 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:35:50,053-Speed 3410.60 samples/sec Loss 7.3064 LearningRate 0.0665 Epoch: 3 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:35:53,065-Speed 3400.61 samples/sec Loss 7.3127 LearningRate 0.0665 Epoch: 3 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:35:56,097-Speed 3378.24 samples/sec Loss 7.4150 LearningRate 0.0665 Epoch: 3 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:35:59,119-Speed 3389.19 samples/sec Loss 7.4917 LearningRate 0.0665 Epoch: 3 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:02,140-Speed 3390.64 samples/sec Loss 7.2726 LearningRate 0.0665 Epoch: 3 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:05,166-Speed 3384.39 samples/sec Loss 7.2524 LearningRate 0.0664 Epoch: 3 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:08,183-Speed 3394.74 samples/sec Loss 7.2490 LearningRate 0.0664 Epoch: 3 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:11,208-Speed 3386.33 samples/sec Loss 7.4101 LearningRate 0.0664 Epoch: 3 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:14,229-Speed 3390.25 samples/sec Loss 7.2374 LearningRate 0.0664 Epoch: 3 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:17,256-Speed 3383.74 samples/sec Loss 7.3333 LearningRate 0.0664 Epoch: 3 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:36:20,281-Speed 3385.84 samples/sec Loss 7.4061 LearningRate 0.0664 Epoch: 3 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:36:23,290-Speed 3403.97 samples/sec Loss 7.3892 LearningRate 0.0664 Epoch: 3 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:26,307-Speed 3395.63 samples/sec Loss 7.3607 LearningRate 0.0663 Epoch: 3 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:29,315-Speed 3405.14 samples/sec Loss 7.3781 LearningRate 0.0663 Epoch: 3 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:32,370-Speed 3351.92 samples/sec Loss 7.4789 LearningRate 0.0663 Epoch: 3 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:35,386-Speed 3396.00 samples/sec Loss 7.3596 LearningRate 0.0663 Epoch: 3 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:38,413-Speed 3384.17 samples/sec Loss 7.5909 LearningRate 0.0663 Epoch: 3 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:41,436-Speed 3387.46 samples/sec Loss 7.5336 LearningRate 0.0663 Epoch: 3 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:44,468-Speed 3378.24 samples/sec Loss 7.3601 LearningRate 0.0663 Epoch: 3 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:47,493-Speed 3386.38 samples/sec Loss 7.5063 LearningRate 0.0662 Epoch: 3 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:50,518-Speed 3385.65 samples/sec Loss 7.3390 LearningRate 0.0662 Epoch: 3 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:36:53,535-Speed 3395.03 samples/sec Loss 7.4207 LearningRate 0.0662 Epoch: 3 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:36:56,554-Speed 3392.63 samples/sec Loss 7.1832 LearningRate 0.0662 Epoch: 3 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:36:59,579-Speed 3386.28 samples/sec Loss 7.2922 LearningRate 0.0662 Epoch: 3 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:37:02,594-Speed 3397.29 samples/sec Loss 7.3450 LearningRate 0.0662 Epoch: 3 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:05,629-Speed 3374.52 samples/sec Loss 7.2852 LearningRate 0.0662 Epoch: 3 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:08,646-Speed 3394.81 samples/sec Loss 7.2811 LearningRate 0.0661 Epoch: 3 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:11,651-Speed 3407.72 samples/sec Loss 7.3711 LearningRate 0.0661 Epoch: 3 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:14,666-Speed 3397.54 samples/sec Loss 7.3408 LearningRate 0.0661 Epoch: 3 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:17,694-Speed 3383.33 samples/sec Loss 7.4455 LearningRate 0.0661 Epoch: 3 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:20,724-Speed 3380.09 samples/sec Loss 7.4233 LearningRate 0.0661 Epoch: 3 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:23,756-Speed 3377.84 samples/sec Loss 7.2034 LearningRate 0.0661 Epoch: 3 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:26,816-Speed 3347.56 samples/sec Loss 7.4340 LearningRate 0.0661 Epoch: 3 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:29,904-Speed 3316.07 samples/sec Loss 7.5451 LearningRate 0.0660 Epoch: 3 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:32,934-Speed 3380.46 samples/sec Loss 7.2225 LearningRate 0.0660 Epoch: 3 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:37:35,939-Speed 3407.95 samples/sec Loss 7.3841 LearningRate 0.0660 Epoch: 3 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:38,965-Speed 3385.61 samples/sec Loss 7.3533 LearningRate 0.0660 Epoch: 3 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:41,982-Speed 3395.13 samples/sec Loss 7.3068 LearningRate 0.0660 Epoch: 3 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:44,995-Speed 3399.06 samples/sec Loss 7.3676 LearningRate 0.0660 Epoch: 3 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:48,043-Speed 3360.90 samples/sec Loss 7.3115 LearningRate 0.0660 Epoch: 3 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:51,073-Speed 3380.13 samples/sec Loss 7.1999 LearningRate 0.0659 Epoch: 3 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:54,089-Speed 3395.67 samples/sec Loss 7.3164 LearningRate 0.0659 Epoch: 3 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:37:57,130-Speed 3368.44 samples/sec Loss 7.5458 LearningRate 0.0659 Epoch: 3 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:00,160-Speed 3380.03 samples/sec Loss 7.5167 LearningRate 0.0659 Epoch: 3 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:03,180-Speed 3391.32 samples/sec Loss 7.3017 LearningRate 0.0659 Epoch: 3 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:06,197-Speed 3395.75 samples/sec Loss 7.4277 LearningRate 0.0659 Epoch: 3 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:09,227-Speed 3380.52 samples/sec Loss 7.3236 LearningRate 0.0659 Epoch: 3 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:12,380-Speed 3247.83 samples/sec Loss 7.4059 LearningRate 0.0658 Epoch: 3 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:15,399-Speed 3392.84 samples/sec Loss 7.3912 LearningRate 0.0658 Epoch: 3 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:18,420-Speed 3390.60 samples/sec Loss 7.4795 LearningRate 0.0658 Epoch: 3 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:21,448-Speed 3381.83 samples/sec Loss 7.3969 LearningRate 0.0658 Epoch: 3 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:24,482-Speed 3376.77 samples/sec Loss 7.3820 LearningRate 0.0658 Epoch: 3 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:27,520-Speed 3370.91 samples/sec Loss 7.4172 LearningRate 0.0658 Epoch: 3 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:30,555-Speed 3374.65 samples/sec Loss 7.3697 LearningRate 0.0658 Epoch: 3 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:33,629-Speed 3332.64 samples/sec Loss 7.3566 LearningRate 0.0657 Epoch: 3 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:36,659-Speed 3380.04 samples/sec Loss 7.3092 LearningRate 0.0657 Epoch: 3 Global Step: 21520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:38:39,675-Speed 3396.39 samples/sec Loss 7.4339 LearningRate 0.0657 Epoch: 3 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:38:42,680-Speed 3408.51 samples/sec Loss 7.3215 LearningRate 0.0657 Epoch: 3 Global Step: 21540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:45,691-Speed 3401.23 samples/sec Loss 7.3601 LearningRate 0.0657 Epoch: 3 Global Step: 21550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:48,717-Speed 3384.71 samples/sec Loss 7.4138 LearningRate 0.0657 Epoch: 3 Global Step: 21560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:51,757-Speed 3369.68 samples/sec Loss 7.2597 LearningRate 0.0657 Epoch: 3 Global Step: 21570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:54,789-Speed 3377.91 samples/sec Loss 7.3113 LearningRate 0.0656 Epoch: 3 Global Step: 21580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:38:57,860-Speed 3335.66 samples/sec Loss 7.3136 LearningRate 0.0656 Epoch: 3 Global Step: 21590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:00,886-Speed 3384.15 samples/sec Loss 7.3416 LearningRate 0.0656 Epoch: 3 Global Step: 21600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:03,920-Speed 3375.65 samples/sec Loss 7.2265 LearningRate 0.0656 Epoch: 3 Global Step: 21610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:06,949-Speed 3382.51 samples/sec Loss 7.3868 LearningRate 0.0656 Epoch: 3 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:09,971-Speed 3388.52 samples/sec Loss 7.2496 LearningRate 0.0656 Epoch: 3 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:13,003-Speed 3378.94 samples/sec Loss 7.2762 LearningRate 0.0656 Epoch: 3 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:39:16,034-Speed 3379.15 samples/sec Loss 7.3369 LearningRate 0.0655 Epoch: 3 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:39:19,065-Speed 3379.71 samples/sec Loss 7.2310 LearningRate 0.0655 Epoch: 3 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:39:22,076-Speed 3401.26 samples/sec Loss 7.1326 LearningRate 0.0655 Epoch: 3 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:25,133-Speed 3350.71 samples/sec Loss 7.2321 LearningRate 0.0655 Epoch: 3 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:28,171-Speed 3371.70 samples/sec Loss 7.5001 LearningRate 0.0655 Epoch: 3 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:31,204-Speed 3376.42 samples/sec Loss 7.2116 LearningRate 0.0655 Epoch: 3 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:34,227-Speed 3387.72 samples/sec Loss 7.3895 LearningRate 0.0655 Epoch: 3 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:37,247-Speed 3392.35 samples/sec Loss 7.2271 LearningRate 0.0654 Epoch: 3 Global Step: 21720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:40,277-Speed 3380.57 samples/sec Loss 7.2941 LearningRate 0.0654 Epoch: 3 Global Step: 21730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:43,304-Speed 3384.16 samples/sec Loss 7.2964 LearningRate 0.0654 Epoch: 3 Global Step: 21740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:46,328-Speed 3386.10 samples/sec Loss 7.1904 LearningRate 0.0654 Epoch: 3 Global Step: 21750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:49,356-Speed 3382.85 samples/sec Loss 7.3764 LearningRate 0.0654 Epoch: 3 Global Step: 21760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:39:52,384-Speed 3382.79 samples/sec Loss 7.4272 LearningRate 0.0654 Epoch: 3 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:39:55,410-Speed 3384.94 samples/sec Loss 7.2999 LearningRate 0.0654 Epoch: 3 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:39:58,446-Speed 3373.56 samples/sec Loss 7.3786 LearningRate 0.0653 Epoch: 3 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:01,448-Speed 3411.98 samples/sec Loss 7.3168 LearningRate 0.0653 Epoch: 3 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:04,479-Speed 3378.83 samples/sec Loss 7.3860 LearningRate 0.0653 Epoch: 3 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:07,507-Speed 3382.18 samples/sec Loss 7.2010 LearningRate 0.0653 Epoch: 3 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:10,535-Speed 3382.70 samples/sec Loss 7.2805 LearningRate 0.0653 Epoch: 3 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:13,591-Speed 3351.57 samples/sec Loss 7.2132 LearningRate 0.0653 Epoch: 3 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:16,632-Speed 3368.60 samples/sec Loss 7.3148 LearningRate 0.0653 Epoch: 3 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:19,667-Speed 3374.24 samples/sec Loss 7.1790 LearningRate 0.0652 Epoch: 3 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:22,699-Speed 3378.05 samples/sec Loss 7.2492 LearningRate 0.0652 Epoch: 3 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:25,732-Speed 3378.13 samples/sec Loss 7.2733 LearningRate 0.0652 Epoch: 3 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:28,762-Speed 3379.25 samples/sec Loss 7.2250 LearningRate 0.0652 Epoch: 3 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:31,785-Speed 3388.91 samples/sec Loss 7.2055 LearningRate 0.0652 Epoch: 3 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:34,819-Speed 3375.29 samples/sec Loss 7.1567 LearningRate 0.0652 Epoch: 3 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:37,840-Speed 3390.56 samples/sec Loss 7.3674 LearningRate 0.0652 Epoch: 3 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:40,865-Speed 3385.80 samples/sec Loss 7.3102 LearningRate 0.0652 Epoch: 3 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:43,894-Speed 3381.50 samples/sec Loss 7.3008 LearningRate 0.0651 Epoch: 3 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:40:46,895-Speed 3413.59 samples/sec Loss 7.2111 LearningRate 0.0651 Epoch: 3 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:49,920-Speed 3385.33 samples/sec Loss 7.0888 LearningRate 0.0651 Epoch: 3 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:40:52,933-Speed 3399.07 samples/sec Loss 7.2024 LearningRate 0.0651 Epoch: 3 Global Step: 21970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:40:55,959-Speed 3385.65 samples/sec Loss 7.3254 LearningRate 0.0651 Epoch: 3 Global Step: 21980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:40:58,989-Speed 3379.74 samples/sec Loss 7.4264 LearningRate 0.0651 Epoch: 3 Global Step: 21990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:41:02,094-Speed 3298.81 samples/sec Loss 7.3613 LearningRate 0.0651 Epoch: 3 Global Step: 22000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:41:45,527-[lfw][22000]XNorm: 22.253873 Training: 2022-04-27 03:41:45,528-[lfw][22000]Accuracy-Flip: 0.99717+-0.00325 Training: 2022-04-27 03:41:45,528-[lfw][22000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:42:36,070-[cfp_fp][22000]XNorm: 19.574763 Training: 2022-04-27 03:42:36,071-[cfp_fp][22000]Accuracy-Flip: 0.94257+-0.01108 Training: 2022-04-27 03:42:36,071-[cfp_fp][22000]Accuracy-Highest: 0.94257 Training: 2022-04-27 03:43:19,382-[agedb_30][22000]XNorm: 22.148440 Training: 2022-04-27 03:43:19,383-[agedb_30][22000]Accuracy-Flip: 0.96983+-0.00893 Training: 2022-04-27 03:43:19,383-[agedb_30][22000]Accuracy-Highest: 0.97167 Training: 2022-04-27 03:43:22,398-Speed 72.98 samples/sec Loss 7.1951 LearningRate 0.0650 Epoch: 3 Global Step: 22010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:25,410-Speed 3400.79 samples/sec Loss 7.3108 LearningRate 0.0650 Epoch: 3 Global Step: 22020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:28,428-Speed 3394.27 samples/sec Loss 7.2444 LearningRate 0.0650 Epoch: 3 Global Step: 22030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:31,441-Speed 3399.34 samples/sec Loss 7.2526 LearningRate 0.0650 Epoch: 3 Global Step: 22040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:34,456-Speed 3397.46 samples/sec Loss 7.1681 LearningRate 0.0650 Epoch: 3 Global Step: 22050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:37,470-Speed 3397.29 samples/sec Loss 7.3932 LearningRate 0.0650 Epoch: 3 Global Step: 22060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:43:40,495-Speed 3386.00 samples/sec Loss 7.2190 LearningRate 0.0650 Epoch: 3 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:43,503-Speed 3405.91 samples/sec Loss 7.0947 LearningRate 0.0649 Epoch: 3 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:46,518-Speed 3396.42 samples/sec Loss 7.3178 LearningRate 0.0649 Epoch: 3 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:49,530-Speed 3400.72 samples/sec Loss 7.1310 LearningRate 0.0649 Epoch: 3 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:52,551-Speed 3390.61 samples/sec Loss 7.3329 LearningRate 0.0649 Epoch: 3 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:55,571-Speed 3391.34 samples/sec Loss 7.1886 LearningRate 0.0649 Epoch: 3 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:43:58,595-Speed 3386.92 samples/sec Loss 7.2998 LearningRate 0.0649 Epoch: 3 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:44:01,617-Speed 3389.86 samples/sec Loss 7.2480 LearningRate 0.0649 Epoch: 3 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:44:04,636-Speed 3392.06 samples/sec Loss 7.1749 LearningRate 0.0648 Epoch: 3 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:44:07,648-Speed 3400.30 samples/sec Loss 7.1324 LearningRate 0.0648 Epoch: 3 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:44:10,688-Speed 3369.29 samples/sec Loss 7.1769 LearningRate 0.0648 Epoch: 3 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:13,712-Speed 3387.58 samples/sec Loss 7.4067 LearningRate 0.0648 Epoch: 3 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:16,729-Speed 3394.04 samples/sec Loss 7.3266 LearningRate 0.0648 Epoch: 3 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:19,754-Speed 3386.28 samples/sec Loss 7.1780 LearningRate 0.0648 Epoch: 3 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:22,786-Speed 3377.87 samples/sec Loss 7.2092 LearningRate 0.0648 Epoch: 3 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:25,810-Speed 3387.67 samples/sec Loss 7.2251 LearningRate 0.0647 Epoch: 3 Global Step: 22220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:28,830-Speed 3391.75 samples/sec Loss 7.1543 LearningRate 0.0647 Epoch: 3 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:31,837-Speed 3406.52 samples/sec Loss 7.1482 LearningRate 0.0647 Epoch: 3 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:34,866-Speed 3380.52 samples/sec Loss 7.2625 LearningRate 0.0647 Epoch: 3 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:37,898-Speed 3378.53 samples/sec Loss 7.2666 LearningRate 0.0647 Epoch: 3 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:40,898-Speed 3413.50 samples/sec Loss 7.2566 LearningRate 0.0647 Epoch: 3 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:44:43,875-Speed 3441.39 samples/sec Loss 7.1614 LearningRate 0.0647 Epoch: 3 Global Step: 22280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:44:46,891-Speed 3395.08 samples/sec Loss 7.2449 LearningRate 0.0646 Epoch: 3 Global Step: 22290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:44:49,933-Speed 3367.35 samples/sec Loss 7.3266 LearningRate 0.0646 Epoch: 3 Global Step: 22300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:44:52,944-Speed 3401.73 samples/sec Loss 7.0772 LearningRate 0.0646 Epoch: 3 Global Step: 22310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:44:55,961-Speed 3395.43 samples/sec Loss 7.1493 LearningRate 0.0646 Epoch: 3 Global Step: 22320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:44:58,978-Speed 3394.49 samples/sec Loss 7.2628 LearningRate 0.0646 Epoch: 3 Global Step: 22330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:45:01,995-Speed 3394.81 samples/sec Loss 7.3449 LearningRate 0.0646 Epoch: 3 Global Step: 22340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:45:05,015-Speed 3391.66 samples/sec Loss 7.3359 LearningRate 0.0646 Epoch: 3 Global Step: 22350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:45:08,031-Speed 3395.83 samples/sec Loss 7.3629 LearningRate 0.0645 Epoch: 3 Global Step: 22360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:45:11,064-Speed 3376.51 samples/sec Loss 7.3341 LearningRate 0.0645 Epoch: 3 Global Step: 22370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 03:45:14,073-Speed 3404.36 samples/sec Loss 7.1860 LearningRate 0.0645 Epoch: 3 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:17,090-Speed 3395.29 samples/sec Loss 7.2047 LearningRate 0.0645 Epoch: 3 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:20,107-Speed 3394.98 samples/sec Loss 7.2736 LearningRate 0.0645 Epoch: 3 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:23,122-Speed 3397.07 samples/sec Loss 7.2243 LearningRate 0.0645 Epoch: 3 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:26,139-Speed 3394.93 samples/sec Loss 7.3620 LearningRate 0.0645 Epoch: 3 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:29,152-Speed 3399.00 samples/sec Loss 7.2114 LearningRate 0.0644 Epoch: 3 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:32,167-Speed 3397.76 samples/sec Loss 7.1278 LearningRate 0.0644 Epoch: 3 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:35,191-Speed 3386.28 samples/sec Loss 7.3203 LearningRate 0.0644 Epoch: 3 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:38,207-Speed 3396.26 samples/sec Loss 7.2422 LearningRate 0.0644 Epoch: 3 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:41,228-Speed 3390.28 samples/sec Loss 7.2338 LearningRate 0.0644 Epoch: 3 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:45:44,243-Speed 3396.95 samples/sec Loss 7.2629 LearningRate 0.0644 Epoch: 3 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:45:47,258-Speed 3397.98 samples/sec Loss 7.1305 LearningRate 0.0644 Epoch: 3 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:45:50,278-Speed 3391.65 samples/sec Loss 7.1476 LearningRate 0.0643 Epoch: 3 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:45:53,304-Speed 3384.55 samples/sec Loss 7.2821 LearningRate 0.0643 Epoch: 3 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:45:56,323-Speed 3392.11 samples/sec Loss 7.1256 LearningRate 0.0643 Epoch: 3 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:45:59,341-Speed 3394.22 samples/sec Loss 7.1364 LearningRate 0.0643 Epoch: 3 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:02,361-Speed 3390.85 samples/sec Loss 7.1376 LearningRate 0.0643 Epoch: 3 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:05,370-Speed 3403.90 samples/sec Loss 7.1777 LearningRate 0.0643 Epoch: 3 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:08,384-Speed 3398.10 samples/sec Loss 7.0889 LearningRate 0.0643 Epoch: 3 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:11,399-Speed 3397.61 samples/sec Loss 7.1909 LearningRate 0.0642 Epoch: 3 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:14,413-Speed 3398.58 samples/sec Loss 7.2363 LearningRate 0.0642 Epoch: 3 Global Step: 22580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:46:17,413-Speed 3413.65 samples/sec Loss 7.0739 LearningRate 0.0642 Epoch: 3 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:20,428-Speed 3398.04 samples/sec Loss 7.1825 LearningRate 0.0642 Epoch: 3 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:23,442-Speed 3398.26 samples/sec Loss 7.1457 LearningRate 0.0642 Epoch: 3 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:26,458-Speed 3395.01 samples/sec Loss 7.2651 LearningRate 0.0642 Epoch: 3 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:29,473-Speed 3397.60 samples/sec Loss 7.2603 LearningRate 0.0642 Epoch: 3 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:32,481-Speed 3405.59 samples/sec Loss 7.1472 LearningRate 0.0641 Epoch: 3 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:35,504-Speed 3388.80 samples/sec Loss 7.1489 LearningRate 0.0641 Epoch: 3 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:38,521-Speed 3394.73 samples/sec Loss 7.1999 LearningRate 0.0641 Epoch: 3 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:41,548-Speed 3383.48 samples/sec Loss 7.1950 LearningRate 0.0641 Epoch: 3 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:44,567-Speed 3393.76 samples/sec Loss 7.3588 LearningRate 0.0641 Epoch: 3 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:47,584-Speed 3393.93 samples/sec Loss 7.2577 LearningRate 0.0641 Epoch: 3 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:50,598-Speed 3398.22 samples/sec Loss 7.2089 LearningRate 0.0641 Epoch: 3 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:53,611-Speed 3400.12 samples/sec Loss 7.2031 LearningRate 0.0640 Epoch: 3 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:46:56,617-Speed 3407.34 samples/sec Loss 7.1487 LearningRate 0.0640 Epoch: 3 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:46:59,641-Speed 3385.88 samples/sec Loss 7.3585 LearningRate 0.0640 Epoch: 3 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:02,743-Speed 3302.36 samples/sec Loss 7.1574 LearningRate 0.0640 Epoch: 3 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:16,257-Speed 757.81 samples/sec Loss 6.7898 LearningRate 0.0640 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:19,275-Speed 3393.67 samples/sec Loss 6.6017 LearningRate 0.0640 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:22,331-Speed 3351.76 samples/sec Loss 6.5562 LearningRate 0.0640 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:25,510-Speed 3222.21 samples/sec Loss 6.5530 LearningRate 0.0639 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:47:28,528-Speed 3393.63 samples/sec Loss 6.5103 LearningRate 0.0639 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:31,558-Speed 3379.88 samples/sec Loss 6.4951 LearningRate 0.0639 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:36,065-Speed 2272.61 samples/sec Loss 6.6043 LearningRate 0.0639 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:39,096-Speed 3378.60 samples/sec Loss 6.6183 LearningRate 0.0639 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:42,119-Speed 3388.23 samples/sec Loss 6.5144 LearningRate 0.0639 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:45,139-Speed 3391.88 samples/sec Loss 6.6169 LearningRate 0.0639 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:48,151-Speed 3400.53 samples/sec Loss 6.7527 LearningRate 0.0639 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:51,183-Speed 3378.83 samples/sec Loss 6.6572 LearningRate 0.0638 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:54,198-Speed 3397.28 samples/sec Loss 6.6876 LearningRate 0.0638 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:47:57,213-Speed 3396.58 samples/sec Loss 6.6435 LearningRate 0.0638 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:00,252-Speed 3370.49 samples/sec Loss 6.8037 LearningRate 0.0638 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:03,286-Speed 3376.30 samples/sec Loss 6.9326 LearningRate 0.0638 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:06,310-Speed 3386.12 samples/sec Loss 6.6491 LearningRate 0.0638 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:09,360-Speed 3358.33 samples/sec Loss 6.7099 LearningRate 0.0638 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:12,415-Speed 3352.65 samples/sec Loss 6.6289 LearningRate 0.0637 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:15,473-Speed 3349.08 samples/sec Loss 6.7882 LearningRate 0.0637 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:18,564-Speed 3313.80 samples/sec Loss 6.8623 LearningRate 0.0637 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:21,597-Speed 3377.42 samples/sec Loss 6.4916 LearningRate 0.0637 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:24,623-Speed 3384.47 samples/sec Loss 6.8169 LearningRate 0.0637 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:27,654-Speed 3379.94 samples/sec Loss 6.7553 LearningRate 0.0637 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:30,686-Speed 3377.18 samples/sec Loss 6.8742 LearningRate 0.0637 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:33,734-Speed 3360.97 samples/sec Loss 6.7060 LearningRate 0.0636 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:36,788-Speed 3353.58 samples/sec Loss 6.7027 LearningRate 0.0636 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:39,884-Speed 3307.94 samples/sec Loss 6.6878 LearningRate 0.0636 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:42,917-Speed 3377.66 samples/sec Loss 6.7254 LearningRate 0.0636 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:45,936-Speed 3393.00 samples/sec Loss 6.6523 LearningRate 0.0636 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:48:48,972-Speed 3373.14 samples/sec Loss 6.7925 LearningRate 0.0636 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:51,997-Speed 3385.74 samples/sec Loss 6.8050 LearningRate 0.0636 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:55,015-Speed 3393.88 samples/sec Loss 6.8258 LearningRate 0.0635 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:48:58,039-Speed 3386.53 samples/sec Loss 6.8302 LearningRate 0.0635 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:01,062-Speed 3388.74 samples/sec Loss 6.9066 LearningRate 0.0635 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:04,091-Speed 3381.63 samples/sec Loss 6.7539 LearningRate 0.0635 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:07,115-Speed 3386.55 samples/sec Loss 6.8540 LearningRate 0.0635 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:10,134-Speed 3393.18 samples/sec Loss 6.7214 LearningRate 0.0635 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:13,151-Speed 3395.14 samples/sec Loss 6.8733 LearningRate 0.0635 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:16,169-Speed 3393.29 samples/sec Loss 6.8242 LearningRate 0.0634 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:19,172-Speed 3411.67 samples/sec Loss 6.8441 LearningRate 0.0634 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:22,215-Speed 3365.75 samples/sec Loss 7.0100 LearningRate 0.0634 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:25,240-Speed 3385.20 samples/sec Loss 7.0237 LearningRate 0.0634 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:28,291-Speed 3356.71 samples/sec Loss 6.8283 LearningRate 0.0634 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:31,314-Speed 3388.94 samples/sec Loss 6.8590 LearningRate 0.0634 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:34,324-Speed 3402.79 samples/sec Loss 6.8233 LearningRate 0.0634 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:37,341-Speed 3394.82 samples/sec Loss 6.9689 LearningRate 0.0633 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:49:40,346-Speed 3408.15 samples/sec Loss 6.8694 LearningRate 0.0633 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:43,373-Speed 3384.31 samples/sec Loss 7.0409 LearningRate 0.0633 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:46,404-Speed 3378.45 samples/sec Loss 6.9650 LearningRate 0.0633 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:49,427-Speed 3388.42 samples/sec Loss 6.9681 LearningRate 0.0633 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:52,446-Speed 3392.91 samples/sec Loss 6.8105 LearningRate 0.0633 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:55,462-Speed 3395.75 samples/sec Loss 7.1044 LearningRate 0.0633 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:49:58,482-Speed 3391.71 samples/sec Loss 6.9878 LearningRate 0.0632 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:01,490-Speed 3405.55 samples/sec Loss 6.8987 LearningRate 0.0632 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:04,503-Speed 3398.76 samples/sec Loss 7.0096 LearningRate 0.0632 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:07,528-Speed 3386.28 samples/sec Loss 6.7895 LearningRate 0.0632 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:10,566-Speed 3371.50 samples/sec Loss 6.9209 LearningRate 0.0632 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:13,673-Speed 3296.07 samples/sec Loss 6.7811 LearningRate 0.0632 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:16,712-Speed 3370.34 samples/sec Loss 7.0568 LearningRate 0.0632 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:19,747-Speed 3388.26 samples/sec Loss 6.8411 LearningRate 0.0632 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:22,768-Speed 3390.16 samples/sec Loss 6.9517 LearningRate 0.0631 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:25,803-Speed 3374.93 samples/sec Loss 7.0548 LearningRate 0.0631 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:28,850-Speed 3361.43 samples/sec Loss 6.9913 LearningRate 0.0631 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:31,880-Speed 3380.59 samples/sec Loss 6.8433 LearningRate 0.0631 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:34,893-Speed 3399.25 samples/sec Loss 6.8868 LearningRate 0.0631 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:37,975-Speed 3323.13 samples/sec Loss 6.9364 LearningRate 0.0631 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:40,997-Speed 3389.00 samples/sec Loss 6.9801 LearningRate 0.0631 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:50:44,037-Speed 3369.75 samples/sec Loss 7.1047 LearningRate 0.0630 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:50:47,060-Speed 3387.54 samples/sec Loss 6.9310 LearningRate 0.0630 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:50,090-Speed 3380.47 samples/sec Loss 7.0243 LearningRate 0.0630 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:50:53,109-Speed 3393.22 samples/sec Loss 6.9547 LearningRate 0.0630 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:56,142-Speed 3376.68 samples/sec Loss 6.9852 LearningRate 0.0630 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:50:59,166-Speed 3387.15 samples/sec Loss 6.9156 LearningRate 0.0630 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:02,217-Speed 3357.53 samples/sec Loss 6.7963 LearningRate 0.0630 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:05,245-Speed 3382.44 samples/sec Loss 6.9727 LearningRate 0.0629 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:08,272-Speed 3382.81 samples/sec Loss 7.0155 LearningRate 0.0629 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:11,305-Speed 3377.29 samples/sec Loss 7.0667 LearningRate 0.0629 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:14,334-Speed 3381.24 samples/sec Loss 7.0227 LearningRate 0.0629 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:17,365-Speed 3379.72 samples/sec Loss 6.9159 LearningRate 0.0629 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:20,389-Speed 3387.22 samples/sec Loss 6.8723 LearningRate 0.0629 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:23,468-Speed 3326.17 samples/sec Loss 6.9424 LearningRate 0.0629 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:51:26,494-Speed 3385.17 samples/sec Loss 7.0527 LearningRate 0.0628 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:51:29,522-Speed 3381.92 samples/sec Loss 6.9586 LearningRate 0.0628 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:51:32,528-Speed 3408.11 samples/sec Loss 6.9985 LearningRate 0.0628 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:35,564-Speed 3373.58 samples/sec Loss 7.0018 LearningRate 0.0628 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:38,591-Speed 3383.95 samples/sec Loss 6.9442 LearningRate 0.0628 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:41,616-Speed 3384.94 samples/sec Loss 6.7636 LearningRate 0.0628 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:44,648-Speed 3378.49 samples/sec Loss 6.9398 LearningRate 0.0628 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:47,672-Speed 3387.50 samples/sec Loss 7.0540 LearningRate 0.0627 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:50,707-Speed 3375.12 samples/sec Loss 6.8654 LearningRate 0.0627 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:53,729-Speed 3389.15 samples/sec Loss 7.0181 LearningRate 0.0627 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:56,755-Speed 3384.59 samples/sec Loss 7.0439 LearningRate 0.0627 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:51:59,775-Speed 3391.20 samples/sec Loss 6.9708 LearningRate 0.0627 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:02,801-Speed 3385.53 samples/sec Loss 7.0013 LearningRate 0.0627 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:05,806-Speed 3407.35 samples/sec Loss 7.0808 LearningRate 0.0627 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:08,829-Speed 3388.35 samples/sec Loss 6.8819 LearningRate 0.0626 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:11,918-Speed 3315.30 samples/sec Loss 6.9598 LearningRate 0.0626 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:14,990-Speed 3335.14 samples/sec Loss 6.9753 LearningRate 0.0626 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:18,014-Speed 3387.34 samples/sec Loss 7.0131 LearningRate 0.0626 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:21,041-Speed 3383.22 samples/sec Loss 6.8682 LearningRate 0.0626 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:24,063-Speed 3389.36 samples/sec Loss 6.9526 LearningRate 0.0626 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:27,084-Speed 3389.72 samples/sec Loss 6.9935 LearningRate 0.0626 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:30,120-Speed 3373.89 samples/sec Loss 6.9128 LearningRate 0.0626 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:33,162-Speed 3367.25 samples/sec Loss 6.8785 LearningRate 0.0625 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:52:36,265-Speed 3300.94 samples/sec Loss 6.9934 LearningRate 0.0625 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:39,367-Speed 3301.75 samples/sec Loss 6.9463 LearningRate 0.0625 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:42,397-Speed 3380.07 samples/sec Loss 6.9298 LearningRate 0.0625 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:45,426-Speed 3381.80 samples/sec Loss 7.0811 LearningRate 0.0625 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:48,454-Speed 3382.82 samples/sec Loss 6.8784 LearningRate 0.0625 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:51,479-Speed 3385.71 samples/sec Loss 7.0261 LearningRate 0.0625 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:54,503-Speed 3387.43 samples/sec Loss 7.1284 LearningRate 0.0624 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:52:57,516-Speed 3398.66 samples/sec Loss 6.9937 LearningRate 0.0624 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:00,542-Speed 3385.30 samples/sec Loss 7.1572 LearningRate 0.0624 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:03,566-Speed 3386.25 samples/sec Loss 6.9075 LearningRate 0.0624 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:06,602-Speed 3373.99 samples/sec Loss 7.0283 LearningRate 0.0624 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:09,632-Speed 3379.81 samples/sec Loss 7.0193 LearningRate 0.0624 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:12,659-Speed 3384.32 samples/sec Loss 6.9727 LearningRate 0.0624 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:15,684-Speed 3385.96 samples/sec Loss 6.8192 LearningRate 0.0623 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:18,726-Speed 3366.59 samples/sec Loss 7.1133 LearningRate 0.0623 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:21,755-Speed 3381.47 samples/sec Loss 6.9897 LearningRate 0.0623 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:24,781-Speed 3384.65 samples/sec Loss 6.9858 LearningRate 0.0623 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:53:27,817-Speed 3373.69 samples/sec Loss 7.1994 LearningRate 0.0623 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:53:30,840-Speed 3388.88 samples/sec Loss 7.0030 LearningRate 0.0623 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:53:33,874-Speed 3375.43 samples/sec Loss 6.9236 LearningRate 0.0623 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:53:36,905-Speed 3379.14 samples/sec Loss 6.9334 LearningRate 0.0622 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:54:20,188-[lfw][24000]XNorm: 21.892247 Training: 2022-04-27 03:54:20,189-[lfw][24000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-27 03:54:20,189-[lfw][24000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:55:10,431-[cfp_fp][24000]XNorm: 19.193311 Training: 2022-04-27 03:55:10,431-[cfp_fp][24000]Accuracy-Flip: 0.93714+-0.01194 Training: 2022-04-27 03:55:10,432-[cfp_fp][24000]Accuracy-Highest: 0.94257 Training: 2022-04-27 03:55:53,781-[agedb_30][24000]XNorm: 21.897793 Training: 2022-04-27 03:55:53,782-[agedb_30][24000]Accuracy-Flip: 0.96750+-0.00739 Training: 2022-04-27 03:55:53,782-[agedb_30][24000]Accuracy-Highest: 0.97167 Training: 2022-04-27 03:55:56,800-Speed 73.20 samples/sec Loss 7.1318 LearningRate 0.0622 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:55:59,810-Speed 3403.66 samples/sec Loss 7.0214 LearningRate 0.0622 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:02,822-Speed 3401.04 samples/sec Loss 6.9673 LearningRate 0.0622 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:05,835-Speed 3398.58 samples/sec Loss 7.0037 LearningRate 0.0622 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:08,876-Speed 3368.57 samples/sec Loss 6.8831 LearningRate 0.0622 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:11,885-Speed 3403.34 samples/sec Loss 7.0923 LearningRate 0.0622 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:14,902-Speed 3395.12 samples/sec Loss 6.8972 LearningRate 0.0621 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:56:17,899-Speed 3417.05 samples/sec Loss 6.8309 LearningRate 0.0621 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:20,899-Speed 3413.87 samples/sec Loss 6.9708 LearningRate 0.0621 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:23,918-Speed 3393.22 samples/sec Loss 6.9136 LearningRate 0.0621 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:26,936-Speed 3394.40 samples/sec Loss 6.9582 LearningRate 0.0621 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:29,961-Speed 3385.41 samples/sec Loss 7.0657 LearningRate 0.0621 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:32,980-Speed 3393.31 samples/sec Loss 6.9219 LearningRate 0.0621 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:35,997-Speed 3395.13 samples/sec Loss 6.9136 LearningRate 0.0621 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:39,012-Speed 3396.18 samples/sec Loss 6.9942 LearningRate 0.0620 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:42,031-Speed 3393.10 samples/sec Loss 7.0611 LearningRate 0.0620 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:45,047-Speed 3395.75 samples/sec Loss 6.9189 LearningRate 0.0620 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:48,139-Speed 3313.04 samples/sec Loss 7.0116 LearningRate 0.0620 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:56:51,162-Speed 3387.45 samples/sec Loss 7.0538 LearningRate 0.0620 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:54,177-Speed 3397.67 samples/sec Loss 6.8926 LearningRate 0.0620 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:56:57,228-Speed 3357.08 samples/sec Loss 7.0350 LearningRate 0.0620 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:00,264-Speed 3373.82 samples/sec Loss 6.9672 LearningRate 0.0619 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:03,284-Speed 3390.90 samples/sec Loss 6.8827 LearningRate 0.0619 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:06,302-Speed 3394.59 samples/sec Loss 6.9614 LearningRate 0.0619 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:09,319-Speed 3394.70 samples/sec Loss 6.9703 LearningRate 0.0619 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:12,339-Speed 3391.26 samples/sec Loss 7.0011 LearningRate 0.0619 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:15,356-Speed 3394.75 samples/sec Loss 7.1137 LearningRate 0.0619 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:18,383-Speed 3383.94 samples/sec Loss 7.0750 LearningRate 0.0619 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:21,394-Speed 3401.53 samples/sec Loss 7.1105 LearningRate 0.0618 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:57:24,408-Speed 3398.24 samples/sec Loss 7.0896 LearningRate 0.0618 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:57:27,427-Speed 3392.28 samples/sec Loss 7.2381 LearningRate 0.0618 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 03:57:30,417-Speed 3425.85 samples/sec Loss 6.8338 LearningRate 0.0618 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:33,431-Speed 3398.63 samples/sec Loss 6.9893 LearningRate 0.0618 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:36,440-Speed 3404.10 samples/sec Loss 7.0407 LearningRate 0.0618 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:57:39,438-Speed 3416.47 samples/sec Loss 7.0127 LearningRate 0.0618 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:42,452-Speed 3399.18 samples/sec Loss 6.8160 LearningRate 0.0617 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:45,469-Speed 3394.49 samples/sec Loss 6.9770 LearningRate 0.0617 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:48,482-Speed 3400.40 samples/sec Loss 7.1098 LearningRate 0.0617 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:51,491-Speed 3403.22 samples/sec Loss 6.9342 LearningRate 0.0617 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:54,515-Speed 3386.80 samples/sec Loss 6.8485 LearningRate 0.0617 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:57:57,528-Speed 3400.14 samples/sec Loss 6.8527 LearningRate 0.0617 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:00,577-Speed 3358.88 samples/sec Loss 7.0268 LearningRate 0.0617 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:03,586-Speed 3403.68 samples/sec Loss 6.9613 LearningRate 0.0616 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:06,597-Speed 3402.56 samples/sec Loss 6.8921 LearningRate 0.0616 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:09,614-Speed 3394.12 samples/sec Loss 7.0286 LearningRate 0.0616 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:12,647-Speed 3377.66 samples/sec Loss 7.2039 LearningRate 0.0616 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:15,662-Speed 3396.36 samples/sec Loss 6.9931 LearningRate 0.0616 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:18,678-Speed 3396.60 samples/sec Loss 6.8814 LearningRate 0.0616 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:21,693-Speed 3397.59 samples/sec Loss 6.9578 LearningRate 0.0616 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:24,686-Speed 3421.81 samples/sec Loss 7.0226 LearningRate 0.0616 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:27,699-Speed 3398.83 samples/sec Loss 6.9571 LearningRate 0.0615 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:30,713-Speed 3398.22 samples/sec Loss 6.9832 LearningRate 0.0615 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:33,731-Speed 3393.71 samples/sec Loss 6.9816 LearningRate 0.0615 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:36,955-Speed 3177.92 samples/sec Loss 6.8685 LearningRate 0.0615 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:39,989-Speed 3375.26 samples/sec Loss 7.1216 LearningRate 0.0615 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:43,010-Speed 3391.00 samples/sec Loss 7.0109 LearningRate 0.0615 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:46,022-Speed 3400.89 samples/sec Loss 6.9841 LearningRate 0.0615 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:49,052-Speed 3380.11 samples/sec Loss 6.9929 LearningRate 0.0614 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:52,070-Speed 3394.22 samples/sec Loss 6.9676 LearningRate 0.0614 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:58:55,088-Speed 3393.46 samples/sec Loss 6.9112 LearningRate 0.0614 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:58:58,153-Speed 3341.61 samples/sec Loss 6.8743 LearningRate 0.0614 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:01,190-Speed 3373.10 samples/sec Loss 7.0400 LearningRate 0.0614 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:04,407-Speed 3183.28 samples/sec Loss 6.8661 LearningRate 0.0614 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:07,420-Speed 3400.29 samples/sec Loss 6.9797 LearningRate 0.0614 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:10,433-Speed 3399.05 samples/sec Loss 6.9435 LearningRate 0.0613 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:13,427-Speed 3421.21 samples/sec Loss 6.9783 LearningRate 0.0613 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:16,441-Speed 3398.12 samples/sec Loss 7.0019 LearningRate 0.0613 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:19,458-Speed 3395.57 samples/sec Loss 7.0640 LearningRate 0.0613 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:22,485-Speed 3383.48 samples/sec Loss 6.9274 LearningRate 0.0613 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:25,529-Speed 3364.65 samples/sec Loss 7.0313 LearningRate 0.0613 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:28,569-Speed 3368.09 samples/sec Loss 6.8108 LearningRate 0.0613 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:31,589-Speed 3392.74 samples/sec Loss 6.7941 LearningRate 0.0613 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:34,608-Speed 3392.86 samples/sec Loss 6.9716 LearningRate 0.0612 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:37,624-Speed 3396.47 samples/sec Loss 7.0566 LearningRate 0.0612 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:40,643-Speed 3392.64 samples/sec Loss 7.0487 LearningRate 0.0612 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 03:59:43,667-Speed 3387.18 samples/sec Loss 6.9587 LearningRate 0.0612 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:46,722-Speed 3352.22 samples/sec Loss 6.8824 LearningRate 0.0612 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:49,738-Speed 3396.70 samples/sec Loss 6.9587 LearningRate 0.0612 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:52,752-Speed 3397.36 samples/sec Loss 7.0160 LearningRate 0.0612 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:55,771-Speed 3393.44 samples/sec Loss 6.8691 LearningRate 0.0611 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 03:59:58,786-Speed 3396.08 samples/sec Loss 7.0207 LearningRate 0.0611 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:01,803-Speed 3395.59 samples/sec Loss 6.9553 LearningRate 0.0611 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:04,817-Speed 3397.96 samples/sec Loss 6.9049 LearningRate 0.0611 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:07,832-Speed 3397.37 samples/sec Loss 6.7832 LearningRate 0.0611 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:10,860-Speed 3382.67 samples/sec Loss 6.9198 LearningRate 0.0611 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:13,909-Speed 3359.10 samples/sec Loss 6.9813 LearningRate 0.0611 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:16,937-Speed 3382.53 samples/sec Loss 6.9101 LearningRate 0.0610 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:19,957-Speed 3392.07 samples/sec Loss 7.0022 LearningRate 0.0610 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:22,973-Speed 3395.16 samples/sec Loss 6.8952 LearningRate 0.0610 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:25,998-Speed 3386.46 samples/sec Loss 6.8327 LearningRate 0.0610 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:29,013-Speed 3396.93 samples/sec Loss 6.8397 LearningRate 0.0610 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:32,054-Speed 3367.99 samples/sec Loss 7.0024 LearningRate 0.0610 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:35,097-Speed 3366.14 samples/sec Loss 6.9247 LearningRate 0.0610 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:38,166-Speed 3337.54 samples/sec Loss 6.8048 LearningRate 0.0609 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:00:41,168-Speed 3412.27 samples/sec Loss 6.7550 LearningRate 0.0609 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:44,189-Speed 3389.89 samples/sec Loss 6.8939 LearningRate 0.0609 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:47,248-Speed 3351.51 samples/sec Loss 6.9368 LearningRate 0.0609 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:50,267-Speed 3392.90 samples/sec Loss 6.9149 LearningRate 0.0609 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:53,290-Speed 3388.37 samples/sec Loss 7.0899 LearningRate 0.0609 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:56,308-Speed 3392.56 samples/sec Loss 6.8495 LearningRate 0.0609 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:00:59,407-Speed 3305.30 samples/sec Loss 6.9051 LearningRate 0.0609 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:01:02,429-Speed 3389.44 samples/sec Loss 6.9963 LearningRate 0.0608 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:01:05,475-Speed 3363.11 samples/sec Loss 6.9077 LearningRate 0.0608 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:01:08,494-Speed 3392.50 samples/sec Loss 6.9277 LearningRate 0.0608 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:01:11,525-Speed 3380.13 samples/sec Loss 6.9139 LearningRate 0.0608 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:14,544-Speed 3392.35 samples/sec Loss 6.8926 LearningRate 0.0608 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:17,558-Speed 3398.06 samples/sec Loss 6.7052 LearningRate 0.0608 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:20,574-Speed 3396.40 samples/sec Loss 6.8686 LearningRate 0.0608 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:23,603-Speed 3381.19 samples/sec Loss 6.8747 LearningRate 0.0607 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:26,644-Speed 3367.12 samples/sec Loss 6.8772 LearningRate 0.0607 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:29,668-Speed 3387.09 samples/sec Loss 6.8912 LearningRate 0.0607 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:32,689-Speed 3390.87 samples/sec Loss 6.8752 LearningRate 0.0607 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:35,704-Speed 3397.61 samples/sec Loss 6.9194 LearningRate 0.0607 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:38,726-Speed 3389.39 samples/sec Loss 6.7855 LearningRate 0.0607 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:41,750-Speed 3387.40 samples/sec Loss 6.9122 LearningRate 0.0607 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:01:44,752-Speed 3411.62 samples/sec Loss 7.0176 LearningRate 0.0606 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:47,790-Speed 3371.56 samples/sec Loss 6.7969 LearningRate 0.0606 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:50,808-Speed 3393.66 samples/sec Loss 6.9564 LearningRate 0.0606 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:53,838-Speed 3380.48 samples/sec Loss 6.9789 LearningRate 0.0606 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:56,864-Speed 3383.79 samples/sec Loss 6.7670 LearningRate 0.0606 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:01:59,941-Speed 3329.39 samples/sec Loss 6.8831 LearningRate 0.0606 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:02,965-Speed 3387.35 samples/sec Loss 6.8846 LearningRate 0.0606 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:05,994-Speed 3380.54 samples/sec Loss 6.9633 LearningRate 0.0606 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:09,019-Speed 3386.45 samples/sec Loss 6.8487 LearningRate 0.0605 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:12,044-Speed 3385.56 samples/sec Loss 6.8384 LearningRate 0.0605 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:15,078-Speed 3375.81 samples/sec Loss 6.8887 LearningRate 0.0605 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:18,151-Speed 3334.08 samples/sec Loss 7.0122 LearningRate 0.0605 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:21,177-Speed 3384.33 samples/sec Loss 7.0200 LearningRate 0.0605 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:24,206-Speed 3381.02 samples/sec Loss 6.8980 LearningRate 0.0605 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:27,248-Speed 3367.83 samples/sec Loss 6.9379 LearningRate 0.0605 Epoch: 4 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:30,281-Speed 3377.00 samples/sec Loss 6.8976 LearningRate 0.0604 Epoch: 4 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:33,381-Speed 3303.03 samples/sec Loss 6.8942 LearningRate 0.0604 Epoch: 4 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:02:36,409-Speed 3382.97 samples/sec Loss 6.8847 LearningRate 0.0604 Epoch: 4 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:39,432-Speed 3388.62 samples/sec Loss 6.7174 LearningRate 0.0604 Epoch: 4 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:42,472-Speed 3369.04 samples/sec Loss 6.9310 LearningRate 0.0604 Epoch: 4 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:45,511-Speed 3370.19 samples/sec Loss 6.8429 LearningRate 0.0604 Epoch: 4 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:48,541-Speed 3380.01 samples/sec Loss 6.8986 LearningRate 0.0604 Epoch: 4 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:51,566-Speed 3385.94 samples/sec Loss 6.7821 LearningRate 0.0603 Epoch: 4 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:54,591-Speed 3386.22 samples/sec Loss 6.9524 LearningRate 0.0603 Epoch: 4 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:02:57,615-Speed 3386.80 samples/sec Loss 6.7681 LearningRate 0.0603 Epoch: 4 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:00,641-Speed 3384.67 samples/sec Loss 6.9054 LearningRate 0.0603 Epoch: 4 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:03,741-Speed 3304.45 samples/sec Loss 6.8082 LearningRate 0.0603 Epoch: 4 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:06,764-Speed 3388.26 samples/sec Loss 6.9432 LearningRate 0.0603 Epoch: 4 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:09,789-Speed 3385.26 samples/sec Loss 6.8705 LearningRate 0.0603 Epoch: 4 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:12,832-Speed 3366.52 samples/sec Loss 6.8071 LearningRate 0.0602 Epoch: 4 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:15,883-Speed 3356.85 samples/sec Loss 7.0616 LearningRate 0.0602 Epoch: 4 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:18,922-Speed 3370.02 samples/sec Loss 6.9063 LearningRate 0.0602 Epoch: 4 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:21,949-Speed 3384.19 samples/sec Loss 6.9006 LearningRate 0.0602 Epoch: 4 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:24,979-Speed 3380.14 samples/sec Loss 6.8240 LearningRate 0.0602 Epoch: 4 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:03:28,037-Speed 3349.07 samples/sec Loss 6.7634 LearningRate 0.0602 Epoch: 4 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:31,057-Speed 3391.54 samples/sec Loss 6.9503 LearningRate 0.0602 Epoch: 4 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:34,083-Speed 3384.66 samples/sec Loss 6.6305 LearningRate 0.0602 Epoch: 4 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:37,108-Speed 3385.89 samples/sec Loss 6.8707 LearningRate 0.0601 Epoch: 4 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:40,134-Speed 3385.36 samples/sec Loss 7.0121 LearningRate 0.0601 Epoch: 4 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:43,179-Speed 3363.11 samples/sec Loss 6.6298 LearningRate 0.0601 Epoch: 4 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:46,206-Speed 3384.43 samples/sec Loss 6.8233 LearningRate 0.0601 Epoch: 4 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:49,231-Speed 3385.95 samples/sec Loss 6.7964 LearningRate 0.0601 Epoch: 4 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:52,257-Speed 3385.65 samples/sec Loss 6.6841 LearningRate 0.0601 Epoch: 4 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:55,282-Speed 3385.11 samples/sec Loss 7.0152 LearningRate 0.0601 Epoch: 4 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:03:58,320-Speed 3371.31 samples/sec Loss 6.7267 LearningRate 0.0600 Epoch: 4 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:01,346-Speed 3385.32 samples/sec Loss 6.8325 LearningRate 0.0600 Epoch: 4 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:04,376-Speed 3380.46 samples/sec Loss 6.8505 LearningRate 0.0600 Epoch: 4 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:07,403-Speed 3383.76 samples/sec Loss 6.9037 LearningRate 0.0600 Epoch: 4 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:10,441-Speed 3371.22 samples/sec Loss 6.8035 LearningRate 0.0600 Epoch: 4 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:13,466-Speed 3386.09 samples/sec Loss 6.7800 LearningRate 0.0600 Epoch: 4 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:16,518-Speed 3355.46 samples/sec Loss 6.9495 LearningRate 0.0600 Epoch: 4 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:19,541-Speed 3388.18 samples/sec Loss 6.8498 LearningRate 0.0599 Epoch: 4 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:22,565-Speed 3387.31 samples/sec Loss 6.8386 LearningRate 0.0599 Epoch: 4 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:25,588-Speed 3387.83 samples/sec Loss 6.8264 LearningRate 0.0599 Epoch: 4 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:28,616-Speed 3382.61 samples/sec Loss 6.9016 LearningRate 0.0599 Epoch: 4 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:04:31,623-Speed 3406.44 samples/sec Loss 6.7773 LearningRate 0.0599 Epoch: 4 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:34,647-Speed 3387.36 samples/sec Loss 6.7116 LearningRate 0.0599 Epoch: 4 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:37,727-Speed 3324.97 samples/sec Loss 6.7152 LearningRate 0.0599 Epoch: 4 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:40,799-Speed 3334.50 samples/sec Loss 6.8629 LearningRate 0.0599 Epoch: 4 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:43,872-Speed 3332.95 samples/sec Loss 6.9347 LearningRate 0.0598 Epoch: 4 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:46,961-Speed 3316.66 samples/sec Loss 6.8273 LearningRate 0.0598 Epoch: 4 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:50,011-Speed 3357.40 samples/sec Loss 6.8011 LearningRate 0.0598 Epoch: 4 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:53,124-Speed 3290.89 samples/sec Loss 6.8964 LearningRate 0.0598 Epoch: 4 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:56,195-Speed 3335.05 samples/sec Loss 6.6366 LearningRate 0.0598 Epoch: 4 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:04:59,243-Speed 3359.56 samples/sec Loss 6.7981 LearningRate 0.0598 Epoch: 4 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:02,290-Speed 3362.13 samples/sec Loss 6.9820 LearningRate 0.0598 Epoch: 4 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:05:05,298-Speed 3404.52 samples/sec Loss 6.8463 LearningRate 0.0597 Epoch: 4 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:08,321-Speed 3388.24 samples/sec Loss 6.8264 LearningRate 0.0597 Epoch: 4 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:11,358-Speed 3373.34 samples/sec Loss 6.8848 LearningRate 0.0597 Epoch: 4 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:14,386-Speed 3382.04 samples/sec Loss 6.7269 LearningRate 0.0597 Epoch: 4 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:17,408-Speed 3389.14 samples/sec Loss 6.9116 LearningRate 0.0597 Epoch: 4 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:20,435-Speed 3383.58 samples/sec Loss 6.8238 LearningRate 0.0597 Epoch: 4 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:23,459-Speed 3388.05 samples/sec Loss 6.9882 LearningRate 0.0597 Epoch: 4 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:26,590-Speed 3271.50 samples/sec Loss 6.9246 LearningRate 0.0597 Epoch: 4 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:29,727-Speed 3264.43 samples/sec Loss 6.7857 LearningRate 0.0596 Epoch: 4 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:32,753-Speed 3385.47 samples/sec Loss 6.7974 LearningRate 0.0596 Epoch: 4 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:35,758-Speed 3408.90 samples/sec Loss 6.8032 LearningRate 0.0596 Epoch: 4 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:38,786-Speed 3381.57 samples/sec Loss 6.8789 LearningRate 0.0596 Epoch: 4 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:41,808-Speed 3389.46 samples/sec Loss 7.0051 LearningRate 0.0596 Epoch: 4 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:44,834-Speed 3384.55 samples/sec Loss 6.7266 LearningRate 0.0596 Epoch: 4 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:47,867-Speed 3377.75 samples/sec Loss 6.9106 LearningRate 0.0596 Epoch: 4 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:50,890-Speed 3388.14 samples/sec Loss 6.9267 LearningRate 0.0595 Epoch: 4 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:53,916-Speed 3384.05 samples/sec Loss 6.7661 LearningRate 0.0595 Epoch: 4 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:56,937-Speed 3391.28 samples/sec Loss 6.8812 LearningRate 0.0595 Epoch: 4 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:05:59,964-Speed 3383.44 samples/sec Loss 6.8243 LearningRate 0.0595 Epoch: 4 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:06:43,508-[lfw][26000]XNorm: 23.087392 Training: 2022-04-27 04:06:43,509-[lfw][26000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-27 04:06:43,509-[lfw][26000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:07:34,265-[cfp_fp][26000]XNorm: 20.063840 Training: 2022-04-27 04:07:34,266-[cfp_fp][26000]Accuracy-Flip: 0.94443+-0.01470 Training: 2022-04-27 04:07:34,266-[cfp_fp][26000]Accuracy-Highest: 0.94443 Training: 2022-04-27 04:08:17,832-[agedb_30][26000]XNorm: 22.752791 Training: 2022-04-27 04:08:17,832-[agedb_30][26000]Accuracy-Flip: 0.96917+-0.00955 Training: 2022-04-27 04:08:17,833-[agedb_30][26000]Accuracy-Highest: 0.97167 Training: 2022-04-27 04:08:20,845-Speed 72.69 samples/sec Loss 6.8609 LearningRate 0.0595 Epoch: 4 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:23,851-Speed 3408.32 samples/sec Loss 6.7603 LearningRate 0.0595 Epoch: 4 Global Step: 26020 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:08:26,842-Speed 3424.34 samples/sec Loss 6.8454 LearningRate 0.0595 Epoch: 4 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:29,850-Speed 3404.92 samples/sec Loss 6.9258 LearningRate 0.0594 Epoch: 4 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:32,857-Speed 3405.43 samples/sec Loss 6.7818 LearningRate 0.0594 Epoch: 4 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:36,100-Speed 3158.13 samples/sec Loss 6.6342 LearningRate 0.0594 Epoch: 4 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:39,136-Speed 3374.04 samples/sec Loss 6.5885 LearningRate 0.0594 Epoch: 4 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:42,152-Speed 3396.66 samples/sec Loss 6.8445 LearningRate 0.0594 Epoch: 4 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:45,168-Speed 3395.45 samples/sec Loss 6.8144 LearningRate 0.0594 Epoch: 4 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:48,186-Speed 3394.17 samples/sec Loss 6.7033 LearningRate 0.0594 Epoch: 4 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:51,226-Speed 3369.24 samples/sec Loss 6.8972 LearningRate 0.0594 Epoch: 4 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:54,255-Speed 3381.91 samples/sec Loss 6.8564 LearningRate 0.0593 Epoch: 4 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:08:57,279-Speed 3387.22 samples/sec Loss 6.9184 LearningRate 0.0593 Epoch: 4 Global Step: 26130 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:09:00,356-Speed 3329.01 samples/sec Loss 6.7975 LearningRate 0.0593 Epoch: 4 Global Step: 26140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:03,577-Speed 3179.70 samples/sec Loss 6.8512 LearningRate 0.0593 Epoch: 4 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:06,611-Speed 3375.85 samples/sec Loss 6.8672 LearningRate 0.0593 Epoch: 4 Global Step: 26160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:09,639-Speed 3381.98 samples/sec Loss 6.7486 LearningRate 0.0593 Epoch: 4 Global Step: 26170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:12,678-Speed 3370.78 samples/sec Loss 6.8675 LearningRate 0.0593 Epoch: 4 Global Step: 26180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:15,709-Speed 3379.79 samples/sec Loss 6.8344 LearningRate 0.0592 Epoch: 4 Global Step: 26190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:18,731-Speed 3389.06 samples/sec Loss 6.7177 LearningRate 0.0592 Epoch: 4 Global Step: 26200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:21,753-Speed 3389.05 samples/sec Loss 6.7505 LearningRate 0.0592 Epoch: 4 Global Step: 26210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:24,793-Speed 3369.59 samples/sec Loss 6.9399 LearningRate 0.0592 Epoch: 4 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:27,826-Speed 3377.18 samples/sec Loss 6.7522 LearningRate 0.0592 Epoch: 4 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:30,852-Speed 3384.71 samples/sec Loss 6.7169 LearningRate 0.0592 Epoch: 4 Global Step: 26240 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:09:33,851-Speed 3415.18 samples/sec Loss 6.7648 LearningRate 0.0592 Epoch: 4 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:36,886-Speed 3375.02 samples/sec Loss 6.8394 LearningRate 0.0591 Epoch: 4 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:39,904-Speed 3393.28 samples/sec Loss 6.7351 LearningRate 0.0591 Epoch: 4 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:42,916-Speed 3400.73 samples/sec Loss 6.7181 LearningRate 0.0591 Epoch: 4 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:09:45,904-Speed 3428.65 samples/sec Loss 6.7559 LearningRate 0.0591 Epoch: 4 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:09:48,927-Speed 3387.76 samples/sec Loss 6.7363 LearningRate 0.0591 Epoch: 4 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:09:51,943-Speed 3396.56 samples/sec Loss 6.5860 LearningRate 0.0591 Epoch: 4 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:09:54,957-Speed 3398.42 samples/sec Loss 6.6887 LearningRate 0.0591 Epoch: 4 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:09:57,967-Speed 3401.98 samples/sec Loss 6.8324 LearningRate 0.0591 Epoch: 4 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:00,972-Speed 3409.19 samples/sec Loss 6.9529 LearningRate 0.0590 Epoch: 4 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:03,981-Speed 3403.70 samples/sec Loss 6.5581 LearningRate 0.0590 Epoch: 4 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:06,996-Speed 3397.43 samples/sec Loss 6.5834 LearningRate 0.0590 Epoch: 4 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:10,006-Speed 3401.60 samples/sec Loss 6.8721 LearningRate 0.0590 Epoch: 4 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:13,033-Speed 3384.05 samples/sec Loss 6.7056 LearningRate 0.0590 Epoch: 4 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:10:16,055-Speed 3389.80 samples/sec Loss 6.7667 LearningRate 0.0590 Epoch: 4 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:19,103-Speed 3360.12 samples/sec Loss 6.7467 LearningRate 0.0590 Epoch: 4 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:22,110-Speed 3406.73 samples/sec Loss 6.6128 LearningRate 0.0589 Epoch: 4 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:25,128-Speed 3393.73 samples/sec Loss 6.8131 LearningRate 0.0589 Epoch: 4 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:28,132-Speed 3408.99 samples/sec Loss 6.8180 LearningRate 0.0589 Epoch: 4 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:31,138-Speed 3407.60 samples/sec Loss 6.7356 LearningRate 0.0589 Epoch: 4 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:34,148-Speed 3403.55 samples/sec Loss 6.7482 LearningRate 0.0589 Epoch: 4 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:37,171-Speed 3387.28 samples/sec Loss 6.6906 LearningRate 0.0589 Epoch: 4 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:40,235-Speed 3342.56 samples/sec Loss 6.8702 LearningRate 0.0589 Epoch: 4 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:43,259-Speed 3388.07 samples/sec Loss 6.7702 LearningRate 0.0589 Epoch: 4 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:46,281-Speed 3388.89 samples/sec Loss 6.7452 LearningRate 0.0588 Epoch: 4 Global Step: 26490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:10:49,275-Speed 3422.10 samples/sec Loss 6.6135 LearningRate 0.0588 Epoch: 4 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:52,283-Speed 3404.91 samples/sec Loss 6.7105 LearningRate 0.0588 Epoch: 4 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:55,291-Speed 3405.73 samples/sec Loss 6.7238 LearningRate 0.0588 Epoch: 4 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:10:58,299-Speed 3404.77 samples/sec Loss 6.6160 LearningRate 0.0588 Epoch: 4 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:01,348-Speed 3359.52 samples/sec Loss 6.5864 LearningRate 0.0588 Epoch: 4 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:04,366-Speed 3393.94 samples/sec Loss 6.6253 LearningRate 0.0588 Epoch: 4 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:07,377-Speed 3401.63 samples/sec Loss 6.6714 LearningRate 0.0587 Epoch: 4 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:10,453-Speed 3329.89 samples/sec Loss 6.8414 LearningRate 0.0587 Epoch: 4 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:13,476-Speed 3387.99 samples/sec Loss 6.9059 LearningRate 0.0587 Epoch: 4 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:16,493-Speed 3394.82 samples/sec Loss 6.7918 LearningRate 0.0587 Epoch: 4 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:19,486-Speed 3422.20 samples/sec Loss 6.6893 LearningRate 0.0587 Epoch: 4 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:22,497-Speed 3401.01 samples/sec Loss 6.8121 LearningRate 0.0587 Epoch: 4 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:25,523-Speed 3385.85 samples/sec Loss 6.7591 LearningRate 0.0587 Epoch: 4 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:28,564-Speed 3367.56 samples/sec Loss 6.7503 LearningRate 0.0586 Epoch: 4 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:31,581-Speed 3394.89 samples/sec Loss 6.7778 LearningRate 0.0586 Epoch: 4 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:34,604-Speed 3388.47 samples/sec Loss 6.8014 LearningRate 0.0586 Epoch: 4 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:37,620-Speed 3395.80 samples/sec Loss 6.7168 LearningRate 0.0586 Epoch: 4 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:40,679-Speed 3348.22 samples/sec Loss 6.7502 LearningRate 0.0586 Epoch: 4 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:43,694-Speed 3397.37 samples/sec Loss 6.8532 LearningRate 0.0586 Epoch: 4 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:46,733-Speed 3370.08 samples/sec Loss 6.8040 LearningRate 0.0586 Epoch: 4 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:49,763-Speed 3380.48 samples/sec Loss 6.7219 LearningRate 0.0586 Epoch: 4 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:52,780-Speed 3394.65 samples/sec Loss 6.7279 LearningRate 0.0585 Epoch: 4 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:11:55,777-Speed 3417.39 samples/sec Loss 6.8945 LearningRate 0.0585 Epoch: 4 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:11:58,790-Speed 3400.02 samples/sec Loss 6.7330 LearningRate 0.0585 Epoch: 4 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:01,802-Speed 3400.48 samples/sec Loss 6.7314 LearningRate 0.0585 Epoch: 4 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:04,827-Speed 3385.41 samples/sec Loss 6.7616 LearningRate 0.0585 Epoch: 4 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:07,840-Speed 3399.32 samples/sec Loss 6.7422 LearningRate 0.0585 Epoch: 4 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:10,853-Speed 3399.57 samples/sec Loss 6.6298 LearningRate 0.0585 Epoch: 4 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:13,873-Speed 3391.83 samples/sec Loss 6.6233 LearningRate 0.0584 Epoch: 4 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:16,889-Speed 3395.53 samples/sec Loss 6.7909 LearningRate 0.0584 Epoch: 4 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:19,899-Speed 3403.00 samples/sec Loss 6.7288 LearningRate 0.0584 Epoch: 4 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:22,919-Speed 3391.41 samples/sec Loss 6.8621 LearningRate 0.0584 Epoch: 4 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:25,918-Speed 3416.15 samples/sec Loss 6.9222 LearningRate 0.0584 Epoch: 4 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:28,931-Speed 3398.47 samples/sec Loss 6.6971 LearningRate 0.0584 Epoch: 4 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:31,946-Speed 3397.94 samples/sec Loss 6.6842 LearningRate 0.0584 Epoch: 4 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:34,958-Speed 3400.08 samples/sec Loss 6.7677 LearningRate 0.0584 Epoch: 4 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:37,981-Speed 3387.93 samples/sec Loss 6.6751 LearningRate 0.0583 Epoch: 4 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:41,005-Speed 3387.22 samples/sec Loss 6.7660 LearningRate 0.0583 Epoch: 4 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:44,022-Speed 3394.49 samples/sec Loss 6.8619 LearningRate 0.0583 Epoch: 4 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:47,040-Speed 3394.69 samples/sec Loss 6.6523 LearningRate 0.0583 Epoch: 4 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:50,062-Speed 3389.43 samples/sec Loss 6.8414 LearningRate 0.0583 Epoch: 4 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:53,083-Speed 3390.18 samples/sec Loss 6.8157 LearningRate 0.0583 Epoch: 4 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:12:56,099-Speed 3395.33 samples/sec Loss 6.7727 LearningRate 0.0583 Epoch: 4 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:12:59,128-Speed 3382.05 samples/sec Loss 6.7874 LearningRate 0.0582 Epoch: 4 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:02,157-Speed 3381.24 samples/sec Loss 6.7154 LearningRate 0.0582 Epoch: 4 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:05,227-Speed 3336.77 samples/sec Loss 6.7189 LearningRate 0.0582 Epoch: 4 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:08,271-Speed 3364.77 samples/sec Loss 6.8711 LearningRate 0.0582 Epoch: 4 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:11,291-Speed 3391.39 samples/sec Loss 6.9450 LearningRate 0.0582 Epoch: 4 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:14,310-Speed 3393.54 samples/sec Loss 6.6661 LearningRate 0.0582 Epoch: 4 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:17,326-Speed 3395.46 samples/sec Loss 6.7253 LearningRate 0.0582 Epoch: 4 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:20,337-Speed 3401.55 samples/sec Loss 6.8504 LearningRate 0.0582 Epoch: 4 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:23,373-Speed 3374.42 samples/sec Loss 6.7398 LearningRate 0.0581 Epoch: 4 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:26,372-Speed 3415.40 samples/sec Loss 6.5897 LearningRate 0.0581 Epoch: 4 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:29,387-Speed 3396.64 samples/sec Loss 6.6672 LearningRate 0.0581 Epoch: 4 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:32,407-Speed 3391.24 samples/sec Loss 6.7585 LearningRate 0.0581 Epoch: 4 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:35,425-Speed 3393.57 samples/sec Loss 6.6929 LearningRate 0.0581 Epoch: 4 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:38,438-Speed 3399.14 samples/sec Loss 6.6121 LearningRate 0.0581 Epoch: 4 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:41,459-Speed 3390.91 samples/sec Loss 6.6858 LearningRate 0.0581 Epoch: 4 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:44,476-Speed 3395.55 samples/sec Loss 6.7128 LearningRate 0.0580 Epoch: 4 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:47,513-Speed 3372.25 samples/sec Loss 6.6505 LearningRate 0.0580 Epoch: 4 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:50,537-Speed 3387.41 samples/sec Loss 6.6508 LearningRate 0.0580 Epoch: 4 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:53,555-Speed 3393.81 samples/sec Loss 6.6177 LearningRate 0.0580 Epoch: 4 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:13:56,569-Speed 3398.55 samples/sec Loss 6.7253 LearningRate 0.0580 Epoch: 4 Global Step: 27120 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:13:59,585-Speed 3395.46 samples/sec Loss 6.6897 LearningRate 0.0580 Epoch: 4 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:02,611-Speed 3384.23 samples/sec Loss 6.6404 LearningRate 0.0580 Epoch: 4 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:05,627-Speed 3397.21 samples/sec Loss 6.6928 LearningRate 0.0580 Epoch: 4 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:08,646-Speed 3392.13 samples/sec Loss 6.7091 LearningRate 0.0579 Epoch: 4 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:11,670-Speed 3387.70 samples/sec Loss 6.5581 LearningRate 0.0579 Epoch: 4 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:14,732-Speed 3344.72 samples/sec Loss 6.7593 LearningRate 0.0579 Epoch: 4 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:17,778-Speed 3362.61 samples/sec Loss 6.6592 LearningRate 0.0579 Epoch: 4 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:20,817-Speed 3370.75 samples/sec Loss 6.7124 LearningRate 0.0579 Epoch: 4 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:23,847-Speed 3379.85 samples/sec Loss 6.6230 LearningRate 0.0579 Epoch: 4 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:26,865-Speed 3393.32 samples/sec Loss 6.6559 LearningRate 0.0579 Epoch: 4 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:29,863-Speed 3416.23 samples/sec Loss 6.7926 LearningRate 0.0578 Epoch: 4 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:32,881-Speed 3394.45 samples/sec Loss 6.7405 LearningRate 0.0578 Epoch: 4 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:35,920-Speed 3370.49 samples/sec Loss 6.5588 LearningRate 0.0578 Epoch: 4 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:38,943-Speed 3388.08 samples/sec Loss 6.6366 LearningRate 0.0578 Epoch: 4 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:41,961-Speed 3393.99 samples/sec Loss 6.6606 LearningRate 0.0578 Epoch: 4 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:44,985-Speed 3387.44 samples/sec Loss 6.5953 LearningRate 0.0578 Epoch: 4 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:14:47,990-Speed 3407.83 samples/sec Loss 6.7246 LearningRate 0.0578 Epoch: 4 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:14:51,004-Speed 3398.22 samples/sec Loss 6.8245 LearningRate 0.0578 Epoch: 4 Global Step: 27300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:14:54,024-Speed 3392.08 samples/sec Loss 6.7670 LearningRate 0.0577 Epoch: 4 Global Step: 27310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:14:57,049-Speed 3385.86 samples/sec Loss 6.6087 LearningRate 0.0577 Epoch: 4 Global Step: 27320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:00,068-Speed 3392.57 samples/sec Loss 6.8179 LearningRate 0.0577 Epoch: 4 Global Step: 27330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:03,094-Speed 3384.22 samples/sec Loss 6.6023 LearningRate 0.0577 Epoch: 4 Global Step: 27340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:06,130-Speed 3375.53 samples/sec Loss 6.7188 LearningRate 0.0577 Epoch: 4 Global Step: 27350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:09,152-Speed 3388.83 samples/sec Loss 6.5692 LearningRate 0.0577 Epoch: 4 Global Step: 27360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:12,175-Speed 3388.46 samples/sec Loss 6.5228 LearningRate 0.0577 Epoch: 4 Global Step: 27370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:15,191-Speed 3395.49 samples/sec Loss 6.6627 LearningRate 0.0576 Epoch: 4 Global Step: 27380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:18,210-Speed 3393.26 samples/sec Loss 6.7884 LearningRate 0.0576 Epoch: 4 Global Step: 27390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 04:15:21,230-Speed 3390.90 samples/sec Loss 6.6879 LearningRate 0.0576 Epoch: 4 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:24,262-Speed 3378.10 samples/sec Loss 6.6973 LearningRate 0.0576 Epoch: 4 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:27,290-Speed 3382.36 samples/sec Loss 6.6793 LearningRate 0.0576 Epoch: 4 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:30,318-Speed 3383.03 samples/sec Loss 6.6419 LearningRate 0.0576 Epoch: 4 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:33,340-Speed 3389.83 samples/sec Loss 6.6978 LearningRate 0.0576 Epoch: 4 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:36,358-Speed 3393.42 samples/sec Loss 6.5733 LearningRate 0.0576 Epoch: 4 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:39,379-Speed 3390.22 samples/sec Loss 6.7159 LearningRate 0.0575 Epoch: 4 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:42,401-Speed 3389.36 samples/sec Loss 6.6857 LearningRate 0.0575 Epoch: 4 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:45,420-Speed 3392.97 samples/sec Loss 6.7726 LearningRate 0.0575 Epoch: 4 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:48,469-Speed 3358.60 samples/sec Loss 6.6284 LearningRate 0.0575 Epoch: 4 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:15:51,491-Speed 3389.55 samples/sec Loss 6.6125 LearningRate 0.0575 Epoch: 4 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:15:54,518-Speed 3383.77 samples/sec Loss 6.6494 LearningRate 0.0575 Epoch: 4 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:15:57,536-Speed 3393.82 samples/sec Loss 6.7549 LearningRate 0.0575 Epoch: 4 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:00,553-Speed 3395.10 samples/sec Loss 6.4713 LearningRate 0.0574 Epoch: 4 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:03,579-Speed 3384.85 samples/sec Loss 6.6805 LearningRate 0.0574 Epoch: 4 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:06,604-Speed 3385.92 samples/sec Loss 6.7249 LearningRate 0.0574 Epoch: 4 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:09,635-Speed 3379.39 samples/sec Loss 6.6892 LearningRate 0.0574 Epoch: 4 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:12,664-Speed 3382.17 samples/sec Loss 6.6448 LearningRate 0.0574 Epoch: 4 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:15,691-Speed 3382.69 samples/sec Loss 6.7297 LearningRate 0.0574 Epoch: 4 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:18,711-Speed 3392.29 samples/sec Loss 6.5607 LearningRate 0.0574 Epoch: 4 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:21,715-Speed 3408.83 samples/sec Loss 6.6876 LearningRate 0.0574 Epoch: 4 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:24,741-Speed 3385.70 samples/sec Loss 6.5975 LearningRate 0.0573 Epoch: 4 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:27,764-Speed 3387.31 samples/sec Loss 6.6593 LearningRate 0.0573 Epoch: 4 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:30,785-Speed 3390.80 samples/sec Loss 6.6981 LearningRate 0.0573 Epoch: 4 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:33,900-Speed 3288.68 samples/sec Loss 6.7159 LearningRate 0.0573 Epoch: 4 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:36,943-Speed 3365.52 samples/sec Loss 6.5528 LearningRate 0.0573 Epoch: 4 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:40,133-Speed 3211.09 samples/sec Loss 6.6697 LearningRate 0.0573 Epoch: 4 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:43,242-Speed 3294.55 samples/sec Loss 6.5614 LearningRate 0.0573 Epoch: 4 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:46,262-Speed 3391.36 samples/sec Loss 6.7548 LearningRate 0.0572 Epoch: 4 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:49,285-Speed 3387.57 samples/sec Loss 6.7949 LearningRate 0.0572 Epoch: 4 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:52,296-Speed 3401.84 samples/sec Loss 6.5456 LearningRate 0.0572 Epoch: 4 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:16:55,300-Speed 3409.97 samples/sec Loss 6.5964 LearningRate 0.0572 Epoch: 4 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:16:58,325-Speed 3385.47 samples/sec Loss 6.6071 LearningRate 0.0572 Epoch: 4 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:01,349-Speed 3386.80 samples/sec Loss 6.5212 LearningRate 0.0572 Epoch: 4 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:04,403-Speed 3354.34 samples/sec Loss 6.7428 LearningRate 0.0572 Epoch: 4 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:07,428-Speed 3385.97 samples/sec Loss 6.5696 LearningRate 0.0572 Epoch: 4 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:10,455-Speed 3383.46 samples/sec Loss 6.5948 LearningRate 0.0571 Epoch: 4 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:13,484-Speed 3381.97 samples/sec Loss 6.5816 LearningRate 0.0571 Epoch: 4 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:16,517-Speed 3376.71 samples/sec Loss 6.5480 LearningRate 0.0571 Epoch: 4 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:19,545-Speed 3382.47 samples/sec Loss 6.6394 LearningRate 0.0571 Epoch: 4 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:22,575-Speed 3380.18 samples/sec Loss 6.6532 LearningRate 0.0571 Epoch: 4 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:25,615-Speed 3370.05 samples/sec Loss 6.5295 LearningRate 0.0571 Epoch: 4 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:28,648-Speed 3376.99 samples/sec Loss 6.5040 LearningRate 0.0571 Epoch: 4 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:31,668-Speed 3390.44 samples/sec Loss 6.6268 LearningRate 0.0570 Epoch: 4 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:34,694-Speed 3385.61 samples/sec Loss 6.5905 LearningRate 0.0570 Epoch: 4 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:37,733-Speed 3370.07 samples/sec Loss 6.6355 LearningRate 0.0570 Epoch: 4 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:40,752-Speed 3392.13 samples/sec Loss 6.5910 LearningRate 0.0570 Epoch: 4 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:43,786-Speed 3375.63 samples/sec Loss 6.6946 LearningRate 0.0570 Epoch: 4 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:46,816-Speed 3380.94 samples/sec Loss 6.5928 LearningRate 0.0570 Epoch: 4 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:17:49,845-Speed 3381.36 samples/sec Loss 6.6521 LearningRate 0.0570 Epoch: 4 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:52,872-Speed 3383.68 samples/sec Loss 6.6342 LearningRate 0.0570 Epoch: 4 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:55,896-Speed 3387.47 samples/sec Loss 6.6379 LearningRate 0.0569 Epoch: 4 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:17:58,919-Speed 3388.11 samples/sec Loss 6.3860 LearningRate 0.0569 Epoch: 4 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:01,940-Speed 3389.65 samples/sec Loss 6.5595 LearningRate 0.0569 Epoch: 4 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:04,966-Speed 3384.92 samples/sec Loss 6.6465 LearningRate 0.0569 Epoch: 4 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:07,993-Speed 3383.81 samples/sec Loss 6.6515 LearningRate 0.0569 Epoch: 4 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:11,026-Speed 3377.22 samples/sec Loss 6.6718 LearningRate 0.0569 Epoch: 4 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:14,050-Speed 3386.72 samples/sec Loss 6.5897 LearningRate 0.0569 Epoch: 4 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:17,080-Speed 3381.38 samples/sec Loss 6.5766 LearningRate 0.0568 Epoch: 4 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:18:20,109-Speed 3380.65 samples/sec Loss 6.7657 LearningRate 0.0568 Epoch: 4 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:18:23,200-Speed 3313.72 samples/sec Loss 6.6644 LearningRate 0.0568 Epoch: 4 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:19:06,448-[lfw][28000]XNorm: 20.600976 Training: 2022-04-27 04:19:06,448-[lfw][28000]Accuracy-Flip: 0.99600+-0.00260 Training: 2022-04-27 04:19:06,449-[lfw][28000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:19:56,931-[cfp_fp][28000]XNorm: 18.298377 Training: 2022-04-27 04:19:56,932-[cfp_fp][28000]Accuracy-Flip: 0.93886+-0.01355 Training: 2022-04-27 04:19:56,932-[cfp_fp][28000]Accuracy-Highest: 0.94443 Training: 2022-04-27 04:20:40,342-[agedb_30][28000]XNorm: 20.489127 Training: 2022-04-27 04:20:40,343-[agedb_30][28000]Accuracy-Flip: 0.97133+-0.01011 Training: 2022-04-27 04:20:40,343-[agedb_30][28000]Accuracy-Highest: 0.97167 Training: 2022-04-27 04:20:43,364-Speed 73.06 samples/sec Loss 6.7090 LearningRate 0.0568 Epoch: 4 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:20:46,387-Speed 3387.46 samples/sec Loss 6.5221 LearningRate 0.0568 Epoch: 4 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:20:49,385-Speed 3416.96 samples/sec Loss 6.5855 LearningRate 0.0568 Epoch: 4 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:20:52,397-Speed 3400.42 samples/sec Loss 6.5815 LearningRate 0.0568 Epoch: 4 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:20:55,411-Speed 3398.35 samples/sec Loss 6.6225 LearningRate 0.0568 Epoch: 4 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:20:58,429-Speed 3393.78 samples/sec Loss 6.4728 LearningRate 0.0567 Epoch: 4 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:01,463-Speed 3375.93 samples/sec Loss 6.5367 LearningRate 0.0567 Epoch: 4 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:04,484-Speed 3389.56 samples/sec Loss 6.6303 LearningRate 0.0567 Epoch: 4 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:07,503-Speed 3392.53 samples/sec Loss 6.6451 LearningRate 0.0567 Epoch: 4 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:10,524-Speed 3390.71 samples/sec Loss 6.7046 LearningRate 0.0567 Epoch: 4 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:13,545-Speed 3390.48 samples/sec Loss 6.6895 LearningRate 0.0567 Epoch: 4 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:16,566-Speed 3391.48 samples/sec Loss 6.6635 LearningRate 0.0567 Epoch: 4 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:21:19,592-Speed 3384.27 samples/sec Loss 6.5855 LearningRate 0.0566 Epoch: 4 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:22,614-Speed 3389.42 samples/sec Loss 6.6673 LearningRate 0.0566 Epoch: 4 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:25,636-Speed 3388.77 samples/sec Loss 6.5562 LearningRate 0.0566 Epoch: 4 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:28,660-Speed 3386.95 samples/sec Loss 6.4710 LearningRate 0.0566 Epoch: 4 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:31,680-Speed 3391.22 samples/sec Loss 6.4610 LearningRate 0.0566 Epoch: 4 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:34,704-Speed 3388.01 samples/sec Loss 6.6731 LearningRate 0.0566 Epoch: 4 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:37,740-Speed 3373.01 samples/sec Loss 6.5271 LearningRate 0.0566 Epoch: 4 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:40,769-Speed 3381.45 samples/sec Loss 6.5566 LearningRate 0.0566 Epoch: 4 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:43,814-Speed 3363.94 samples/sec Loss 6.5637 LearningRate 0.0565 Epoch: 4 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:46,843-Speed 3381.69 samples/sec Loss 6.4893 LearningRate 0.0565 Epoch: 4 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:49,877-Speed 3375.61 samples/sec Loss 6.6909 LearningRate 0.0565 Epoch: 4 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:52,921-Speed 3364.96 samples/sec Loss 6.6809 LearningRate 0.0565 Epoch: 4 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:55,957-Speed 3373.27 samples/sec Loss 6.5695 LearningRate 0.0565 Epoch: 4 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:21:59,008-Speed 3357.08 samples/sec Loss 6.5994 LearningRate 0.0565 Epoch: 4 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:02,042-Speed 3375.80 samples/sec Loss 6.4801 LearningRate 0.0565 Epoch: 4 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:05,069-Speed 3384.08 samples/sec Loss 6.5425 LearningRate 0.0564 Epoch: 4 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:08,093-Speed 3387.84 samples/sec Loss 6.5049 LearningRate 0.0564 Epoch: 4 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:11,160-Speed 3339.47 samples/sec Loss 6.4665 LearningRate 0.0564 Epoch: 4 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:14,187-Speed 3383.62 samples/sec Loss 6.4931 LearningRate 0.0564 Epoch: 4 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:17,208-Speed 3389.65 samples/sec Loss 6.6132 LearningRate 0.0564 Epoch: 4 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:20,232-Speed 3386.98 samples/sec Loss 6.7068 LearningRate 0.0564 Epoch: 4 Global Step: 28330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:22:23,253-Speed 3390.74 samples/sec Loss 6.5797 LearningRate 0.0564 Epoch: 4 Global Step: 28340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:22:26,258-Speed 3408.68 samples/sec Loss 6.5406 LearningRate 0.0564 Epoch: 4 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:29,280-Speed 3388.55 samples/sec Loss 6.6048 LearningRate 0.0563 Epoch: 4 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:22:32,282-Speed 3412.47 samples/sec Loss 6.6766 LearningRate 0.0563 Epoch: 4 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:35,302-Speed 3392.10 samples/sec Loss 6.6452 LearningRate 0.0563 Epoch: 4 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:38,324-Speed 3389.29 samples/sec Loss 6.6440 LearningRate 0.0563 Epoch: 4 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:41,344-Speed 3390.74 samples/sec Loss 6.4998 LearningRate 0.0563 Epoch: 4 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:44,364-Speed 3391.60 samples/sec Loss 6.5512 LearningRate 0.0563 Epoch: 4 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:47,562-Speed 3203.33 samples/sec Loss 6.6577 LearningRate 0.0563 Epoch: 4 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:22:50,583-Speed 3389.27 samples/sec Loss 6.5507 LearningRate 0.0563 Epoch: 4 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:03,920-Speed 767.87 samples/sec Loss 6.0581 LearningRate 0.0562 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:06,962-Speed 3368.35 samples/sec Loss 5.9586 LearningRate 0.0562 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:09,997-Speed 3375.28 samples/sec Loss 6.1011 LearningRate 0.0562 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:13,011-Speed 3397.81 samples/sec Loss 5.9808 LearningRate 0.0562 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:16,051-Speed 3368.99 samples/sec Loss 5.9908 LearningRate 0.0562 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:19,069-Speed 3395.08 samples/sec Loss 5.9135 LearningRate 0.0562 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:22,089-Speed 3390.97 samples/sec Loss 5.9870 LearningRate 0.0562 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:25,129-Speed 3369.04 samples/sec Loss 6.0094 LearningRate 0.0561 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:28,163-Speed 3376.54 samples/sec Loss 6.0446 LearningRate 0.0561 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:23:31,166-Speed 3410.25 samples/sec Loss 6.0945 LearningRate 0.0561 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:34,186-Speed 3391.43 samples/sec Loss 6.1125 LearningRate 0.0561 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:37,209-Speed 3388.94 samples/sec Loss 6.0254 LearningRate 0.0561 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:40,235-Speed 3384.10 samples/sec Loss 6.0734 LearningRate 0.0561 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:43,259-Speed 3387.63 samples/sec Loss 6.0741 LearningRate 0.0561 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:46,291-Speed 3377.50 samples/sec Loss 5.9974 LearningRate 0.0561 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:49,342-Speed 3356.88 samples/sec Loss 6.0468 LearningRate 0.0560 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:52,370-Speed 3383.43 samples/sec Loss 6.0610 LearningRate 0.0560 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:55,403-Speed 3377.32 samples/sec Loss 6.1386 LearningRate 0.0560 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:23:58,435-Speed 3377.70 samples/sec Loss 5.9311 LearningRate 0.0560 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:01,471-Speed 3373.41 samples/sec Loss 6.2108 LearningRate 0.0560 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:04,505-Speed 3375.70 samples/sec Loss 6.1234 LearningRate 0.0560 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:07,546-Speed 3368.24 samples/sec Loss 6.1246 LearningRate 0.0560 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:10,587-Speed 3368.16 samples/sec Loss 6.2423 LearningRate 0.0559 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:13,649-Speed 3345.44 samples/sec Loss 6.1418 LearningRate 0.0559 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:16,672-Speed 3387.93 samples/sec Loss 6.1166 LearningRate 0.0559 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:19,708-Speed 3373.33 samples/sec Loss 6.1114 LearningRate 0.0559 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:22,769-Speed 3346.86 samples/sec Loss 6.2816 LearningRate 0.0559 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:25,802-Speed 3376.23 samples/sec Loss 6.2510 LearningRate 0.0559 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:28,855-Speed 3355.13 samples/sec Loss 6.2471 LearningRate 0.0559 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:31,884-Speed 3382.10 samples/sec Loss 6.2027 LearningRate 0.0559 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:34,912-Speed 3381.67 samples/sec Loss 6.1075 LearningRate 0.0558 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:37,951-Speed 3370.15 samples/sec Loss 6.2437 LearningRate 0.0558 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:41,003-Speed 3357.17 samples/sec Loss 6.1197 LearningRate 0.0558 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:44,035-Speed 3377.69 samples/sec Loss 6.2133 LearningRate 0.0558 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:24:47,065-Speed 3380.13 samples/sec Loss 6.0631 LearningRate 0.0558 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:50,087-Speed 3390.15 samples/sec Loss 6.2110 LearningRate 0.0558 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:53,107-Speed 3391.35 samples/sec Loss 6.2302 LearningRate 0.0558 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:56,132-Speed 3385.16 samples/sec Loss 6.2842 LearningRate 0.0557 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:24:59,153-Speed 3391.46 samples/sec Loss 6.2979 LearningRate 0.0557 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:02,219-Speed 3340.55 samples/sec Loss 6.3249 LearningRate 0.0557 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:05,240-Speed 3390.06 samples/sec Loss 6.1415 LearningRate 0.0557 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:08,262-Speed 3388.86 samples/sec Loss 6.4004 LearningRate 0.0557 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:11,284-Speed 3389.18 samples/sec Loss 6.2302 LearningRate 0.0557 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:14,298-Speed 3399.26 samples/sec Loss 6.2347 LearningRate 0.0557 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:17,318-Speed 3391.25 samples/sec Loss 6.1551 LearningRate 0.0557 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:25:20,315-Speed 3417.37 samples/sec Loss 6.1446 LearningRate 0.0556 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:23,359-Speed 3364.46 samples/sec Loss 6.1764 LearningRate 0.0556 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:26,375-Speed 3396.61 samples/sec Loss 6.2682 LearningRate 0.0556 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:29,385-Speed 3402.04 samples/sec Loss 6.2383 LearningRate 0.0556 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:32,411-Speed 3384.65 samples/sec Loss 6.2872 LearningRate 0.0556 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:35,432-Speed 3391.24 samples/sec Loss 6.4516 LearningRate 0.0556 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:38,457-Speed 3385.33 samples/sec Loss 6.3799 LearningRate 0.0556 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:41,481-Speed 3387.52 samples/sec Loss 6.2466 LearningRate 0.0556 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:44,495-Speed 3398.34 samples/sec Loss 6.2302 LearningRate 0.0555 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:47,576-Speed 3324.70 samples/sec Loss 6.2843 LearningRate 0.0555 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:50,616-Speed 3368.43 samples/sec Loss 6.2597 LearningRate 0.0555 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:25:53,660-Speed 3365.30 samples/sec Loss 6.2839 LearningRate 0.0555 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:25:56,653-Speed 3421.16 samples/sec Loss 6.3063 LearningRate 0.0555 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:25:59,681-Speed 3382.72 samples/sec Loss 6.2427 LearningRate 0.0555 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:02,708-Speed 3383.79 samples/sec Loss 6.4099 LearningRate 0.0555 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:05,730-Speed 3390.11 samples/sec Loss 6.2059 LearningRate 0.0554 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:08,751-Speed 3390.25 samples/sec Loss 6.3701 LearningRate 0.0554 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:11,767-Speed 3395.53 samples/sec Loss 6.1499 LearningRate 0.0554 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:14,796-Speed 3381.87 samples/sec Loss 6.4190 LearningRate 0.0554 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:17,823-Speed 3383.89 samples/sec Loss 6.2339 LearningRate 0.0554 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:20,843-Speed 3391.43 samples/sec Loss 6.1581 LearningRate 0.0554 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:23,863-Speed 3390.98 samples/sec Loss 6.1593 LearningRate 0.0554 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:26:26,882-Speed 3392.21 samples/sec Loss 6.3320 LearningRate 0.0554 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:29,904-Speed 3389.66 samples/sec Loss 6.3120 LearningRate 0.0553 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:32,928-Speed 3387.46 samples/sec Loss 6.3593 LearningRate 0.0553 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:35,960-Speed 3378.15 samples/sec Loss 6.2970 LearningRate 0.0553 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:38,989-Speed 3380.93 samples/sec Loss 6.3713 LearningRate 0.0553 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:42,021-Speed 3378.44 samples/sec Loss 6.2687 LearningRate 0.0553 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:45,041-Speed 3391.27 samples/sec Loss 6.2558 LearningRate 0.0553 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:48,074-Speed 3376.85 samples/sec Loss 6.3296 LearningRate 0.0553 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:51,097-Speed 3388.44 samples/sec Loss 6.4390 LearningRate 0.0553 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:54,118-Speed 3390.16 samples/sec Loss 6.3096 LearningRate 0.0552 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:26:57,145-Speed 3384.38 samples/sec Loss 6.3339 LearningRate 0.0552 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:27:00,180-Speed 3374.55 samples/sec Loss 6.3648 LearningRate 0.0552 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:27:03,204-Speed 3387.23 samples/sec Loss 6.1761 LearningRate 0.0552 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:27:06,220-Speed 3395.86 samples/sec Loss 6.3021 LearningRate 0.0552 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:09,241-Speed 3390.22 samples/sec Loss 6.2159 LearningRate 0.0552 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:12,261-Speed 3391.74 samples/sec Loss 6.2571 LearningRate 0.0552 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:15,281-Speed 3391.51 samples/sec Loss 6.2604 LearningRate 0.0551 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:18,303-Speed 3389.67 samples/sec Loss 6.4101 LearningRate 0.0551 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:21,335-Speed 3377.96 samples/sec Loss 6.2315 LearningRate 0.0551 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:24,374-Speed 3370.02 samples/sec Loss 6.2628 LearningRate 0.0551 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:27,394-Speed 3391.88 samples/sec Loss 6.2386 LearningRate 0.0551 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:30,416-Speed 3389.55 samples/sec Loss 6.4200 LearningRate 0.0551 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:33,434-Speed 3393.33 samples/sec Loss 6.2536 LearningRate 0.0551 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:36,454-Speed 3391.51 samples/sec Loss 6.3048 LearningRate 0.0551 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:27:39,461-Speed 3405.87 samples/sec Loss 6.2734 LearningRate 0.0550 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:42,482-Speed 3389.91 samples/sec Loss 6.3984 LearningRate 0.0550 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:45,509-Speed 3384.39 samples/sec Loss 6.2112 LearningRate 0.0550 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:48,536-Speed 3383.95 samples/sec Loss 6.3482 LearningRate 0.0550 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:51,560-Speed 3385.92 samples/sec Loss 6.4955 LearningRate 0.0550 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:54,599-Speed 3370.97 samples/sec Loss 6.4213 LearningRate 0.0550 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:27:57,625-Speed 3385.29 samples/sec Loss 6.1749 LearningRate 0.0550 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:00,662-Speed 3372.22 samples/sec Loss 6.2676 LearningRate 0.0550 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:03,710-Speed 3360.75 samples/sec Loss 6.3748 LearningRate 0.0549 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:06,732-Speed 3389.23 samples/sec Loss 6.2022 LearningRate 0.0549 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:09,798-Speed 3340.69 samples/sec Loss 6.3842 LearningRate 0.0549 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-27 04:28:12,806-Speed 3405.04 samples/sec Loss 6.2207 LearningRate 0.0549 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:15,827-Speed 3390.17 samples/sec Loss 6.3131 LearningRate 0.0549 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:18,860-Speed 3377.51 samples/sec Loss 6.4202 LearningRate 0.0549 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:21,861-Speed 3413.06 samples/sec Loss 6.3223 LearningRate 0.0549 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:24,883-Speed 3389.35 samples/sec Loss 6.3742 LearningRate 0.0548 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:27,907-Speed 3387.26 samples/sec Loss 6.3160 LearningRate 0.0548 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:30,932-Speed 3385.15 samples/sec Loss 6.3385 LearningRate 0.0548 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:33,953-Speed 3390.86 samples/sec Loss 6.2444 LearningRate 0.0548 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:36,972-Speed 3392.91 samples/sec Loss 6.4784 LearningRate 0.0548 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:40,003-Speed 3378.86 samples/sec Loss 6.3011 LearningRate 0.0548 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:43,029-Speed 3384.52 samples/sec Loss 6.4632 LearningRate 0.0548 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:46,057-Speed 3382.88 samples/sec Loss 6.4195 LearningRate 0.0548 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:49,083-Speed 3385.25 samples/sec Loss 6.4155 LearningRate 0.0547 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:52,100-Speed 3394.87 samples/sec Loss 6.2864 LearningRate 0.0547 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:28:55,101-Speed 3412.28 samples/sec Loss 6.3902 LearningRate 0.0547 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:28:58,122-Speed 3391.10 samples/sec Loss 6.5820 LearningRate 0.0547 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:01,139-Speed 3394.44 samples/sec Loss 6.2833 LearningRate 0.0547 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:04,165-Speed 3384.40 samples/sec Loss 6.2001 LearningRate 0.0547 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:07,189-Speed 3386.92 samples/sec Loss 6.3935 LearningRate 0.0547 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:10,227-Speed 3372.63 samples/sec Loss 6.3798 LearningRate 0.0547 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:13,257-Speed 3379.27 samples/sec Loss 6.4024 LearningRate 0.0546 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:16,280-Speed 3388.68 samples/sec Loss 6.3428 LearningRate 0.0546 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:19,341-Speed 3346.13 samples/sec Loss 6.3590 LearningRate 0.0546 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:22,365-Speed 3386.92 samples/sec Loss 6.4183 LearningRate 0.0546 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:25,413-Speed 3360.23 samples/sec Loss 6.3040 LearningRate 0.0546 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:28,454-Speed 3368.47 samples/sec Loss 6.4194 LearningRate 0.0546 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:31,477-Speed 3387.53 samples/sec Loss 6.2951 LearningRate 0.0546 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:34,499-Speed 3389.93 samples/sec Loss 6.3687 LearningRate 0.0545 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:37,525-Speed 3385.03 samples/sec Loss 6.4550 LearningRate 0.0545 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:40,545-Speed 3391.51 samples/sec Loss 6.2130 LearningRate 0.0545 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:43,570-Speed 3386.09 samples/sec Loss 6.2982 LearningRate 0.0545 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:29:46,566-Speed 3418.63 samples/sec Loss 6.3446 LearningRate 0.0545 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:49,596-Speed 3380.59 samples/sec Loss 6.5474 LearningRate 0.0545 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:52,628-Speed 3377.95 samples/sec Loss 6.4196 LearningRate 0.0545 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:55,659-Speed 3378.56 samples/sec Loss 6.3488 LearningRate 0.0545 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:29:58,694-Speed 3374.62 samples/sec Loss 6.2327 LearningRate 0.0544 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:01,715-Speed 3390.59 samples/sec Loss 6.3546 LearningRate 0.0544 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:04,755-Speed 3369.05 samples/sec Loss 6.3971 LearningRate 0.0544 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:07,776-Speed 3390.64 samples/sec Loss 6.3958 LearningRate 0.0544 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:10,797-Speed 3390.10 samples/sec Loss 6.3306 LearningRate 0.0544 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:13,820-Speed 3388.67 samples/sec Loss 6.3475 LearningRate 0.0544 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:16,843-Speed 3388.13 samples/sec Loss 6.2396 LearningRate 0.0544 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:30:19,875-Speed 3377.96 samples/sec Loss 6.3806 LearningRate 0.0544 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:30:22,917-Speed 3366.99 samples/sec Loss 6.4120 LearningRate 0.0543 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:30:25,925-Speed 3405.35 samples/sec Loss 6.1548 LearningRate 0.0543 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:28,994-Speed 3336.57 samples/sec Loss 6.4759 LearningRate 0.0543 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:32,058-Speed 3343.77 samples/sec Loss 6.4614 LearningRate 0.0543 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:35,100-Speed 3366.08 samples/sec Loss 6.3105 LearningRate 0.0543 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:38,125-Speed 3386.94 samples/sec Loss 6.2954 LearningRate 0.0543 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:41,144-Speed 3392.71 samples/sec Loss 6.2281 LearningRate 0.0543 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:44,162-Speed 3393.14 samples/sec Loss 6.4878 LearningRate 0.0542 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:47,187-Speed 3386.41 samples/sec Loss 6.3575 LearningRate 0.0542 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:30:50,214-Speed 3384.17 samples/sec Loss 6.2969 LearningRate 0.0542 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:30:53,242-Speed 3382.27 samples/sec Loss 6.2774 LearningRate 0.0542 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:30:56,279-Speed 3372.47 samples/sec Loss 6.4366 LearningRate 0.0542 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:31:39,774-[lfw][30000]XNorm: 21.604066 Training: 2022-04-27 04:31:39,775-[lfw][30000]Accuracy-Flip: 0.99683+-0.00329 Training: 2022-04-27 04:31:39,775-[lfw][30000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:32:30,254-[cfp_fp][30000]XNorm: 19.520216 Training: 2022-04-27 04:32:30,255-[cfp_fp][30000]Accuracy-Flip: 0.95071+-0.01122 Training: 2022-04-27 04:32:30,255-[cfp_fp][30000]Accuracy-Highest: 0.95071 Training: 2022-04-27 04:33:13,673-[agedb_30][30000]XNorm: 21.513922 Training: 2022-04-27 04:33:13,673-[agedb_30][30000]Accuracy-Flip: 0.97233+-0.00782 Training: 2022-04-27 04:33:13,674-[agedb_30][30000]Accuracy-Highest: 0.97233 Training: 2022-04-27 04:33:16,701-Speed 72.92 samples/sec Loss 6.2756 LearningRate 0.0542 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:19,709-Speed 3405.12 samples/sec Loss 6.3951 LearningRate 0.0542 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:22,741-Speed 3378.00 samples/sec Loss 6.2914 LearningRate 0.0542 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:25,759-Speed 3393.87 samples/sec Loss 6.4194 LearningRate 0.0541 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:28,767-Speed 3405.20 samples/sec Loss 6.3582 LearningRate 0.0541 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:31,778-Speed 3401.63 samples/sec Loss 6.3833 LearningRate 0.0541 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:34,799-Speed 3391.06 samples/sec Loss 6.4203 LearningRate 0.0541 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:37,811-Speed 3400.43 samples/sec Loss 6.2717 LearningRate 0.0541 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:40,845-Speed 3375.74 samples/sec Loss 6.4813 LearningRate 0.0541 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:43,852-Speed 3406.04 samples/sec Loss 6.3175 LearningRate 0.0541 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:46,859-Speed 3406.55 samples/sec Loss 6.4768 LearningRate 0.0541 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:49,888-Speed 3381.50 samples/sec Loss 6.3412 LearningRate 0.0540 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:52,901-Speed 3398.45 samples/sec Loss 6.2620 LearningRate 0.0540 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:33:55,902-Speed 3413.51 samples/sec Loss 6.2955 LearningRate 0.0540 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:33:58,917-Speed 3397.66 samples/sec Loss 6.2445 LearningRate 0.0540 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:01,940-Speed 3387.75 samples/sec Loss 6.3804 LearningRate 0.0540 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:04,990-Speed 3357.65 samples/sec Loss 6.3545 LearningRate 0.0540 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:08,029-Speed 3370.49 samples/sec Loss 6.5291 LearningRate 0.0540 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:11,040-Speed 3401.87 samples/sec Loss 6.4499 LearningRate 0.0540 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:14,049-Speed 3404.82 samples/sec Loss 6.3602 LearningRate 0.0539 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:17,059-Speed 3402.36 samples/sec Loss 6.1988 LearningRate 0.0539 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:20,070-Speed 3401.59 samples/sec Loss 6.4357 LearningRate 0.0539 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:23,095-Speed 3386.36 samples/sec Loss 6.3172 LearningRate 0.0539 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:26,108-Speed 3398.56 samples/sec Loss 6.2439 LearningRate 0.0539 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:29,127-Speed 3393.07 samples/sec Loss 6.4076 LearningRate 0.0539 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:32,143-Speed 3395.93 samples/sec Loss 6.4199 LearningRate 0.0539 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:35,159-Speed 3395.45 samples/sec Loss 6.4761 LearningRate 0.0538 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:38,178-Speed 3393.07 samples/sec Loss 6.3507 LearningRate 0.0538 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:41,207-Speed 3382.13 samples/sec Loss 6.3848 LearningRate 0.0538 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:44,250-Speed 3365.52 samples/sec Loss 6.1614 LearningRate 0.0538 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:47,262-Speed 3399.90 samples/sec Loss 6.3050 LearningRate 0.0538 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:50,297-Speed 3375.34 samples/sec Loss 6.4188 LearningRate 0.0538 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:34:53,301-Speed 3409.40 samples/sec Loss 6.2933 LearningRate 0.0538 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:56,316-Speed 3397.31 samples/sec Loss 6.5575 LearningRate 0.0538 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:34:59,333-Speed 3395.00 samples/sec Loss 6.3334 LearningRate 0.0537 Epoch: 5 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:02,356-Speed 3388.25 samples/sec Loss 6.3934 LearningRate 0.0537 Epoch: 5 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:05,371-Speed 3397.33 samples/sec Loss 6.3429 LearningRate 0.0537 Epoch: 5 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:08,425-Speed 3353.90 samples/sec Loss 6.4585 LearningRate 0.0537 Epoch: 5 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:11,442-Speed 3394.74 samples/sec Loss 6.2511 LearningRate 0.0537 Epoch: 5 Global Step: 30390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:14,458-Speed 3396.00 samples/sec Loss 6.2164 LearningRate 0.0537 Epoch: 5 Global Step: 30400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:17,491-Speed 3376.55 samples/sec Loss 6.3746 LearningRate 0.0537 Epoch: 5 Global Step: 30410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:20,515-Speed 3387.56 samples/sec Loss 6.3052 LearningRate 0.0537 Epoch: 5 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:23,534-Speed 3392.41 samples/sec Loss 6.4998 LearningRate 0.0536 Epoch: 5 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:35:26,562-Speed 3382.68 samples/sec Loss 6.3605 LearningRate 0.0536 Epoch: 5 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:29,578-Speed 3395.72 samples/sec Loss 6.1637 LearningRate 0.0536 Epoch: 5 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:32,595-Speed 3395.30 samples/sec Loss 6.2758 LearningRate 0.0536 Epoch: 5 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:35,609-Speed 3398.37 samples/sec Loss 6.4242 LearningRate 0.0536 Epoch: 5 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:38,624-Speed 3396.94 samples/sec Loss 6.2592 LearningRate 0.0536 Epoch: 5 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:41,633-Speed 3404.18 samples/sec Loss 6.2080 LearningRate 0.0536 Epoch: 5 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:44,648-Speed 3396.09 samples/sec Loss 6.2874 LearningRate 0.0536 Epoch: 5 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:47,676-Speed 3382.67 samples/sec Loss 6.3909 LearningRate 0.0535 Epoch: 5 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:50,690-Speed 3398.18 samples/sec Loss 6.4039 LearningRate 0.0535 Epoch: 5 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:53,704-Speed 3398.82 samples/sec Loss 6.3794 LearningRate 0.0535 Epoch: 5 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:35:56,716-Speed 3400.82 samples/sec Loss 6.5153 LearningRate 0.0535 Epoch: 5 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:35:59,739-Speed 3388.70 samples/sec Loss 6.3711 LearningRate 0.0535 Epoch: 5 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:02,769-Speed 3380.11 samples/sec Loss 6.2829 LearningRate 0.0535 Epoch: 5 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:05,784-Speed 3397.05 samples/sec Loss 6.1660 LearningRate 0.0535 Epoch: 5 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:08,804-Speed 3391.17 samples/sec Loss 6.5331 LearningRate 0.0534 Epoch: 5 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:11,830-Speed 3385.15 samples/sec Loss 6.2788 LearningRate 0.0534 Epoch: 5 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:14,867-Speed 3371.77 samples/sec Loss 6.4323 LearningRate 0.0534 Epoch: 5 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:17,896-Speed 3381.22 samples/sec Loss 6.3775 LearningRate 0.0534 Epoch: 5 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:20,917-Speed 3391.33 samples/sec Loss 6.4249 LearningRate 0.0534 Epoch: 5 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 04:36:23,931-Speed 3398.21 samples/sec Loss 6.2433 LearningRate 0.0534 Epoch: 5 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:36:26,962-Speed 3379.55 samples/sec Loss 6.3170 LearningRate 0.0534 Epoch: 5 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:36:29,982-Speed 3391.19 samples/sec Loss 6.2737 LearningRate 0.0534 Epoch: 5 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:36:32,996-Speed 3398.51 samples/sec Loss 6.1907 LearningRate 0.0533 Epoch: 5 Global Step: 30660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 04:36:36,012-Speed 3396.16 samples/sec Loss 6.4344 LearningRate 0.0533 Epoch: 5 Global Step: 30670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:36:39,014-Speed 3411.49 samples/sec Loss 6.2493 LearningRate 0.0533 Epoch: 5 Global Step: 30680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:42,056-Speed 3366.88 samples/sec Loss 6.2831 LearningRate 0.0533 Epoch: 5 Global Step: 30690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:45,067-Speed 3401.24 samples/sec Loss 6.3254 LearningRate 0.0533 Epoch: 5 Global Step: 30700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:48,085-Speed 3394.40 samples/sec Loss 6.2591 LearningRate 0.0533 Epoch: 5 Global Step: 30710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:51,108-Speed 3388.33 samples/sec Loss 6.1826 LearningRate 0.0533 Epoch: 5 Global Step: 30720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:54,130-Speed 3388.96 samples/sec Loss 6.1771 LearningRate 0.0533 Epoch: 5 Global Step: 30730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:36:57,153-Speed 3388.79 samples/sec Loss 6.1174 LearningRate 0.0532 Epoch: 5 Global Step: 30740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:37:00,217-Speed 3343.00 samples/sec Loss 6.2946 LearningRate 0.0532 Epoch: 5 Global Step: 30750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:37:03,240-Speed 3388.20 samples/sec Loss 6.3185 LearningRate 0.0532 Epoch: 5 Global Step: 30760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:37:06,261-Speed 3390.07 samples/sec Loss 6.4845 LearningRate 0.0532 Epoch: 5 Global Step: 30770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:37:09,298-Speed 3371.78 samples/sec Loss 6.3317 LearningRate 0.0532 Epoch: 5 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:12,336-Speed 3372.29 samples/sec Loss 6.2381 LearningRate 0.0532 Epoch: 5 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:15,378-Speed 3366.65 samples/sec Loss 6.3671 LearningRate 0.0532 Epoch: 5 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:18,428-Speed 3357.60 samples/sec Loss 6.2496 LearningRate 0.0532 Epoch: 5 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:21,458-Speed 3380.63 samples/sec Loss 6.4826 LearningRate 0.0531 Epoch: 5 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:24,481-Speed 3388.48 samples/sec Loss 6.2999 LearningRate 0.0531 Epoch: 5 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:27,514-Speed 3376.97 samples/sec Loss 6.2295 LearningRate 0.0531 Epoch: 5 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:30,568-Speed 3353.58 samples/sec Loss 6.3628 LearningRate 0.0531 Epoch: 5 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:33,599-Speed 3379.12 samples/sec Loss 6.3784 LearningRate 0.0531 Epoch: 5 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:36,624-Speed 3385.55 samples/sec Loss 6.2226 LearningRate 0.0531 Epoch: 5 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:37:39,706-Speed 3324.03 samples/sec Loss 6.3615 LearningRate 0.0531 Epoch: 5 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:42,729-Speed 3388.29 samples/sec Loss 6.2246 LearningRate 0.0531 Epoch: 5 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:45,756-Speed 3383.07 samples/sec Loss 6.1711 LearningRate 0.0530 Epoch: 5 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:48,781-Speed 3386.15 samples/sec Loss 6.3236 LearningRate 0.0530 Epoch: 5 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:51,829-Speed 3361.27 samples/sec Loss 6.2590 LearningRate 0.0530 Epoch: 5 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:54,874-Speed 3363.21 samples/sec Loss 6.1984 LearningRate 0.0530 Epoch: 5 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:37:57,897-Speed 3388.66 samples/sec Loss 6.2181 LearningRate 0.0530 Epoch: 5 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:00,918-Speed 3390.26 samples/sec Loss 6.2887 LearningRate 0.0530 Epoch: 5 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:03,965-Speed 3361.30 samples/sec Loss 6.3845 LearningRate 0.0530 Epoch: 5 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:06,985-Speed 3391.97 samples/sec Loss 6.3277 LearningRate 0.0529 Epoch: 5 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:10,009-Speed 3387.31 samples/sec Loss 6.2365 LearningRate 0.0529 Epoch: 5 Global Step: 30980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:38:13,032-Speed 3387.85 samples/sec Loss 6.2945 LearningRate 0.0529 Epoch: 5 Global Step: 30990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:38:16,053-Speed 3390.77 samples/sec Loss 6.2487 LearningRate 0.0529 Epoch: 5 Global Step: 31000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:38:19,069-Speed 3395.48 samples/sec Loss 6.0778 LearningRate 0.0529 Epoch: 5 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:22,142-Speed 3332.98 samples/sec Loss 6.1560 LearningRate 0.0529 Epoch: 5 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:25,166-Speed 3387.19 samples/sec Loss 6.4276 LearningRate 0.0529 Epoch: 5 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:28,195-Speed 3381.88 samples/sec Loss 6.3290 LearningRate 0.0529 Epoch: 5 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:31,218-Speed 3388.43 samples/sec Loss 6.3200 LearningRate 0.0528 Epoch: 5 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:34,248-Speed 3380.53 samples/sec Loss 6.1336 LearningRate 0.0528 Epoch: 5 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:37,274-Speed 3384.44 samples/sec Loss 6.4003 LearningRate 0.0528 Epoch: 5 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:40,299-Speed 3385.75 samples/sec Loss 6.2570 LearningRate 0.0528 Epoch: 5 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:43,320-Speed 3390.64 samples/sec Loss 6.0915 LearningRate 0.0528 Epoch: 5 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:46,342-Speed 3389.37 samples/sec Loss 6.1999 LearningRate 0.0528 Epoch: 5 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:49,349-Speed 3406.08 samples/sec Loss 6.2095 LearningRate 0.0528 Epoch: 5 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:52,368-Speed 3393.06 samples/sec Loss 6.3172 LearningRate 0.0528 Epoch: 5 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:55,388-Speed 3390.90 samples/sec Loss 6.2641 LearningRate 0.0527 Epoch: 5 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:38:58,421-Speed 3376.92 samples/sec Loss 6.1707 LearningRate 0.0527 Epoch: 5 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:01,441-Speed 3392.44 samples/sec Loss 6.2290 LearningRate 0.0527 Epoch: 5 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:04,514-Speed 3331.91 samples/sec Loss 6.3768 LearningRate 0.0527 Epoch: 5 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:07,547-Speed 3376.84 samples/sec Loss 6.3302 LearningRate 0.0527 Epoch: 5 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:10,586-Speed 3370.63 samples/sec Loss 6.3171 LearningRate 0.0527 Epoch: 5 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:13,614-Speed 3383.36 samples/sec Loss 6.2726 LearningRate 0.0527 Epoch: 5 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:16,635-Speed 3390.68 samples/sec Loss 6.2542 LearningRate 0.0527 Epoch: 5 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:19,657-Speed 3389.14 samples/sec Loss 6.4300 LearningRate 0.0526 Epoch: 5 Global Step: 31210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:39:22,679-Speed 3388.99 samples/sec Loss 6.3566 LearningRate 0.0526 Epoch: 5 Global Step: 31220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:39:25,687-Speed 3404.81 samples/sec Loss 6.3687 LearningRate 0.0526 Epoch: 5 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:28,710-Speed 3388.31 samples/sec Loss 6.3189 LearningRate 0.0526 Epoch: 5 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:39:31,726-Speed 3395.95 samples/sec Loss 6.3049 LearningRate 0.0526 Epoch: 5 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:34,799-Speed 3333.40 samples/sec Loss 6.3597 LearningRate 0.0526 Epoch: 5 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:37,831-Speed 3377.34 samples/sec Loss 6.2274 LearningRate 0.0526 Epoch: 5 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:40,860-Speed 3381.49 samples/sec Loss 6.5353 LearningRate 0.0526 Epoch: 5 Global Step: 31280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:43,910-Speed 3358.86 samples/sec Loss 6.4605 LearningRate 0.0525 Epoch: 5 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:46,951-Speed 3367.51 samples/sec Loss 6.2728 LearningRate 0.0525 Epoch: 5 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:49,967-Speed 3395.98 samples/sec Loss 6.2698 LearningRate 0.0525 Epoch: 5 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:52,988-Speed 3390.06 samples/sec Loss 6.1402 LearningRate 0.0525 Epoch: 5 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:56,010-Speed 3389.41 samples/sec Loss 6.1321 LearningRate 0.0525 Epoch: 5 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:39:59,047-Speed 3372.66 samples/sec Loss 6.2130 LearningRate 0.0525 Epoch: 5 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:02,094-Speed 3374.75 samples/sec Loss 6.2227 LearningRate 0.0525 Epoch: 5 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:05,107-Speed 3400.48 samples/sec Loss 6.1132 LearningRate 0.0525 Epoch: 5 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:08,128-Speed 3389.64 samples/sec Loss 6.2373 LearningRate 0.0524 Epoch: 5 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:11,153-Speed 3386.55 samples/sec Loss 6.3474 LearningRate 0.0524 Epoch: 5 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:14,178-Speed 3385.28 samples/sec Loss 6.4063 LearningRate 0.0524 Epoch: 5 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:17,227-Speed 3359.85 samples/sec Loss 6.1046 LearningRate 0.0524 Epoch: 5 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:20,249-Speed 3389.07 samples/sec Loss 6.2057 LearningRate 0.0524 Epoch: 5 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:23,276-Speed 3382.90 samples/sec Loss 6.1441 LearningRate 0.0524 Epoch: 5 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:26,327-Speed 3357.58 samples/sec Loss 6.3205 LearningRate 0.0524 Epoch: 5 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:29,372-Speed 3363.81 samples/sec Loss 6.3245 LearningRate 0.0523 Epoch: 5 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:32,398-Speed 3384.92 samples/sec Loss 6.2431 LearningRate 0.0523 Epoch: 5 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:40:35,425-Speed 3384.04 samples/sec Loss 6.2965 LearningRate 0.0523 Epoch: 5 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:38,457-Speed 3377.84 samples/sec Loss 6.0664 LearningRate 0.0523 Epoch: 5 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:41,501-Speed 3364.71 samples/sec Loss 6.2964 LearningRate 0.0523 Epoch: 5 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:44,525-Speed 3387.63 samples/sec Loss 6.2811 LearningRate 0.0523 Epoch: 5 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:47,549-Speed 3387.10 samples/sec Loss 6.3015 LearningRate 0.0523 Epoch: 5 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:50,578-Speed 3380.82 samples/sec Loss 6.1244 LearningRate 0.0523 Epoch: 5 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:53,612-Speed 3375.69 samples/sec Loss 6.2104 LearningRate 0.0522 Epoch: 5 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:56,641-Speed 3382.49 samples/sec Loss 6.1877 LearningRate 0.0522 Epoch: 5 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:40:59,665-Speed 3386.34 samples/sec Loss 6.4050 LearningRate 0.0522 Epoch: 5 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:02,717-Speed 3357.87 samples/sec Loss 6.2202 LearningRate 0.0522 Epoch: 5 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:05,733-Speed 3395.79 samples/sec Loss 6.1620 LearningRate 0.0522 Epoch: 5 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:08,761-Speed 3382.46 samples/sec Loss 6.2059 LearningRate 0.0522 Epoch: 5 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:11,785-Speed 3386.81 samples/sec Loss 6.2293 LearningRate 0.0522 Epoch: 5 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:14,812-Speed 3383.84 samples/sec Loss 6.2468 LearningRate 0.0522 Epoch: 5 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:17,821-Speed 3404.68 samples/sec Loss 6.2336 LearningRate 0.0521 Epoch: 5 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:20,842-Speed 3390.47 samples/sec Loss 6.2011 LearningRate 0.0521 Epoch: 5 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:23,874-Speed 3377.20 samples/sec Loss 6.2511 LearningRate 0.0521 Epoch: 5 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:26,912-Speed 3372.43 samples/sec Loss 6.3350 LearningRate 0.0521 Epoch: 5 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:29,935-Speed 3388.09 samples/sec Loss 6.0808 LearningRate 0.0521 Epoch: 5 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:32,961-Speed 3384.74 samples/sec Loss 6.1584 LearningRate 0.0521 Epoch: 5 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:36,006-Speed 3363.96 samples/sec Loss 6.2185 LearningRate 0.0521 Epoch: 5 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:39,032-Speed 3384.66 samples/sec Loss 6.1322 LearningRate 0.0521 Epoch: 5 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:42,054-Speed 3388.67 samples/sec Loss 6.3137 LearningRate 0.0520 Epoch: 5 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:45,075-Speed 3390.19 samples/sec Loss 6.1815 LearningRate 0.0520 Epoch: 5 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:41:48,112-Speed 3372.41 samples/sec Loss 6.1224 LearningRate 0.0520 Epoch: 5 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:51,142-Speed 3380.95 samples/sec Loss 6.2992 LearningRate 0.0520 Epoch: 5 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:54,175-Speed 3377.00 samples/sec Loss 6.3012 LearningRate 0.0520 Epoch: 5 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:41:57,266-Speed 3313.69 samples/sec Loss 6.2695 LearningRate 0.0520 Epoch: 5 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:00,290-Speed 3386.58 samples/sec Loss 6.1316 LearningRate 0.0520 Epoch: 5 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:03,316-Speed 3385.47 samples/sec Loss 6.2975 LearningRate 0.0520 Epoch: 5 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:06,359-Speed 3366.21 samples/sec Loss 6.2388 LearningRate 0.0519 Epoch: 5 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:09,380-Speed 3390.09 samples/sec Loss 6.2437 LearningRate 0.0519 Epoch: 5 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:12,384-Speed 3409.28 samples/sec Loss 6.2108 LearningRate 0.0519 Epoch: 5 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:15,429-Speed 3363.57 samples/sec Loss 6.2095 LearningRate 0.0519 Epoch: 5 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:18,460-Speed 3379.70 samples/sec Loss 6.2419 LearningRate 0.0519 Epoch: 5 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:21,499-Speed 3370.87 samples/sec Loss 6.2223 LearningRate 0.0519 Epoch: 5 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:24,524-Speed 3385.37 samples/sec Loss 6.2656 LearningRate 0.0519 Epoch: 5 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:27,551-Speed 3383.69 samples/sec Loss 6.3823 LearningRate 0.0519 Epoch: 5 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:30,626-Speed 3330.69 samples/sec Loss 6.1462 LearningRate 0.0518 Epoch: 5 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:33,653-Speed 3384.01 samples/sec Loss 6.1921 LearningRate 0.0518 Epoch: 5 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:36,676-Speed 3388.15 samples/sec Loss 6.0471 LearningRate 0.0518 Epoch: 5 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:39,700-Speed 3386.34 samples/sec Loss 6.2360 LearningRate 0.0518 Epoch: 5 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:42:42,730-Speed 3380.35 samples/sec Loss 6.2000 LearningRate 0.0518 Epoch: 5 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:45,819-Speed 3315.86 samples/sec Loss 6.1980 LearningRate 0.0518 Epoch: 5 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:48,872-Speed 3355.40 samples/sec Loss 6.2666 LearningRate 0.0518 Epoch: 5 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:51,897-Speed 3386.08 samples/sec Loss 6.2743 LearningRate 0.0518 Epoch: 5 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:54,925-Speed 3383.63 samples/sec Loss 6.2399 LearningRate 0.0517 Epoch: 5 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:42:57,948-Speed 3387.24 samples/sec Loss 6.3501 LearningRate 0.0517 Epoch: 5 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:43:00,956-Speed 3405.48 samples/sec Loss 6.2364 LearningRate 0.0517 Epoch: 5 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:03,998-Speed 3366.14 samples/sec Loss 6.3474 LearningRate 0.0517 Epoch: 5 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:07,031-Speed 3377.10 samples/sec Loss 6.1911 LearningRate 0.0517 Epoch: 5 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:10,077-Speed 3362.75 samples/sec Loss 6.1721 LearningRate 0.0517 Epoch: 5 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:13,159-Speed 3323.88 samples/sec Loss 6.2714 LearningRate 0.0517 Epoch: 5 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:16,184-Speed 3385.50 samples/sec Loss 6.2713 LearningRate 0.0517 Epoch: 5 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:43:19,211-Speed 3383.96 samples/sec Loss 6.2355 LearningRate 0.0516 Epoch: 5 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:44:02,670-[lfw][32000]XNorm: 21.457548 Training: 2022-04-27 04:44:02,671-[lfw][32000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-27 04:44:02,671-[lfw][32000]Accuracy-Highest: 0.99817 Training: 2022-04-27 04:44:53,165-[cfp_fp][32000]XNorm: 18.387733 Training: 2022-04-27 04:44:53,166-[cfp_fp][32000]Accuracy-Flip: 0.95171+-0.00797 Training: 2022-04-27 04:44:53,166-[cfp_fp][32000]Accuracy-Highest: 0.95171 Training: 2022-04-27 04:45:36,480-[agedb_30][32000]XNorm: 20.998163 Training: 2022-04-27 04:45:36,480-[agedb_30][32000]Accuracy-Flip: 0.97117+-0.00940 Training: 2022-04-27 04:45:36,481-[agedb_30][32000]Accuracy-Highest: 0.97233 Training: 2022-04-27 04:45:39,499-Speed 72.99 samples/sec Loss 6.2059 LearningRate 0.0516 Epoch: 5 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:45:42,501-Speed 3411.60 samples/sec Loss 6.0726 LearningRate 0.0516 Epoch: 5 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:45:45,563-Speed 3345.65 samples/sec Loss 6.2340 LearningRate 0.0516 Epoch: 5 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:45:48,587-Speed 3387.21 samples/sec Loss 6.1384 LearningRate 0.0516 Epoch: 5 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:45:51,600-Speed 3398.67 samples/sec Loss 6.0497 LearningRate 0.0516 Epoch: 5 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:45:54,614-Speed 3398.40 samples/sec Loss 6.2240 LearningRate 0.0516 Epoch: 5 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:45:57,627-Speed 3399.75 samples/sec Loss 6.2498 LearningRate 0.0516 Epoch: 5 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:46:00,638-Speed 3401.15 samples/sec Loss 6.1331 LearningRate 0.0515 Epoch: 5 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:46:03,659-Speed 3390.71 samples/sec Loss 6.2700 LearningRate 0.0515 Epoch: 5 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:46:06,674-Speed 3396.08 samples/sec Loss 6.2355 LearningRate 0.0515 Epoch: 5 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:46:09,707-Speed 3377.60 samples/sec Loss 6.0720 LearningRate 0.0515 Epoch: 5 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:46:12,718-Speed 3404.02 samples/sec Loss 6.0631 LearningRate 0.0515 Epoch: 5 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:46:15,735-Speed 3394.59 samples/sec Loss 6.1817 LearningRate 0.0515 Epoch: 5 Global Step: 32130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:18,766-Speed 3378.87 samples/sec Loss 6.2925 LearningRate 0.0515 Epoch: 5 Global Step: 32140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:21,793-Speed 3383.77 samples/sec Loss 6.1453 LearningRate 0.0515 Epoch: 5 Global Step: 32150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:24,817-Speed 3387.19 samples/sec Loss 6.3270 LearningRate 0.0514 Epoch: 5 Global Step: 32160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:27,832-Speed 3397.19 samples/sec Loss 6.4171 LearningRate 0.0514 Epoch: 5 Global Step: 32170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:30,857-Speed 3386.16 samples/sec Loss 6.2139 LearningRate 0.0514 Epoch: 5 Global Step: 32180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:33,890-Speed 3375.95 samples/sec Loss 6.2741 LearningRate 0.0514 Epoch: 5 Global Step: 32190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:36,910-Speed 3391.91 samples/sec Loss 6.2776 LearningRate 0.0514 Epoch: 5 Global Step: 32200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:39,931-Speed 3391.14 samples/sec Loss 6.3246 LearningRate 0.0514 Epoch: 5 Global Step: 32210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:42,942-Speed 3401.23 samples/sec Loss 6.1987 LearningRate 0.0514 Epoch: 5 Global Step: 32220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 04:46:45,954-Speed 3400.09 samples/sec Loss 6.3533 LearningRate 0.0513 Epoch: 5 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:46:48,968-Speed 3398.42 samples/sec Loss 6.1270 LearningRate 0.0513 Epoch: 5 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:46:51,977-Speed 3404.48 samples/sec Loss 6.2378 LearningRate 0.0513 Epoch: 5 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:46:54,996-Speed 3392.43 samples/sec Loss 6.2033 LearningRate 0.0513 Epoch: 5 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:46:58,030-Speed 3375.30 samples/sec Loss 6.2052 LearningRate 0.0513 Epoch: 5 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:01,048-Speed 3393.91 samples/sec Loss 6.2091 LearningRate 0.0513 Epoch: 5 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:04,058-Speed 3402.29 samples/sec Loss 6.2485 LearningRate 0.0513 Epoch: 5 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:07,065-Speed 3406.67 samples/sec Loss 6.1935 LearningRate 0.0513 Epoch: 5 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:10,075-Speed 3403.10 samples/sec Loss 6.1858 LearningRate 0.0512 Epoch: 5 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:13,085-Speed 3403.10 samples/sec Loss 6.3114 LearningRate 0.0512 Epoch: 5 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:16,075-Speed 3425.16 samples/sec Loss 6.2041 LearningRate 0.0512 Epoch: 5 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:19,091-Speed 3396.27 samples/sec Loss 6.1830 LearningRate 0.0512 Epoch: 5 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:22,123-Speed 3378.33 samples/sec Loss 6.2069 LearningRate 0.0512 Epoch: 5 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:25,152-Speed 3380.61 samples/sec Loss 6.1634 LearningRate 0.0512 Epoch: 5 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:28,178-Speed 3384.63 samples/sec Loss 6.2522 LearningRate 0.0512 Epoch: 5 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:31,207-Speed 3382.80 samples/sec Loss 6.0712 LearningRate 0.0512 Epoch: 5 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:34,223-Speed 3395.03 samples/sec Loss 6.1727 LearningRate 0.0511 Epoch: 5 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:37,234-Speed 3401.91 samples/sec Loss 6.2431 LearningRate 0.0511 Epoch: 5 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:40,239-Speed 3408.76 samples/sec Loss 6.2787 LearningRate 0.0511 Epoch: 5 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:43,324-Speed 3320.07 samples/sec Loss 6.3322 LearningRate 0.0511 Epoch: 5 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:47:46,342-Speed 3393.85 samples/sec Loss 6.2531 LearningRate 0.0511 Epoch: 5 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:47:49,423-Speed 3324.09 samples/sec Loss 6.0399 LearningRate 0.0511 Epoch: 5 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:47:52,460-Speed 3372.57 samples/sec Loss 6.1026 LearningRate 0.0511 Epoch: 5 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:47:55,473-Speed 3399.12 samples/sec Loss 6.1691 LearningRate 0.0511 Epoch: 5 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:47:58,543-Speed 3336.71 samples/sec Loss 6.0900 LearningRate 0.0510 Epoch: 5 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:01,544-Speed 3413.49 samples/sec Loss 6.1612 LearningRate 0.0510 Epoch: 5 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:04,576-Speed 3377.80 samples/sec Loss 6.1300 LearningRate 0.0510 Epoch: 5 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:07,608-Speed 3378.28 samples/sec Loss 6.1521 LearningRate 0.0510 Epoch: 5 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:10,638-Speed 3379.77 samples/sec Loss 6.2826 LearningRate 0.0510 Epoch: 5 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:13,809-Speed 3230.68 samples/sec Loss 6.2813 LearningRate 0.0510 Epoch: 5 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:16,840-Speed 3378.94 samples/sec Loss 6.1274 LearningRate 0.0510 Epoch: 5 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:19,849-Speed 3404.13 samples/sec Loss 6.1567 LearningRate 0.0510 Epoch: 5 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:22,865-Speed 3395.03 samples/sec Loss 6.0872 LearningRate 0.0509 Epoch: 5 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:25,875-Speed 3403.66 samples/sec Loss 6.1980 LearningRate 0.0509 Epoch: 5 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:28,890-Speed 3397.10 samples/sec Loss 6.1691 LearningRate 0.0509 Epoch: 5 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:31,905-Speed 3397.42 samples/sec Loss 6.2172 LearningRate 0.0509 Epoch: 5 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:34,917-Speed 3400.61 samples/sec Loss 6.2333 LearningRate 0.0509 Epoch: 5 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:37,932-Speed 3396.90 samples/sec Loss 5.9726 LearningRate 0.0509 Epoch: 5 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:40,968-Speed 3372.89 samples/sec Loss 6.2149 LearningRate 0.0509 Epoch: 5 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:43,987-Speed 3393.25 samples/sec Loss 6.3014 LearningRate 0.0509 Epoch: 5 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:47,013-Speed 3384.72 samples/sec Loss 6.1339 LearningRate 0.0508 Epoch: 5 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:50,031-Speed 3394.27 samples/sec Loss 6.1699 LearningRate 0.0508 Epoch: 5 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:48:53,024-Speed 3421.79 samples/sec Loss 6.2405 LearningRate 0.0508 Epoch: 5 Global Step: 32650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:56,032-Speed 3404.61 samples/sec Loss 6.1324 LearningRate 0.0508 Epoch: 5 Global Step: 32660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:48:59,043-Speed 3401.73 samples/sec Loss 6.1960 LearningRate 0.0508 Epoch: 5 Global Step: 32670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:02,057-Speed 3398.79 samples/sec Loss 6.2137 LearningRate 0.0508 Epoch: 5 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:05,088-Speed 3379.60 samples/sec Loss 6.0766 LearningRate 0.0508 Epoch: 5 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:08,099-Speed 3401.50 samples/sec Loss 6.2985 LearningRate 0.0508 Epoch: 5 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:11,113-Speed 3397.24 samples/sec Loss 6.3565 LearningRate 0.0507 Epoch: 5 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:14,132-Speed 3393.60 samples/sec Loss 6.0414 LearningRate 0.0507 Epoch: 5 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:17,148-Speed 3395.13 samples/sec Loss 6.1712 LearningRate 0.0507 Epoch: 5 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:20,206-Speed 3350.32 samples/sec Loss 6.1859 LearningRate 0.0507 Epoch: 5 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:23,417-Speed 3188.96 samples/sec Loss 6.1828 LearningRate 0.0507 Epoch: 5 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:49:26,455-Speed 3371.65 samples/sec Loss 6.1083 LearningRate 0.0507 Epoch: 5 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:49:29,481-Speed 3385.37 samples/sec Loss 6.1219 LearningRate 0.0507 Epoch: 5 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:49:32,478-Speed 3417.03 samples/sec Loss 6.2003 LearningRate 0.0507 Epoch: 5 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:35,493-Speed 3397.65 samples/sec Loss 6.0735 LearningRate 0.0506 Epoch: 5 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:38,505-Speed 3400.04 samples/sec Loss 6.1640 LearningRate 0.0506 Epoch: 5 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:41,528-Speed 3388.46 samples/sec Loss 6.1042 LearningRate 0.0506 Epoch: 5 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:44,581-Speed 3355.00 samples/sec Loss 6.0930 LearningRate 0.0506 Epoch: 5 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:47,610-Speed 3381.35 samples/sec Loss 6.1204 LearningRate 0.0506 Epoch: 5 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:50,634-Speed 3386.95 samples/sec Loss 6.1264 LearningRate 0.0506 Epoch: 5 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:53,651-Speed 3394.86 samples/sec Loss 6.1188 LearningRate 0.0506 Epoch: 5 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:56,663-Speed 3400.53 samples/sec Loss 6.1563 LearningRate 0.0506 Epoch: 5 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:49:59,685-Speed 3389.38 samples/sec Loss 6.1045 LearningRate 0.0505 Epoch: 5 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:50:02,702-Speed 3394.98 samples/sec Loss 6.0435 LearningRate 0.0505 Epoch: 5 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:05,719-Speed 3395.14 samples/sec Loss 6.2656 LearningRate 0.0505 Epoch: 5 Global Step: 32890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:08,745-Speed 3384.56 samples/sec Loss 6.2224 LearningRate 0.0505 Epoch: 5 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:11,769-Speed 3386.35 samples/sec Loss 6.0492 LearningRate 0.0505 Epoch: 5 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:14,795-Speed 3384.66 samples/sec Loss 6.1675 LearningRate 0.0505 Epoch: 5 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:17,819-Speed 3387.54 samples/sec Loss 6.1074 LearningRate 0.0505 Epoch: 5 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:20,838-Speed 3393.47 samples/sec Loss 6.1582 LearningRate 0.0505 Epoch: 5 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:23,854-Speed 3395.03 samples/sec Loss 5.9896 LearningRate 0.0504 Epoch: 5 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:26,875-Speed 3391.36 samples/sec Loss 6.2978 LearningRate 0.0504 Epoch: 5 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:29,891-Speed 3395.56 samples/sec Loss 6.1517 LearningRate 0.0504 Epoch: 5 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:32,895-Speed 3409.78 samples/sec Loss 6.1472 LearningRate 0.0504 Epoch: 5 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:35,912-Speed 3394.55 samples/sec Loss 6.1927 LearningRate 0.0504 Epoch: 5 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:38,943-Speed 3380.26 samples/sec Loss 5.9356 LearningRate 0.0504 Epoch: 5 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:41,962-Speed 3391.54 samples/sec Loss 6.0415 LearningRate 0.0504 Epoch: 5 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:44,989-Speed 3384.42 samples/sec Loss 6.1581 LearningRate 0.0504 Epoch: 5 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:48,008-Speed 3392.98 samples/sec Loss 6.1084 LearningRate 0.0503 Epoch: 5 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:51,029-Speed 3390.20 samples/sec Loss 6.1191 LearningRate 0.0503 Epoch: 5 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:54,058-Speed 3382.04 samples/sec Loss 6.1197 LearningRate 0.0503 Epoch: 5 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:50:57,077-Speed 3392.57 samples/sec Loss 6.1501 LearningRate 0.0503 Epoch: 5 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:00,116-Speed 3369.56 samples/sec Loss 6.1621 LearningRate 0.0503 Epoch: 5 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:03,121-Speed 3407.83 samples/sec Loss 6.1737 LearningRate 0.0503 Epoch: 5 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:06,125-Speed 3409.89 samples/sec Loss 6.3361 LearningRate 0.0503 Epoch: 5 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:09,148-Speed 3388.80 samples/sec Loss 6.0831 LearningRate 0.0503 Epoch: 5 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:12,163-Speed 3396.30 samples/sec Loss 6.3007 LearningRate 0.0502 Epoch: 5 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:15,182-Speed 3393.74 samples/sec Loss 5.9419 LearningRate 0.0502 Epoch: 5 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:18,196-Speed 3397.54 samples/sec Loss 6.1155 LearningRate 0.0502 Epoch: 5 Global Step: 33130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:21,216-Speed 3391.72 samples/sec Loss 6.0241 LearningRate 0.0502 Epoch: 5 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:24,238-Speed 3389.41 samples/sec Loss 6.0593 LearningRate 0.0502 Epoch: 5 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:27,325-Speed 3318.12 samples/sec Loss 6.1058 LearningRate 0.0502 Epoch: 5 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:30,342-Speed 3395.37 samples/sec Loss 6.0813 LearningRate 0.0502 Epoch: 5 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:33,366-Speed 3386.03 samples/sec Loss 5.9721 LearningRate 0.0502 Epoch: 5 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:51:36,386-Speed 3391.51 samples/sec Loss 6.1050 LearningRate 0.0501 Epoch: 5 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:39,405-Speed 3393.38 samples/sec Loss 6.1679 LearningRate 0.0501 Epoch: 5 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:42,422-Speed 3394.58 samples/sec Loss 6.0108 LearningRate 0.0501 Epoch: 5 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:45,440-Speed 3393.84 samples/sec Loss 6.2523 LearningRate 0.0501 Epoch: 5 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:48,467-Speed 3384.19 samples/sec Loss 6.1524 LearningRate 0.0501 Epoch: 5 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:51,512-Speed 3363.95 samples/sec Loss 6.0479 LearningRate 0.0501 Epoch: 5 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:54,531-Speed 3392.02 samples/sec Loss 5.9221 LearningRate 0.0501 Epoch: 5 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:51:57,555-Speed 3386.56 samples/sec Loss 6.2193 LearningRate 0.0501 Epoch: 5 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:52:00,576-Speed 3390.74 samples/sec Loss 6.0675 LearningRate 0.0500 Epoch: 5 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:52:03,598-Speed 3389.77 samples/sec Loss 6.1969 LearningRate 0.0500 Epoch: 5 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:52:06,598-Speed 3413.57 samples/sec Loss 5.9814 LearningRate 0.0500 Epoch: 5 Global Step: 33290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:09,621-Speed 3388.05 samples/sec Loss 6.0241 LearningRate 0.0500 Epoch: 5 Global Step: 33300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:12,702-Speed 3324.55 samples/sec Loss 6.0138 LearningRate 0.0500 Epoch: 5 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:15,722-Speed 3391.52 samples/sec Loss 6.2496 LearningRate 0.0500 Epoch: 5 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:18,743-Speed 3391.22 samples/sec Loss 6.0117 LearningRate 0.0500 Epoch: 5 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:21,767-Speed 3386.09 samples/sec Loss 6.0778 LearningRate 0.0500 Epoch: 5 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:24,789-Speed 3390.40 samples/sec Loss 6.1469 LearningRate 0.0499 Epoch: 5 Global Step: 33350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:27,810-Speed 3390.11 samples/sec Loss 5.9585 LearningRate 0.0499 Epoch: 5 Global Step: 33360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:30,832-Speed 3388.97 samples/sec Loss 6.0808 LearningRate 0.0499 Epoch: 5 Global Step: 33370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:33,865-Speed 3376.26 samples/sec Loss 6.1905 LearningRate 0.0499 Epoch: 5 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:36,874-Speed 3404.53 samples/sec Loss 6.0361 LearningRate 0.0499 Epoch: 5 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:39,892-Speed 3393.99 samples/sec Loss 6.1056 LearningRate 0.0499 Epoch: 5 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:42,928-Speed 3373.86 samples/sec Loss 6.0759 LearningRate 0.0499 Epoch: 5 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:45,949-Speed 3389.55 samples/sec Loss 6.1241 LearningRate 0.0499 Epoch: 5 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:48,974-Speed 3386.53 samples/sec Loss 6.0635 LearningRate 0.0498 Epoch: 5 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:51,996-Speed 3389.51 samples/sec Loss 6.0699 LearningRate 0.0498 Epoch: 5 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:55,030-Speed 3375.08 samples/sec Loss 6.2127 LearningRate 0.0498 Epoch: 5 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:52:58,055-Speed 3386.23 samples/sec Loss 6.1351 LearningRate 0.0498 Epoch: 5 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:01,073-Speed 3393.22 samples/sec Loss 6.0879 LearningRate 0.0498 Epoch: 5 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:04,096-Speed 3388.39 samples/sec Loss 6.0934 LearningRate 0.0498 Epoch: 5 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:07,120-Speed 3387.64 samples/sec Loss 6.0915 LearningRate 0.0498 Epoch: 5 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:10,142-Speed 3389.91 samples/sec Loss 6.1928 LearningRate 0.0498 Epoch: 5 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:13,162-Speed 3390.91 samples/sec Loss 6.2144 LearningRate 0.0497 Epoch: 5 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:16,183-Speed 3390.55 samples/sec Loss 6.1384 LearningRate 0.0497 Epoch: 5 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:19,205-Speed 3389.69 samples/sec Loss 6.0630 LearningRate 0.0497 Epoch: 5 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:22,234-Speed 3380.97 samples/sec Loss 6.2538 LearningRate 0.0497 Epoch: 5 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:25,264-Speed 3380.15 samples/sec Loss 6.0691 LearningRate 0.0497 Epoch: 5 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:28,312-Speed 3360.54 samples/sec Loss 6.1835 LearningRate 0.0497 Epoch: 5 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:31,337-Speed 3386.20 samples/sec Loss 5.9254 LearningRate 0.0497 Epoch: 5 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:34,358-Speed 3390.40 samples/sec Loss 6.1753 LearningRate 0.0497 Epoch: 5 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:37,377-Speed 3392.68 samples/sec Loss 6.1465 LearningRate 0.0496 Epoch: 5 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:40,418-Speed 3368.82 samples/sec Loss 6.0054 LearningRate 0.0496 Epoch: 5 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:43,441-Speed 3387.12 samples/sec Loss 6.0156 LearningRate 0.0496 Epoch: 5 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:46,466-Speed 3386.26 samples/sec Loss 6.0353 LearningRate 0.0496 Epoch: 5 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:53:49,574-Speed 3295.96 samples/sec Loss 6.0509 LearningRate 0.0496 Epoch: 5 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:52,627-Speed 3354.79 samples/sec Loss 6.1820 LearningRate 0.0496 Epoch: 5 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:55,649-Speed 3389.91 samples/sec Loss 6.2024 LearningRate 0.0496 Epoch: 5 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:53:58,701-Speed 3355.64 samples/sec Loss 6.1116 LearningRate 0.0496 Epoch: 5 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:01,716-Speed 3397.48 samples/sec Loss 6.0759 LearningRate 0.0496 Epoch: 5 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:04,758-Speed 3366.73 samples/sec Loss 6.1399 LearningRate 0.0495 Epoch: 5 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:07,786-Speed 3382.66 samples/sec Loss 6.0722 LearningRate 0.0495 Epoch: 5 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:10,809-Speed 3388.53 samples/sec Loss 6.1076 LearningRate 0.0495 Epoch: 5 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:13,835-Speed 3385.12 samples/sec Loss 5.9953 LearningRate 0.0495 Epoch: 5 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:16,867-Speed 3377.49 samples/sec Loss 5.9168 LearningRate 0.0495 Epoch: 5 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:20,169-Speed 3101.25 samples/sec Loss 6.0557 LearningRate 0.0495 Epoch: 5 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:54:23,190-Speed 3391.06 samples/sec Loss 6.0781 LearningRate 0.0495 Epoch: 5 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:54:26,219-Speed 3381.54 samples/sec Loss 5.9841 LearningRate 0.0495 Epoch: 5 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:54:29,241-Speed 3389.38 samples/sec Loss 5.9945 LearningRate 0.0494 Epoch: 5 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:54:32,239-Speed 3415.94 samples/sec Loss 5.9728 LearningRate 0.0494 Epoch: 5 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:35,316-Speed 3329.35 samples/sec Loss 6.0699 LearningRate 0.0494 Epoch: 5 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:38,345-Speed 3380.65 samples/sec Loss 6.1037 LearningRate 0.0494 Epoch: 5 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:41,376-Speed 3379.96 samples/sec Loss 6.2407 LearningRate 0.0494 Epoch: 5 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:44,412-Speed 3373.72 samples/sec Loss 6.0437 LearningRate 0.0494 Epoch: 5 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:47,442-Speed 3380.13 samples/sec Loss 5.9336 LearningRate 0.0494 Epoch: 5 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:50,468-Speed 3384.85 samples/sec Loss 6.0669 LearningRate 0.0494 Epoch: 5 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:53,504-Speed 3374.41 samples/sec Loss 5.9722 LearningRate 0.0493 Epoch: 5 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:56,529-Speed 3385.04 samples/sec Loss 6.0088 LearningRate 0.0493 Epoch: 5 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:54:59,555-Speed 3385.54 samples/sec Loss 5.9421 LearningRate 0.0493 Epoch: 5 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:55:02,597-Speed 3367.08 samples/sec Loss 6.0270 LearningRate 0.0493 Epoch: 5 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:05,625-Speed 3382.75 samples/sec Loss 6.0982 LearningRate 0.0493 Epoch: 5 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:08,643-Speed 3393.31 samples/sec Loss 6.1752 LearningRate 0.0493 Epoch: 5 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:11,665-Speed 3389.70 samples/sec Loss 6.0908 LearningRate 0.0493 Epoch: 5 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:14,697-Speed 3377.88 samples/sec Loss 6.1126 LearningRate 0.0493 Epoch: 5 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:17,722-Speed 3386.95 samples/sec Loss 6.2652 LearningRate 0.0492 Epoch: 5 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:20,748-Speed 3384.05 samples/sec Loss 6.0860 LearningRate 0.0492 Epoch: 5 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:23,792-Speed 3365.46 samples/sec Loss 6.1424 LearningRate 0.0492 Epoch: 5 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:26,842-Speed 3357.86 samples/sec Loss 6.1205 LearningRate 0.0492 Epoch: 5 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:29,881-Speed 3371.01 samples/sec Loss 6.0836 LearningRate 0.0492 Epoch: 5 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:32,888-Speed 3405.63 samples/sec Loss 5.9424 LearningRate 0.0492 Epoch: 5 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:35,933-Speed 3363.47 samples/sec Loss 6.0840 LearningRate 0.0492 Epoch: 5 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:38,957-Speed 3386.85 samples/sec Loss 6.0740 LearningRate 0.0492 Epoch: 5 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:55:41,988-Speed 3379.06 samples/sec Loss 6.1245 LearningRate 0.0491 Epoch: 5 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:56:25,467-[lfw][34000]XNorm: 22.048982 Training: 2022-04-27 04:56:25,468-[lfw][34000]Accuracy-Flip: 0.99650+-0.00241 Training: 2022-04-27 04:56:25,468-[lfw][34000]Accuracy-Highest: 0.99817 Training: 2022-04-27 04:57:15,935-[cfp_fp][34000]XNorm: 19.422187 Training: 2022-04-27 04:57:15,935-[cfp_fp][34000]Accuracy-Flip: 0.94429+-0.01020 Training: 2022-04-27 04:57:15,936-[cfp_fp][34000]Accuracy-Highest: 0.95171 Training: 2022-04-27 04:57:59,388-[agedb_30][34000]XNorm: 21.921999 Training: 2022-04-27 04:57:59,389-[agedb_30][34000]Accuracy-Flip: 0.97467+-0.00733 Training: 2022-04-27 04:57:59,389-[agedb_30][34000]Accuracy-Highest: 0.97467 Training: 2022-04-27 04:58:02,404-Speed 72.93 samples/sec Loss 5.9770 LearningRate 0.0491 Epoch: 5 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:05,407-Speed 3410.35 samples/sec Loss 6.1000 LearningRate 0.0491 Epoch: 5 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:08,418-Speed 3402.04 samples/sec Loss 6.0136 LearningRate 0.0491 Epoch: 5 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:11,505-Speed 3318.25 samples/sec Loss 5.9405 LearningRate 0.0491 Epoch: 5 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:14,512-Speed 3405.79 samples/sec Loss 6.0713 LearningRate 0.0491 Epoch: 5 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:17,521-Speed 3403.44 samples/sec Loss 5.9715 LearningRate 0.0491 Epoch: 5 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:20,515-Speed 3421.37 samples/sec Loss 6.0955 LearningRate 0.0491 Epoch: 5 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:23,533-Speed 3393.19 samples/sec Loss 6.1513 LearningRate 0.0490 Epoch: 5 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:26,569-Speed 3373.63 samples/sec Loss 6.0400 LearningRate 0.0490 Epoch: 5 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:29,584-Speed 3397.57 samples/sec Loss 6.0054 LearningRate 0.0490 Epoch: 5 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:32,676-Speed 3312.58 samples/sec Loss 6.1396 LearningRate 0.0490 Epoch: 5 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:45,906-Speed 774.03 samples/sec Loss 5.7697 LearningRate 0.0490 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:48,930-Speed 3387.91 samples/sec Loss 5.4939 LearningRate 0.0490 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:51,948-Speed 3393.59 samples/sec Loss 5.3537 LearningRate 0.0490 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:54,962-Speed 3398.59 samples/sec Loss 5.4388 LearningRate 0.0490 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:58:57,997-Speed 3374.77 samples/sec Loss 5.2846 LearningRate 0.0489 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:59:01,013-Speed 3395.96 samples/sec Loss 5.4239 LearningRate 0.0489 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 04:59:04,018-Speed 3407.78 samples/sec Loss 5.3882 LearningRate 0.0489 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:59:07,022-Speed 3410.35 samples/sec Loss 5.2876 LearningRate 0.0489 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:10,050-Speed 3382.11 samples/sec Loss 5.4923 LearningRate 0.0489 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:13,074-Speed 3387.82 samples/sec Loss 5.4459 LearningRate 0.0489 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:16,098-Speed 3387.04 samples/sec Loss 5.5158 LearningRate 0.0489 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:19,122-Speed 3386.69 samples/sec Loss 5.5717 LearningRate 0.0489 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:22,157-Speed 3373.90 samples/sec Loss 5.4854 LearningRate 0.0488 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:25,301-Speed 3258.65 samples/sec Loss 5.5217 LearningRate 0.0488 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:28,323-Speed 3389.22 samples/sec Loss 5.6587 LearningRate 0.0488 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:31,517-Speed 3205.84 samples/sec Loss 5.4709 LearningRate 0.0488 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:34,529-Speed 3401.08 samples/sec Loss 5.5684 LearningRate 0.0488 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:37,543-Speed 3398.58 samples/sec Loss 5.4981 LearningRate 0.0488 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:59:40,555-Speed 3400.34 samples/sec Loss 5.5732 LearningRate 0.0488 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 04:59:43,547-Speed 3423.96 samples/sec Loss 5.5709 LearningRate 0.0488 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:46,557-Speed 3401.98 samples/sec Loss 5.6163 LearningRate 0.0487 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:49,579-Speed 3389.77 samples/sec Loss 5.6734 LearningRate 0.0487 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:52,593-Speed 3397.62 samples/sec Loss 5.4294 LearningRate 0.0487 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:55,603-Speed 3404.52 samples/sec Loss 5.5396 LearningRate 0.0487 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 04:59:58,607-Speed 3408.99 samples/sec Loss 5.6087 LearningRate 0.0487 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:00:01,617-Speed 3403.23 samples/sec Loss 5.6839 LearningRate 0.0487 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:00:04,691-Speed 3332.23 samples/sec Loss 5.6933 LearningRate 0.0487 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:00:07,699-Speed 3406.54 samples/sec Loss 5.5984 LearningRate 0.0487 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:00:10,718-Speed 3393.55 samples/sec Loss 5.6267 LearningRate 0.0487 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:00:13,758-Speed 3369.45 samples/sec Loss 5.5589 LearningRate 0.0486 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:16,768-Speed 3402.28 samples/sec Loss 5.4005 LearningRate 0.0486 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:19,783-Speed 3396.88 samples/sec Loss 5.5804 LearningRate 0.0486 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:22,796-Speed 3400.02 samples/sec Loss 5.6018 LearningRate 0.0486 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:25,821-Speed 3386.19 samples/sec Loss 5.7238 LearningRate 0.0486 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:28,823-Speed 3411.50 samples/sec Loss 5.7600 LearningRate 0.0486 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:31,831-Speed 3405.32 samples/sec Loss 5.6338 LearningRate 0.0486 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:34,867-Speed 3374.03 samples/sec Loss 5.7328 LearningRate 0.0486 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:37,880-Speed 3399.48 samples/sec Loss 5.6890 LearningRate 0.0485 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:40,945-Speed 3341.17 samples/sec Loss 5.4715 LearningRate 0.0485 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:43,933-Speed 3428.22 samples/sec Loss 5.7753 LearningRate 0.0485 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:46,941-Speed 3405.39 samples/sec Loss 5.6055 LearningRate 0.0485 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:49,949-Speed 3405.10 samples/sec Loss 5.5892 LearningRate 0.0485 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:52,958-Speed 3403.22 samples/sec Loss 5.6273 LearningRate 0.0485 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:55,981-Speed 3389.88 samples/sec Loss 5.8013 LearningRate 0.0485 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:00:59,059-Speed 3327.67 samples/sec Loss 5.7682 LearningRate 0.0485 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:02,072-Speed 3399.29 samples/sec Loss 5.7473 LearningRate 0.0484 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:05,085-Speed 3398.31 samples/sec Loss 5.6868 LearningRate 0.0484 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:08,104-Speed 3394.03 samples/sec Loss 5.6862 LearningRate 0.0484 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:11,147-Speed 3365.30 samples/sec Loss 5.5688 LearningRate 0.0484 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:14,140-Speed 3423.35 samples/sec Loss 5.7473 LearningRate 0.0484 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:17,146-Speed 3407.59 samples/sec Loss 5.7560 LearningRate 0.0484 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:20,155-Speed 3403.94 samples/sec Loss 5.7248 LearningRate 0.0484 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:23,279-Speed 3278.01 samples/sec Loss 5.6957 LearningRate 0.0484 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:26,303-Speed 3386.72 samples/sec Loss 5.7575 LearningRate 0.0483 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:01:29,317-Speed 3399.08 samples/sec Loss 5.6867 LearningRate 0.0483 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:01:32,315-Speed 3415.90 samples/sec Loss 5.6544 LearningRate 0.0483 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:35,324-Speed 3404.17 samples/sec Loss 5.6425 LearningRate 0.0483 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:38,338-Speed 3398.29 samples/sec Loss 5.6944 LearningRate 0.0483 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:41,353-Speed 3396.57 samples/sec Loss 5.8292 LearningRate 0.0483 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:44,361-Speed 3405.64 samples/sec Loss 5.8080 LearningRate 0.0483 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:47,373-Speed 3401.06 samples/sec Loss 5.6385 LearningRate 0.0483 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:50,415-Speed 3366.65 samples/sec Loss 5.7830 LearningRate 0.0482 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:53,544-Speed 3272.61 samples/sec Loss 5.6905 LearningRate 0.0482 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:56,554-Speed 3403.71 samples/sec Loss 5.8279 LearningRate 0.0482 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:01:59,564-Speed 3401.88 samples/sec Loss 5.8844 LearningRate 0.0482 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:02:02,583-Speed 3392.88 samples/sec Loss 5.6930 LearningRate 0.0482 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:05,601-Speed 3393.47 samples/sec Loss 5.7530 LearningRate 0.0482 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:08,619-Speed 3394.13 samples/sec Loss 5.8416 LearningRate 0.0482 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:11,627-Speed 3404.93 samples/sec Loss 5.8031 LearningRate 0.0482 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:14,644-Speed 3395.91 samples/sec Loss 5.6165 LearningRate 0.0481 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:17,655-Speed 3400.73 samples/sec Loss 5.6806 LearningRate 0.0481 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:20,670-Speed 3397.07 samples/sec Loss 5.7098 LearningRate 0.0481 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:23,684-Speed 3398.50 samples/sec Loss 5.7892 LearningRate 0.0481 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:26,709-Speed 3385.77 samples/sec Loss 5.8777 LearningRate 0.0481 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:29,728-Speed 3392.87 samples/sec Loss 5.7131 LearningRate 0.0481 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:02:32,746-Speed 3393.43 samples/sec Loss 5.7259 LearningRate 0.0481 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:35,768-Speed 3390.27 samples/sec Loss 5.7917 LearningRate 0.0481 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:38,809-Speed 3368.14 samples/sec Loss 5.7611 LearningRate 0.0481 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:41,829-Speed 3390.69 samples/sec Loss 5.8061 LearningRate 0.0480 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:44,844-Speed 3397.44 samples/sec Loss 5.6451 LearningRate 0.0480 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:47,880-Speed 3373.60 samples/sec Loss 5.7223 LearningRate 0.0480 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:50,890-Speed 3402.28 samples/sec Loss 5.8015 LearningRate 0.0480 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:53,907-Speed 3394.64 samples/sec Loss 5.7902 LearningRate 0.0480 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:56,944-Speed 3373.64 samples/sec Loss 5.8927 LearningRate 0.0480 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:02:59,960-Speed 3396.02 samples/sec Loss 5.7717 LearningRate 0.0480 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:03,093-Speed 3269.17 samples/sec Loss 5.6472 LearningRate 0.0480 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:06,194-Speed 3302.86 samples/sec Loss 5.8298 LearningRate 0.0479 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:09,209-Speed 3397.27 samples/sec Loss 5.8084 LearningRate 0.0479 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:12,217-Speed 3404.89 samples/sec Loss 5.9709 LearningRate 0.0479 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:15,231-Speed 3398.27 samples/sec Loss 5.7157 LearningRate 0.0479 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:18,243-Speed 3400.32 samples/sec Loss 5.8168 LearningRate 0.0479 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:21,268-Speed 3386.13 samples/sec Loss 5.7944 LearningRate 0.0479 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:24,328-Speed 3347.01 samples/sec Loss 5.7330 LearningRate 0.0479 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:27,352-Speed 3387.00 samples/sec Loss 5.6851 LearningRate 0.0479 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:30,367-Speed 3397.14 samples/sec Loss 5.6925 LearningRate 0.0478 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:33,387-Speed 3391.83 samples/sec Loss 5.7454 LearningRate 0.0478 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:36,400-Speed 3399.36 samples/sec Loss 5.7344 LearningRate 0.0478 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:39,419-Speed 3393.99 samples/sec Loss 5.7992 LearningRate 0.0478 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:03:42,436-Speed 3394.64 samples/sec Loss 5.6899 LearningRate 0.0478 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:45,453-Speed 3394.68 samples/sec Loss 5.7623 LearningRate 0.0478 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:48,474-Speed 3390.82 samples/sec Loss 5.9207 LearningRate 0.0478 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:51,503-Speed 3380.74 samples/sec Loss 5.8533 LearningRate 0.0478 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:54,522-Speed 3393.43 samples/sec Loss 5.9527 LearningRate 0.0477 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:03:57,542-Speed 3391.55 samples/sec Loss 5.8365 LearningRate 0.0477 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:04:00,555-Speed 3399.34 samples/sec Loss 5.8006 LearningRate 0.0477 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:04:03,550-Speed 3419.14 samples/sec Loss 5.7763 LearningRate 0.0477 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:06,563-Speed 3400.34 samples/sec Loss 5.7286 LearningRate 0.0477 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:09,572-Speed 3402.97 samples/sec Loss 5.8572 LearningRate 0.0477 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:12,594-Speed 3389.75 samples/sec Loss 5.9129 LearningRate 0.0477 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:15,610-Speed 3395.86 samples/sec Loss 5.8581 LearningRate 0.0477 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:18,626-Speed 3395.80 samples/sec Loss 5.8046 LearningRate 0.0477 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:21,650-Speed 3387.92 samples/sec Loss 5.7919 LearningRate 0.0476 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:24,669-Speed 3392.06 samples/sec Loss 5.8621 LearningRate 0.0476 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:27,692-Speed 3388.40 samples/sec Loss 5.6866 LearningRate 0.0476 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:30,712-Speed 3391.12 samples/sec Loss 5.8798 LearningRate 0.0476 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:33,793-Speed 3324.75 samples/sec Loss 5.9860 LearningRate 0.0476 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:04:36,831-Speed 3371.16 samples/sec Loss 5.8750 LearningRate 0.0476 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:39,850-Speed 3393.34 samples/sec Loss 5.8550 LearningRate 0.0476 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:42,864-Speed 3398.00 samples/sec Loss 5.8450 LearningRate 0.0476 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:45,893-Speed 3381.55 samples/sec Loss 5.9237 LearningRate 0.0475 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:48,910-Speed 3394.75 samples/sec Loss 5.8951 LearningRate 0.0475 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:51,926-Speed 3396.30 samples/sec Loss 5.8699 LearningRate 0.0475 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:54,946-Speed 3391.73 samples/sec Loss 5.8128 LearningRate 0.0475 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:04:57,963-Speed 3395.09 samples/sec Loss 5.8230 LearningRate 0.0475 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:00,980-Speed 3393.78 samples/sec Loss 5.9331 LearningRate 0.0475 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:04,021-Speed 3368.47 samples/sec Loss 5.7978 LearningRate 0.0475 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:07,054-Speed 3377.12 samples/sec Loss 5.7292 LearningRate 0.0475 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:10,068-Speed 3398.06 samples/sec Loss 5.8455 LearningRate 0.0474 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:13,087-Speed 3392.57 samples/sec Loss 5.7203 LearningRate 0.0474 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:16,107-Speed 3392.05 samples/sec Loss 5.6846 LearningRate 0.0474 Epoch: 6 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:19,125-Speed 3394.01 samples/sec Loss 5.8426 LearningRate 0.0474 Epoch: 6 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:22,143-Speed 3393.64 samples/sec Loss 5.9271 LearningRate 0.0474 Epoch: 6 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:25,170-Speed 3384.07 samples/sec Loss 5.6278 LearningRate 0.0474 Epoch: 6 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:28,205-Speed 3374.89 samples/sec Loss 5.7965 LearningRate 0.0474 Epoch: 6 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:31,222-Speed 3394.67 samples/sec Loss 5.7519 LearningRate 0.0474 Epoch: 6 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:34,253-Speed 3379.28 samples/sec Loss 5.8175 LearningRate 0.0473 Epoch: 6 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:37,255-Speed 3411.60 samples/sec Loss 5.9958 LearningRate 0.0473 Epoch: 6 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:40,271-Speed 3395.68 samples/sec Loss 6.0198 LearningRate 0.0473 Epoch: 6 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:05:43,265-Speed 3420.81 samples/sec Loss 5.7564 LearningRate 0.0473 Epoch: 6 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:46,296-Speed 3379.62 samples/sec Loss 5.8371 LearningRate 0.0473 Epoch: 6 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:49,322-Speed 3384.81 samples/sec Loss 5.6950 LearningRate 0.0473 Epoch: 6 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:52,339-Speed 3395.63 samples/sec Loss 5.7423 LearningRate 0.0473 Epoch: 6 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:55,358-Speed 3392.09 samples/sec Loss 5.7073 LearningRate 0.0473 Epoch: 6 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:05:58,392-Speed 3376.29 samples/sec Loss 5.7750 LearningRate 0.0473 Epoch: 6 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:01,406-Speed 3398.05 samples/sec Loss 5.8111 LearningRate 0.0472 Epoch: 6 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:04,424-Speed 3393.76 samples/sec Loss 5.7937 LearningRate 0.0472 Epoch: 6 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:07,448-Speed 3386.99 samples/sec Loss 5.8080 LearningRate 0.0472 Epoch: 6 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:10,469-Speed 3389.36 samples/sec Loss 5.9467 LearningRate 0.0472 Epoch: 6 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:13,492-Speed 3388.97 samples/sec Loss 5.7864 LearningRate 0.0472 Epoch: 6 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:16,510-Speed 3393.86 samples/sec Loss 5.7670 LearningRate 0.0472 Epoch: 6 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:19,527-Speed 3394.97 samples/sec Loss 5.9703 LearningRate 0.0472 Epoch: 6 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:22,548-Speed 3390.90 samples/sec Loss 5.7435 LearningRate 0.0472 Epoch: 6 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:25,568-Speed 3390.74 samples/sec Loss 5.9139 LearningRate 0.0471 Epoch: 6 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:28,590-Speed 3389.56 samples/sec Loss 5.8754 LearningRate 0.0471 Epoch: 6 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:31,611-Speed 3390.14 samples/sec Loss 5.8837 LearningRate 0.0471 Epoch: 6 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:06:34,612-Speed 3413.33 samples/sec Loss 5.7174 LearningRate 0.0471 Epoch: 6 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:37,647-Speed 3374.27 samples/sec Loss 5.8480 LearningRate 0.0471 Epoch: 6 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:40,669-Speed 3390.40 samples/sec Loss 5.8082 LearningRate 0.0471 Epoch: 6 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:43,684-Speed 3396.16 samples/sec Loss 5.8671 LearningRate 0.0471 Epoch: 6 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:46,718-Speed 3376.18 samples/sec Loss 5.7395 LearningRate 0.0471 Epoch: 6 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:49,781-Speed 3344.55 samples/sec Loss 5.7566 LearningRate 0.0470 Epoch: 6 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:52,896-Speed 3288.12 samples/sec Loss 5.9177 LearningRate 0.0470 Epoch: 6 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:55,915-Speed 3392.11 samples/sec Loss 5.8572 LearningRate 0.0470 Epoch: 6 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:06:58,938-Speed 3387.75 samples/sec Loss 5.8007 LearningRate 0.0470 Epoch: 6 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:01,971-Speed 3376.91 samples/sec Loss 5.7616 LearningRate 0.0470 Epoch: 6 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:05,003-Speed 3378.67 samples/sec Loss 5.8462 LearningRate 0.0470 Epoch: 6 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:08,007-Speed 3409.28 samples/sec Loss 5.7981 LearningRate 0.0470 Epoch: 6 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:11,024-Speed 3394.85 samples/sec Loss 5.8075 LearningRate 0.0470 Epoch: 6 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:14,050-Speed 3385.10 samples/sec Loss 5.8851 LearningRate 0.0469 Epoch: 6 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:17,067-Speed 3395.06 samples/sec Loss 5.7339 LearningRate 0.0469 Epoch: 6 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:20,088-Speed 3390.26 samples/sec Loss 5.8677 LearningRate 0.0469 Epoch: 6 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:23,119-Speed 3378.87 samples/sec Loss 5.8479 LearningRate 0.0469 Epoch: 6 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:26,140-Speed 3391.21 samples/sec Loss 5.7574 LearningRate 0.0469 Epoch: 6 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:29,161-Speed 3390.25 samples/sec Loss 5.9462 LearningRate 0.0469 Epoch: 6 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:32,179-Speed 3393.12 samples/sec Loss 5.8646 LearningRate 0.0469 Epoch: 6 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:35,207-Speed 3383.53 samples/sec Loss 5.8880 LearningRate 0.0469 Epoch: 6 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:07:38,237-Speed 3380.22 samples/sec Loss 5.8590 LearningRate 0.0469 Epoch: 6 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:41,270-Speed 3376.49 samples/sec Loss 5.8103 LearningRate 0.0468 Epoch: 6 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:44,296-Speed 3385.07 samples/sec Loss 5.7515 LearningRate 0.0468 Epoch: 6 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:47,322-Speed 3384.08 samples/sec Loss 5.8948 LearningRate 0.0468 Epoch: 6 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:50,468-Speed 3255.76 samples/sec Loss 5.7739 LearningRate 0.0468 Epoch: 6 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:53,497-Speed 3381.43 samples/sec Loss 5.7926 LearningRate 0.0468 Epoch: 6 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:56,527-Speed 3380.23 samples/sec Loss 5.7041 LearningRate 0.0468 Epoch: 6 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:07:59,534-Speed 3406.04 samples/sec Loss 5.6901 LearningRate 0.0468 Epoch: 6 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:02,567-Speed 3377.55 samples/sec Loss 5.8763 LearningRate 0.0468 Epoch: 6 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:05,588-Speed 3390.81 samples/sec Loss 5.7157 LearningRate 0.0467 Epoch: 6 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:08,612-Speed 3386.86 samples/sec Loss 5.8895 LearningRate 0.0467 Epoch: 6 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:11,638-Speed 3385.02 samples/sec Loss 5.7919 LearningRate 0.0467 Epoch: 6 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:14,667-Speed 3380.83 samples/sec Loss 5.7868 LearningRate 0.0467 Epoch: 6 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:08:58,001-[lfw][36000]XNorm: 22.579901 Training: 2022-04-27 05:08:58,002-[lfw][36000]Accuracy-Flip: 0.99633+-0.00323 Training: 2022-04-27 05:08:58,002-[lfw][36000]Accuracy-Highest: 0.99817 Training: 2022-04-27 05:09:48,499-[cfp_fp][36000]XNorm: 19.885575 Training: 2022-04-27 05:09:48,500-[cfp_fp][36000]Accuracy-Flip: 0.95186+-0.01214 Training: 2022-04-27 05:09:48,500-[cfp_fp][36000]Accuracy-Highest: 0.95186 Training: 2022-04-27 05:10:31,812-[agedb_30][36000]XNorm: 22.228437 Training: 2022-04-27 05:10:31,812-[agedb_30][36000]Accuracy-Flip: 0.97433+-0.00742 Training: 2022-04-27 05:10:31,813-[agedb_30][36000]Accuracy-Highest: 0.97467 Training: 2022-04-27 05:10:34,832-Speed 73.06 samples/sec Loss 5.8540 LearningRate 0.0467 Epoch: 6 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:10:37,868-Speed 3373.08 samples/sec Loss 5.8007 LearningRate 0.0467 Epoch: 6 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:10:40,890-Speed 3389.31 samples/sec Loss 5.7804 LearningRate 0.0467 Epoch: 6 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:10:43,902-Speed 3400.47 samples/sec Loss 5.7788 LearningRate 0.0467 Epoch: 6 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:10:46,915-Speed 3400.03 samples/sec Loss 5.7710 LearningRate 0.0466 Epoch: 6 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:10:49,925-Speed 3402.44 samples/sec Loss 5.8117 LearningRate 0.0466 Epoch: 6 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:10:52,941-Speed 3395.67 samples/sec Loss 5.9063 LearningRate 0.0466 Epoch: 6 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:10:55,940-Speed 3415.10 samples/sec Loss 5.8764 LearningRate 0.0466 Epoch: 6 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:10:58,966-Speed 3385.47 samples/sec Loss 5.8845 LearningRate 0.0466 Epoch: 6 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:02,013-Speed 3361.59 samples/sec Loss 5.8996 LearningRate 0.0466 Epoch: 6 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:05,032-Speed 3392.83 samples/sec Loss 5.8672 LearningRate 0.0466 Epoch: 6 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:08,064-Speed 3377.26 samples/sec Loss 5.7778 LearningRate 0.0466 Epoch: 6 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:11,082-Speed 3394.53 samples/sec Loss 5.6838 LearningRate 0.0466 Epoch: 6 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:14,101-Speed 3392.41 samples/sec Loss 5.8511 LearningRate 0.0465 Epoch: 6 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:17,121-Speed 3391.65 samples/sec Loss 5.7776 LearningRate 0.0465 Epoch: 6 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:20,146-Speed 3384.93 samples/sec Loss 5.7904 LearningRate 0.0465 Epoch: 6 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:23,164-Speed 3394.82 samples/sec Loss 5.8452 LearningRate 0.0465 Epoch: 6 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:26,239-Speed 3331.17 samples/sec Loss 5.7661 LearningRate 0.0465 Epoch: 6 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:29,255-Speed 3395.32 samples/sec Loss 5.8509 LearningRate 0.0465 Epoch: 6 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:32,272-Speed 3395.00 samples/sec Loss 5.7710 LearningRate 0.0465 Epoch: 6 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:35,295-Speed 3388.16 samples/sec Loss 5.8700 LearningRate 0.0465 Epoch: 6 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:38,316-Speed 3390.84 samples/sec Loss 5.8711 LearningRate 0.0464 Epoch: 6 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:41,338-Speed 3388.70 samples/sec Loss 5.7817 LearningRate 0.0464 Epoch: 6 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:11:44,346-Speed 3404.68 samples/sec Loss 5.7696 LearningRate 0.0464 Epoch: 6 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:47,378-Speed 3378.04 samples/sec Loss 5.8935 LearningRate 0.0464 Epoch: 6 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:50,407-Speed 3382.13 samples/sec Loss 5.8346 LearningRate 0.0464 Epoch: 6 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:53,432-Speed 3386.38 samples/sec Loss 5.7593 LearningRate 0.0464 Epoch: 6 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:56,455-Speed 3387.48 samples/sec Loss 5.9256 LearningRate 0.0464 Epoch: 6 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:11:59,484-Speed 3381.92 samples/sec Loss 5.8543 LearningRate 0.0464 Epoch: 6 Global Step: 36290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:02,513-Speed 3381.33 samples/sec Loss 5.7610 LearningRate 0.0463 Epoch: 6 Global Step: 36300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:05,542-Speed 3381.06 samples/sec Loss 5.7439 LearningRate 0.0463 Epoch: 6 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:08,568-Speed 3384.99 samples/sec Loss 5.7668 LearningRate 0.0463 Epoch: 6 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:11,604-Speed 3373.93 samples/sec Loss 5.6248 LearningRate 0.0463 Epoch: 6 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:14,639-Speed 3373.88 samples/sec Loss 5.8668 LearningRate 0.0463 Epoch: 6 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:12:17,693-Speed 3354.85 samples/sec Loss 5.6914 LearningRate 0.0463 Epoch: 6 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:12:20,697-Speed 3409.26 samples/sec Loss 5.8281 LearningRate 0.0463 Epoch: 6 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:23,724-Speed 3383.79 samples/sec Loss 5.6638 LearningRate 0.0463 Epoch: 6 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:26,745-Speed 3392.06 samples/sec Loss 5.8461 LearningRate 0.0463 Epoch: 6 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:29,765-Speed 3391.25 samples/sec Loss 5.8464 LearningRate 0.0462 Epoch: 6 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:32,788-Speed 3387.59 samples/sec Loss 5.7837 LearningRate 0.0462 Epoch: 6 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:35,816-Speed 3383.42 samples/sec Loss 5.7493 LearningRate 0.0462 Epoch: 6 Global Step: 36410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:38,842-Speed 3383.81 samples/sec Loss 5.8292 LearningRate 0.0462 Epoch: 6 Global Step: 36420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:41,868-Speed 3385.46 samples/sec Loss 5.7290 LearningRate 0.0462 Epoch: 6 Global Step: 36430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:44,886-Speed 3394.20 samples/sec Loss 5.7274 LearningRate 0.0462 Epoch: 6 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:47,904-Speed 3393.08 samples/sec Loss 5.8202 LearningRate 0.0462 Epoch: 6 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:50,929-Speed 3385.87 samples/sec Loss 5.8134 LearningRate 0.0462 Epoch: 6 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:12:53,968-Speed 3370.42 samples/sec Loss 5.7076 LearningRate 0.0461 Epoch: 6 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:12:56,993-Speed 3385.96 samples/sec Loss 5.8643 LearningRate 0.0461 Epoch: 6 Global Step: 36480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:00,013-Speed 3391.71 samples/sec Loss 5.7715 LearningRate 0.0461 Epoch: 6 Global Step: 36490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:03,037-Speed 3387.23 samples/sec Loss 5.7210 LearningRate 0.0461 Epoch: 6 Global Step: 36500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:06,061-Speed 3387.68 samples/sec Loss 5.6123 LearningRate 0.0461 Epoch: 6 Global Step: 36510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:09,089-Speed 3382.73 samples/sec Loss 5.7927 LearningRate 0.0461 Epoch: 6 Global Step: 36520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:12,110-Speed 3389.80 samples/sec Loss 5.6720 LearningRate 0.0461 Epoch: 6 Global Step: 36530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:15,135-Speed 3386.10 samples/sec Loss 5.7167 LearningRate 0.0461 Epoch: 6 Global Step: 36540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:18,158-Speed 3388.65 samples/sec Loss 5.8985 LearningRate 0.0460 Epoch: 6 Global Step: 36550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:21,198-Speed 3369.51 samples/sec Loss 5.9056 LearningRate 0.0460 Epoch: 6 Global Step: 36560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:24,221-Speed 3387.72 samples/sec Loss 5.7509 LearningRate 0.0460 Epoch: 6 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:13:27,252-Speed 3379.43 samples/sec Loss 5.7672 LearningRate 0.0460 Epoch: 6 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:13:30,270-Speed 3393.37 samples/sec Loss 5.7194 LearningRate 0.0460 Epoch: 6 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:13:33,293-Speed 3388.30 samples/sec Loss 5.6185 LearningRate 0.0460 Epoch: 6 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:13:36,313-Speed 3391.39 samples/sec Loss 5.7293 LearningRate 0.0460 Epoch: 6 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:13:39,324-Speed 3401.66 samples/sec Loss 5.6936 LearningRate 0.0460 Epoch: 6 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:42,346-Speed 3390.14 samples/sec Loss 5.9220 LearningRate 0.0460 Epoch: 6 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:45,370-Speed 3386.88 samples/sec Loss 5.8367 LearningRate 0.0459 Epoch: 6 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:48,390-Speed 3391.15 samples/sec Loss 5.8442 LearningRate 0.0459 Epoch: 6 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:51,413-Speed 3387.53 samples/sec Loss 5.7795 LearningRate 0.0459 Epoch: 6 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:54,436-Speed 3388.32 samples/sec Loss 5.8565 LearningRate 0.0459 Epoch: 6 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:13:57,468-Speed 3378.69 samples/sec Loss 5.8205 LearningRate 0.0459 Epoch: 6 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:00,492-Speed 3387.14 samples/sec Loss 5.7855 LearningRate 0.0459 Epoch: 6 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:03,520-Speed 3381.93 samples/sec Loss 5.6302 LearningRate 0.0459 Epoch: 6 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:06,570-Speed 3359.15 samples/sec Loss 5.8004 LearningRate 0.0459 Epoch: 6 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:09,575-Speed 3408.17 samples/sec Loss 5.7962 LearningRate 0.0458 Epoch: 6 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:12,613-Speed 3371.60 samples/sec Loss 5.9545 LearningRate 0.0458 Epoch: 6 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:15,731-Speed 3284.68 samples/sec Loss 5.7476 LearningRate 0.0458 Epoch: 6 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:18,763-Speed 3378.31 samples/sec Loss 5.6905 LearningRate 0.0458 Epoch: 6 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:21,793-Speed 3379.58 samples/sec Loss 5.7367 LearningRate 0.0458 Epoch: 6 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:24,823-Speed 3379.94 samples/sec Loss 5.8363 LearningRate 0.0458 Epoch: 6 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:27,850-Speed 3383.43 samples/sec Loss 5.8571 LearningRate 0.0458 Epoch: 6 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:30,880-Speed 3380.91 samples/sec Loss 5.7562 LearningRate 0.0458 Epoch: 6 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:33,911-Speed 3380.12 samples/sec Loss 5.7109 LearningRate 0.0458 Epoch: 6 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:36,938-Speed 3383.81 samples/sec Loss 5.7363 LearningRate 0.0457 Epoch: 6 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:39,981-Speed 3365.60 samples/sec Loss 5.9763 LearningRate 0.0457 Epoch: 6 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:14:42,990-Speed 3403.53 samples/sec Loss 5.7001 LearningRate 0.0457 Epoch: 6 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:46,014-Speed 3387.31 samples/sec Loss 5.6609 LearningRate 0.0457 Epoch: 6 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:49,037-Speed 3388.12 samples/sec Loss 5.8510 LearningRate 0.0457 Epoch: 6 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:52,065-Speed 3382.48 samples/sec Loss 5.7465 LearningRate 0.0457 Epoch: 6 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:55,091-Speed 3384.97 samples/sec Loss 5.8179 LearningRate 0.0457 Epoch: 6 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:14:58,135-Speed 3364.81 samples/sec Loss 5.9444 LearningRate 0.0457 Epoch: 6 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:01,202-Speed 3340.01 samples/sec Loss 5.7132 LearningRate 0.0456 Epoch: 6 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:04,226-Speed 3386.58 samples/sec Loss 5.8259 LearningRate 0.0456 Epoch: 6 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:07,252-Speed 3384.94 samples/sec Loss 5.8894 LearningRate 0.0456 Epoch: 6 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:10,278-Speed 3384.38 samples/sec Loss 5.7622 LearningRate 0.0456 Epoch: 6 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:13,306-Speed 3382.58 samples/sec Loss 5.8176 LearningRate 0.0456 Epoch: 6 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:15:16,369-Speed 3343.70 samples/sec Loss 5.8879 LearningRate 0.0456 Epoch: 6 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:15:19,414-Speed 3363.63 samples/sec Loss 5.8861 LearningRate 0.0456 Epoch: 6 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:15:22,438-Speed 3387.32 samples/sec Loss 5.8824 LearningRate 0.0456 Epoch: 6 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:15:25,464-Speed 3384.74 samples/sec Loss 5.7370 LearningRate 0.0455 Epoch: 6 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:28,529-Speed 3341.70 samples/sec Loss 5.7442 LearningRate 0.0455 Epoch: 6 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:31,586-Speed 3350.67 samples/sec Loss 5.7037 LearningRate 0.0455 Epoch: 6 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:34,616-Speed 3380.37 samples/sec Loss 5.7712 LearningRate 0.0455 Epoch: 6 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:37,641-Speed 3385.40 samples/sec Loss 5.6752 LearningRate 0.0455 Epoch: 6 Global Step: 37010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:40,666-Speed 3386.87 samples/sec Loss 5.8494 LearningRate 0.0455 Epoch: 6 Global Step: 37020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:43,688-Speed 3388.47 samples/sec Loss 5.7004 LearningRate 0.0455 Epoch: 6 Global Step: 37030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:46,723-Speed 3374.33 samples/sec Loss 5.6811 LearningRate 0.0455 Epoch: 6 Global Step: 37040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:49,759-Speed 3373.92 samples/sec Loss 5.8860 LearningRate 0.0455 Epoch: 6 Global Step: 37050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:52,796-Speed 3373.05 samples/sec Loss 5.7164 LearningRate 0.0454 Epoch: 6 Global Step: 37060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:15:55,826-Speed 3379.99 samples/sec Loss 5.6236 LearningRate 0.0454 Epoch: 6 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:15:58,849-Speed 3388.71 samples/sec Loss 5.7367 LearningRate 0.0454 Epoch: 6 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:01,876-Speed 3383.75 samples/sec Loss 5.7134 LearningRate 0.0454 Epoch: 6 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:04,900-Speed 3386.32 samples/sec Loss 5.7980 LearningRate 0.0454 Epoch: 6 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:07,927-Speed 3384.49 samples/sec Loss 5.7058 LearningRate 0.0454 Epoch: 6 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:10,942-Speed 3396.06 samples/sec Loss 5.7877 LearningRate 0.0454 Epoch: 6 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:13,968-Speed 3384.82 samples/sec Loss 5.8762 LearningRate 0.0454 Epoch: 6 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:17,023-Speed 3352.77 samples/sec Loss 5.7589 LearningRate 0.0453 Epoch: 6 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:20,049-Speed 3385.09 samples/sec Loss 5.6642 LearningRate 0.0453 Epoch: 6 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:23,070-Speed 3390.42 samples/sec Loss 5.5799 LearningRate 0.0453 Epoch: 6 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:26,098-Speed 3383.10 samples/sec Loss 5.6886 LearningRate 0.0453 Epoch: 6 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:29,125-Speed 3382.68 samples/sec Loss 5.7781 LearningRate 0.0453 Epoch: 6 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:32,149-Speed 3387.25 samples/sec Loss 5.7631 LearningRate 0.0453 Epoch: 6 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:35,181-Speed 3378.39 samples/sec Loss 5.8264 LearningRate 0.0453 Epoch: 6 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:38,264-Speed 3322.35 samples/sec Loss 5.8482 LearningRate 0.0453 Epoch: 6 Global Step: 37210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:41,290-Speed 3384.70 samples/sec Loss 5.6801 LearningRate 0.0453 Epoch: 6 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:44,315-Speed 3385.42 samples/sec Loss 5.7332 LearningRate 0.0452 Epoch: 6 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:47,348-Speed 3377.86 samples/sec Loss 5.7345 LearningRate 0.0452 Epoch: 6 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:50,369-Speed 3389.91 samples/sec Loss 5.5688 LearningRate 0.0452 Epoch: 6 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:16:53,388-Speed 3392.77 samples/sec Loss 5.8011 LearningRate 0.0452 Epoch: 6 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:56,411-Speed 3388.33 samples/sec Loss 5.8093 LearningRate 0.0452 Epoch: 6 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:16:59,437-Speed 3384.27 samples/sec Loss 5.5784 LearningRate 0.0452 Epoch: 6 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:02,460-Speed 3387.84 samples/sec Loss 5.7878 LearningRate 0.0452 Epoch: 6 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:05,494-Speed 3375.96 samples/sec Loss 5.6911 LearningRate 0.0452 Epoch: 6 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:08,522-Speed 3382.81 samples/sec Loss 5.6992 LearningRate 0.0451 Epoch: 6 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:11,549-Speed 3383.45 samples/sec Loss 5.6757 LearningRate 0.0451 Epoch: 6 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:14,588-Speed 3370.91 samples/sec Loss 5.7388 LearningRate 0.0451 Epoch: 6 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:17,614-Speed 3384.86 samples/sec Loss 5.5986 LearningRate 0.0451 Epoch: 6 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:20,642-Speed 3382.21 samples/sec Loss 5.7645 LearningRate 0.0451 Epoch: 6 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:17:23,667-Speed 3387.23 samples/sec Loss 5.7943 LearningRate 0.0451 Epoch: 6 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:26,691-Speed 3387.01 samples/sec Loss 5.7805 LearningRate 0.0451 Epoch: 6 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:29,715-Speed 3385.84 samples/sec Loss 5.7648 LearningRate 0.0451 Epoch: 6 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:32,749-Speed 3376.69 samples/sec Loss 5.8156 LearningRate 0.0451 Epoch: 6 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:35,778-Speed 3381.05 samples/sec Loss 5.8170 LearningRate 0.0450 Epoch: 6 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:38,805-Speed 3383.91 samples/sec Loss 5.6790 LearningRate 0.0450 Epoch: 6 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:41,837-Speed 3377.79 samples/sec Loss 5.5345 LearningRate 0.0450 Epoch: 6 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:44,883-Speed 3362.50 samples/sec Loss 5.7425 LearningRate 0.0450 Epoch: 6 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:47,935-Speed 3356.74 samples/sec Loss 5.6453 LearningRate 0.0450 Epoch: 6 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:50,964-Speed 3381.46 samples/sec Loss 5.8367 LearningRate 0.0450 Epoch: 6 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:17:53,988-Speed 3386.80 samples/sec Loss 5.7501 LearningRate 0.0450 Epoch: 6 Global Step: 37460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-27 05:17:56,998-Speed 3402.97 samples/sec Loss 5.7739 LearningRate 0.0450 Epoch: 6 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:00,020-Speed 3388.97 samples/sec Loss 5.7578 LearningRate 0.0449 Epoch: 6 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:03,051-Speed 3378.69 samples/sec Loss 5.7489 LearningRate 0.0449 Epoch: 6 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:06,089-Speed 3372.11 samples/sec Loss 5.5716 LearningRate 0.0449 Epoch: 6 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:09,122-Speed 3376.96 samples/sec Loss 5.7275 LearningRate 0.0449 Epoch: 6 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:12,151-Speed 3381.77 samples/sec Loss 5.5893 LearningRate 0.0449 Epoch: 6 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:15,178-Speed 3382.85 samples/sec Loss 5.6617 LearningRate 0.0449 Epoch: 6 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:18,211-Speed 3377.95 samples/sec Loss 5.8451 LearningRate 0.0449 Epoch: 6 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:21,236-Speed 3384.95 samples/sec Loss 5.6534 LearningRate 0.0449 Epoch: 6 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:24,247-Speed 3402.44 samples/sec Loss 5.8194 LearningRate 0.0449 Epoch: 6 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:27,274-Speed 3383.01 samples/sec Loss 5.6440 LearningRate 0.0448 Epoch: 6 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:30,300-Speed 3385.55 samples/sec Loss 5.6582 LearningRate 0.0448 Epoch: 6 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:33,339-Speed 3369.97 samples/sec Loss 5.8522 LearningRate 0.0448 Epoch: 6 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:36,465-Speed 3276.12 samples/sec Loss 5.7505 LearningRate 0.0448 Epoch: 6 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:39,495-Speed 3381.18 samples/sec Loss 5.8234 LearningRate 0.0448 Epoch: 6 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:42,523-Speed 3381.98 samples/sec Loss 5.6967 LearningRate 0.0448 Epoch: 6 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:45,555-Speed 3378.24 samples/sec Loss 5.8116 LearningRate 0.0448 Epoch: 6 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:48,583-Speed 3383.67 samples/sec Loss 5.6538 LearningRate 0.0448 Epoch: 6 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:51,611-Speed 3381.68 samples/sec Loss 5.6800 LearningRate 0.0447 Epoch: 6 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:18:54,643-Speed 3377.76 samples/sec Loss 5.6432 LearningRate 0.0447 Epoch: 6 Global Step: 37660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:18:57,676-Speed 3377.30 samples/sec Loss 5.7838 LearningRate 0.0447 Epoch: 6 Global Step: 37670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:00,743-Speed 3339.74 samples/sec Loss 5.6248 LearningRate 0.0447 Epoch: 6 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:03,864-Speed 3282.32 samples/sec Loss 5.6824 LearningRate 0.0447 Epoch: 6 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:06,901-Speed 3372.27 samples/sec Loss 5.7528 LearningRate 0.0447 Epoch: 6 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:09,931-Speed 3380.07 samples/sec Loss 5.6903 LearningRate 0.0447 Epoch: 6 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:12,959-Speed 3382.40 samples/sec Loss 5.8224 LearningRate 0.0447 Epoch: 6 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:15,986-Speed 3383.95 samples/sec Loss 5.7195 LearningRate 0.0447 Epoch: 6 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:18,997-Speed 3403.85 samples/sec Loss 5.6048 LearningRate 0.0446 Epoch: 6 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:22,043-Speed 3362.56 samples/sec Loss 5.7686 LearningRate 0.0446 Epoch: 6 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:25,085-Speed 3366.97 samples/sec Loss 5.6838 LearningRate 0.0446 Epoch: 6 Global Step: 37760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:28,118-Speed 3376.99 samples/sec Loss 5.8582 LearningRate 0.0446 Epoch: 6 Global Step: 37770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:31,140-Speed 3389.57 samples/sec Loss 5.7218 LearningRate 0.0446 Epoch: 6 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:34,173-Speed 3377.10 samples/sec Loss 5.5892 LearningRate 0.0446 Epoch: 6 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:37,201-Speed 3382.12 samples/sec Loss 5.6725 LearningRate 0.0446 Epoch: 6 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:40,257-Speed 3351.70 samples/sec Loss 5.6642 LearningRate 0.0446 Epoch: 6 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:43,333-Speed 3330.44 samples/sec Loss 5.7810 LearningRate 0.0445 Epoch: 6 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:46,364-Speed 3378.74 samples/sec Loss 5.7816 LearningRate 0.0445 Epoch: 6 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:19:49,449-Speed 3320.79 samples/sec Loss 5.7571 LearningRate 0.0445 Epoch: 6 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:52,484-Speed 3374.09 samples/sec Loss 5.7876 LearningRate 0.0445 Epoch: 6 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:55,525-Speed 3367.94 samples/sec Loss 5.7666 LearningRate 0.0445 Epoch: 6 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:19:58,561-Speed 3374.24 samples/sec Loss 5.5770 LearningRate 0.0445 Epoch: 6 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:01,596-Speed 3374.58 samples/sec Loss 5.6771 LearningRate 0.0445 Epoch: 6 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:04,625-Speed 3381.75 samples/sec Loss 5.7127 LearningRate 0.0445 Epoch: 6 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:07,653-Speed 3381.96 samples/sec Loss 5.7525 LearningRate 0.0445 Epoch: 6 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:10,767-Speed 3289.62 samples/sec Loss 5.7664 LearningRate 0.0444 Epoch: 6 Global Step: 37910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:13,797-Speed 3380.40 samples/sec Loss 5.5331 LearningRate 0.0444 Epoch: 6 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:16,831-Speed 3376.01 samples/sec Loss 5.7212 LearningRate 0.0444 Epoch: 6 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:19,852-Speed 3389.63 samples/sec Loss 5.6906 LearningRate 0.0444 Epoch: 6 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:20:22,866-Speed 3398.19 samples/sec Loss 5.6012 LearningRate 0.0444 Epoch: 6 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:20:25,891-Speed 3386.17 samples/sec Loss 5.6236 LearningRate 0.0444 Epoch: 6 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:20:28,914-Speed 3388.44 samples/sec Loss 5.6469 LearningRate 0.0444 Epoch: 6 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:20:31,941-Speed 3383.40 samples/sec Loss 5.5518 LearningRate 0.0444 Epoch: 6 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:20:35,035-Speed 3310.16 samples/sec Loss 5.7656 LearningRate 0.0443 Epoch: 6 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:20:38,069-Speed 3375.99 samples/sec Loss 5.7858 LearningRate 0.0443 Epoch: 6 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:21:21,356-[lfw][38000]XNorm: 21.401213 Training: 2022-04-27 05:21:21,356-[lfw][38000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 05:21:21,357-[lfw][38000]Accuracy-Highest: 0.99817 Training: 2022-04-27 05:22:11,705-[cfp_fp][38000]XNorm: 19.202028 Training: 2022-04-27 05:22:11,706-[cfp_fp][38000]Accuracy-Flip: 0.95300+-0.01039 Training: 2022-04-27 05:22:11,706-[cfp_fp][38000]Accuracy-Highest: 0.95300 Training: 2022-04-27 05:22:55,029-[agedb_30][38000]XNorm: 21.345746 Training: 2022-04-27 05:22:55,029-[agedb_30][38000]Accuracy-Flip: 0.97300+-0.00954 Training: 2022-04-27 05:22:55,030-[agedb_30][38000]Accuracy-Highest: 0.97467 Training: 2022-04-27 05:22:58,047-Speed 73.15 samples/sec Loss 5.7120 LearningRate 0.0443 Epoch: 6 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:01,092-Speed 3364.11 samples/sec Loss 5.6740 LearningRate 0.0443 Epoch: 6 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:04,100-Speed 3405.03 samples/sec Loss 5.6843 LearningRate 0.0443 Epoch: 6 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:07,105-Speed 3408.33 samples/sec Loss 5.7002 LearningRate 0.0443 Epoch: 6 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:10,121-Speed 3395.91 samples/sec Loss 5.5891 LearningRate 0.0443 Epoch: 6 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:23:13,125-Speed 3409.87 samples/sec Loss 5.6677 LearningRate 0.0443 Epoch: 6 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:16,139-Speed 3397.68 samples/sec Loss 5.6948 LearningRate 0.0443 Epoch: 6 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:19,172-Speed 3377.43 samples/sec Loss 5.7874 LearningRate 0.0442 Epoch: 6 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:22,194-Speed 3389.63 samples/sec Loss 5.7445 LearningRate 0.0442 Epoch: 6 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:25,216-Speed 3389.39 samples/sec Loss 5.5880 LearningRate 0.0442 Epoch: 6 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:28,259-Speed 3366.28 samples/sec Loss 5.6339 LearningRate 0.0442 Epoch: 6 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:31,275-Speed 3395.92 samples/sec Loss 5.6732 LearningRate 0.0442 Epoch: 6 Global Step: 38120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:34,302-Speed 3383.39 samples/sec Loss 5.7354 LearningRate 0.0442 Epoch: 6 Global Step: 38130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:37,330-Speed 3382.68 samples/sec Loss 5.7032 LearningRate 0.0442 Epoch: 6 Global Step: 38140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:40,350-Speed 3391.67 samples/sec Loss 5.6990 LearningRate 0.0442 Epoch: 6 Global Step: 38150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:43,364-Speed 3397.84 samples/sec Loss 5.6597 LearningRate 0.0441 Epoch: 6 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:23:46,384-Speed 3391.38 samples/sec Loss 5.6236 LearningRate 0.0441 Epoch: 6 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:23:49,386-Speed 3411.84 samples/sec Loss 5.7098 LearningRate 0.0441 Epoch: 6 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:52,408-Speed 3389.34 samples/sec Loss 5.6140 LearningRate 0.0441 Epoch: 6 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:55,420-Speed 3400.74 samples/sec Loss 5.7329 LearningRate 0.0441 Epoch: 6 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:23:58,445-Speed 3386.29 samples/sec Loss 5.7605 LearningRate 0.0441 Epoch: 6 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:01,471-Speed 3384.43 samples/sec Loss 5.7490 LearningRate 0.0441 Epoch: 6 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:04,486-Speed 3397.01 samples/sec Loss 5.6088 LearningRate 0.0441 Epoch: 6 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:07,533-Speed 3361.67 samples/sec Loss 5.5270 LearningRate 0.0441 Epoch: 6 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:10,558-Speed 3386.09 samples/sec Loss 5.7682 LearningRate 0.0440 Epoch: 6 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:13,583-Speed 3386.31 samples/sec Loss 5.6151 LearningRate 0.0440 Epoch: 6 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:16,638-Speed 3352.53 samples/sec Loss 5.7346 LearningRate 0.0440 Epoch: 6 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:24:19,658-Speed 3391.32 samples/sec Loss 5.6045 LearningRate 0.0440 Epoch: 6 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:22,677-Speed 3392.44 samples/sec Loss 5.7679 LearningRate 0.0440 Epoch: 6 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:25,699-Speed 3389.41 samples/sec Loss 5.6161 LearningRate 0.0440 Epoch: 6 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:28,717-Speed 3394.04 samples/sec Loss 5.5343 LearningRate 0.0440 Epoch: 6 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:31,735-Speed 3393.27 samples/sec Loss 5.4656 LearningRate 0.0440 Epoch: 6 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:34,747-Speed 3400.07 samples/sec Loss 5.5941 LearningRate 0.0439 Epoch: 6 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:37,757-Speed 3403.01 samples/sec Loss 5.6093 LearningRate 0.0439 Epoch: 6 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:40,779-Speed 3390.20 samples/sec Loss 5.6287 LearningRate 0.0439 Epoch: 6 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:43,791-Speed 3400.20 samples/sec Loss 5.6158 LearningRate 0.0439 Epoch: 6 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:46,804-Speed 3399.02 samples/sec Loss 5.6786 LearningRate 0.0439 Epoch: 6 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:49,803-Speed 3415.84 samples/sec Loss 5.6930 LearningRate 0.0439 Epoch: 6 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:52,814-Speed 3401.28 samples/sec Loss 5.5591 LearningRate 0.0439 Epoch: 6 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:55,828-Speed 3398.74 samples/sec Loss 5.5574 LearningRate 0.0439 Epoch: 6 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:24:58,839-Speed 3401.47 samples/sec Loss 5.6032 LearningRate 0.0439 Epoch: 6 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:01,856-Speed 3394.53 samples/sec Loss 5.6341 LearningRate 0.0438 Epoch: 6 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:04,879-Speed 3388.01 samples/sec Loss 5.6145 LearningRate 0.0438 Epoch: 6 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:07,894-Speed 3397.94 samples/sec Loss 5.7074 LearningRate 0.0438 Epoch: 6 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:10,906-Speed 3400.61 samples/sec Loss 5.7100 LearningRate 0.0438 Epoch: 6 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:13,918-Speed 3401.18 samples/sec Loss 5.6578 LearningRate 0.0438 Epoch: 6 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:16,928-Speed 3402.15 samples/sec Loss 5.5616 LearningRate 0.0438 Epoch: 6 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:19,965-Speed 3372.43 samples/sec Loss 5.6416 LearningRate 0.0438 Epoch: 6 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:22,987-Speed 3388.93 samples/sec Loss 5.6620 LearningRate 0.0438 Epoch: 6 Global Step: 38490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:26,007-Speed 3392.31 samples/sec Loss 5.5427 LearningRate 0.0438 Epoch: 6 Global Step: 38500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:29,024-Speed 3394.61 samples/sec Loss 5.7583 LearningRate 0.0437 Epoch: 6 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:25:32,046-Speed 3389.44 samples/sec Loss 5.6147 LearningRate 0.0437 Epoch: 6 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:25:35,061-Speed 3396.94 samples/sec Loss 5.6259 LearningRate 0.0437 Epoch: 6 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:25:38,059-Speed 3416.69 samples/sec Loss 5.6267 LearningRate 0.0437 Epoch: 6 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:41,070-Speed 3402.41 samples/sec Loss 5.5410 LearningRate 0.0437 Epoch: 6 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:44,083-Speed 3398.82 samples/sec Loss 5.5353 LearningRate 0.0437 Epoch: 6 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:47,096-Speed 3399.62 samples/sec Loss 5.5000 LearningRate 0.0437 Epoch: 6 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:50,116-Speed 3391.46 samples/sec Loss 5.8100 LearningRate 0.0437 Epoch: 6 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:53,132-Speed 3395.95 samples/sec Loss 5.5331 LearningRate 0.0436 Epoch: 6 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:56,148-Speed 3395.36 samples/sec Loss 5.6591 LearningRate 0.0436 Epoch: 6 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:25:59,164-Speed 3396.05 samples/sec Loss 5.7270 LearningRate 0.0436 Epoch: 6 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:02,179-Speed 3397.33 samples/sec Loss 5.6232 LearningRate 0.0436 Epoch: 6 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:05,192-Speed 3399.31 samples/sec Loss 5.5716 LearningRate 0.0436 Epoch: 6 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:08,215-Speed 3388.46 samples/sec Loss 5.5917 LearningRate 0.0436 Epoch: 6 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:11,331-Speed 3286.95 samples/sec Loss 5.7596 LearningRate 0.0436 Epoch: 6 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:14,343-Speed 3400.42 samples/sec Loss 5.5312 LearningRate 0.0436 Epoch: 6 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:17,357-Speed 3398.06 samples/sec Loss 5.6027 LearningRate 0.0436 Epoch: 6 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:20,373-Speed 3396.71 samples/sec Loss 5.5454 LearningRate 0.0435 Epoch: 6 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:23,390-Speed 3394.85 samples/sec Loss 5.6616 LearningRate 0.0435 Epoch: 6 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:26,388-Speed 3416.45 samples/sec Loss 5.5064 LearningRate 0.0435 Epoch: 6 Global Step: 38700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:29,408-Speed 3391.91 samples/sec Loss 5.6382 LearningRate 0.0435 Epoch: 6 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:32,423-Speed 3396.65 samples/sec Loss 5.5654 LearningRate 0.0435 Epoch: 6 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:35,474-Speed 3357.25 samples/sec Loss 5.6420 LearningRate 0.0435 Epoch: 6 Global Step: 38730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:38,495-Speed 3389.87 samples/sec Loss 5.7341 LearningRate 0.0435 Epoch: 6 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:41,516-Speed 3391.09 samples/sec Loss 5.6000 LearningRate 0.0435 Epoch: 6 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:44,538-Speed 3388.81 samples/sec Loss 5.4583 LearningRate 0.0434 Epoch: 6 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:47,556-Speed 3393.23 samples/sec Loss 5.6728 LearningRate 0.0434 Epoch: 6 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:50,580-Speed 3387.86 samples/sec Loss 5.7631 LearningRate 0.0434 Epoch: 6 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:53,595-Speed 3396.54 samples/sec Loss 5.6863 LearningRate 0.0434 Epoch: 6 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:26:56,613-Speed 3394.95 samples/sec Loss 5.7100 LearningRate 0.0434 Epoch: 6 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:26:59,639-Speed 3384.16 samples/sec Loss 5.6648 LearningRate 0.0434 Epoch: 6 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:27:02,648-Speed 3404.06 samples/sec Loss 5.5873 LearningRate 0.0434 Epoch: 6 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:05,666-Speed 3393.30 samples/sec Loss 5.5921 LearningRate 0.0434 Epoch: 6 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:08,684-Speed 3394.11 samples/sec Loss 5.5527 LearningRate 0.0434 Epoch: 6 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:11,712-Speed 3382.68 samples/sec Loss 5.5128 LearningRate 0.0433 Epoch: 6 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:14,742-Speed 3380.39 samples/sec Loss 5.6681 LearningRate 0.0433 Epoch: 6 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:17,763-Speed 3390.43 samples/sec Loss 5.5697 LearningRate 0.0433 Epoch: 6 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:20,789-Speed 3385.79 samples/sec Loss 5.6410 LearningRate 0.0433 Epoch: 6 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:23,817-Speed 3382.70 samples/sec Loss 5.7323 LearningRate 0.0433 Epoch: 6 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:26,936-Speed 3284.37 samples/sec Loss 5.6644 LearningRate 0.0433 Epoch: 6 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:29,979-Speed 3365.84 samples/sec Loss 5.4646 LearningRate 0.0433 Epoch: 6 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:33,002-Speed 3388.20 samples/sec Loss 5.6053 LearningRate 0.0433 Epoch: 6 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:27:36,000-Speed 3415.30 samples/sec Loss 5.6353 LearningRate 0.0433 Epoch: 6 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:39,017-Speed 3395.64 samples/sec Loss 5.4846 LearningRate 0.0432 Epoch: 6 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:42,043-Speed 3385.12 samples/sec Loss 5.4924 LearningRate 0.0432 Epoch: 6 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:45,067-Speed 3386.44 samples/sec Loss 5.7338 LearningRate 0.0432 Epoch: 6 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:48,137-Speed 3336.44 samples/sec Loss 5.6570 LearningRate 0.0432 Epoch: 6 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:51,215-Speed 3327.25 samples/sec Loss 5.4595 LearningRate 0.0432 Epoch: 6 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:54,238-Speed 3388.61 samples/sec Loss 5.6142 LearningRate 0.0432 Epoch: 6 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:27:57,262-Speed 3387.03 samples/sec Loss 5.5899 LearningRate 0.0432 Epoch: 6 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:00,286-Speed 3387.85 samples/sec Loss 5.5959 LearningRate 0.0432 Epoch: 6 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:03,311-Speed 3385.47 samples/sec Loss 5.5384 LearningRate 0.0431 Epoch: 6 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:06,332-Speed 3390.53 samples/sec Loss 5.5895 LearningRate 0.0431 Epoch: 6 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:09,400-Speed 3338.74 samples/sec Loss 5.5640 LearningRate 0.0431 Epoch: 6 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:12,436-Speed 3372.64 samples/sec Loss 5.6058 LearningRate 0.0431 Epoch: 6 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:15,561-Speed 3277.87 samples/sec Loss 5.5486 LearningRate 0.0431 Epoch: 6 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:18,583-Speed 3389.55 samples/sec Loss 5.4597 LearningRate 0.0431 Epoch: 6 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:21,606-Speed 3388.85 samples/sec Loss 5.6496 LearningRate 0.0431 Epoch: 6 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:24,628-Speed 3389.49 samples/sec Loss 5.6757 LearningRate 0.0431 Epoch: 6 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:27,647-Speed 3392.08 samples/sec Loss 5.6598 LearningRate 0.0431 Epoch: 6 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:30,675-Speed 3382.62 samples/sec Loss 5.4905 LearningRate 0.0430 Epoch: 6 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:33,698-Speed 3387.92 samples/sec Loss 5.8238 LearningRate 0.0430 Epoch: 6 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:36,713-Speed 3397.10 samples/sec Loss 5.5962 LearningRate 0.0430 Epoch: 6 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:28:39,716-Speed 3410.89 samples/sec Loss 5.5194 LearningRate 0.0430 Epoch: 6 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:42,740-Speed 3386.51 samples/sec Loss 5.5278 LearningRate 0.0430 Epoch: 6 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:45,765-Speed 3386.34 samples/sec Loss 5.6201 LearningRate 0.0430 Epoch: 6 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:50,078-Speed 2375.61 samples/sec Loss 5.5726 LearningRate 0.0430 Epoch: 6 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:53,112-Speed 3375.84 samples/sec Loss 5.5631 LearningRate 0.0430 Epoch: 6 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:56,138-Speed 3384.55 samples/sec Loss 5.5543 LearningRate 0.0430 Epoch: 6 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:28:59,173-Speed 3375.32 samples/sec Loss 5.7217 LearningRate 0.0429 Epoch: 6 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:02,193-Speed 3390.95 samples/sec Loss 5.5733 LearningRate 0.0429 Epoch: 6 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:05,220-Speed 3384.01 samples/sec Loss 5.5694 LearningRate 0.0429 Epoch: 6 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:08,252-Speed 3377.91 samples/sec Loss 5.6998 LearningRate 0.0429 Epoch: 6 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:11,279-Speed 3384.02 samples/sec Loss 5.6036 LearningRate 0.0429 Epoch: 6 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:14,303-Speed 3387.12 samples/sec Loss 5.6292 LearningRate 0.0429 Epoch: 6 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:17,336-Speed 3377.13 samples/sec Loss 5.5478 LearningRate 0.0429 Epoch: 6 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:20,359-Speed 3387.71 samples/sec Loss 5.4595 LearningRate 0.0429 Epoch: 6 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:23,415-Speed 3351.20 samples/sec Loss 5.5551 LearningRate 0.0428 Epoch: 6 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:26,485-Speed 3339.19 samples/sec Loss 5.6182 LearningRate 0.0428 Epoch: 6 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:29,528-Speed 3365.53 samples/sec Loss 5.5561 LearningRate 0.0428 Epoch: 6 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:29:32,540-Speed 3401.23 samples/sec Loss 5.5702 LearningRate 0.0428 Epoch: 6 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:35,572-Speed 3377.58 samples/sec Loss 5.5352 LearningRate 0.0428 Epoch: 6 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:38,610-Speed 3371.41 samples/sec Loss 5.5926 LearningRate 0.0428 Epoch: 6 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:41,643-Speed 3377.10 samples/sec Loss 5.4477 LearningRate 0.0428 Epoch: 6 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:44,662-Speed 3393.29 samples/sec Loss 5.5815 LearningRate 0.0428 Epoch: 6 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:47,703-Speed 3368.24 samples/sec Loss 5.7323 LearningRate 0.0428 Epoch: 6 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:50,790-Speed 3317.89 samples/sec Loss 5.5455 LearningRate 0.0427 Epoch: 6 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:53,823-Speed 3376.28 samples/sec Loss 5.4992 LearningRate 0.0427 Epoch: 6 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:56,869-Speed 3362.83 samples/sec Loss 5.6810 LearningRate 0.0427 Epoch: 6 Global Step: 39390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:29:59,912-Speed 3365.73 samples/sec Loss 5.5848 LearningRate 0.0427 Epoch: 6 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:02,965-Speed 3354.86 samples/sec Loss 5.5473 LearningRate 0.0427 Epoch: 6 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:05,991-Speed 3385.40 samples/sec Loss 5.5512 LearningRate 0.0427 Epoch: 6 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:09,018-Speed 3383.33 samples/sec Loss 5.5630 LearningRate 0.0427 Epoch: 6 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:12,093-Speed 3330.66 samples/sec Loss 5.4164 LearningRate 0.0427 Epoch: 6 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:15,149-Speed 3352.29 samples/sec Loss 5.4893 LearningRate 0.0427 Epoch: 6 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:18,176-Speed 3382.47 samples/sec Loss 5.5979 LearningRate 0.0426 Epoch: 6 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:21,200-Speed 3387.43 samples/sec Loss 5.4851 LearningRate 0.0426 Epoch: 6 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:24,242-Speed 3367.39 samples/sec Loss 5.6026 LearningRate 0.0426 Epoch: 6 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:27,283-Speed 3367.37 samples/sec Loss 5.5925 LearningRate 0.0426 Epoch: 6 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:30,304-Speed 3390.79 samples/sec Loss 5.4942 LearningRate 0.0426 Epoch: 6 Global Step: 39500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:33,316-Speed 3401.26 samples/sec Loss 5.5888 LearningRate 0.0426 Epoch: 6 Global Step: 39510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:36,383-Speed 3338.99 samples/sec Loss 5.5791 LearningRate 0.0426 Epoch: 6 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:30:39,414-Speed 3379.02 samples/sec Loss 5.6264 LearningRate 0.0426 Epoch: 6 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:42,452-Speed 3371.62 samples/sec Loss 5.5402 LearningRate 0.0426 Epoch: 6 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:45,476-Speed 3387.61 samples/sec Loss 5.5738 LearningRate 0.0425 Epoch: 6 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:48,501-Speed 3385.42 samples/sec Loss 5.6077 LearningRate 0.0425 Epoch: 6 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:51,529-Speed 3382.14 samples/sec Loss 5.5230 LearningRate 0.0425 Epoch: 6 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:54,560-Speed 3379.91 samples/sec Loss 5.5419 LearningRate 0.0425 Epoch: 6 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:30:57,582-Speed 3389.30 samples/sec Loss 5.5050 LearningRate 0.0425 Epoch: 6 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:00,625-Speed 3365.24 samples/sec Loss 5.6057 LearningRate 0.0425 Epoch: 6 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:03,661-Speed 3373.68 samples/sec Loss 5.5732 LearningRate 0.0425 Epoch: 6 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:06,691-Speed 3381.05 samples/sec Loss 5.4860 LearningRate 0.0425 Epoch: 6 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:09,712-Speed 3390.06 samples/sec Loss 5.4491 LearningRate 0.0424 Epoch: 6 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:31:12,735-Speed 3388.83 samples/sec Loss 5.5150 LearningRate 0.0424 Epoch: 6 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:31:15,761-Speed 3384.30 samples/sec Loss 5.5302 LearningRate 0.0424 Epoch: 6 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:31:18,771-Speed 3403.34 samples/sec Loss 5.5761 LearningRate 0.0424 Epoch: 6 Global Step: 39660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:21,809-Speed 3371.01 samples/sec Loss 5.5208 LearningRate 0.0424 Epoch: 6 Global Step: 39670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:24,922-Speed 3290.32 samples/sec Loss 5.4967 LearningRate 0.0424 Epoch: 6 Global Step: 39680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:27,955-Speed 3377.24 samples/sec Loss 5.6878 LearningRate 0.0424 Epoch: 6 Global Step: 39690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:30,981-Speed 3385.97 samples/sec Loss 5.5165 LearningRate 0.0424 Epoch: 6 Global Step: 39700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:34,004-Speed 3387.56 samples/sec Loss 5.5908 LearningRate 0.0424 Epoch: 6 Global Step: 39710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:37,031-Speed 3383.92 samples/sec Loss 5.4702 LearningRate 0.0423 Epoch: 6 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:40,066-Speed 3374.73 samples/sec Loss 5.6229 LearningRate 0.0423 Epoch: 6 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:43,085-Speed 3391.91 samples/sec Loss 5.5897 LearningRate 0.0423 Epoch: 6 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:31:46,103-Speed 3393.70 samples/sec Loss 5.5450 LearningRate 0.0423 Epoch: 6 Global Step: 39750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:31:49,140-Speed 3372.70 samples/sec Loss 5.6290 LearningRate 0.0423 Epoch: 6 Global Step: 39760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:31:52,168-Speed 3382.72 samples/sec Loss 5.4820 LearningRate 0.0423 Epoch: 6 Global Step: 39770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:31:55,190-Speed 3389.12 samples/sec Loss 5.4478 LearningRate 0.0423 Epoch: 6 Global Step: 39780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:31:58,212-Speed 3389.20 samples/sec Loss 5.6044 LearningRate 0.0423 Epoch: 6 Global Step: 39790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:01,325-Speed 3290.66 samples/sec Loss 5.5053 LearningRate 0.0423 Epoch: 6 Global Step: 39800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:14,630-Speed 769.68 samples/sec Loss 5.0398 LearningRate 0.0422 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:17,660-Speed 3380.45 samples/sec Loss 4.8569 LearningRate 0.0422 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:20,680-Speed 3391.69 samples/sec Loss 4.8508 LearningRate 0.0422 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:23,757-Speed 3329.13 samples/sec Loss 4.7991 LearningRate 0.0422 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:32:26,799-Speed 3367.13 samples/sec Loss 4.9876 LearningRate 0.0422 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:29,829-Speed 3379.76 samples/sec Loss 5.0945 LearningRate 0.0422 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:32,850-Speed 3390.73 samples/sec Loss 5.0478 LearningRate 0.0422 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:35,886-Speed 3374.02 samples/sec Loss 4.9230 LearningRate 0.0422 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:38,916-Speed 3380.29 samples/sec Loss 5.1305 LearningRate 0.0421 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:41,939-Speed 3387.31 samples/sec Loss 4.9757 LearningRate 0.0421 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:44,970-Speed 3380.27 samples/sec Loss 4.9383 LearningRate 0.0421 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:47,989-Speed 3391.50 samples/sec Loss 4.9442 LearningRate 0.0421 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:51,013-Speed 3387.67 samples/sec Loss 5.0124 LearningRate 0.0421 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:54,037-Speed 3387.48 samples/sec Loss 5.0744 LearningRate 0.0421 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:32:57,069-Speed 3377.02 samples/sec Loss 4.8463 LearningRate 0.0421 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:00,120-Speed 3357.59 samples/sec Loss 4.9656 LearningRate 0.0421 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:03,191-Speed 3336.05 samples/sec Loss 4.9603 LearningRate 0.0421 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:06,213-Speed 3388.26 samples/sec Loss 5.0118 LearningRate 0.0420 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:09,245-Speed 3378.64 samples/sec Loss 5.2516 LearningRate 0.0420 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:12,271-Speed 3384.28 samples/sec Loss 5.0008 LearningRate 0.0420 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:33:55,883-[lfw][40000]XNorm: 21.172970 Training: 2022-04-27 05:33:55,883-[lfw][40000]Accuracy-Flip: 0.99700+-0.00306 Training: 2022-04-27 05:33:55,884-[lfw][40000]Accuracy-Highest: 0.99817 Training: 2022-04-27 05:34:46,178-[cfp_fp][40000]XNorm: 18.940806 Training: 2022-04-27 05:34:46,179-[cfp_fp][40000]Accuracy-Flip: 0.96057+-0.00920 Training: 2022-04-27 05:34:46,179-[cfp_fp][40000]Accuracy-Highest: 0.96057 Training: 2022-04-27 05:35:29,440-[agedb_30][40000]XNorm: 20.838208 Training: 2022-04-27 05:35:29,441-[agedb_30][40000]Accuracy-Flip: 0.97533+-0.00812 Training: 2022-04-27 05:35:29,441-[agedb_30][40000]Accuracy-Highest: 0.97533 Training: 2022-04-27 05:35:32,454-Speed 73.05 samples/sec Loss 5.0428 LearningRate 0.0420 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:35:35,467-Speed 3399.76 samples/sec Loss 5.0671 LearningRate 0.0420 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:35:38,479-Speed 3400.95 samples/sec Loss 5.1549 LearningRate 0.0420 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:41,500-Speed 3390.09 samples/sec Loss 5.1226 LearningRate 0.0420 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:44,509-Speed 3403.95 samples/sec Loss 5.0886 LearningRate 0.0420 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:47,517-Speed 3404.14 samples/sec Loss 5.0672 LearningRate 0.0420 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:50,531-Speed 3398.65 samples/sec Loss 4.8689 LearningRate 0.0419 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:53,542-Speed 3401.45 samples/sec Loss 5.0351 LearningRate 0.0419 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:56,559-Speed 3395.99 samples/sec Loss 5.2905 LearningRate 0.0419 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:35:59,575-Speed 3396.16 samples/sec Loss 5.0529 LearningRate 0.0419 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:02,592-Speed 3394.06 samples/sec Loss 4.9863 LearningRate 0.0419 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:05,607-Speed 3397.94 samples/sec Loss 5.1967 LearningRate 0.0419 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:08,605-Speed 3416.21 samples/sec Loss 5.0403 LearningRate 0.0419 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:11,627-Speed 3389.03 samples/sec Loss 5.1600 LearningRate 0.0419 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:14,763-Speed 3265.50 samples/sec Loss 5.1438 LearningRate 0.0419 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:17,802-Speed 3370.75 samples/sec Loss 5.0758 LearningRate 0.0418 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:20,817-Speed 3397.33 samples/sec Loss 5.3032 LearningRate 0.0418 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:23,828-Speed 3401.41 samples/sec Loss 5.1906 LearningRate 0.0418 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:26,844-Speed 3395.91 samples/sec Loss 5.1433 LearningRate 0.0418 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:29,859-Speed 3397.33 samples/sec Loss 5.0779 LearningRate 0.0418 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:32,875-Speed 3395.95 samples/sec Loss 5.1724 LearningRate 0.0418 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:35,901-Speed 3384.65 samples/sec Loss 5.1266 LearningRate 0.0418 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:36:38,947-Speed 3363.06 samples/sec Loss 5.1030 LearningRate 0.0418 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:41,977-Speed 3381.56 samples/sec Loss 5.0807 LearningRate 0.0418 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:44,994-Speed 3394.47 samples/sec Loss 5.2011 LearningRate 0.0417 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:48,034-Speed 3369.15 samples/sec Loss 5.1048 LearningRate 0.0417 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:51,068-Speed 3375.89 samples/sec Loss 5.1897 LearningRate 0.0417 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:54,081-Speed 3399.98 samples/sec Loss 5.2099 LearningRate 0.0417 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:36:57,093-Speed 3400.12 samples/sec Loss 5.1365 LearningRate 0.0417 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:00,101-Speed 3404.53 samples/sec Loss 5.1653 LearningRate 0.0417 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:03,178-Speed 3328.83 samples/sec Loss 5.1409 LearningRate 0.0417 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:06,194-Speed 3396.84 samples/sec Loss 5.2287 LearningRate 0.0417 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:09,189-Speed 3419.44 samples/sec Loss 5.1428 LearningRate 0.0416 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:12,208-Speed 3392.65 samples/sec Loss 5.2801 LearningRate 0.0416 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:15,234-Speed 3384.86 samples/sec Loss 5.1528 LearningRate 0.0416 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:18,255-Speed 3390.64 samples/sec Loss 5.1602 LearningRate 0.0416 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:21,270-Speed 3398.02 samples/sec Loss 5.2740 LearningRate 0.0416 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:24,288-Speed 3393.80 samples/sec Loss 5.3421 LearningRate 0.0416 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:27,298-Speed 3403.09 samples/sec Loss 5.1532 LearningRate 0.0416 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:30,310-Speed 3400.02 samples/sec Loss 5.2033 LearningRate 0.0416 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:33,321-Speed 3401.13 samples/sec Loss 5.2207 LearningRate 0.0416 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:36,341-Speed 3392.06 samples/sec Loss 5.3025 LearningRate 0.0415 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:37:39,359-Speed 3393.73 samples/sec Loss 5.0454 LearningRate 0.0415 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:42,379-Speed 3392.74 samples/sec Loss 5.3343 LearningRate 0.0415 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:45,387-Speed 3404.26 samples/sec Loss 5.1913 LearningRate 0.0415 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:48,403-Speed 3395.88 samples/sec Loss 5.2000 LearningRate 0.0415 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:51,483-Speed 3325.82 samples/sec Loss 5.3185 LearningRate 0.0415 Epoch: 7 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:54,520-Speed 3372.37 samples/sec Loss 5.2410 LearningRate 0.0415 Epoch: 7 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:37:57,531-Speed 3402.28 samples/sec Loss 5.3269 LearningRate 0.0415 Epoch: 7 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:38:00,528-Speed 3417.00 samples/sec Loss 5.1341 LearningRate 0.0415 Epoch: 7 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:03,538-Speed 3403.02 samples/sec Loss 5.1540 LearningRate 0.0414 Epoch: 7 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:06,552-Speed 3398.06 samples/sec Loss 5.1862 LearningRate 0.0414 Epoch: 7 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:09,572-Speed 3391.66 samples/sec Loss 5.1996 LearningRate 0.0414 Epoch: 7 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:12,594-Speed 3389.13 samples/sec Loss 5.2199 LearningRate 0.0414 Epoch: 7 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:15,634-Speed 3369.53 samples/sec Loss 5.2084 LearningRate 0.0414 Epoch: 7 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:18,663-Speed 3381.75 samples/sec Loss 5.2612 LearningRate 0.0414 Epoch: 7 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:21,675-Speed 3400.76 samples/sec Loss 5.4461 LearningRate 0.0414 Epoch: 7 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:24,694-Speed 3391.91 samples/sec Loss 5.1609 LearningRate 0.0414 Epoch: 7 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:27,781-Speed 3317.94 samples/sec Loss 5.3188 LearningRate 0.0414 Epoch: 7 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:30,849-Speed 3339.12 samples/sec Loss 5.1810 LearningRate 0.0413 Epoch: 7 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:38:33,860-Speed 3401.32 samples/sec Loss 5.2394 LearningRate 0.0413 Epoch: 7 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 05:38:36,864-Speed 3409.89 samples/sec Loss 5.2655 LearningRate 0.0413 Epoch: 7 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 05:38:39,889-Speed 3386.77 samples/sec Loss 5.4035 LearningRate 0.0413 Epoch: 7 Global Step: 40630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:42,915-Speed 3384.97 samples/sec Loss 5.4429 LearningRate 0.0413 Epoch: 7 Global Step: 40640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:45,927-Speed 3400.15 samples/sec Loss 5.2148 LearningRate 0.0413 Epoch: 7 Global Step: 40650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:48,939-Speed 3400.67 samples/sec Loss 5.3484 LearningRate 0.0413 Epoch: 7 Global Step: 40660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:51,955-Speed 3395.51 samples/sec Loss 5.2749 LearningRate 0.0413 Epoch: 7 Global Step: 40670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:54,981-Speed 3385.33 samples/sec Loss 5.3446 LearningRate 0.0413 Epoch: 7 Global Step: 40680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 05:38:57,998-Speed 3394.50 samples/sec Loss 5.2075 LearningRate 0.0412 Epoch: 7 Global Step: 40690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:39:01,020-Speed 3388.59 samples/sec Loss 5.2223 LearningRate 0.0412 Epoch: 7 Global Step: 40700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:39:04,085-Speed 3341.81 samples/sec Loss 5.2295 LearningRate 0.0412 Epoch: 7 Global Step: 40710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:39:07,111-Speed 3385.60 samples/sec Loss 5.3813 LearningRate 0.0412 Epoch: 7 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:39:10,125-Speed 3397.58 samples/sec Loss 5.3906 LearningRate 0.0412 Epoch: 7 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:13,143-Speed 3394.45 samples/sec Loss 5.2987 LearningRate 0.0412 Epoch: 7 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:16,166-Speed 3387.75 samples/sec Loss 5.2909 LearningRate 0.0412 Epoch: 7 Global Step: 40750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:19,185-Speed 3392.84 samples/sec Loss 5.3629 LearningRate 0.0412 Epoch: 7 Global Step: 40760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:22,208-Speed 3387.98 samples/sec Loss 5.2344 LearningRate 0.0412 Epoch: 7 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:25,231-Speed 3389.24 samples/sec Loss 5.3814 LearningRate 0.0411 Epoch: 7 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:28,257-Speed 3384.18 samples/sec Loss 5.3018 LearningRate 0.0411 Epoch: 7 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:31,295-Speed 3371.42 samples/sec Loss 5.3043 LearningRate 0.0411 Epoch: 7 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:34,327-Speed 3378.26 samples/sec Loss 5.1587 LearningRate 0.0411 Epoch: 7 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:37,342-Speed 3397.29 samples/sec Loss 5.3989 LearningRate 0.0411 Epoch: 7 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:39:40,360-Speed 3393.60 samples/sec Loss 5.3110 LearningRate 0.0411 Epoch: 7 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:43,378-Speed 3394.59 samples/sec Loss 5.2883 LearningRate 0.0411 Epoch: 7 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:46,410-Speed 3378.22 samples/sec Loss 5.3168 LearningRate 0.0411 Epoch: 7 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:49,465-Speed 3352.13 samples/sec Loss 5.4209 LearningRate 0.0410 Epoch: 7 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:52,502-Speed 3372.15 samples/sec Loss 5.3123 LearningRate 0.0410 Epoch: 7 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:55,520-Speed 3394.49 samples/sec Loss 5.3772 LearningRate 0.0410 Epoch: 7 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:39:58,544-Speed 3386.92 samples/sec Loss 5.2280 LearningRate 0.0410 Epoch: 7 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:01,564-Speed 3392.12 samples/sec Loss 5.4381 LearningRate 0.0410 Epoch: 7 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:04,588-Speed 3387.07 samples/sec Loss 5.4348 LearningRate 0.0410 Epoch: 7 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:07,612-Speed 3386.86 samples/sec Loss 5.2654 LearningRate 0.0410 Epoch: 7 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:10,624-Speed 3400.72 samples/sec Loss 5.2255 LearningRate 0.0410 Epoch: 7 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:13,638-Speed 3397.46 samples/sec Loss 5.3370 LearningRate 0.0410 Epoch: 7 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:16,657-Speed 3393.66 samples/sec Loss 5.1888 LearningRate 0.0409 Epoch: 7 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:19,674-Speed 3394.23 samples/sec Loss 5.2304 LearningRate 0.0409 Epoch: 7 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:22,694-Speed 3392.17 samples/sec Loss 5.2830 LearningRate 0.0409 Epoch: 7 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:25,711-Speed 3394.58 samples/sec Loss 5.2786 LearningRate 0.0409 Epoch: 7 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:28,724-Speed 3399.65 samples/sec Loss 5.3427 LearningRate 0.0409 Epoch: 7 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:31,740-Speed 3395.92 samples/sec Loss 5.3647 LearningRate 0.0409 Epoch: 7 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:40:34,745-Speed 3408.92 samples/sec Loss 5.2449 LearningRate 0.0409 Epoch: 7 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:37,766-Speed 3389.63 samples/sec Loss 5.2530 LearningRate 0.0409 Epoch: 7 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:40,790-Speed 3387.36 samples/sec Loss 5.3864 LearningRate 0.0409 Epoch: 7 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:43,804-Speed 3398.16 samples/sec Loss 5.2309 LearningRate 0.0408 Epoch: 7 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:46,824-Speed 3392.06 samples/sec Loss 5.3567 LearningRate 0.0408 Epoch: 7 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:49,842-Speed 3393.11 samples/sec Loss 5.2320 LearningRate 0.0408 Epoch: 7 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:52,859-Speed 3395.38 samples/sec Loss 5.2927 LearningRate 0.0408 Epoch: 7 Global Step: 41070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:55,879-Speed 3391.35 samples/sec Loss 5.4646 LearningRate 0.0408 Epoch: 7 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:40:58,908-Speed 3381.89 samples/sec Loss 5.2921 LearningRate 0.0408 Epoch: 7 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:01,929-Speed 3389.88 samples/sec Loss 5.3279 LearningRate 0.0408 Epoch: 7 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:04,951-Speed 3389.14 samples/sec Loss 5.2276 LearningRate 0.0408 Epoch: 7 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:07,978-Speed 3383.83 samples/sec Loss 5.3539 LearningRate 0.0408 Epoch: 7 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:10,996-Speed 3393.49 samples/sec Loss 5.2430 LearningRate 0.0407 Epoch: 7 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:14,021-Speed 3385.72 samples/sec Loss 5.2516 LearningRate 0.0407 Epoch: 7 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:17,053-Speed 3379.06 samples/sec Loss 5.2645 LearningRate 0.0407 Epoch: 7 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:20,051-Speed 3416.09 samples/sec Loss 5.3488 LearningRate 0.0407 Epoch: 7 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:23,069-Speed 3394.63 samples/sec Loss 5.3185 LearningRate 0.0407 Epoch: 7 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:26,089-Speed 3391.21 samples/sec Loss 5.2533 LearningRate 0.0407 Epoch: 7 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:29,112-Speed 3387.41 samples/sec Loss 5.2918 LearningRate 0.0407 Epoch: 7 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:32,130-Speed 3394.10 samples/sec Loss 5.2540 LearningRate 0.0407 Epoch: 7 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:35,148-Speed 3393.34 samples/sec Loss 5.2229 LearningRate 0.0407 Epoch: 7 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:38,171-Speed 3388.77 samples/sec Loss 5.4404 LearningRate 0.0406 Epoch: 7 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:41,197-Speed 3384.37 samples/sec Loss 5.2475 LearningRate 0.0406 Epoch: 7 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:44,215-Speed 3393.12 samples/sec Loss 5.3928 LearningRate 0.0406 Epoch: 7 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:47,238-Speed 3389.02 samples/sec Loss 5.3483 LearningRate 0.0406 Epoch: 7 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:50,258-Speed 3391.44 samples/sec Loss 5.3338 LearningRate 0.0406 Epoch: 7 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:41:53,257-Speed 3416.21 samples/sec Loss 5.5069 LearningRate 0.0406 Epoch: 7 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:56,274-Speed 3394.63 samples/sec Loss 5.3644 LearningRate 0.0406 Epoch: 7 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:41:59,296-Speed 3389.63 samples/sec Loss 5.3598 LearningRate 0.0406 Epoch: 7 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:02,322-Speed 3384.56 samples/sec Loss 5.3063 LearningRate 0.0406 Epoch: 7 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:05,346-Speed 3387.53 samples/sec Loss 5.1989 LearningRate 0.0405 Epoch: 7 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:08,363-Speed 3394.13 samples/sec Loss 5.4281 LearningRate 0.0405 Epoch: 7 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:11,428-Speed 3342.14 samples/sec Loss 5.3300 LearningRate 0.0405 Epoch: 7 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:14,461-Speed 3376.41 samples/sec Loss 5.2099 LearningRate 0.0405 Epoch: 7 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:17,486-Speed 3385.98 samples/sec Loss 5.3388 LearningRate 0.0405 Epoch: 7 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:20,506-Speed 3392.25 samples/sec Loss 5.2770 LearningRate 0.0405 Epoch: 7 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:23,511-Speed 3408.65 samples/sec Loss 5.3447 LearningRate 0.0405 Epoch: 7 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:26,534-Speed 3387.34 samples/sec Loss 5.2242 LearningRate 0.0405 Epoch: 7 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:29,555-Speed 3390.34 samples/sec Loss 5.2952 LearningRate 0.0405 Epoch: 7 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:32,583-Speed 3383.41 samples/sec Loss 5.2577 LearningRate 0.0404 Epoch: 7 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:35,606-Speed 3388.29 samples/sec Loss 5.3719 LearningRate 0.0404 Epoch: 7 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:38,632-Speed 3384.75 samples/sec Loss 5.2577 LearningRate 0.0404 Epoch: 7 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:41,745-Speed 3290.10 samples/sec Loss 5.1217 LearningRate 0.0404 Epoch: 7 Global Step: 41430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:44,765-Speed 3391.63 samples/sec Loss 5.2532 LearningRate 0.0404 Epoch: 7 Global Step: 41440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:47,785-Speed 3390.95 samples/sec Loss 5.2677 LearningRate 0.0404 Epoch: 7 Global Step: 41450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:50,811-Speed 3385.09 samples/sec Loss 5.3759 LearningRate 0.0404 Epoch: 7 Global Step: 41460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:42:53,834-Speed 3388.81 samples/sec Loss 5.1643 LearningRate 0.0404 Epoch: 7 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:42:56,857-Speed 3387.02 samples/sec Loss 5.3134 LearningRate 0.0404 Epoch: 7 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:42:59,881-Speed 3387.67 samples/sec Loss 5.3088 LearningRate 0.0403 Epoch: 7 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:02,909-Speed 3382.36 samples/sec Loss 5.2463 LearningRate 0.0403 Epoch: 7 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:05,935-Speed 3385.31 samples/sec Loss 5.2835 LearningRate 0.0403 Epoch: 7 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:08,942-Speed 3405.59 samples/sec Loss 5.2284 LearningRate 0.0403 Epoch: 7 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:11,969-Speed 3383.43 samples/sec Loss 5.3292 LearningRate 0.0403 Epoch: 7 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:14,997-Speed 3382.95 samples/sec Loss 5.3665 LearningRate 0.0403 Epoch: 7 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:18,026-Speed 3382.11 samples/sec Loss 5.3701 LearningRate 0.0403 Epoch: 7 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:21,056-Speed 3380.00 samples/sec Loss 5.3025 LearningRate 0.0403 Epoch: 7 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:24,086-Speed 3380.14 samples/sec Loss 5.2401 LearningRate 0.0403 Epoch: 7 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:27,182-Speed 3307.94 samples/sec Loss 5.3171 LearningRate 0.0402 Epoch: 7 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:30,216-Speed 3375.96 samples/sec Loss 5.4217 LearningRate 0.0402 Epoch: 7 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:33,243-Speed 3384.30 samples/sec Loss 5.4705 LearningRate 0.0402 Epoch: 7 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:36,275-Speed 3378.12 samples/sec Loss 5.3292 LearningRate 0.0402 Epoch: 7 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:39,368-Speed 3311.77 samples/sec Loss 5.3149 LearningRate 0.0402 Epoch: 7 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:42,390-Speed 3388.41 samples/sec Loss 5.2027 LearningRate 0.0402 Epoch: 7 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:45,412-Speed 3389.39 samples/sec Loss 5.1316 LearningRate 0.0402 Epoch: 7 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:48,435-Speed 3388.15 samples/sec Loss 5.2285 LearningRate 0.0402 Epoch: 7 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:43:51,453-Speed 3393.64 samples/sec Loss 5.2987 LearningRate 0.0402 Epoch: 7 Global Step: 41660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:54,476-Speed 3388.00 samples/sec Loss 5.3224 LearningRate 0.0401 Epoch: 7 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:43:57,505-Speed 3381.87 samples/sec Loss 5.2207 LearningRate 0.0401 Epoch: 7 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:00,560-Speed 3352.44 samples/sec Loss 5.2909 LearningRate 0.0401 Epoch: 7 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:03,611-Speed 3357.12 samples/sec Loss 5.3418 LearningRate 0.0401 Epoch: 7 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:06,637-Speed 3385.72 samples/sec Loss 5.2066 LearningRate 0.0401 Epoch: 7 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:09,663-Speed 3384.66 samples/sec Loss 5.2379 LearningRate 0.0401 Epoch: 7 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:12,764-Speed 3302.66 samples/sec Loss 5.4159 LearningRate 0.0401 Epoch: 7 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:15,826-Speed 3345.03 samples/sec Loss 5.2950 LearningRate 0.0401 Epoch: 7 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:18,852-Speed 3384.55 samples/sec Loss 5.3261 LearningRate 0.0401 Epoch: 7 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:21,882-Speed 3380.24 samples/sec Loss 5.2282 LearningRate 0.0400 Epoch: 7 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:44:24,891-Speed 3404.25 samples/sec Loss 5.2437 LearningRate 0.0400 Epoch: 7 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:27,922-Speed 3378.75 samples/sec Loss 5.2302 LearningRate 0.0400 Epoch: 7 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:30,953-Speed 3379.69 samples/sec Loss 5.4393 LearningRate 0.0400 Epoch: 7 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:33,981-Speed 3382.51 samples/sec Loss 5.2870 LearningRate 0.0400 Epoch: 7 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:37,009-Speed 3382.54 samples/sec Loss 5.4606 LearningRate 0.0400 Epoch: 7 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:40,052-Speed 3365.82 samples/sec Loss 5.4154 LearningRate 0.0400 Epoch: 7 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:43,079-Speed 3383.82 samples/sec Loss 5.2817 LearningRate 0.0400 Epoch: 7 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:46,099-Speed 3391.49 samples/sec Loss 5.3449 LearningRate 0.0400 Epoch: 7 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:49,126-Speed 3382.86 samples/sec Loss 5.3883 LearningRate 0.0399 Epoch: 7 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:52,149-Speed 3388.88 samples/sec Loss 5.2727 LearningRate 0.0399 Epoch: 7 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:44:55,171-Speed 3388.70 samples/sec Loss 5.2848 LearningRate 0.0399 Epoch: 7 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:44:58,325-Speed 3247.88 samples/sec Loss 5.5014 LearningRate 0.0399 Epoch: 7 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:45:01,388-Speed 3344.05 samples/sec Loss 5.4324 LearningRate 0.0399 Epoch: 7 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:45:04,427-Speed 3370.82 samples/sec Loss 5.3020 LearningRate 0.0399 Epoch: 7 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:45:07,450-Speed 3387.58 samples/sec Loss 5.2393 LearningRate 0.0399 Epoch: 7 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:45:10,474-Speed 3387.49 samples/sec Loss 5.4133 LearningRate 0.0399 Epoch: 7 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:45:13,477-Speed 3410.12 samples/sec Loss 5.3490 LearningRate 0.0399 Epoch: 7 Global Step: 41930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:16,500-Speed 3388.55 samples/sec Loss 5.3885 LearningRate 0.0398 Epoch: 7 Global Step: 41940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:19,526-Speed 3384.17 samples/sec Loss 5.2427 LearningRate 0.0398 Epoch: 7 Global Step: 41950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:22,552-Speed 3384.94 samples/sec Loss 5.3903 LearningRate 0.0398 Epoch: 7 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:25,582-Speed 3381.07 samples/sec Loss 5.2481 LearningRate 0.0398 Epoch: 7 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:28,612-Speed 3379.82 samples/sec Loss 5.2013 LearningRate 0.0398 Epoch: 7 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:31,634-Speed 3389.76 samples/sec Loss 5.4006 LearningRate 0.0398 Epoch: 7 Global Step: 41990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:45:34,673-Speed 3370.01 samples/sec Loss 5.3380 LearningRate 0.0398 Epoch: 7 Global Step: 42000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:46:18,109-[lfw][42000]XNorm: 21.286959 Training: 2022-04-27 05:46:18,110-[lfw][42000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 05:46:18,110-[lfw][42000]Accuracy-Highest: 0.99817 Training: 2022-04-27 05:47:08,563-[cfp_fp][42000]XNorm: 18.559626 Training: 2022-04-27 05:47:08,563-[cfp_fp][42000]Accuracy-Flip: 0.95900+-0.00761 Training: 2022-04-27 05:47:08,564-[cfp_fp][42000]Accuracy-Highest: 0.96057 Training: 2022-04-27 05:47:52,060-[agedb_30][42000]XNorm: 21.462667 Training: 2022-04-27 05:47:52,060-[agedb_30][42000]Accuracy-Flip: 0.97767+-0.00786 Training: 2022-04-27 05:47:52,061-[agedb_30][42000]Accuracy-Highest: 0.97767 Training: 2022-04-27 05:47:55,071-Speed 72.94 samples/sec Loss 5.2957 LearningRate 0.0398 Epoch: 7 Global Step: 42010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:47:58,080-Speed 3404.71 samples/sec Loss 5.2181 LearningRate 0.0398 Epoch: 7 Global Step: 42020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:01,082-Speed 3411.89 samples/sec Loss 5.3289 LearningRate 0.0397 Epoch: 7 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:04,091-Speed 3403.88 samples/sec Loss 5.2807 LearningRate 0.0397 Epoch: 7 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:07,079-Speed 3427.53 samples/sec Loss 5.3586 LearningRate 0.0397 Epoch: 7 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:10,102-Speed 3388.65 samples/sec Loss 5.3848 LearningRate 0.0397 Epoch: 7 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:13,120-Speed 3393.59 samples/sec Loss 5.3208 LearningRate 0.0397 Epoch: 7 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:16,147-Speed 3384.18 samples/sec Loss 5.3173 LearningRate 0.0397 Epoch: 7 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:19,163-Speed 3396.12 samples/sec Loss 5.2076 LearningRate 0.0397 Epoch: 7 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:22,194-Speed 3388.65 samples/sec Loss 5.2705 LearningRate 0.0397 Epoch: 7 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:25,208-Speed 3397.48 samples/sec Loss 5.1888 LearningRate 0.0397 Epoch: 7 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:28,229-Speed 3390.64 samples/sec Loss 5.2676 LearningRate 0.0396 Epoch: 7 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:31,243-Speed 3398.22 samples/sec Loss 5.4252 LearningRate 0.0396 Epoch: 7 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:34,269-Speed 3385.50 samples/sec Loss 5.3550 LearningRate 0.0396 Epoch: 7 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:48:37,291-Speed 3389.32 samples/sec Loss 5.3062 LearningRate 0.0396 Epoch: 7 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:40,334-Speed 3364.93 samples/sec Loss 5.2551 LearningRate 0.0396 Epoch: 7 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:43,357-Speed 3388.30 samples/sec Loss 5.4049 LearningRate 0.0396 Epoch: 7 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:46,383-Speed 3385.39 samples/sec Loss 5.2779 LearningRate 0.0396 Epoch: 7 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:49,409-Speed 3385.23 samples/sec Loss 5.2735 LearningRate 0.0396 Epoch: 7 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:52,433-Speed 3386.82 samples/sec Loss 5.2290 LearningRate 0.0396 Epoch: 7 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:55,475-Speed 3367.62 samples/sec Loss 5.2662 LearningRate 0.0395 Epoch: 7 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:48:58,529-Speed 3353.45 samples/sec Loss 5.2631 LearningRate 0.0395 Epoch: 7 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:01,616-Speed 3317.81 samples/sec Loss 5.3379 LearningRate 0.0395 Epoch: 7 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:04,645-Speed 3381.71 samples/sec Loss 5.3068 LearningRate 0.0395 Epoch: 7 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:07,649-Speed 3409.83 samples/sec Loss 5.2665 LearningRate 0.0395 Epoch: 7 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:10,671-Speed 3389.20 samples/sec Loss 5.3267 LearningRate 0.0395 Epoch: 7 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:13,691-Speed 3391.81 samples/sec Loss 5.3318 LearningRate 0.0395 Epoch: 7 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:16,719-Speed 3383.07 samples/sec Loss 5.0789 LearningRate 0.0395 Epoch: 7 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:19,749-Speed 3379.72 samples/sec Loss 5.3608 LearningRate 0.0395 Epoch: 7 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:22,812-Speed 3344.01 samples/sec Loss 5.1940 LearningRate 0.0394 Epoch: 7 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:25,832-Speed 3392.01 samples/sec Loss 5.3070 LearningRate 0.0394 Epoch: 7 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:28,863-Speed 3379.23 samples/sec Loss 5.2944 LearningRate 0.0394 Epoch: 7 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:31,873-Speed 3402.56 samples/sec Loss 5.3153 LearningRate 0.0394 Epoch: 7 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:34,886-Speed 3399.25 samples/sec Loss 5.4645 LearningRate 0.0394 Epoch: 7 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:37,898-Speed 3401.01 samples/sec Loss 5.4354 LearningRate 0.0394 Epoch: 7 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:40,919-Speed 3390.46 samples/sec Loss 5.3576 LearningRate 0.0394 Epoch: 7 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:43,927-Speed 3405.58 samples/sec Loss 5.1967 LearningRate 0.0394 Epoch: 7 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:46,937-Speed 3403.64 samples/sec Loss 5.3703 LearningRate 0.0394 Epoch: 7 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:49,953-Speed 3395.20 samples/sec Loss 5.3835 LearningRate 0.0393 Epoch: 7 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:49:52,974-Speed 3390.62 samples/sec Loss 5.2729 LearningRate 0.0393 Epoch: 7 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:55,992-Speed 3393.73 samples/sec Loss 5.5140 LearningRate 0.0393 Epoch: 7 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:49:59,018-Speed 3385.11 samples/sec Loss 5.2309 LearningRate 0.0393 Epoch: 7 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:02,030-Speed 3400.54 samples/sec Loss 5.2104 LearningRate 0.0393 Epoch: 7 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:05,042-Speed 3400.36 samples/sec Loss 5.2971 LearningRate 0.0393 Epoch: 7 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:08,054-Speed 3400.68 samples/sec Loss 5.2109 LearningRate 0.0393 Epoch: 7 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:11,070-Speed 3396.01 samples/sec Loss 5.2110 LearningRate 0.0393 Epoch: 7 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:14,129-Speed 3349.03 samples/sec Loss 5.1676 LearningRate 0.0393 Epoch: 7 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:17,176-Speed 3360.43 samples/sec Loss 5.1136 LearningRate 0.0392 Epoch: 7 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:20,168-Speed 3424.24 samples/sec Loss 5.4679 LearningRate 0.0392 Epoch: 7 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:23,182-Speed 3397.81 samples/sec Loss 5.3530 LearningRate 0.0392 Epoch: 7 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:26,191-Speed 3404.45 samples/sec Loss 5.2280 LearningRate 0.0392 Epoch: 7 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:29,207-Speed 3395.73 samples/sec Loss 5.1045 LearningRate 0.0392 Epoch: 7 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:32,226-Speed 3391.95 samples/sec Loss 5.1556 LearningRate 0.0392 Epoch: 7 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:35,232-Speed 3407.38 samples/sec Loss 5.2879 LearningRate 0.0392 Epoch: 7 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:38,260-Speed 3382.98 samples/sec Loss 5.2637 LearningRate 0.0392 Epoch: 7 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:41,264-Speed 3409.71 samples/sec Loss 5.2281 LearningRate 0.0392 Epoch: 7 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:44,273-Speed 3404.02 samples/sec Loss 5.1288 LearningRate 0.0391 Epoch: 7 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:47,289-Speed 3396.19 samples/sec Loss 5.2725 LearningRate 0.0391 Epoch: 7 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:50:50,298-Speed 3403.07 samples/sec Loss 5.2355 LearningRate 0.0391 Epoch: 7 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:53,311-Speed 3399.65 samples/sec Loss 5.2119 LearningRate 0.0391 Epoch: 7 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:56,319-Speed 3406.42 samples/sec Loss 5.2045 LearningRate 0.0391 Epoch: 7 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:50:59,341-Speed 3389.07 samples/sec Loss 5.3271 LearningRate 0.0391 Epoch: 7 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:02,365-Speed 3387.08 samples/sec Loss 5.2736 LearningRate 0.0391 Epoch: 7 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:05,375-Speed 3403.05 samples/sec Loss 5.2486 LearningRate 0.0391 Epoch: 7 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:08,365-Speed 3425.28 samples/sec Loss 5.3191 LearningRate 0.0391 Epoch: 7 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:11,378-Speed 3399.65 samples/sec Loss 5.2043 LearningRate 0.0390 Epoch: 7 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:14,387-Speed 3403.40 samples/sec Loss 5.2337 LearningRate 0.0390 Epoch: 7 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:17,398-Speed 3401.57 samples/sec Loss 5.4323 LearningRate 0.0390 Epoch: 7 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:20,406-Speed 3405.77 samples/sec Loss 5.2526 LearningRate 0.0390 Epoch: 7 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:23,418-Speed 3400.27 samples/sec Loss 5.4054 LearningRate 0.0390 Epoch: 7 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:26,444-Speed 3384.74 samples/sec Loss 5.1991 LearningRate 0.0390 Epoch: 7 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:29,450-Speed 3406.86 samples/sec Loss 5.3375 LearningRate 0.0390 Epoch: 7 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:32,458-Speed 3405.44 samples/sec Loss 5.2327 LearningRate 0.0390 Epoch: 7 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:35,466-Speed 3404.90 samples/sec Loss 5.1729 LearningRate 0.0390 Epoch: 7 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:38,477-Speed 3401.99 samples/sec Loss 5.3363 LearningRate 0.0389 Epoch: 7 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:41,487-Speed 3403.39 samples/sec Loss 5.3317 LearningRate 0.0389 Epoch: 7 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:44,494-Speed 3405.71 samples/sec Loss 5.1145 LearningRate 0.0389 Epoch: 7 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:47,509-Speed 3397.46 samples/sec Loss 5.2596 LearningRate 0.0389 Epoch: 7 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:50,542-Speed 3377.33 samples/sec Loss 5.2833 LearningRate 0.0389 Epoch: 7 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:51:53,564-Speed 3388.67 samples/sec Loss 5.2073 LearningRate 0.0389 Epoch: 7 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:56,577-Speed 3399.98 samples/sec Loss 5.3041 LearningRate 0.0389 Epoch: 7 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:51:59,585-Speed 3404.52 samples/sec Loss 5.2779 LearningRate 0.0389 Epoch: 7 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:02,599-Speed 3398.06 samples/sec Loss 5.2097 LearningRate 0.0389 Epoch: 7 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:05,611-Speed 3400.78 samples/sec Loss 5.2681 LearningRate 0.0388 Epoch: 7 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:08,621-Speed 3403.97 samples/sec Loss 5.1583 LearningRate 0.0388 Epoch: 7 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:11,685-Speed 3341.94 samples/sec Loss 5.3946 LearningRate 0.0388 Epoch: 7 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:14,683-Speed 3416.73 samples/sec Loss 5.2311 LearningRate 0.0388 Epoch: 7 Global Step: 42870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:17,706-Speed 3387.65 samples/sec Loss 5.1706 LearningRate 0.0388 Epoch: 7 Global Step: 42880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:20,728-Speed 3390.17 samples/sec Loss 5.4517 LearningRate 0.0388 Epoch: 7 Global Step: 42890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:23,765-Speed 3372.96 samples/sec Loss 5.1993 LearningRate 0.0388 Epoch: 7 Global Step: 42900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:26,780-Speed 3396.88 samples/sec Loss 5.3851 LearningRate 0.0388 Epoch: 7 Global Step: 42910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:29,790-Speed 3402.45 samples/sec Loss 5.2239 LearningRate 0.0388 Epoch: 7 Global Step: 42920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:32,801-Speed 3401.96 samples/sec Loss 5.2969 LearningRate 0.0387 Epoch: 7 Global Step: 42930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:35,828-Speed 3383.92 samples/sec Loss 5.2097 LearningRate 0.0387 Epoch: 7 Global Step: 42940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:38,845-Speed 3395.46 samples/sec Loss 5.2847 LearningRate 0.0387 Epoch: 7 Global Step: 42950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:41,886-Speed 3368.36 samples/sec Loss 5.2673 LearningRate 0.0387 Epoch: 7 Global Step: 42960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 05:52:44,899-Speed 3399.91 samples/sec Loss 5.2113 LearningRate 0.0387 Epoch: 7 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:47,937-Speed 3370.91 samples/sec Loss 5.1904 LearningRate 0.0387 Epoch: 7 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:50,953-Speed 3396.81 samples/sec Loss 5.2037 LearningRate 0.0387 Epoch: 7 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:53,969-Speed 3396.12 samples/sec Loss 5.4139 LearningRate 0.0387 Epoch: 7 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:56,981-Speed 3399.99 samples/sec Loss 5.2287 LearningRate 0.0387 Epoch: 7 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:52:59,991-Speed 3403.03 samples/sec Loss 5.1582 LearningRate 0.0387 Epoch: 7 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:03,059-Speed 3338.62 samples/sec Loss 5.2885 LearningRate 0.0386 Epoch: 7 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:06,076-Speed 3394.62 samples/sec Loss 5.2639 LearningRate 0.0386 Epoch: 7 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:09,094-Speed 3392.98 samples/sec Loss 5.2455 LearningRate 0.0386 Epoch: 7 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:12,106-Speed 3401.05 samples/sec Loss 5.3608 LearningRate 0.0386 Epoch: 7 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:15,122-Speed 3395.39 samples/sec Loss 5.2535 LearningRate 0.0386 Epoch: 7 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:18,145-Speed 3389.34 samples/sec Loss 5.3424 LearningRate 0.0386 Epoch: 7 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:21,161-Speed 3395.62 samples/sec Loss 5.2124 LearningRate 0.0386 Epoch: 7 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:24,181-Speed 3391.87 samples/sec Loss 5.1572 LearningRate 0.0386 Epoch: 7 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:27,239-Speed 3349.61 samples/sec Loss 5.1750 LearningRate 0.0386 Epoch: 7 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:30,263-Speed 3387.00 samples/sec Loss 5.2803 LearningRate 0.0385 Epoch: 7 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:33,285-Speed 3388.98 samples/sec Loss 5.1307 LearningRate 0.0385 Epoch: 7 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:36,300-Speed 3396.15 samples/sec Loss 5.3335 LearningRate 0.0385 Epoch: 7 Global Step: 43140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:53:39,293-Speed 3422.52 samples/sec Loss 5.1343 LearningRate 0.0385 Epoch: 7 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:42,308-Speed 3397.92 samples/sec Loss 5.3566 LearningRate 0.0385 Epoch: 7 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:45,324-Speed 3395.57 samples/sec Loss 5.2668 LearningRate 0.0385 Epoch: 7 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:48,361-Speed 3372.67 samples/sec Loss 5.2982 LearningRate 0.0385 Epoch: 7 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:51,374-Speed 3399.07 samples/sec Loss 5.3144 LearningRate 0.0385 Epoch: 7 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:54,460-Speed 3318.98 samples/sec Loss 5.2466 LearningRate 0.0385 Epoch: 7 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:53:57,472-Speed 3401.39 samples/sec Loss 5.2197 LearningRate 0.0384 Epoch: 7 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:00,488-Speed 3395.09 samples/sec Loss 5.2331 LearningRate 0.0384 Epoch: 7 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:03,511-Speed 3388.26 samples/sec Loss 5.4197 LearningRate 0.0384 Epoch: 7 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:06,524-Speed 3399.62 samples/sec Loss 5.2321 LearningRate 0.0384 Epoch: 7 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:09,538-Speed 3399.01 samples/sec Loss 5.2456 LearningRate 0.0384 Epoch: 7 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:54:12,536-Speed 3416.01 samples/sec Loss 5.1832 LearningRate 0.0384 Epoch: 7 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:15,547-Speed 3401.26 samples/sec Loss 5.3662 LearningRate 0.0384 Epoch: 7 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:18,560-Speed 3399.31 samples/sec Loss 5.2218 LearningRate 0.0384 Epoch: 7 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:21,577-Speed 3395.19 samples/sec Loss 5.3803 LearningRate 0.0384 Epoch: 7 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:24,594-Speed 3394.61 samples/sec Loss 5.1700 LearningRate 0.0383 Epoch: 7 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:27,614-Speed 3392.46 samples/sec Loss 5.2787 LearningRate 0.0383 Epoch: 7 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:30,629-Speed 3397.24 samples/sec Loss 5.2091 LearningRate 0.0383 Epoch: 7 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:33,647-Speed 3393.16 samples/sec Loss 5.1646 LearningRate 0.0383 Epoch: 7 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:36,672-Speed 3386.81 samples/sec Loss 5.1972 LearningRate 0.0383 Epoch: 7 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:39,691-Speed 3392.60 samples/sec Loss 5.2474 LearningRate 0.0383 Epoch: 7 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:42,705-Speed 3397.77 samples/sec Loss 5.1451 LearningRate 0.0383 Epoch: 7 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:54:45,704-Speed 3415.73 samples/sec Loss 5.1991 LearningRate 0.0383 Epoch: 7 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:48,735-Speed 3379.02 samples/sec Loss 5.1814 LearningRate 0.0383 Epoch: 7 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:51,758-Speed 3387.81 samples/sec Loss 5.2288 LearningRate 0.0382 Epoch: 7 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:54,783-Speed 3386.00 samples/sec Loss 5.1599 LearningRate 0.0382 Epoch: 7 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:54:57,803-Speed 3391.40 samples/sec Loss 5.2812 LearningRate 0.0382 Epoch: 7 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:00,834-Speed 3379.02 samples/sec Loss 5.2537 LearningRate 0.0382 Epoch: 7 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:03,879-Speed 3363.51 samples/sec Loss 5.2344 LearningRate 0.0382 Epoch: 7 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:06,896-Speed 3395.24 samples/sec Loss 5.1352 LearningRate 0.0382 Epoch: 7 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:09,923-Speed 3383.70 samples/sec Loss 5.2507 LearningRate 0.0382 Epoch: 7 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:12,998-Speed 3331.27 samples/sec Loss 5.1207 LearningRate 0.0382 Epoch: 7 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:16,012-Speed 3398.01 samples/sec Loss 5.1133 LearningRate 0.0382 Epoch: 7 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:55:19,016-Speed 3409.47 samples/sec Loss 5.0983 LearningRate 0.0382 Epoch: 7 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:22,032-Speed 3396.02 samples/sec Loss 5.1884 LearningRate 0.0381 Epoch: 7 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:25,068-Speed 3373.08 samples/sec Loss 5.0631 LearningRate 0.0381 Epoch: 7 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:28,102-Speed 3375.93 samples/sec Loss 5.3145 LearningRate 0.0381 Epoch: 7 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:31,132-Speed 3381.35 samples/sec Loss 5.1278 LearningRate 0.0381 Epoch: 7 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:34,152-Speed 3390.86 samples/sec Loss 5.0955 LearningRate 0.0381 Epoch: 7 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:37,179-Speed 3384.42 samples/sec Loss 5.1985 LearningRate 0.0381 Epoch: 7 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:40,201-Speed 3389.01 samples/sec Loss 5.2508 LearningRate 0.0381 Epoch: 7 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:43,228-Speed 3383.59 samples/sec Loss 5.1944 LearningRate 0.0381 Epoch: 7 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:46,250-Speed 3389.54 samples/sec Loss 5.2313 LearningRate 0.0381 Epoch: 7 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:55:49,291-Speed 3367.77 samples/sec Loss 5.2341 LearningRate 0.0380 Epoch: 7 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:55:52,315-Speed 3387.29 samples/sec Loss 5.0491 LearningRate 0.0380 Epoch: 7 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:55:55,340-Speed 3386.06 samples/sec Loss 5.2910 LearningRate 0.0380 Epoch: 7 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:55:58,342-Speed 3411.32 samples/sec Loss 5.1951 LearningRate 0.0380 Epoch: 7 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:01,364-Speed 3390.06 samples/sec Loss 5.2752 LearningRate 0.0380 Epoch: 7 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:04,390-Speed 3384.32 samples/sec Loss 5.1749 LearningRate 0.0380 Epoch: 7 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:07,412-Speed 3389.69 samples/sec Loss 5.1061 LearningRate 0.0380 Epoch: 7 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:10,430-Speed 3393.50 samples/sec Loss 5.1638 LearningRate 0.0380 Epoch: 7 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:13,454-Speed 3386.87 samples/sec Loss 5.2046 LearningRate 0.0380 Epoch: 7 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:16,472-Speed 3393.49 samples/sec Loss 5.1137 LearningRate 0.0379 Epoch: 7 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:19,493-Speed 3391.57 samples/sec Loss 5.1575 LearningRate 0.0379 Epoch: 7 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:22,515-Speed 3388.57 samples/sec Loss 5.1266 LearningRate 0.0379 Epoch: 7 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:25,585-Speed 3336.23 samples/sec Loss 5.2980 LearningRate 0.0379 Epoch: 7 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:28,831-Speed 3155.94 samples/sec Loss 5.1948 LearningRate 0.0379 Epoch: 7 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:56:31,836-Speed 3408.15 samples/sec Loss 5.2958 LearningRate 0.0379 Epoch: 7 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:34,863-Speed 3383.76 samples/sec Loss 5.1913 LearningRate 0.0379 Epoch: 7 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:37,883-Speed 3391.82 samples/sec Loss 5.1602 LearningRate 0.0379 Epoch: 7 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:40,925-Speed 3367.37 samples/sec Loss 5.1690 LearningRate 0.0379 Epoch: 7 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:43,951-Speed 3385.48 samples/sec Loss 5.0702 LearningRate 0.0378 Epoch: 7 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:46,980-Speed 3380.60 samples/sec Loss 5.2455 LearningRate 0.0378 Epoch: 7 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:50,017-Speed 3372.53 samples/sec Loss 5.2350 LearningRate 0.0378 Epoch: 7 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:53,049-Speed 3378.97 samples/sec Loss 5.1801 LearningRate 0.0378 Epoch: 7 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:56,070-Speed 3390.72 samples/sec Loss 5.2637 LearningRate 0.0378 Epoch: 7 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:56:59,096-Speed 3384.58 samples/sec Loss 5.1943 LearningRate 0.0378 Epoch: 7 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:02,125-Speed 3380.79 samples/sec Loss 5.1500 LearningRate 0.0378 Epoch: 7 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:05,218-Speed 3312.02 samples/sec Loss 5.1465 LearningRate 0.0378 Epoch: 7 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:08,238-Speed 3391.49 samples/sec Loss 5.0586 LearningRate 0.0378 Epoch: 7 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:11,258-Speed 3391.65 samples/sec Loss 5.1166 LearningRate 0.0377 Epoch: 7 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:14,263-Speed 3408.05 samples/sec Loss 5.1890 LearningRate 0.0377 Epoch: 7 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:17,291-Speed 3382.90 samples/sec Loss 5.2393 LearningRate 0.0377 Epoch: 7 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:20,313-Speed 3389.62 samples/sec Loss 5.2218 LearningRate 0.0377 Epoch: 7 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:23,345-Speed 3378.05 samples/sec Loss 5.1940 LearningRate 0.0377 Epoch: 7 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:26,400-Speed 3352.92 samples/sec Loss 5.1100 LearningRate 0.0377 Epoch: 7 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:29,429-Speed 3380.76 samples/sec Loss 5.1151 LearningRate 0.0377 Epoch: 7 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:32,459-Speed 3381.88 samples/sec Loss 5.2035 LearningRate 0.0377 Epoch: 7 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:35,484-Speed 3386.17 samples/sec Loss 5.1044 LearningRate 0.0377 Epoch: 7 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:38,505-Speed 3390.20 samples/sec Loss 5.2978 LearningRate 0.0377 Epoch: 7 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:41,529-Speed 3387.08 samples/sec Loss 5.1828 LearningRate 0.0376 Epoch: 7 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 05:57:44,555-Speed 3385.14 samples/sec Loss 5.1867 LearningRate 0.0376 Epoch: 7 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:47,590-Speed 3374.77 samples/sec Loss 5.2787 LearningRate 0.0376 Epoch: 7 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:50,616-Speed 3385.54 samples/sec Loss 5.1413 LearningRate 0.0376 Epoch: 7 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:53,640-Speed 3386.89 samples/sec Loss 5.2026 LearningRate 0.0376 Epoch: 7 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:57:56,663-Speed 3388.66 samples/sec Loss 5.1870 LearningRate 0.0376 Epoch: 7 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 05:58:40,391-[lfw][44000]XNorm: 21.992907 Training: 2022-04-27 05:58:40,391-[lfw][44000]Accuracy-Flip: 0.99733+-0.00260 Training: 2022-04-27 05:58:40,392-[lfw][44000]Accuracy-Highest: 0.99817 Training: 2022-04-27 05:59:30,615-[cfp_fp][44000]XNorm: 19.642103 Training: 2022-04-27 05:59:30,615-[cfp_fp][44000]Accuracy-Flip: 0.95957+-0.00878 Training: 2022-04-27 05:59:30,616-[cfp_fp][44000]Accuracy-Highest: 0.96057 Training: 2022-04-27 06:00:14,008-[agedb_30][44000]XNorm: 21.939295 Training: 2022-04-27 06:00:14,009-[agedb_30][44000]Accuracy-Flip: 0.97600+-0.00797 Training: 2022-04-27 06:00:14,009-[agedb_30][44000]Accuracy-Highest: 0.97767 Training: 2022-04-27 06:00:17,036-Speed 72.95 samples/sec Loss 5.2943 LearningRate 0.0376 Epoch: 7 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:00:20,047-Speed 3401.34 samples/sec Loss 5.1001 LearningRate 0.0376 Epoch: 7 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:00:23,060-Speed 3399.93 samples/sec Loss 5.1796 LearningRate 0.0376 Epoch: 7 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:00:26,072-Speed 3400.36 samples/sec Loss 5.1729 LearningRate 0.0375 Epoch: 7 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:00:29,081-Speed 3404.12 samples/sec Loss 5.2761 LearningRate 0.0375 Epoch: 7 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:00:32,057-Speed 3441.59 samples/sec Loss 5.2462 LearningRate 0.0375 Epoch: 7 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:35,065-Speed 3405.26 samples/sec Loss 5.1321 LearningRate 0.0375 Epoch: 7 Global Step: 44070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:38,085-Speed 3391.81 samples/sec Loss 5.1142 LearningRate 0.0375 Epoch: 7 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:41,110-Speed 3385.60 samples/sec Loss 5.2094 LearningRate 0.0375 Epoch: 7 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:44,129-Speed 3392.79 samples/sec Loss 5.1286 LearningRate 0.0375 Epoch: 7 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:47,147-Speed 3394.02 samples/sec Loss 5.1344 LearningRate 0.0375 Epoch: 7 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:50,164-Speed 3394.44 samples/sec Loss 5.1609 LearningRate 0.0375 Epoch: 7 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:53,183-Speed 3393.14 samples/sec Loss 5.3268 LearningRate 0.0374 Epoch: 7 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:56,191-Speed 3405.13 samples/sec Loss 5.1899 LearningRate 0.0374 Epoch: 7 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:00:59,208-Speed 3395.08 samples/sec Loss 5.1545 LearningRate 0.0374 Epoch: 7 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:02,228-Speed 3390.67 samples/sec Loss 5.1715 LearningRate 0.0374 Epoch: 7 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:01:05,225-Speed 3417.76 samples/sec Loss 5.1624 LearningRate 0.0374 Epoch: 7 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:08,238-Speed 3400.26 samples/sec Loss 5.2789 LearningRate 0.0374 Epoch: 7 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:11,258-Speed 3391.52 samples/sec Loss 5.0771 LearningRate 0.0374 Epoch: 7 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:14,271-Speed 3399.48 samples/sec Loss 5.1544 LearningRate 0.0374 Epoch: 7 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:17,288-Speed 3394.71 samples/sec Loss 5.1295 LearningRate 0.0374 Epoch: 7 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:20,306-Speed 3393.23 samples/sec Loss 5.0400 LearningRate 0.0374 Epoch: 7 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:23,327-Speed 3390.28 samples/sec Loss 5.1682 LearningRate 0.0373 Epoch: 7 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:26,343-Speed 3396.43 samples/sec Loss 5.0599 LearningRate 0.0373 Epoch: 7 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:29,354-Speed 3401.41 samples/sec Loss 5.1046 LearningRate 0.0373 Epoch: 7 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:32,367-Speed 3399.65 samples/sec Loss 5.0439 LearningRate 0.0373 Epoch: 7 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:01:35,358-Speed 3424.27 samples/sec Loss 5.1357 LearningRate 0.0373 Epoch: 7 Global Step: 44270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:38,391-Speed 3377.10 samples/sec Loss 5.2062 LearningRate 0.0373 Epoch: 7 Global Step: 44280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:41,417-Speed 3384.92 samples/sec Loss 5.2435 LearningRate 0.0373 Epoch: 7 Global Step: 44290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:44,433-Speed 3397.46 samples/sec Loss 5.1052 LearningRate 0.0373 Epoch: 7 Global Step: 44300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:47,486-Speed 3354.62 samples/sec Loss 5.3187 LearningRate 0.0373 Epoch: 7 Global Step: 44310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:50,515-Speed 3381.09 samples/sec Loss 5.2257 LearningRate 0.0372 Epoch: 7 Global Step: 44320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:53,527-Speed 3399.90 samples/sec Loss 5.0862 LearningRate 0.0372 Epoch: 7 Global Step: 44330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:56,542-Speed 3397.38 samples/sec Loss 5.1461 LearningRate 0.0372 Epoch: 7 Global Step: 44340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:01:59,553-Speed 3401.43 samples/sec Loss 5.2677 LearningRate 0.0372 Epoch: 7 Global Step: 44350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:02:02,572-Speed 3392.65 samples/sec Loss 5.1379 LearningRate 0.0372 Epoch: 7 Global Step: 44360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:02:05,590-Speed 3394.03 samples/sec Loss 5.0131 LearningRate 0.0372 Epoch: 7 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:08,610-Speed 3391.49 samples/sec Loss 5.1827 LearningRate 0.0372 Epoch: 7 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:11,628-Speed 3394.12 samples/sec Loss 5.1461 LearningRate 0.0372 Epoch: 7 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:14,649-Speed 3390.81 samples/sec Loss 5.0919 LearningRate 0.0372 Epoch: 7 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:17,742-Speed 3311.03 samples/sec Loss 5.1413 LearningRate 0.0371 Epoch: 7 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:20,758-Speed 3396.63 samples/sec Loss 5.1569 LearningRate 0.0371 Epoch: 7 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:23,777-Speed 3391.65 samples/sec Loss 5.1446 LearningRate 0.0371 Epoch: 7 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:26,797-Speed 3391.79 samples/sec Loss 5.1139 LearningRate 0.0371 Epoch: 7 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:29,819-Speed 3389.54 samples/sec Loss 5.1929 LearningRate 0.0371 Epoch: 7 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:32,838-Speed 3392.33 samples/sec Loss 5.1333 LearningRate 0.0371 Epoch: 7 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:02:35,860-Speed 3389.19 samples/sec Loss 5.0637 LearningRate 0.0371 Epoch: 7 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:38,886-Speed 3385.99 samples/sec Loss 5.1160 LearningRate 0.0371 Epoch: 7 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:41,904-Speed 3393.28 samples/sec Loss 5.1803 LearningRate 0.0371 Epoch: 7 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:44,922-Speed 3394.25 samples/sec Loss 5.1473 LearningRate 0.0371 Epoch: 7 Global Step: 44500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:47,934-Speed 3399.82 samples/sec Loss 5.1951 LearningRate 0.0370 Epoch: 7 Global Step: 44510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:50,955-Speed 3390.47 samples/sec Loss 5.1931 LearningRate 0.0370 Epoch: 7 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:53,967-Speed 3400.44 samples/sec Loss 5.2572 LearningRate 0.0370 Epoch: 7 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:02:56,984-Speed 3394.76 samples/sec Loss 5.0949 LearningRate 0.0370 Epoch: 7 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:00,002-Speed 3394.79 samples/sec Loss 5.1713 LearningRate 0.0370 Epoch: 7 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:03,039-Speed 3372.68 samples/sec Loss 5.1500 LearningRate 0.0370 Epoch: 7 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:06,038-Speed 3414.58 samples/sec Loss 5.1091 LearningRate 0.0370 Epoch: 7 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:09,034-Speed 3419.17 samples/sec Loss 5.2268 LearningRate 0.0370 Epoch: 7 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:12,051-Speed 3394.61 samples/sec Loss 5.3248 LearningRate 0.0370 Epoch: 7 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:15,066-Speed 3397.11 samples/sec Loss 5.1258 LearningRate 0.0369 Epoch: 7 Global Step: 44600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:18,090-Speed 3387.35 samples/sec Loss 5.1012 LearningRate 0.0369 Epoch: 7 Global Step: 44610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:21,103-Speed 3398.63 samples/sec Loss 5.0809 LearningRate 0.0369 Epoch: 7 Global Step: 44620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:24,118-Speed 3397.47 samples/sec Loss 5.1867 LearningRate 0.0369 Epoch: 7 Global Step: 44630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:27,245-Speed 3275.19 samples/sec Loss 5.0877 LearningRate 0.0369 Epoch: 7 Global Step: 44640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:30,278-Speed 3377.90 samples/sec Loss 5.0171 LearningRate 0.0369 Epoch: 7 Global Step: 44650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:33,296-Speed 3393.70 samples/sec Loss 5.1001 LearningRate 0.0369 Epoch: 7 Global Step: 44660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:36,315-Speed 3391.99 samples/sec Loss 5.2856 LearningRate 0.0369 Epoch: 7 Global Step: 44670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:03:39,339-Speed 3387.30 samples/sec Loss 4.9894 LearningRate 0.0369 Epoch: 7 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:42,356-Speed 3395.52 samples/sec Loss 5.2187 LearningRate 0.0368 Epoch: 7 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:45,370-Speed 3397.33 samples/sec Loss 5.1631 LearningRate 0.0368 Epoch: 7 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:48,399-Speed 3381.93 samples/sec Loss 5.1768 LearningRate 0.0368 Epoch: 7 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:51,412-Speed 3399.06 samples/sec Loss 5.1500 LearningRate 0.0368 Epoch: 7 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:54,432-Speed 3391.59 samples/sec Loss 5.1149 LearningRate 0.0368 Epoch: 7 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:03:57,473-Speed 3368.54 samples/sec Loss 5.1409 LearningRate 0.0368 Epoch: 7 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:04:00,487-Speed 3398.74 samples/sec Loss 5.1379 LearningRate 0.0368 Epoch: 7 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:04:03,487-Speed 3413.80 samples/sec Loss 5.0979 LearningRate 0.0368 Epoch: 7 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:06,483-Speed 3418.90 samples/sec Loss 5.1158 LearningRate 0.0368 Epoch: 7 Global Step: 44770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:09,496-Speed 3399.31 samples/sec Loss 5.2472 LearningRate 0.0368 Epoch: 7 Global Step: 44780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:12,515-Speed 3392.66 samples/sec Loss 5.1892 LearningRate 0.0367 Epoch: 7 Global Step: 44790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:15,529-Speed 3397.74 samples/sec Loss 5.0913 LearningRate 0.0367 Epoch: 7 Global Step: 44800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:18,545-Speed 3396.78 samples/sec Loss 5.0259 LearningRate 0.0367 Epoch: 7 Global Step: 44810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:21,580-Speed 3374.31 samples/sec Loss 5.1515 LearningRate 0.0367 Epoch: 7 Global Step: 44820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:24,611-Speed 3378.99 samples/sec Loss 5.1160 LearningRate 0.0367 Epoch: 7 Global Step: 44830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:27,633-Speed 3389.50 samples/sec Loss 5.1227 LearningRate 0.0367 Epoch: 7 Global Step: 44840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:30,655-Speed 3389.45 samples/sec Loss 5.1131 LearningRate 0.0367 Epoch: 7 Global Step: 44850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:33,679-Speed 3387.09 samples/sec Loss 5.1573 LearningRate 0.0367 Epoch: 7 Global Step: 44860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:04:36,699-Speed 3391.23 samples/sec Loss 5.1525 LearningRate 0.0367 Epoch: 7 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:39,731-Speed 3378.82 samples/sec Loss 5.0834 LearningRate 0.0366 Epoch: 7 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:42,755-Speed 3386.11 samples/sec Loss 5.0935 LearningRate 0.0366 Epoch: 7 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:45,773-Speed 3394.78 samples/sec Loss 5.1032 LearningRate 0.0366 Epoch: 7 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:48,795-Speed 3388.87 samples/sec Loss 5.1401 LearningRate 0.0366 Epoch: 7 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:51,811-Speed 3395.72 samples/sec Loss 5.1022 LearningRate 0.0366 Epoch: 7 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:54,829-Speed 3394.49 samples/sec Loss 5.0420 LearningRate 0.0366 Epoch: 7 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:04:57,850-Speed 3389.88 samples/sec Loss 5.1015 LearningRate 0.0366 Epoch: 7 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:00,868-Speed 3394.00 samples/sec Loss 5.0721 LearningRate 0.0366 Epoch: 7 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:03,890-Speed 3388.82 samples/sec Loss 5.1045 LearningRate 0.0366 Epoch: 7 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:06,921-Speed 3379.93 samples/sec Loss 5.0340 LearningRate 0.0365 Epoch: 7 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:05:09,919-Speed 3416.52 samples/sec Loss 5.1233 LearningRate 0.0365 Epoch: 7 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:12,945-Speed 3384.55 samples/sec Loss 5.1434 LearningRate 0.0365 Epoch: 7 Global Step: 44990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:15,967-Speed 3388.74 samples/sec Loss 5.1459 LearningRate 0.0365 Epoch: 7 Global Step: 45000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:18,984-Speed 3395.53 samples/sec Loss 5.0206 LearningRate 0.0365 Epoch: 7 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:22,008-Speed 3386.51 samples/sec Loss 4.9999 LearningRate 0.0365 Epoch: 7 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:25,010-Speed 3411.66 samples/sec Loss 5.0458 LearningRate 0.0365 Epoch: 7 Global Step: 45030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:28,038-Speed 3383.32 samples/sec Loss 5.0015 LearningRate 0.0365 Epoch: 7 Global Step: 45040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:31,054-Speed 3395.47 samples/sec Loss 5.1155 LearningRate 0.0365 Epoch: 7 Global Step: 45050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:34,071-Speed 3395.12 samples/sec Loss 5.1101 LearningRate 0.0365 Epoch: 7 Global Step: 45060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:37,087-Speed 3396.27 samples/sec Loss 5.1242 LearningRate 0.0364 Epoch: 7 Global Step: 45070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:40,109-Speed 3389.20 samples/sec Loss 5.3144 LearningRate 0.0364 Epoch: 7 Global Step: 45080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:43,139-Speed 3380.16 samples/sec Loss 5.0760 LearningRate 0.0364 Epoch: 7 Global Step: 45090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:46,155-Speed 3396.56 samples/sec Loss 5.0076 LearningRate 0.0364 Epoch: 7 Global Step: 45100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:49,182-Speed 3383.88 samples/sec Loss 5.0280 LearningRate 0.0364 Epoch: 7 Global Step: 45110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:52,205-Speed 3388.21 samples/sec Loss 5.1835 LearningRate 0.0364 Epoch: 7 Global Step: 45120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:05:55,221-Speed 3395.93 samples/sec Loss 5.1563 LearningRate 0.0364 Epoch: 7 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:05:58,240-Speed 3392.19 samples/sec Loss 5.0587 LearningRate 0.0364 Epoch: 7 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:01,256-Speed 3395.65 samples/sec Loss 4.9998 LearningRate 0.0364 Epoch: 7 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:04,280-Speed 3388.15 samples/sec Loss 5.1000 LearningRate 0.0363 Epoch: 7 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:07,299-Speed 3392.12 samples/sec Loss 5.0428 LearningRate 0.0363 Epoch: 7 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:10,326-Speed 3384.17 samples/sec Loss 5.0318 LearningRate 0.0363 Epoch: 7 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:13,346-Speed 3391.33 samples/sec Loss 5.1522 LearningRate 0.0363 Epoch: 7 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:16,363-Speed 3395.34 samples/sec Loss 5.1943 LearningRate 0.0363 Epoch: 7 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:19,383-Speed 3391.39 samples/sec Loss 5.0892 LearningRate 0.0363 Epoch: 7 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:22,403-Speed 3391.40 samples/sec Loss 5.1568 LearningRate 0.0363 Epoch: 7 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:25,426-Speed 3387.92 samples/sec Loss 5.1201 LearningRate 0.0363 Epoch: 7 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:06:28,444-Speed 3393.58 samples/sec Loss 5.0337 LearningRate 0.0363 Epoch: 7 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:06:31,450-Speed 3407.91 samples/sec Loss 5.1607 LearningRate 0.0363 Epoch: 7 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:34,464-Speed 3397.75 samples/sec Loss 4.9249 LearningRate 0.0362 Epoch: 7 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:37,489-Speed 3385.89 samples/sec Loss 5.0041 LearningRate 0.0362 Epoch: 7 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:40,513-Speed 3388.15 samples/sec Loss 5.2072 LearningRate 0.0362 Epoch: 7 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:43,539-Speed 3385.31 samples/sec Loss 4.8852 LearningRate 0.0362 Epoch: 7 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:46,561-Speed 3388.93 samples/sec Loss 5.1858 LearningRate 0.0362 Epoch: 7 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:49,582-Speed 3391.41 samples/sec Loss 4.9577 LearningRate 0.0362 Epoch: 7 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:52,605-Speed 3387.59 samples/sec Loss 5.1564 LearningRate 0.0362 Epoch: 7 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:55,630-Speed 3386.08 samples/sec Loss 5.0257 LearningRate 0.0362 Epoch: 7 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:06:58,656-Speed 3385.29 samples/sec Loss 5.2622 LearningRate 0.0362 Epoch: 7 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:01,678-Speed 3388.85 samples/sec Loss 5.0674 LearningRate 0.0361 Epoch: 7 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:04,726-Speed 3360.44 samples/sec Loss 5.0424 LearningRate 0.0361 Epoch: 7 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:07,749-Speed 3388.18 samples/sec Loss 5.0985 LearningRate 0.0361 Epoch: 7 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:10,768-Speed 3392.76 samples/sec Loss 5.0484 LearningRate 0.0361 Epoch: 7 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:13,791-Speed 3388.28 samples/sec Loss 4.9343 LearningRate 0.0361 Epoch: 7 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:16,813-Speed 3389.93 samples/sec Loss 5.0107 LearningRate 0.0361 Epoch: 7 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:07:19,815-Speed 3410.97 samples/sec Loss 5.0910 LearningRate 0.0361 Epoch: 7 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:22,843-Speed 3382.45 samples/sec Loss 5.0725 LearningRate 0.0361 Epoch: 7 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:25,871-Speed 3383.04 samples/sec Loss 4.9279 LearningRate 0.0361 Epoch: 7 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:28,891-Speed 3391.24 samples/sec Loss 4.9904 LearningRate 0.0361 Epoch: 7 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:31,921-Speed 3380.49 samples/sec Loss 5.1112 LearningRate 0.0360 Epoch: 7 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:34,949-Speed 3383.14 samples/sec Loss 5.0433 LearningRate 0.0360 Epoch: 7 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:37,972-Speed 3388.30 samples/sec Loss 5.1433 LearningRate 0.0360 Epoch: 7 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:41,092-Speed 3282.22 samples/sec Loss 5.1529 LearningRate 0.0360 Epoch: 7 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:54,287-Speed 776.11 samples/sec Loss 5.0772 LearningRate 0.0360 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:07:57,316-Speed 3381.84 samples/sec Loss 4.4966 LearningRate 0.0360 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:00,377-Speed 3346.44 samples/sec Loss 4.4197 LearningRate 0.0360 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:03,415-Speed 3372.00 samples/sec Loss 4.4040 LearningRate 0.0360 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:06,440-Speed 3385.15 samples/sec Loss 4.5389 LearningRate 0.0360 Epoch: 8 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:09,479-Speed 3370.23 samples/sec Loss 4.3896 LearningRate 0.0359 Epoch: 8 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:12,504-Speed 3387.02 samples/sec Loss 4.4930 LearningRate 0.0359 Epoch: 8 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:15,524-Speed 3390.99 samples/sec Loss 4.6510 LearningRate 0.0359 Epoch: 8 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:18,543-Speed 3392.93 samples/sec Loss 4.4506 LearningRate 0.0359 Epoch: 8 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:21,562-Speed 3392.38 samples/sec Loss 4.5857 LearningRate 0.0359 Epoch: 8 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:24,583-Speed 3390.89 samples/sec Loss 4.5107 LearningRate 0.0359 Epoch: 8 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:27,604-Speed 3390.49 samples/sec Loss 4.5890 LearningRate 0.0359 Epoch: 8 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:30,628-Speed 3387.18 samples/sec Loss 4.5959 LearningRate 0.0359 Epoch: 8 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:33,648-Speed 3390.72 samples/sec Loss 4.5256 LearningRate 0.0359 Epoch: 8 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:36,708-Speed 3347.54 samples/sec Loss 4.5092 LearningRate 0.0359 Epoch: 8 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:39,732-Speed 3386.63 samples/sec Loss 4.5112 LearningRate 0.0358 Epoch: 8 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:42,768-Speed 3374.68 samples/sec Loss 4.5772 LearningRate 0.0358 Epoch: 8 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:45,808-Speed 3368.74 samples/sec Loss 4.5435 LearningRate 0.0358 Epoch: 8 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:48,833-Speed 3385.78 samples/sec Loss 4.5448 LearningRate 0.0358 Epoch: 8 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:51,858-Speed 3386.21 samples/sec Loss 4.5815 LearningRate 0.0358 Epoch: 8 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:08:54,865-Speed 3406.35 samples/sec Loss 4.5909 LearningRate 0.0358 Epoch: 8 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:08:57,905-Speed 3368.58 samples/sec Loss 4.6412 LearningRate 0.0358 Epoch: 8 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:00,944-Speed 3370.70 samples/sec Loss 4.6180 LearningRate 0.0358 Epoch: 8 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:03,977-Speed 3377.63 samples/sec Loss 4.6817 LearningRate 0.0358 Epoch: 8 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:07,030-Speed 3354.87 samples/sec Loss 4.5726 LearningRate 0.0357 Epoch: 8 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:10,066-Speed 3373.20 samples/sec Loss 4.6022 LearningRate 0.0357 Epoch: 8 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:13,133-Speed 3339.50 samples/sec Loss 4.5230 LearningRate 0.0357 Epoch: 8 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:16,165-Speed 3378.73 samples/sec Loss 4.4701 LearningRate 0.0357 Epoch: 8 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:19,193-Speed 3382.70 samples/sec Loss 4.5775 LearningRate 0.0357 Epoch: 8 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:22,223-Speed 3379.16 samples/sec Loss 4.6327 LearningRate 0.0357 Epoch: 8 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:25,272-Speed 3360.13 samples/sec Loss 4.6381 LearningRate 0.0357 Epoch: 8 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:09:28,359-Speed 3317.13 samples/sec Loss 4.6373 LearningRate 0.0357 Epoch: 8 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:09:31,391-Speed 3379.17 samples/sec Loss 4.6187 LearningRate 0.0357 Epoch: 8 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:09:34,398-Speed 3406.30 samples/sec Loss 4.4914 LearningRate 0.0357 Epoch: 8 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:37,446-Speed 3360.24 samples/sec Loss 4.7367 LearningRate 0.0356 Epoch: 8 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:40,481-Speed 3374.40 samples/sec Loss 4.6371 LearningRate 0.0356 Epoch: 8 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:43,509-Speed 3382.22 samples/sec Loss 4.8325 LearningRate 0.0356 Epoch: 8 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:46,543-Speed 3376.31 samples/sec Loss 4.6312 LearningRate 0.0356 Epoch: 8 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:49,575-Speed 3378.30 samples/sec Loss 4.6023 LearningRate 0.0356 Epoch: 8 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:52,609-Speed 3375.66 samples/sec Loss 4.7779 LearningRate 0.0356 Epoch: 8 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:55,644-Speed 3374.54 samples/sec Loss 4.6945 LearningRate 0.0356 Epoch: 8 Global Step: 45890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:09:58,673-Speed 3381.80 samples/sec Loss 4.6404 LearningRate 0.0356 Epoch: 8 Global Step: 45900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:01,707-Speed 3376.37 samples/sec Loss 4.6914 LearningRate 0.0356 Epoch: 8 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:04,736-Speed 3381.53 samples/sec Loss 4.7571 LearningRate 0.0355 Epoch: 8 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:10:07,759-Speed 3387.94 samples/sec Loss 4.6747 LearningRate 0.0355 Epoch: 8 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:10:10,781-Speed 3389.19 samples/sec Loss 4.7402 LearningRate 0.0355 Epoch: 8 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:13,819-Speed 3371.48 samples/sec Loss 4.6161 LearningRate 0.0355 Epoch: 8 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:16,873-Speed 3353.62 samples/sec Loss 4.7663 LearningRate 0.0355 Epoch: 8 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:19,911-Speed 3370.68 samples/sec Loss 4.7509 LearningRate 0.0355 Epoch: 8 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:22,939-Speed 3382.78 samples/sec Loss 4.6178 LearningRate 0.0355 Epoch: 8 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:25,967-Speed 3383.21 samples/sec Loss 4.8002 LearningRate 0.0355 Epoch: 8 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:10:28,994-Speed 3383.80 samples/sec Loss 4.6673 LearningRate 0.0355 Epoch: 8 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:11:12,874-[lfw][46000]XNorm: 23.118855 Training: 2022-04-27 06:11:12,875-[lfw][46000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 06:11:12,875-[lfw][46000]Accuracy-Highest: 0.99817 Training: 2022-04-27 06:12:03,785-[cfp_fp][46000]XNorm: 20.791686 Training: 2022-04-27 06:12:03,786-[cfp_fp][46000]Accuracy-Flip: 0.95686+-0.01188 Training: 2022-04-27 06:12:03,786-[cfp_fp][46000]Accuracy-Highest: 0.96057 Training: 2022-04-27 06:12:47,698-[agedb_30][46000]XNorm: 22.902561 Training: 2022-04-27 06:12:47,699-[agedb_30][46000]Accuracy-Flip: 0.97433+-0.00700 Training: 2022-04-27 06:12:47,699-[agedb_30][46000]Accuracy-Highest: 0.97767 Training: 2022-04-27 06:12:50,734-Speed 72.25 samples/sec Loss 4.7250 LearningRate 0.0355 Epoch: 8 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:12:53,741-Speed 3406.18 samples/sec Loss 4.7951 LearningRate 0.0354 Epoch: 8 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:12:56,749-Speed 3405.51 samples/sec Loss 4.5494 LearningRate 0.0354 Epoch: 8 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:12:59,760-Speed 3401.78 samples/sec Loss 4.6590 LearningRate 0.0354 Epoch: 8 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:02,777-Speed 3394.70 samples/sec Loss 4.7455 LearningRate 0.0354 Epoch: 8 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:05,795-Speed 3393.07 samples/sec Loss 4.6984 LearningRate 0.0354 Epoch: 8 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:08,824-Speed 3381.81 samples/sec Loss 4.7139 LearningRate 0.0354 Epoch: 8 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:11,846-Speed 3389.00 samples/sec Loss 4.7503 LearningRate 0.0354 Epoch: 8 Global Step: 46080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:14,876-Speed 3379.81 samples/sec Loss 4.6169 LearningRate 0.0354 Epoch: 8 Global Step: 46090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:17,898-Speed 3390.53 samples/sec Loss 4.7400 LearningRate 0.0354 Epoch: 8 Global Step: 46100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:20,914-Speed 3395.94 samples/sec Loss 4.7357 LearningRate 0.0353 Epoch: 8 Global Step: 46110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:23,936-Speed 3389.47 samples/sec Loss 4.7998 LearningRate 0.0353 Epoch: 8 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:26,954-Speed 3393.52 samples/sec Loss 4.5651 LearningRate 0.0353 Epoch: 8 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:29,971-Speed 3394.97 samples/sec Loss 4.7330 LearningRate 0.0353 Epoch: 8 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:32,992-Speed 3390.15 samples/sec Loss 4.7007 LearningRate 0.0353 Epoch: 8 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:36,026-Speed 3375.78 samples/sec Loss 4.8386 LearningRate 0.0353 Epoch: 8 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:13:39,055-Speed 3381.11 samples/sec Loss 4.8025 LearningRate 0.0353 Epoch: 8 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:42,076-Speed 3390.40 samples/sec Loss 4.5824 LearningRate 0.0353 Epoch: 8 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:45,092-Speed 3396.42 samples/sec Loss 4.6176 LearningRate 0.0353 Epoch: 8 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:48,114-Speed 3389.91 samples/sec Loss 4.7136 LearningRate 0.0353 Epoch: 8 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:51,183-Speed 3337.01 samples/sec Loss 4.7062 LearningRate 0.0352 Epoch: 8 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:54,198-Speed 3396.57 samples/sec Loss 4.8154 LearningRate 0.0352 Epoch: 8 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:13:57,216-Speed 3393.79 samples/sec Loss 4.7322 LearningRate 0.0352 Epoch: 8 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:14:00,231-Speed 3397.34 samples/sec Loss 4.7865 LearningRate 0.0352 Epoch: 8 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:14:03,246-Speed 3396.88 samples/sec Loss 4.7873 LearningRate 0.0352 Epoch: 8 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:14:06,244-Speed 3416.54 samples/sec Loss 4.8397 LearningRate 0.0352 Epoch: 8 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:09,256-Speed 3400.14 samples/sec Loss 4.6468 LearningRate 0.0352 Epoch: 8 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:12,322-Speed 3340.97 samples/sec Loss 4.7559 LearningRate 0.0352 Epoch: 8 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:15,485-Speed 3238.97 samples/sec Loss 4.7137 LearningRate 0.0352 Epoch: 8 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:18,526-Speed 3367.97 samples/sec Loss 4.8398 LearningRate 0.0351 Epoch: 8 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:21,539-Speed 3399.35 samples/sec Loss 4.8855 LearningRate 0.0351 Epoch: 8 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:24,553-Speed 3397.84 samples/sec Loss 4.7841 LearningRate 0.0351 Epoch: 8 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:27,567-Speed 3397.92 samples/sec Loss 4.6847 LearningRate 0.0351 Epoch: 8 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:30,587-Speed 3391.47 samples/sec Loss 4.7929 LearningRate 0.0351 Epoch: 8 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:33,600-Speed 3399.41 samples/sec Loss 4.7822 LearningRate 0.0351 Epoch: 8 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:36,598-Speed 3416.18 samples/sec Loss 4.8397 LearningRate 0.0351 Epoch: 8 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:39,620-Speed 3390.43 samples/sec Loss 4.6641 LearningRate 0.0351 Epoch: 8 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:42,635-Speed 3396.62 samples/sec Loss 4.8398 LearningRate 0.0351 Epoch: 8 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:45,652-Speed 3394.86 samples/sec Loss 4.7205 LearningRate 0.0351 Epoch: 8 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:48,671-Speed 3393.26 samples/sec Loss 4.7172 LearningRate 0.0350 Epoch: 8 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:51,685-Speed 3398.01 samples/sec Loss 4.7686 LearningRate 0.0350 Epoch: 8 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:54,714-Speed 3381.51 samples/sec Loss 4.8397 LearningRate 0.0350 Epoch: 8 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:14:57,728-Speed 3397.55 samples/sec Loss 4.8008 LearningRate 0.0350 Epoch: 8 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:00,755-Speed 3385.17 samples/sec Loss 4.8169 LearningRate 0.0350 Epoch: 8 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:03,775-Speed 3392.01 samples/sec Loss 4.7772 LearningRate 0.0350 Epoch: 8 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:06,804-Speed 3381.17 samples/sec Loss 4.6895 LearningRate 0.0350 Epoch: 8 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:09,809-Speed 3408.47 samples/sec Loss 4.7561 LearningRate 0.0350 Epoch: 8 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:12,833-Speed 3387.10 samples/sec Loss 4.8481 LearningRate 0.0350 Epoch: 8 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:15,847-Speed 3398.32 samples/sec Loss 4.7991 LearningRate 0.0350 Epoch: 8 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:18,868-Speed 3390.20 samples/sec Loss 4.7659 LearningRate 0.0349 Epoch: 8 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:21,882-Speed 3398.57 samples/sec Loss 4.7290 LearningRate 0.0349 Epoch: 8 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:24,896-Speed 3397.79 samples/sec Loss 4.8323 LearningRate 0.0349 Epoch: 8 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:27,917-Speed 3390.46 samples/sec Loss 4.7320 LearningRate 0.0349 Epoch: 8 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:30,940-Speed 3388.35 samples/sec Loss 4.8321 LearningRate 0.0349 Epoch: 8 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:33,974-Speed 3375.89 samples/sec Loss 4.7893 LearningRate 0.0349 Epoch: 8 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:36,995-Speed 3390.88 samples/sec Loss 4.8224 LearningRate 0.0349 Epoch: 8 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:15:40,011-Speed 3396.18 samples/sec Loss 4.8651 LearningRate 0.0349 Epoch: 8 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:43,028-Speed 3394.21 samples/sec Loss 4.7901 LearningRate 0.0349 Epoch: 8 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:46,044-Speed 3395.77 samples/sec Loss 4.6614 LearningRate 0.0348 Epoch: 8 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:49,065-Speed 3390.41 samples/sec Loss 4.9282 LearningRate 0.0348 Epoch: 8 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:52,086-Speed 3390.33 samples/sec Loss 4.8761 LearningRate 0.0348 Epoch: 8 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:55,106-Speed 3391.86 samples/sec Loss 4.7597 LearningRate 0.0348 Epoch: 8 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:15:58,126-Speed 3391.29 samples/sec Loss 4.8391 LearningRate 0.0348 Epoch: 8 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:01,165-Speed 3371.15 samples/sec Loss 4.7151 LearningRate 0.0348 Epoch: 8 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:04,185-Speed 3391.81 samples/sec Loss 4.7956 LearningRate 0.0348 Epoch: 8 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:07,201-Speed 3395.19 samples/sec Loss 4.7884 LearningRate 0.0348 Epoch: 8 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:10,216-Speed 3396.86 samples/sec Loss 4.7377 LearningRate 0.0348 Epoch: 8 Global Step: 46670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 06:16:13,224-Speed 3405.64 samples/sec Loss 4.7551 LearningRate 0.0348 Epoch: 8 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:16,269-Speed 3363.62 samples/sec Loss 4.7393 LearningRate 0.0347 Epoch: 8 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:19,290-Speed 3390.33 samples/sec Loss 4.8593 LearningRate 0.0347 Epoch: 8 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:22,294-Speed 3409.23 samples/sec Loss 4.6263 LearningRate 0.0347 Epoch: 8 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:25,316-Speed 3389.41 samples/sec Loss 4.7867 LearningRate 0.0347 Epoch: 8 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:28,331-Speed 3397.82 samples/sec Loss 4.9114 LearningRate 0.0347 Epoch: 8 Global Step: 46730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:31,347-Speed 3395.61 samples/sec Loss 4.7684 LearningRate 0.0347 Epoch: 8 Global Step: 46740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:34,375-Speed 3382.26 samples/sec Loss 4.8525 LearningRate 0.0347 Epoch: 8 Global Step: 46750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:37,406-Speed 3379.57 samples/sec Loss 4.7904 LearningRate 0.0347 Epoch: 8 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:40,444-Speed 3370.59 samples/sec Loss 4.7980 LearningRate 0.0347 Epoch: 8 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:43,471-Speed 3383.81 samples/sec Loss 4.9024 LearningRate 0.0346 Epoch: 8 Global Step: 46780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:46,496-Speed 3386.07 samples/sec Loss 4.8915 LearningRate 0.0346 Epoch: 8 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:49,518-Speed 3389.17 samples/sec Loss 4.8860 LearningRate 0.0346 Epoch: 8 Global Step: 46800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:16:52,544-Speed 3385.18 samples/sec Loss 4.7821 LearningRate 0.0346 Epoch: 8 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:55,563-Speed 3392.35 samples/sec Loss 4.8769 LearningRate 0.0346 Epoch: 8 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:16:58,599-Speed 3374.31 samples/sec Loss 4.7758 LearningRate 0.0346 Epoch: 8 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:17:01,605-Speed 3407.06 samples/sec Loss 4.8896 LearningRate 0.0346 Epoch: 8 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:04,624-Speed 3392.50 samples/sec Loss 4.8353 LearningRate 0.0346 Epoch: 8 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:07,642-Speed 3393.73 samples/sec Loss 4.8833 LearningRate 0.0346 Epoch: 8 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:10,664-Speed 3390.03 samples/sec Loss 4.8047 LearningRate 0.0346 Epoch: 8 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:13,683-Speed 3391.89 samples/sec Loss 4.8645 LearningRate 0.0345 Epoch: 8 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:16,723-Speed 3369.98 samples/sec Loss 4.7939 LearningRate 0.0345 Epoch: 8 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:19,741-Speed 3393.78 samples/sec Loss 4.7492 LearningRate 0.0345 Epoch: 8 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:22,764-Speed 3388.30 samples/sec Loss 4.7189 LearningRate 0.0345 Epoch: 8 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:25,802-Speed 3371.00 samples/sec Loss 4.7710 LearningRate 0.0345 Epoch: 8 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:28,824-Speed 3389.52 samples/sec Loss 4.8124 LearningRate 0.0345 Epoch: 8 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:31,825-Speed 3412.60 samples/sec Loss 4.8261 LearningRate 0.0345 Epoch: 8 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:34,863-Speed 3371.38 samples/sec Loss 4.7894 LearningRate 0.0345 Epoch: 8 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:37,885-Speed 3388.99 samples/sec Loss 4.7925 LearningRate 0.0345 Epoch: 8 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:40,904-Speed 3393.58 samples/sec Loss 4.7607 LearningRate 0.0345 Epoch: 8 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:43,933-Speed 3380.96 samples/sec Loss 4.8426 LearningRate 0.0344 Epoch: 8 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:46,954-Speed 3390.98 samples/sec Loss 4.9154 LearningRate 0.0344 Epoch: 8 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:49,976-Speed 3389.44 samples/sec Loss 4.8233 LearningRate 0.0344 Epoch: 8 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:53,001-Speed 3386.68 samples/sec Loss 4.8432 LearningRate 0.0344 Epoch: 8 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:56,023-Speed 3388.60 samples/sec Loss 4.7831 LearningRate 0.0344 Epoch: 8 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:17:59,052-Speed 3381.78 samples/sec Loss 4.9477 LearningRate 0.0344 Epoch: 8 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:02,050-Speed 3415.75 samples/sec Loss 4.8510 LearningRate 0.0344 Epoch: 8 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:05,071-Speed 3390.27 samples/sec Loss 4.8035 LearningRate 0.0344 Epoch: 8 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:08,102-Speed 3379.60 samples/sec Loss 4.8002 LearningRate 0.0344 Epoch: 8 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:11,134-Speed 3377.93 samples/sec Loss 4.7421 LearningRate 0.0343 Epoch: 8 Global Step: 47070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:14,163-Speed 3381.55 samples/sec Loss 5.0155 LearningRate 0.0343 Epoch: 8 Global Step: 47080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:17,190-Speed 3383.57 samples/sec Loss 4.9281 LearningRate 0.0343 Epoch: 8 Global Step: 47090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:20,214-Speed 3387.70 samples/sec Loss 4.7878 LearningRate 0.0343 Epoch: 8 Global Step: 47100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:23,236-Speed 3388.94 samples/sec Loss 5.0283 LearningRate 0.0343 Epoch: 8 Global Step: 47110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:26,262-Speed 3385.09 samples/sec Loss 4.8428 LearningRate 0.0343 Epoch: 8 Global Step: 47120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:29,286-Speed 3386.76 samples/sec Loss 4.8222 LearningRate 0.0343 Epoch: 8 Global Step: 47130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:32,306-Speed 3390.89 samples/sec Loss 4.8882 LearningRate 0.0343 Epoch: 8 Global Step: 47140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:35,330-Speed 3387.35 samples/sec Loss 4.7973 LearningRate 0.0343 Epoch: 8 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:38,357-Speed 3384.02 samples/sec Loss 4.8289 LearningRate 0.0343 Epoch: 8 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:18:41,388-Speed 3378.81 samples/sec Loss 4.8268 LearningRate 0.0342 Epoch: 8 Global Step: 47170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:44,413-Speed 3385.88 samples/sec Loss 4.8214 LearningRate 0.0342 Epoch: 8 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:47,444-Speed 3380.14 samples/sec Loss 4.8230 LearningRate 0.0342 Epoch: 8 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:50,472-Speed 3382.37 samples/sec Loss 4.8960 LearningRate 0.0342 Epoch: 8 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:53,497-Speed 3386.16 samples/sec Loss 4.9256 LearningRate 0.0342 Epoch: 8 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:56,526-Speed 3381.06 samples/sec Loss 4.7586 LearningRate 0.0342 Epoch: 8 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:18:59,562-Speed 3373.46 samples/sec Loss 4.7973 LearningRate 0.0342 Epoch: 8 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:19:02,594-Speed 3378.08 samples/sec Loss 4.7834 LearningRate 0.0342 Epoch: 8 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:19:05,618-Speed 3386.89 samples/sec Loss 4.8609 LearningRate 0.0342 Epoch: 8 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:19:08,640-Speed 3389.16 samples/sec Loss 4.8556 LearningRate 0.0342 Epoch: 8 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:19:11,669-Speed 3381.80 samples/sec Loss 4.7174 LearningRate 0.0341 Epoch: 8 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:14,703-Speed 3376.27 samples/sec Loss 4.7351 LearningRate 0.0341 Epoch: 8 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:17,733-Speed 3379.51 samples/sec Loss 4.9060 LearningRate 0.0341 Epoch: 8 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:20,760-Speed 3384.16 samples/sec Loss 4.8989 LearningRate 0.0341 Epoch: 8 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:23,813-Speed 3354.31 samples/sec Loss 4.8559 LearningRate 0.0341 Epoch: 8 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:26,849-Speed 3374.11 samples/sec Loss 4.9856 LearningRate 0.0341 Epoch: 8 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:19:29,943-Speed 3310.85 samples/sec Loss 4.7822 LearningRate 0.0341 Epoch: 8 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:32,975-Speed 3377.88 samples/sec Loss 4.8126 LearningRate 0.0341 Epoch: 8 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:35,998-Speed 3387.43 samples/sec Loss 4.8426 LearningRate 0.0341 Epoch: 8 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:39,019-Speed 3390.73 samples/sec Loss 4.9461 LearningRate 0.0341 Epoch: 8 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:42,048-Speed 3381.90 samples/sec Loss 4.7726 LearningRate 0.0340 Epoch: 8 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:45,066-Speed 3393.40 samples/sec Loss 4.8044 LearningRate 0.0340 Epoch: 8 Global Step: 47380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:48,094-Speed 3382.70 samples/sec Loss 4.8305 LearningRate 0.0340 Epoch: 8 Global Step: 47390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:51,122-Speed 3382.93 samples/sec Loss 4.9256 LearningRate 0.0340 Epoch: 8 Global Step: 47400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:54,153-Speed 3379.09 samples/sec Loss 4.8226 LearningRate 0.0340 Epoch: 8 Global Step: 47410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:19:57,175-Speed 3389.59 samples/sec Loss 4.7749 LearningRate 0.0340 Epoch: 8 Global Step: 47420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:20:00,209-Speed 3375.61 samples/sec Loss 4.7565 LearningRate 0.0340 Epoch: 8 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:03,245-Speed 3374.26 samples/sec Loss 4.8498 LearningRate 0.0340 Epoch: 8 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:06,268-Speed 3387.18 samples/sec Loss 4.7748 LearningRate 0.0340 Epoch: 8 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:09,291-Speed 3388.98 samples/sec Loss 4.8300 LearningRate 0.0339 Epoch: 8 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:12,318-Speed 3382.70 samples/sec Loss 4.8542 LearningRate 0.0339 Epoch: 8 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:15,350-Speed 3378.19 samples/sec Loss 4.9235 LearningRate 0.0339 Epoch: 8 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:18,383-Speed 3377.71 samples/sec Loss 4.9030 LearningRate 0.0339 Epoch: 8 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:21,411-Speed 3382.12 samples/sec Loss 4.7931 LearningRate 0.0339 Epoch: 8 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:24,444-Speed 3376.78 samples/sec Loss 4.7438 LearningRate 0.0339 Epoch: 8 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:27,464-Speed 3391.70 samples/sec Loss 4.8226 LearningRate 0.0339 Epoch: 8 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:30,489-Speed 3385.68 samples/sec Loss 4.8273 LearningRate 0.0339 Epoch: 8 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:20:33,512-Speed 3388.77 samples/sec Loss 4.7734 LearningRate 0.0339 Epoch: 8 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:20:36,537-Speed 3385.59 samples/sec Loss 4.8034 LearningRate 0.0339 Epoch: 8 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:20:39,557-Speed 3391.66 samples/sec Loss 5.0144 LearningRate 0.0338 Epoch: 8 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:20:42,586-Speed 3380.84 samples/sec Loss 4.8389 LearningRate 0.0338 Epoch: 8 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:20:45,641-Speed 3353.74 samples/sec Loss 4.7879 LearningRate 0.0338 Epoch: 8 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:48,669-Speed 3381.94 samples/sec Loss 4.8649 LearningRate 0.0338 Epoch: 8 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:51,696-Speed 3383.25 samples/sec Loss 4.7930 LearningRate 0.0338 Epoch: 8 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:54,726-Speed 3380.08 samples/sec Loss 4.8867 LearningRate 0.0338 Epoch: 8 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:20:57,748-Speed 3390.02 samples/sec Loss 4.9077 LearningRate 0.0338 Epoch: 8 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:00,775-Speed 3383.91 samples/sec Loss 5.0172 LearningRate 0.0338 Epoch: 8 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:03,810-Speed 3374.33 samples/sec Loss 4.8605 LearningRate 0.0338 Epoch: 8 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:06,840-Speed 3380.33 samples/sec Loss 4.8614 LearningRate 0.0338 Epoch: 8 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:09,867-Speed 3384.12 samples/sec Loss 4.7660 LearningRate 0.0337 Epoch: 8 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:12,900-Speed 3376.47 samples/sec Loss 4.8332 LearningRate 0.0337 Epoch: 8 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:15,916-Speed 3396.76 samples/sec Loss 4.8318 LearningRate 0.0337 Epoch: 8 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:18,950-Speed 3375.68 samples/sec Loss 4.7433 LearningRate 0.0337 Epoch: 8 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:21,987-Speed 3372.36 samples/sec Loss 4.8721 LearningRate 0.0337 Epoch: 8 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:25,043-Speed 3351.47 samples/sec Loss 4.9758 LearningRate 0.0337 Epoch: 8 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:21:28,058-Speed 3397.40 samples/sec Loss 4.7061 LearningRate 0.0337 Epoch: 8 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:31,092-Speed 3375.62 samples/sec Loss 4.7730 LearningRate 0.0337 Epoch: 8 Global Step: 47730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:34,115-Speed 3388.19 samples/sec Loss 4.8337 LearningRate 0.0337 Epoch: 8 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:37,142-Speed 3383.68 samples/sec Loss 4.9189 LearningRate 0.0337 Epoch: 8 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:40,172-Speed 3380.74 samples/sec Loss 4.7730 LearningRate 0.0336 Epoch: 8 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:43,230-Speed 3349.18 samples/sec Loss 4.8492 LearningRate 0.0336 Epoch: 8 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:46,524-Speed 3109.42 samples/sec Loss 4.9662 LearningRate 0.0336 Epoch: 8 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:49,561-Speed 3372.27 samples/sec Loss 4.8313 LearningRate 0.0336 Epoch: 8 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:52,594-Speed 3377.24 samples/sec Loss 4.7942 LearningRate 0.0336 Epoch: 8 Global Step: 47800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:55,628-Speed 3375.33 samples/sec Loss 4.8597 LearningRate 0.0336 Epoch: 8 Global Step: 47810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:21:58,656-Speed 3383.02 samples/sec Loss 4.8036 LearningRate 0.0336 Epoch: 8 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:01,686-Speed 3380.26 samples/sec Loss 4.8270 LearningRate 0.0336 Epoch: 8 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:04,713-Speed 3383.78 samples/sec Loss 4.8489 LearningRate 0.0336 Epoch: 8 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:07,746-Speed 3376.87 samples/sec Loss 4.8760 LearningRate 0.0336 Epoch: 8 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:10,789-Speed 3365.80 samples/sec Loss 4.8593 LearningRate 0.0335 Epoch: 8 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:13,824-Speed 3375.14 samples/sec Loss 4.8315 LearningRate 0.0335 Epoch: 8 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:16,865-Speed 3368.59 samples/sec Loss 4.7942 LearningRate 0.0335 Epoch: 8 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:19,894-Speed 3381.18 samples/sec Loss 4.7924 LearningRate 0.0335 Epoch: 8 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:22,949-Speed 3352.52 samples/sec Loss 4.7469 LearningRate 0.0335 Epoch: 8 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:26,036-Speed 3318.46 samples/sec Loss 4.8281 LearningRate 0.0335 Epoch: 8 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:29,065-Speed 3381.75 samples/sec Loss 4.8939 LearningRate 0.0335 Epoch: 8 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:22:32,100-Speed 3374.26 samples/sec Loss 4.6642 LearningRate 0.0335 Epoch: 8 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:35,137-Speed 3372.07 samples/sec Loss 4.9551 LearningRate 0.0335 Epoch: 8 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:38,208-Speed 3335.62 samples/sec Loss 4.7621 LearningRate 0.0334 Epoch: 8 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:41,254-Speed 3362.71 samples/sec Loss 4.8003 LearningRate 0.0334 Epoch: 8 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:44,285-Speed 3378.98 samples/sec Loss 4.8443 LearningRate 0.0334 Epoch: 8 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:47,322-Speed 3372.52 samples/sec Loss 4.8526 LearningRate 0.0334 Epoch: 8 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:50,354-Speed 3378.49 samples/sec Loss 4.8608 LearningRate 0.0334 Epoch: 8 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:22:53,385-Speed 3379.12 samples/sec Loss 4.8231 LearningRate 0.0334 Epoch: 8 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:23:37,027-[lfw][48000]XNorm: 20.882258 Training: 2022-04-27 06:23:37,028-[lfw][48000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 06:23:37,028-[lfw][48000]Accuracy-Highest: 0.99817 Training: 2022-04-27 06:24:27,354-[cfp_fp][48000]XNorm: 18.910019 Training: 2022-04-27 06:24:27,354-[cfp_fp][48000]Accuracy-Flip: 0.96414+-0.00849 Training: 2022-04-27 06:24:27,355-[cfp_fp][48000]Accuracy-Highest: 0.96414 Training: 2022-04-27 06:25:10,627-[agedb_30][48000]XNorm: 21.358148 Training: 2022-04-27 06:25:10,628-[agedb_30][48000]Accuracy-Flip: 0.97583+-0.00883 Training: 2022-04-27 06:25:10,629-[agedb_30][48000]Accuracy-Highest: 0.97767 Training: 2022-04-27 06:25:13,643-Speed 73.01 samples/sec Loss 4.8041 LearningRate 0.0334 Epoch: 8 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:16,650-Speed 3405.66 samples/sec Loss 4.9085 LearningRate 0.0334 Epoch: 8 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:19,662-Speed 3400.89 samples/sec Loss 4.7279 LearningRate 0.0334 Epoch: 8 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:22,696-Speed 3375.65 samples/sec Loss 4.6427 LearningRate 0.0334 Epoch: 8 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:25,706-Speed 3402.87 samples/sec Loss 4.6844 LearningRate 0.0333 Epoch: 8 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:28,719-Speed 3399.02 samples/sec Loss 4.8449 LearningRate 0.0333 Epoch: 8 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:31,730-Speed 3401.76 samples/sec Loss 4.8948 LearningRate 0.0333 Epoch: 8 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:34,742-Speed 3400.69 samples/sec Loss 4.7774 LearningRate 0.0333 Epoch: 8 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:25:37,748-Speed 3407.63 samples/sec Loss 4.7377 LearningRate 0.0333 Epoch: 8 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:40,767-Speed 3393.01 samples/sec Loss 4.8427 LearningRate 0.0333 Epoch: 8 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:43,785-Speed 3393.23 samples/sec Loss 4.7641 LearningRate 0.0333 Epoch: 8 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:46,807-Speed 3388.98 samples/sec Loss 4.8331 LearningRate 0.0333 Epoch: 8 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:49,830-Speed 3387.99 samples/sec Loss 4.8132 LearningRate 0.0333 Epoch: 8 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:52,861-Speed 3379.35 samples/sec Loss 4.8386 LearningRate 0.0333 Epoch: 8 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:55,888-Speed 3384.33 samples/sec Loss 4.7358 LearningRate 0.0332 Epoch: 8 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:25:58,923-Speed 3374.94 samples/sec Loss 4.7395 LearningRate 0.0332 Epoch: 8 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:01,959-Speed 3372.75 samples/sec Loss 4.7458 LearningRate 0.0332 Epoch: 8 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:04,979-Speed 3391.82 samples/sec Loss 4.9683 LearningRate 0.0332 Epoch: 8 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:08,017-Speed 3372.01 samples/sec Loss 5.0165 LearningRate 0.0332 Epoch: 8 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:11,047-Speed 3380.37 samples/sec Loss 4.7544 LearningRate 0.0332 Epoch: 8 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:14,076-Speed 3381.20 samples/sec Loss 4.9025 LearningRate 0.0332 Epoch: 8 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:17,107-Speed 3379.55 samples/sec Loss 4.8440 LearningRate 0.0332 Epoch: 8 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:20,130-Speed 3388.09 samples/sec Loss 4.8590 LearningRate 0.0332 Epoch: 8 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:23,173-Speed 3365.85 samples/sec Loss 4.8387 LearningRate 0.0332 Epoch: 8 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:26,213-Speed 3369.30 samples/sec Loss 4.8580 LearningRate 0.0331 Epoch: 8 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:29,230-Speed 3394.58 samples/sec Loss 4.8183 LearningRate 0.0331 Epoch: 8 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:26:32,238-Speed 3404.80 samples/sec Loss 4.8211 LearningRate 0.0331 Epoch: 8 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:35,252-Speed 3399.40 samples/sec Loss 4.7864 LearningRate 0.0331 Epoch: 8 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:38,295-Speed 3364.84 samples/sec Loss 4.7926 LearningRate 0.0331 Epoch: 8 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:41,315-Speed 3392.08 samples/sec Loss 4.8484 LearningRate 0.0331 Epoch: 8 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:44,330-Speed 3396.79 samples/sec Loss 4.8009 LearningRate 0.0331 Epoch: 8 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:47,339-Speed 3403.50 samples/sec Loss 4.8516 LearningRate 0.0331 Epoch: 8 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:50,351-Speed 3400.69 samples/sec Loss 4.8549 LearningRate 0.0331 Epoch: 8 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:26:53,345-Speed 3420.79 samples/sec Loss 4.7229 LearningRate 0.0331 Epoch: 8 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:26:56,361-Speed 3396.49 samples/sec Loss 4.7837 LearningRate 0.0330 Epoch: 8 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:26:59,415-Speed 3353.53 samples/sec Loss 4.6946 LearningRate 0.0330 Epoch: 8 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:02,471-Speed 3351.29 samples/sec Loss 4.8831 LearningRate 0.0330 Epoch: 8 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:05,484-Speed 3399.79 samples/sec Loss 4.8546 LearningRate 0.0330 Epoch: 8 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:08,495-Speed 3402.05 samples/sec Loss 4.8652 LearningRate 0.0330 Epoch: 8 Global Step: 48390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:11,507-Speed 3400.46 samples/sec Loss 4.8407 LearningRate 0.0330 Epoch: 8 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:14,522-Speed 3397.27 samples/sec Loss 4.8139 LearningRate 0.0330 Epoch: 8 Global Step: 48410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:17,531-Speed 3403.11 samples/sec Loss 4.7016 LearningRate 0.0330 Epoch: 8 Global Step: 48420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:20,540-Speed 3404.74 samples/sec Loss 4.8673 LearningRate 0.0330 Epoch: 8 Global Step: 48430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 06:27:23,552-Speed 3400.79 samples/sec Loss 4.8383 LearningRate 0.0330 Epoch: 8 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:26,585-Speed 3376.61 samples/sec Loss 4.8428 LearningRate 0.0329 Epoch: 8 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:29,589-Speed 3409.44 samples/sec Loss 4.7441 LearningRate 0.0329 Epoch: 8 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:32,599-Speed 3402.56 samples/sec Loss 4.8107 LearningRate 0.0329 Epoch: 8 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:35,612-Speed 3400.05 samples/sec Loss 4.7637 LearningRate 0.0329 Epoch: 8 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:38,627-Speed 3396.67 samples/sec Loss 4.7374 LearningRate 0.0329 Epoch: 8 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:41,637-Speed 3402.75 samples/sec Loss 4.8213 LearningRate 0.0329 Epoch: 8 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:44,647-Speed 3402.99 samples/sec Loss 4.8372 LearningRate 0.0329 Epoch: 8 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:47,661-Speed 3398.02 samples/sec Loss 4.8149 LearningRate 0.0329 Epoch: 8 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:50,678-Speed 3395.25 samples/sec Loss 4.8261 LearningRate 0.0329 Epoch: 8 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:27:53,696-Speed 3394.18 samples/sec Loss 4.7824 LearningRate 0.0329 Epoch: 8 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:27:56,709-Speed 3398.58 samples/sec Loss 4.9195 LearningRate 0.0328 Epoch: 8 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:27:59,719-Speed 3403.24 samples/sec Loss 4.8994 LearningRate 0.0328 Epoch: 8 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:28:02,745-Speed 3385.30 samples/sec Loss 4.7955 LearningRate 0.0328 Epoch: 8 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:28:05,739-Speed 3421.17 samples/sec Loss 4.7281 LearningRate 0.0328 Epoch: 8 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:08,753-Speed 3397.09 samples/sec Loss 4.7913 LearningRate 0.0328 Epoch: 8 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:11,808-Speed 3353.60 samples/sec Loss 4.7609 LearningRate 0.0328 Epoch: 8 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:14,905-Speed 3306.30 samples/sec Loss 4.6911 LearningRate 0.0328 Epoch: 8 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:17,935-Speed 3380.43 samples/sec Loss 4.8425 LearningRate 0.0328 Epoch: 8 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:20,949-Speed 3398.54 samples/sec Loss 4.8620 LearningRate 0.0328 Epoch: 8 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:23,992-Speed 3366.00 samples/sec Loss 4.7975 LearningRate 0.0328 Epoch: 8 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:27,010-Speed 3394.17 samples/sec Loss 4.7880 LearningRate 0.0327 Epoch: 8 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:30,027-Speed 3394.68 samples/sec Loss 4.8811 LearningRate 0.0327 Epoch: 8 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:33,043-Speed 3396.32 samples/sec Loss 4.7324 LearningRate 0.0327 Epoch: 8 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:36,121-Speed 3327.80 samples/sec Loss 4.8460 LearningRate 0.0327 Epoch: 8 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:28:39,163-Speed 3366.01 samples/sec Loss 4.9830 LearningRate 0.0327 Epoch: 8 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:28:42,183-Speed 3392.17 samples/sec Loss 4.5895 LearningRate 0.0327 Epoch: 8 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:28:45,188-Speed 3408.49 samples/sec Loss 4.7491 LearningRate 0.0327 Epoch: 8 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:48,205-Speed 3395.33 samples/sec Loss 4.8440 LearningRate 0.0327 Epoch: 8 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:51,215-Speed 3402.53 samples/sec Loss 4.7299 LearningRate 0.0327 Epoch: 8 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:54,227-Speed 3400.75 samples/sec Loss 4.7640 LearningRate 0.0327 Epoch: 8 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:28:57,243-Speed 3395.77 samples/sec Loss 4.7407 LearningRate 0.0326 Epoch: 8 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:00,275-Speed 3377.61 samples/sec Loss 4.6928 LearningRate 0.0326 Epoch: 8 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:03,291-Speed 3396.73 samples/sec Loss 4.7325 LearningRate 0.0326 Epoch: 8 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:06,300-Speed 3403.79 samples/sec Loss 4.7766 LearningRate 0.0326 Epoch: 8 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:09,321-Speed 3390.23 samples/sec Loss 4.8664 LearningRate 0.0326 Epoch: 8 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:12,337-Speed 3396.51 samples/sec Loss 4.7851 LearningRate 0.0326 Epoch: 8 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:15,359-Speed 3388.58 samples/sec Loss 4.7981 LearningRate 0.0326 Epoch: 8 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:18,372-Speed 3400.13 samples/sec Loss 4.8202 LearningRate 0.0326 Epoch: 8 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:21,388-Speed 3395.57 samples/sec Loss 4.8316 LearningRate 0.0326 Epoch: 8 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:24,405-Speed 3394.76 samples/sec Loss 4.7582 LearningRate 0.0325 Epoch: 8 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:27,429-Speed 3387.77 samples/sec Loss 4.8365 LearningRate 0.0325 Epoch: 8 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:30,446-Speed 3394.66 samples/sec Loss 4.8131 LearningRate 0.0325 Epoch: 8 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:33,460-Speed 3398.11 samples/sec Loss 4.8028 LearningRate 0.0325 Epoch: 8 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:36,475-Speed 3397.25 samples/sec Loss 4.7390 LearningRate 0.0325 Epoch: 8 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:29:39,475-Speed 3413.67 samples/sec Loss 4.8415 LearningRate 0.0325 Epoch: 8 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:42,499-Speed 3387.80 samples/sec Loss 4.7887 LearningRate 0.0325 Epoch: 8 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:45,515-Speed 3395.98 samples/sec Loss 4.8556 LearningRate 0.0325 Epoch: 8 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:48,533-Speed 3393.49 samples/sec Loss 4.7578 LearningRate 0.0325 Epoch: 8 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:51,590-Speed 3350.34 samples/sec Loss 4.8804 LearningRate 0.0325 Epoch: 8 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:54,604-Speed 3398.51 samples/sec Loss 4.7342 LearningRate 0.0324 Epoch: 8 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:29:57,623-Speed 3392.88 samples/sec Loss 4.8432 LearningRate 0.0324 Epoch: 8 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:00,639-Speed 3396.12 samples/sec Loss 4.8629 LearningRate 0.0324 Epoch: 8 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:03,654-Speed 3396.88 samples/sec Loss 4.7835 LearningRate 0.0324 Epoch: 8 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:06,668-Speed 3398.17 samples/sec Loss 4.8631 LearningRate 0.0324 Epoch: 8 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:09,687-Speed 3392.37 samples/sec Loss 4.6737 LearningRate 0.0324 Epoch: 8 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:12,689-Speed 3412.27 samples/sec Loss 4.6552 LearningRate 0.0324 Epoch: 8 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:15,701-Speed 3400.41 samples/sec Loss 4.9467 LearningRate 0.0324 Epoch: 8 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:18,721-Speed 3391.87 samples/sec Loss 4.7715 LearningRate 0.0324 Epoch: 8 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:21,745-Speed 3386.93 samples/sec Loss 4.6908 LearningRate 0.0324 Epoch: 8 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:24,766-Speed 3390.71 samples/sec Loss 4.7301 LearningRate 0.0323 Epoch: 8 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:27,788-Speed 3389.52 samples/sec Loss 4.7190 LearningRate 0.0323 Epoch: 8 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:30,804-Speed 3395.80 samples/sec Loss 4.6993 LearningRate 0.0323 Epoch: 8 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:33,820-Speed 3395.87 samples/sec Loss 4.7347 LearningRate 0.0323 Epoch: 8 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:36,841-Speed 3390.20 samples/sec Loss 4.7769 LearningRate 0.0323 Epoch: 8 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:39,916-Speed 3330.99 samples/sec Loss 4.7853 LearningRate 0.0323 Epoch: 8 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:30:42,948-Speed 3378.94 samples/sec Loss 4.7374 LearningRate 0.0323 Epoch: 8 Global Step: 49100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:45,963-Speed 3397.13 samples/sec Loss 4.6284 LearningRate 0.0323 Epoch: 8 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:48,978-Speed 3397.15 samples/sec Loss 4.6635 LearningRate 0.0323 Epoch: 8 Global Step: 49120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:51,995-Speed 3394.35 samples/sec Loss 4.9213 LearningRate 0.0323 Epoch: 8 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:55,012-Speed 3395.48 samples/sec Loss 4.7323 LearningRate 0.0322 Epoch: 8 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:30:58,035-Speed 3387.34 samples/sec Loss 4.7680 LearningRate 0.0322 Epoch: 8 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:01,060-Speed 3386.04 samples/sec Loss 4.7858 LearningRate 0.0322 Epoch: 8 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:04,069-Speed 3404.43 samples/sec Loss 4.6681 LearningRate 0.0322 Epoch: 8 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:07,089-Speed 3391.55 samples/sec Loss 4.8283 LearningRate 0.0322 Epoch: 8 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:10,106-Speed 3394.79 samples/sec Loss 4.8086 LearningRate 0.0322 Epoch: 8 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:13,123-Speed 3395.31 samples/sec Loss 4.7775 LearningRate 0.0322 Epoch: 8 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:16,137-Speed 3398.26 samples/sec Loss 4.8285 LearningRate 0.0322 Epoch: 8 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:19,155-Speed 3393.68 samples/sec Loss 4.7540 LearningRate 0.0322 Epoch: 8 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:22,171-Speed 3395.59 samples/sec Loss 4.6592 LearningRate 0.0322 Epoch: 8 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:25,192-Speed 3390.16 samples/sec Loss 4.7869 LearningRate 0.0321 Epoch: 8 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:28,216-Speed 3387.73 samples/sec Loss 4.8014 LearningRate 0.0321 Epoch: 8 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:31,240-Speed 3386.53 samples/sec Loss 4.8310 LearningRate 0.0321 Epoch: 8 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:34,271-Speed 3380.02 samples/sec Loss 4.7806 LearningRate 0.0321 Epoch: 8 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:37,287-Speed 3395.74 samples/sec Loss 4.7900 LearningRate 0.0321 Epoch: 8 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:40,306-Speed 3392.68 samples/sec Loss 4.7026 LearningRate 0.0321 Epoch: 8 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:43,334-Speed 3382.02 samples/sec Loss 4.7559 LearningRate 0.0321 Epoch: 8 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:46,372-Speed 3371.65 samples/sec Loss 4.7013 LearningRate 0.0321 Epoch: 8 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:49,400-Speed 3382.98 samples/sec Loss 4.6581 LearningRate 0.0321 Epoch: 8 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:31:52,399-Speed 3414.83 samples/sec Loss 4.8434 LearningRate 0.0321 Epoch: 8 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:55,418-Speed 3393.06 samples/sec Loss 4.8152 LearningRate 0.0321 Epoch: 8 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:31:58,436-Speed 3394.25 samples/sec Loss 4.7830 LearningRate 0.0320 Epoch: 8 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:01,529-Speed 3311.74 samples/sec Loss 4.7346 LearningRate 0.0320 Epoch: 8 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:04,544-Speed 3396.44 samples/sec Loss 4.7142 LearningRate 0.0320 Epoch: 8 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:07,566-Speed 3389.95 samples/sec Loss 4.6754 LearningRate 0.0320 Epoch: 8 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:10,581-Speed 3396.98 samples/sec Loss 4.6946 LearningRate 0.0320 Epoch: 8 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:13,599-Speed 3394.01 samples/sec Loss 4.6930 LearningRate 0.0320 Epoch: 8 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:16,616-Speed 3394.41 samples/sec Loss 4.6472 LearningRate 0.0320 Epoch: 8 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:19,636-Speed 3391.72 samples/sec Loss 4.6986 LearningRate 0.0320 Epoch: 8 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:22,653-Speed 3395.03 samples/sec Loss 4.6635 LearningRate 0.0320 Epoch: 8 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:32:25,654-Speed 3412.96 samples/sec Loss 4.6296 LearningRate 0.0320 Epoch: 8 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:28,671-Speed 3394.69 samples/sec Loss 4.8343 LearningRate 0.0319 Epoch: 8 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:31,686-Speed 3396.86 samples/sec Loss 4.7525 LearningRate 0.0319 Epoch: 8 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:34,706-Speed 3392.19 samples/sec Loss 4.6857 LearningRate 0.0319 Epoch: 8 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:37,750-Speed 3363.87 samples/sec Loss 4.7087 LearningRate 0.0319 Epoch: 8 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:40,805-Speed 3353.47 samples/sec Loss 4.7053 LearningRate 0.0319 Epoch: 8 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:43,825-Speed 3391.24 samples/sec Loss 4.8096 LearningRate 0.0319 Epoch: 8 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:46,845-Speed 3391.65 samples/sec Loss 4.7256 LearningRate 0.0319 Epoch: 8 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:49,863-Speed 3393.52 samples/sec Loss 4.6613 LearningRate 0.0319 Epoch: 8 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:52,882-Speed 3392.93 samples/sec Loss 4.7200 LearningRate 0.0319 Epoch: 8 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:32:55,900-Speed 3393.37 samples/sec Loss 4.7488 LearningRate 0.0319 Epoch: 8 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:32:58,939-Speed 3371.33 samples/sec Loss 4.8223 LearningRate 0.0318 Epoch: 8 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:01,957-Speed 3393.62 samples/sec Loss 4.7936 LearningRate 0.0318 Epoch: 8 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:04,982-Speed 3385.70 samples/sec Loss 4.6257 LearningRate 0.0318 Epoch: 8 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:07,980-Speed 3416.36 samples/sec Loss 4.7419 LearningRate 0.0318 Epoch: 8 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:11,004-Speed 3386.66 samples/sec Loss 4.8226 LearningRate 0.0318 Epoch: 8 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:14,045-Speed 3367.54 samples/sec Loss 4.7822 LearningRate 0.0318 Epoch: 8 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:17,066-Speed 3390.87 samples/sec Loss 4.7667 LearningRate 0.0318 Epoch: 8 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:20,087-Speed 3391.11 samples/sec Loss 4.6418 LearningRate 0.0318 Epoch: 8 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:23,135-Speed 3360.13 samples/sec Loss 4.6904 LearningRate 0.0318 Epoch: 8 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:26,172-Speed 3372.59 samples/sec Loss 4.7090 LearningRate 0.0318 Epoch: 8 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:29,231-Speed 3349.20 samples/sec Loss 4.7554 LearningRate 0.0317 Epoch: 8 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:32,262-Speed 3379.27 samples/sec Loss 4.7888 LearningRate 0.0317 Epoch: 8 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:35,286-Speed 3386.86 samples/sec Loss 4.7196 LearningRate 0.0317 Epoch: 8 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:38,305-Speed 3392.28 samples/sec Loss 4.6182 LearningRate 0.0317 Epoch: 8 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:41,326-Speed 3390.90 samples/sec Loss 4.6235 LearningRate 0.0317 Epoch: 8 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:44,359-Speed 3376.58 samples/sec Loss 4.7472 LearningRate 0.0317 Epoch: 8 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:47,396-Speed 3372.80 samples/sec Loss 4.7528 LearningRate 0.0317 Epoch: 8 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:50,523-Speed 3275.46 samples/sec Loss 4.7061 LearningRate 0.0317 Epoch: 8 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:33:53,525-Speed 3411.97 samples/sec Loss 4.7834 LearningRate 0.0317 Epoch: 8 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:56,545-Speed 3391.13 samples/sec Loss 4.7384 LearningRate 0.0317 Epoch: 8 Global Step: 49740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:33:59,599-Speed 3353.71 samples/sec Loss 4.6823 LearningRate 0.0316 Epoch: 8 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:02,656-Speed 3350.56 samples/sec Loss 4.6377 LearningRate 0.0316 Epoch: 8 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:05,682-Speed 3384.87 samples/sec Loss 4.6674 LearningRate 0.0316 Epoch: 8 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:08,703-Speed 3389.63 samples/sec Loss 4.8502 LearningRate 0.0316 Epoch: 8 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:11,847-Speed 3258.54 samples/sec Loss 4.6325 LearningRate 0.0316 Epoch: 8 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:14,933-Speed 3319.05 samples/sec Loss 4.7713 LearningRate 0.0316 Epoch: 8 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:17,964-Speed 3379.81 samples/sec Loss 4.6500 LearningRate 0.0316 Epoch: 8 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:20,990-Speed 3384.18 samples/sec Loss 4.7359 LearningRate 0.0316 Epoch: 8 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:24,009-Speed 3392.18 samples/sec Loss 4.6891 LearningRate 0.0316 Epoch: 8 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:34:27,029-Speed 3391.39 samples/sec Loss 4.7573 LearningRate 0.0316 Epoch: 8 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:34:30,036-Speed 3406.48 samples/sec Loss 4.6573 LearningRate 0.0315 Epoch: 8 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:33,054-Speed 3393.84 samples/sec Loss 4.7032 LearningRate 0.0315 Epoch: 8 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:36,085-Speed 3378.91 samples/sec Loss 4.6792 LearningRate 0.0315 Epoch: 8 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:39,130-Speed 3363.59 samples/sec Loss 4.6271 LearningRate 0.0315 Epoch: 8 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:42,156-Speed 3384.55 samples/sec Loss 4.7237 LearningRate 0.0315 Epoch: 8 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:45,180-Speed 3387.72 samples/sec Loss 4.7349 LearningRate 0.0315 Epoch: 8 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:48,213-Speed 3377.50 samples/sec Loss 4.8406 LearningRate 0.0315 Epoch: 8 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:51,242-Speed 3381.69 samples/sec Loss 4.6009 LearningRate 0.0315 Epoch: 8 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:54,271-Speed 3380.91 samples/sec Loss 4.8316 LearningRate 0.0315 Epoch: 8 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:34:57,293-Speed 3389.56 samples/sec Loss 4.7187 LearningRate 0.0315 Epoch: 8 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:35:00,314-Speed 3390.05 samples/sec Loss 4.7797 LearningRate 0.0314 Epoch: 8 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:35:03,353-Speed 3370.49 samples/sec Loss 4.6880 LearningRate 0.0314 Epoch: 8 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:35:06,372-Speed 3391.83 samples/sec Loss 4.6964 LearningRate 0.0314 Epoch: 8 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:35:09,395-Speed 3387.98 samples/sec Loss 4.6692 LearningRate 0.0314 Epoch: 8 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:35:12,421-Speed 3385.70 samples/sec Loss 4.5567 LearningRate 0.0314 Epoch: 8 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:35:15,445-Speed 3387.62 samples/sec Loss 4.6905 LearningRate 0.0314 Epoch: 8 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:35:58,954-[lfw][50000]XNorm: 21.875312 Training: 2022-04-27 06:35:58,955-[lfw][50000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 06:35:58,955-[lfw][50000]Accuracy-Highest: 0.99817 Training: 2022-04-27 06:36:49,337-[cfp_fp][50000]XNorm: 19.527867 Training: 2022-04-27 06:36:49,337-[cfp_fp][50000]Accuracy-Flip: 0.95843+-0.00895 Training: 2022-04-27 06:36:49,338-[cfp_fp][50000]Accuracy-Highest: 0.96414 Training: 2022-04-27 06:37:32,889-[agedb_30][50000]XNorm: 21.759556 Training: 2022-04-27 06:37:32,889-[agedb_30][50000]Accuracy-Flip: 0.97567+-0.00814 Training: 2022-04-27 06:37:32,890-[agedb_30][50000]Accuracy-Highest: 0.97767 Training: 2022-04-27 06:37:35,905-Speed 72.90 samples/sec Loss 4.6234 LearningRate 0.0314 Epoch: 8 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:38,912-Speed 3406.70 samples/sec Loss 4.6668 LearningRate 0.0314 Epoch: 8 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:41,923-Speed 3401.48 samples/sec Loss 4.6183 LearningRate 0.0314 Epoch: 8 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:44,942-Speed 3392.14 samples/sec Loss 4.7282 LearningRate 0.0314 Epoch: 8 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:47,965-Speed 3388.77 samples/sec Loss 4.7514 LearningRate 0.0313 Epoch: 8 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:51,011-Speed 3362.24 samples/sec Loss 4.7674 LearningRate 0.0313 Epoch: 8 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:37:54,048-Speed 3374.05 samples/sec Loss 4.6597 LearningRate 0.0313 Epoch: 8 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:37:57,066-Speed 3392.98 samples/sec Loss 4.6274 LearningRate 0.0313 Epoch: 8 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:38:00,094-Speed 3383.73 samples/sec Loss 4.7048 LearningRate 0.0313 Epoch: 8 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:38:03,131-Speed 3372.39 samples/sec Loss 4.6003 LearningRate 0.0313 Epoch: 8 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:38:06,249-Speed 3284.24 samples/sec Loss 4.7412 LearningRate 0.0313 Epoch: 8 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:38:09,254-Speed 3409.49 samples/sec Loss 4.6597 LearningRate 0.0313 Epoch: 8 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:12,275-Speed 3390.48 samples/sec Loss 4.6976 LearningRate 0.0313 Epoch: 8 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:15,295-Speed 3390.75 samples/sec Loss 4.7655 LearningRate 0.0313 Epoch: 8 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:18,321-Speed 3385.11 samples/sec Loss 4.6773 LearningRate 0.0312 Epoch: 8 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:21,352-Speed 3379.38 samples/sec Loss 4.6825 LearningRate 0.0312 Epoch: 8 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:24,380-Speed 3381.52 samples/sec Loss 4.7678 LearningRate 0.0312 Epoch: 8 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:27,402-Speed 3390.11 samples/sec Loss 4.6719 LearningRate 0.0312 Epoch: 8 Global Step: 50180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:30,425-Speed 3388.22 samples/sec Loss 4.7778 LearningRate 0.0312 Epoch: 8 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:33,450-Speed 3386.15 samples/sec Loss 4.7049 LearningRate 0.0312 Epoch: 8 Global Step: 50200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:36,477-Speed 3383.85 samples/sec Loss 4.8028 LearningRate 0.0312 Epoch: 8 Global Step: 50210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:39,517-Speed 3368.73 samples/sec Loss 4.6759 LearningRate 0.0312 Epoch: 8 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:38:42,527-Speed 3403.50 samples/sec Loss 4.6698 LearningRate 0.0312 Epoch: 8 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:45,574-Speed 3361.48 samples/sec Loss 4.6847 LearningRate 0.0312 Epoch: 8 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:48,606-Speed 3377.96 samples/sec Loss 4.6167 LearningRate 0.0312 Epoch: 8 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:51,641-Speed 3374.90 samples/sec Loss 4.7121 LearningRate 0.0311 Epoch: 8 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:54,682-Speed 3367.33 samples/sec Loss 4.5617 LearningRate 0.0311 Epoch: 8 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:38:57,731-Speed 3360.46 samples/sec Loss 4.7345 LearningRate 0.0311 Epoch: 8 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:00,778-Speed 3360.52 samples/sec Loss 4.7167 LearningRate 0.0311 Epoch: 8 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:03,831-Speed 3354.91 samples/sec Loss 4.6163 LearningRate 0.0311 Epoch: 8 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:06,861-Speed 3380.64 samples/sec Loss 4.8213 LearningRate 0.0311 Epoch: 8 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:09,891-Speed 3380.62 samples/sec Loss 4.6092 LearningRate 0.0311 Epoch: 8 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:12,916-Speed 3385.46 samples/sec Loss 4.5648 LearningRate 0.0311 Epoch: 8 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:39:15,932-Speed 3396.21 samples/sec Loss 4.6186 LearningRate 0.0311 Epoch: 8 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:18,977-Speed 3363.83 samples/sec Loss 4.6097 LearningRate 0.0311 Epoch: 8 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:22,004-Speed 3383.03 samples/sec Loss 4.6959 LearningRate 0.0310 Epoch: 8 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:25,035-Speed 3380.13 samples/sec Loss 4.7267 LearningRate 0.0310 Epoch: 8 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:28,064-Speed 3381.00 samples/sec Loss 4.6863 LearningRate 0.0310 Epoch: 8 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:31,092-Speed 3382.90 samples/sec Loss 4.6314 LearningRate 0.0310 Epoch: 8 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:34,112-Speed 3391.18 samples/sec Loss 4.5820 LearningRate 0.0310 Epoch: 8 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:37,139-Speed 3382.99 samples/sec Loss 4.6503 LearningRate 0.0310 Epoch: 8 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:40,161-Speed 3390.14 samples/sec Loss 4.6698 LearningRate 0.0310 Epoch: 8 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:43,187-Speed 3383.87 samples/sec Loss 4.6380 LearningRate 0.0310 Epoch: 8 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:39:46,208-Speed 3391.34 samples/sec Loss 4.6122 LearningRate 0.0310 Epoch: 8 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:39:49,238-Speed 3380.35 samples/sec Loss 4.7358 LearningRate 0.0310 Epoch: 8 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:39:52,272-Speed 3375.90 samples/sec Loss 4.6766 LearningRate 0.0309 Epoch: 8 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:39:55,295-Speed 3387.46 samples/sec Loss 4.7118 LearningRate 0.0309 Epoch: 8 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:39:58,322-Speed 3383.79 samples/sec Loss 4.6797 LearningRate 0.0309 Epoch: 8 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 06:40:01,351-Speed 3381.24 samples/sec Loss 4.7154 LearningRate 0.0309 Epoch: 8 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:04,377-Speed 3385.59 samples/sec Loss 4.6413 LearningRate 0.0309 Epoch: 8 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:07,398-Speed 3390.22 samples/sec Loss 4.6428 LearningRate 0.0309 Epoch: 8 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:10,427-Speed 3381.70 samples/sec Loss 4.7082 LearningRate 0.0309 Epoch: 8 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:13,452-Speed 3385.94 samples/sec Loss 4.6826 LearningRate 0.0309 Epoch: 8 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:16,529-Speed 3328.78 samples/sec Loss 4.8028 LearningRate 0.0309 Epoch: 8 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 06:40:19,553-Speed 3386.94 samples/sec Loss 4.7434 LearningRate 0.0309 Epoch: 8 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:22,574-Speed 3390.68 samples/sec Loss 4.6249 LearningRate 0.0308 Epoch: 8 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:25,593-Speed 3391.49 samples/sec Loss 4.5369 LearningRate 0.0308 Epoch: 8 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:28,615-Speed 3389.59 samples/sec Loss 4.6986 LearningRate 0.0308 Epoch: 8 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:31,635-Speed 3391.39 samples/sec Loss 4.6292 LearningRate 0.0308 Epoch: 8 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:40:34,665-Speed 3380.95 samples/sec Loss 4.6159 LearningRate 0.0308 Epoch: 8 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:40:37,701-Speed 3372.65 samples/sec Loss 4.6270 LearningRate 0.0308 Epoch: 8 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:40:40,713-Speed 3402.37 samples/sec Loss 4.8218 LearningRate 0.0308 Epoch: 8 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:43,740-Speed 3383.55 samples/sec Loss 4.5479 LearningRate 0.0308 Epoch: 8 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:46,762-Speed 3389.48 samples/sec Loss 4.6720 LearningRate 0.0308 Epoch: 8 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:49,797-Speed 3374.43 samples/sec Loss 4.6844 LearningRate 0.0308 Epoch: 8 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:52,821-Speed 3386.53 samples/sec Loss 4.6535 LearningRate 0.0307 Epoch: 8 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:55,860-Speed 3370.34 samples/sec Loss 4.5804 LearningRate 0.0307 Epoch: 8 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:40:58,908-Speed 3360.38 samples/sec Loss 4.7277 LearningRate 0.0307 Epoch: 8 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:01,934-Speed 3385.14 samples/sec Loss 4.6006 LearningRate 0.0307 Epoch: 8 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:04,964-Speed 3380.23 samples/sec Loss 4.7511 LearningRate 0.0307 Epoch: 8 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:07,996-Speed 3377.86 samples/sec Loss 4.7078 LearningRate 0.0307 Epoch: 8 Global Step: 50710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:11,002-Speed 3407.63 samples/sec Loss 4.7479 LearningRate 0.0307 Epoch: 8 Global Step: 50720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:14,028-Speed 3384.71 samples/sec Loss 4.6523 LearningRate 0.0307 Epoch: 8 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:17,055-Speed 3384.56 samples/sec Loss 4.6479 LearningRate 0.0307 Epoch: 8 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:20,080-Speed 3386.04 samples/sec Loss 4.5551 LearningRate 0.0307 Epoch: 8 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:23,109-Speed 3381.32 samples/sec Loss 4.5644 LearningRate 0.0307 Epoch: 8 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:26,132-Speed 3388.04 samples/sec Loss 4.6623 LearningRate 0.0306 Epoch: 8 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:29,158-Speed 3384.64 samples/sec Loss 4.6471 LearningRate 0.0306 Epoch: 8 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:32,185-Speed 3383.63 samples/sec Loss 4.6611 LearningRate 0.0306 Epoch: 8 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:35,229-Speed 3365.28 samples/sec Loss 4.5634 LearningRate 0.0306 Epoch: 8 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:38,263-Speed 3375.09 samples/sec Loss 4.5877 LearningRate 0.0306 Epoch: 8 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:41,294-Speed 3380.14 samples/sec Loss 4.7116 LearningRate 0.0306 Epoch: 8 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:41:44,299-Speed 3408.73 samples/sec Loss 4.6015 LearningRate 0.0306 Epoch: 8 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:47,327-Speed 3382.30 samples/sec Loss 4.6773 LearningRate 0.0306 Epoch: 8 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:50,360-Speed 3377.09 samples/sec Loss 4.6843 LearningRate 0.0306 Epoch: 8 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:53,384-Speed 3386.82 samples/sec Loss 4.5665 LearningRate 0.0306 Epoch: 8 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:56,408-Speed 3386.23 samples/sec Loss 4.5831 LearningRate 0.0305 Epoch: 8 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:41:59,431-Speed 3388.80 samples/sec Loss 4.5979 LearningRate 0.0305 Epoch: 8 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:02,453-Speed 3389.32 samples/sec Loss 4.7281 LearningRate 0.0305 Epoch: 8 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:05,478-Speed 3385.70 samples/sec Loss 4.6249 LearningRate 0.0305 Epoch: 8 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:08,506-Speed 3382.37 samples/sec Loss 4.8049 LearningRate 0.0305 Epoch: 8 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:11,594-Speed 3317.60 samples/sec Loss 4.6824 LearningRate 0.0305 Epoch: 8 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:14,652-Speed 3350.26 samples/sec Loss 4.5421 LearningRate 0.0305 Epoch: 8 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:42:17,687-Speed 3374.75 samples/sec Loss 4.5820 LearningRate 0.0305 Epoch: 8 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:42:20,704-Speed 3394.75 samples/sec Loss 4.6236 LearningRate 0.0305 Epoch: 8 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:23,730-Speed 3383.99 samples/sec Loss 4.6677 LearningRate 0.0305 Epoch: 8 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:26,751-Speed 3390.14 samples/sec Loss 4.6285 LearningRate 0.0304 Epoch: 8 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:29,781-Speed 3380.19 samples/sec Loss 4.7144 LearningRate 0.0304 Epoch: 8 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:32,812-Speed 3380.21 samples/sec Loss 4.5745 LearningRate 0.0304 Epoch: 8 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:35,928-Speed 3287.54 samples/sec Loss 4.5936 LearningRate 0.0304 Epoch: 8 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:38,963-Speed 3374.71 samples/sec Loss 4.4620 LearningRate 0.0304 Epoch: 8 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:41,995-Speed 3377.08 samples/sec Loss 4.5882 LearningRate 0.0304 Epoch: 8 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:45,021-Speed 3385.24 samples/sec Loss 4.5034 LearningRate 0.0304 Epoch: 8 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:48,045-Speed 3386.98 samples/sec Loss 4.4007 LearningRate 0.0304 Epoch: 8 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:51,073-Speed 3383.29 samples/sec Loss 4.5396 LearningRate 0.0304 Epoch: 8 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:42:54,103-Speed 3380.01 samples/sec Loss 4.6591 LearningRate 0.0304 Epoch: 8 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:42:57,131-Speed 3382.25 samples/sec Loss 4.7308 LearningRate 0.0304 Epoch: 8 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:00,167-Speed 3373.51 samples/sec Loss 4.6166 LearningRate 0.0303 Epoch: 8 Global Step: 51080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:03,214-Speed 3363.03 samples/sec Loss 4.6559 LearningRate 0.0303 Epoch: 8 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:06,241-Speed 3382.73 samples/sec Loss 4.6858 LearningRate 0.0303 Epoch: 8 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:09,287-Speed 3363.25 samples/sec Loss 4.5644 LearningRate 0.0303 Epoch: 8 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:12,316-Speed 3381.18 samples/sec Loss 4.5872 LearningRate 0.0303 Epoch: 8 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:15,341-Speed 3385.98 samples/sec Loss 4.5695 LearningRate 0.0303 Epoch: 8 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:18,373-Speed 3377.59 samples/sec Loss 4.6471 LearningRate 0.0303 Epoch: 8 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:21,405-Speed 3378.20 samples/sec Loss 4.4873 LearningRate 0.0303 Epoch: 8 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:24,528-Speed 3279.88 samples/sec Loss 4.6184 LearningRate 0.0303 Epoch: 8 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:27,788-Speed 3141.50 samples/sec Loss 4.6812 LearningRate 0.0303 Epoch: 8 Global Step: 51170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:41,358-Speed 754.69 samples/sec Loss 4.2444 LearningRate 0.0302 Epoch: 9 Global Step: 51180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:44,389-Speed 3378.97 samples/sec Loss 3.9491 LearningRate 0.0302 Epoch: 9 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:47,420-Speed 3380.21 samples/sec Loss 3.9316 LearningRate 0.0302 Epoch: 9 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:50,447-Speed 3382.62 samples/sec Loss 3.9513 LearningRate 0.0302 Epoch: 9 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:53,483-Speed 3374.14 samples/sec Loss 4.0236 LearningRate 0.0302 Epoch: 9 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:43:56,488-Speed 3408.14 samples/sec Loss 3.9332 LearningRate 0.0302 Epoch: 9 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:43:59,521-Speed 3377.69 samples/sec Loss 4.0510 LearningRate 0.0302 Epoch: 9 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:02,548-Speed 3383.27 samples/sec Loss 3.9494 LearningRate 0.0302 Epoch: 9 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:05,578-Speed 3381.00 samples/sec Loss 4.0751 LearningRate 0.0302 Epoch: 9 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:08,616-Speed 3372.02 samples/sec Loss 4.1238 LearningRate 0.0302 Epoch: 9 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:11,807-Speed 3209.66 samples/sec Loss 4.0182 LearningRate 0.0301 Epoch: 9 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:14,821-Speed 3398.34 samples/sec Loss 4.0171 LearningRate 0.0301 Epoch: 9 Global Step: 51290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:17,857-Speed 3372.85 samples/sec Loss 3.8868 LearningRate 0.0301 Epoch: 9 Global Step: 51300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:20,890-Speed 3377.57 samples/sec Loss 4.1247 LearningRate 0.0301 Epoch: 9 Global Step: 51310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:23,937-Speed 3361.71 samples/sec Loss 4.1162 LearningRate 0.0301 Epoch: 9 Global Step: 51320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:27,001-Speed 3342.62 samples/sec Loss 4.1366 LearningRate 0.0301 Epoch: 9 Global Step: 51330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:30,064-Speed 3343.58 samples/sec Loss 4.0292 LearningRate 0.0301 Epoch: 9 Global Step: 51340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:33,099-Speed 3374.68 samples/sec Loss 4.0876 LearningRate 0.0301 Epoch: 9 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:36,161-Speed 3344.92 samples/sec Loss 3.9444 LearningRate 0.0301 Epoch: 9 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:39,199-Speed 3372.14 samples/sec Loss 4.1814 LearningRate 0.0301 Epoch: 9 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:42,234-Speed 3374.42 samples/sec Loss 4.1649 LearningRate 0.0301 Epoch: 9 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:44:45,272-Speed 3371.39 samples/sec Loss 4.1309 LearningRate 0.0300 Epoch: 9 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:48,310-Speed 3370.88 samples/sec Loss 4.0589 LearningRate 0.0300 Epoch: 9 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:51,351-Speed 3368.55 samples/sec Loss 4.1332 LearningRate 0.0300 Epoch: 9 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:54,396-Speed 3363.86 samples/sec Loss 4.1146 LearningRate 0.0300 Epoch: 9 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:44:57,440-Speed 3365.00 samples/sec Loss 4.0904 LearningRate 0.0300 Epoch: 9 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:00,477-Speed 3372.25 samples/sec Loss 4.2602 LearningRate 0.0300 Epoch: 9 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:03,526-Speed 3359.65 samples/sec Loss 4.3176 LearningRate 0.0300 Epoch: 9 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:06,558-Speed 3377.32 samples/sec Loss 4.0927 LearningRate 0.0300 Epoch: 9 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:09,590-Speed 3377.97 samples/sec Loss 4.1932 LearningRate 0.0300 Epoch: 9 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:12,619-Speed 3381.63 samples/sec Loss 4.0898 LearningRate 0.0300 Epoch: 9 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:15,651-Speed 3377.99 samples/sec Loss 4.1638 LearningRate 0.0299 Epoch: 9 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:18,681-Speed 3380.90 samples/sec Loss 4.0821 LearningRate 0.0299 Epoch: 9 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:21,707-Speed 3384.89 samples/sec Loss 4.1142 LearningRate 0.0299 Epoch: 9 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:24,751-Speed 3364.70 samples/sec Loss 4.1846 LearningRate 0.0299 Epoch: 9 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:27,781-Speed 3379.87 samples/sec Loss 4.1571 LearningRate 0.0299 Epoch: 9 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:30,806-Speed 3386.74 samples/sec Loss 4.1864 LearningRate 0.0299 Epoch: 9 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:33,836-Speed 3379.89 samples/sec Loss 4.2325 LearningRate 0.0299 Epoch: 9 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:45:36,846-Speed 3402.63 samples/sec Loss 4.2154 LearningRate 0.0299 Epoch: 9 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:39,873-Speed 3383.40 samples/sec Loss 4.1080 LearningRate 0.0299 Epoch: 9 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:42,894-Speed 3390.79 samples/sec Loss 4.3035 LearningRate 0.0299 Epoch: 9 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:45,929-Speed 3375.34 samples/sec Loss 4.0810 LearningRate 0.0298 Epoch: 9 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:48,969-Speed 3369.27 samples/sec Loss 4.1443 LearningRate 0.0298 Epoch: 9 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:51,991-Speed 3389.33 samples/sec Loss 4.1801 LearningRate 0.0298 Epoch: 9 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:55,012-Speed 3389.55 samples/sec Loss 4.2547 LearningRate 0.0298 Epoch: 9 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:45:58,042-Speed 3380.63 samples/sec Loss 4.2537 LearningRate 0.0298 Epoch: 9 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:01,068-Speed 3384.86 samples/sec Loss 4.1832 LearningRate 0.0298 Epoch: 9 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:04,088-Speed 3391.39 samples/sec Loss 4.2986 LearningRate 0.0298 Epoch: 9 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:07,113-Speed 3386.09 samples/sec Loss 4.3070 LearningRate 0.0298 Epoch: 9 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:10,144-Speed 3379.06 samples/sec Loss 4.1961 LearningRate 0.0298 Epoch: 9 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:13,216-Speed 3335.00 samples/sec Loss 4.3117 LearningRate 0.0298 Epoch: 9 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:16,259-Speed 3365.04 samples/sec Loss 4.2540 LearningRate 0.0298 Epoch: 9 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:19,284-Speed 3386.28 samples/sec Loss 4.3453 LearningRate 0.0297 Epoch: 9 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:22,320-Speed 3373.55 samples/sec Loss 4.3162 LearningRate 0.0297 Epoch: 9 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:25,342-Speed 3389.11 samples/sec Loss 4.3037 LearningRate 0.0297 Epoch: 9 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:28,366-Speed 3386.71 samples/sec Loss 4.1786 LearningRate 0.0297 Epoch: 9 Global Step: 51730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:31,392-Speed 3385.16 samples/sec Loss 4.2063 LearningRate 0.0297 Epoch: 9 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:34,455-Speed 3344.11 samples/sec Loss 4.3147 LearningRate 0.0297 Epoch: 9 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:37,576-Speed 3281.57 samples/sec Loss 4.2571 LearningRate 0.0297 Epoch: 9 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:40,604-Speed 3383.08 samples/sec Loss 4.3321 LearningRate 0.0297 Epoch: 9 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:46:43,610-Speed 3406.90 samples/sec Loss 4.2197 LearningRate 0.0297 Epoch: 9 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:46,638-Speed 3382.44 samples/sec Loss 4.3020 LearningRate 0.0297 Epoch: 9 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:49,660-Speed 3389.26 samples/sec Loss 4.1520 LearningRate 0.0296 Epoch: 9 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:52,685-Speed 3386.27 samples/sec Loss 4.1581 LearningRate 0.0296 Epoch: 9 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:55,711-Speed 3384.72 samples/sec Loss 4.3755 LearningRate 0.0296 Epoch: 9 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:46:58,743-Speed 3377.28 samples/sec Loss 4.2931 LearningRate 0.0296 Epoch: 9 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:01,767-Speed 3387.86 samples/sec Loss 4.2173 LearningRate 0.0296 Epoch: 9 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:04,800-Speed 3376.88 samples/sec Loss 4.2718 LearningRate 0.0296 Epoch: 9 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:07,826-Speed 3384.85 samples/sec Loss 4.2098 LearningRate 0.0296 Epoch: 9 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:10,862-Speed 3373.66 samples/sec Loss 4.2455 LearningRate 0.0296 Epoch: 9 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:13,897-Speed 3374.98 samples/sec Loss 4.1613 LearningRate 0.0296 Epoch: 9 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:16,920-Speed 3387.46 samples/sec Loss 4.1937 LearningRate 0.0296 Epoch: 9 Global Step: 51890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:19,945-Speed 3386.19 samples/sec Loss 4.1868 LearningRate 0.0296 Epoch: 9 Global Step: 51900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:22,977-Speed 3378.73 samples/sec Loss 4.3283 LearningRate 0.0295 Epoch: 9 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:26,016-Speed 3371.43 samples/sec Loss 4.3324 LearningRate 0.0295 Epoch: 9 Global Step: 51920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:29,081-Speed 3341.85 samples/sec Loss 4.3743 LearningRate 0.0295 Epoch: 9 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:47:32,083-Speed 3411.14 samples/sec Loss 4.3347 LearningRate 0.0295 Epoch: 9 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:35,124-Speed 3368.21 samples/sec Loss 4.3834 LearningRate 0.0295 Epoch: 9 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:38,147-Speed 3388.72 samples/sec Loss 4.2834 LearningRate 0.0295 Epoch: 9 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:41,167-Speed 3390.95 samples/sec Loss 4.3462 LearningRate 0.0295 Epoch: 9 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:44,191-Speed 3387.50 samples/sec Loss 4.3469 LearningRate 0.0295 Epoch: 9 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:47,214-Speed 3387.50 samples/sec Loss 4.2689 LearningRate 0.0295 Epoch: 9 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:47:50,264-Speed 3358.52 samples/sec Loss 4.4010 LearningRate 0.0295 Epoch: 9 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:48:33,509-[lfw][52000]XNorm: 21.903268 Training: 2022-04-27 06:48:33,509-[lfw][52000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-27 06:48:33,510-[lfw][52000]Accuracy-Highest: 0.99817 Training: 2022-04-27 06:49:23,737-[cfp_fp][52000]XNorm: 19.901404 Training: 2022-04-27 06:49:23,738-[cfp_fp][52000]Accuracy-Flip: 0.96114+-0.01006 Training: 2022-04-27 06:49:23,738-[cfp_fp][52000]Accuracy-Highest: 0.96414 Training: 2022-04-27 06:50:06,894-[agedb_30][52000]XNorm: 22.019052 Training: 2022-04-27 06:50:06,894-[agedb_30][52000]Accuracy-Flip: 0.97733+-0.00727 Training: 2022-04-27 06:50:06,895-[agedb_30][52000]Accuracy-Highest: 0.97767 Training: 2022-04-27 06:50:09,907-Speed 73.33 samples/sec Loss 4.2374 LearningRate 0.0294 Epoch: 9 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:12,930-Speed 3387.37 samples/sec Loss 4.3676 LearningRate 0.0294 Epoch: 9 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:15,943-Speed 3399.94 samples/sec Loss 4.3944 LearningRate 0.0294 Epoch: 9 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:18,954-Speed 3401.63 samples/sec Loss 4.2378 LearningRate 0.0294 Epoch: 9 Global Step: 52040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:50:21,950-Speed 3418.91 samples/sec Loss 4.2509 LearningRate 0.0294 Epoch: 9 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:24,963-Speed 3398.99 samples/sec Loss 4.3423 LearningRate 0.0294 Epoch: 9 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:27,996-Speed 3377.08 samples/sec Loss 4.1853 LearningRate 0.0294 Epoch: 9 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:31,024-Speed 3382.74 samples/sec Loss 4.3436 LearningRate 0.0294 Epoch: 9 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:34,051-Speed 3383.31 samples/sec Loss 4.3415 LearningRate 0.0294 Epoch: 9 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:37,069-Speed 3394.39 samples/sec Loss 4.4282 LearningRate 0.0294 Epoch: 9 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:40,094-Speed 3385.51 samples/sec Loss 4.2253 LearningRate 0.0294 Epoch: 9 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:43,111-Speed 3394.74 samples/sec Loss 4.4339 LearningRate 0.0293 Epoch: 9 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:46,126-Speed 3397.47 samples/sec Loss 4.3372 LearningRate 0.0293 Epoch: 9 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:49,147-Speed 3390.74 samples/sec Loss 4.1987 LearningRate 0.0293 Epoch: 9 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:52,161-Speed 3398.33 samples/sec Loss 4.2445 LearningRate 0.0293 Epoch: 9 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:50:55,180-Speed 3391.97 samples/sec Loss 4.1878 LearningRate 0.0293 Epoch: 9 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:50:58,196-Speed 3396.83 samples/sec Loss 4.4711 LearningRate 0.0293 Epoch: 9 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:01,208-Speed 3400.09 samples/sec Loss 4.2545 LearningRate 0.0293 Epoch: 9 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:04,227-Speed 3392.66 samples/sec Loss 4.3952 LearningRate 0.0293 Epoch: 9 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:07,240-Speed 3399.56 samples/sec Loss 4.3538 LearningRate 0.0293 Epoch: 9 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:10,252-Speed 3400.16 samples/sec Loss 4.2654 LearningRate 0.0293 Epoch: 9 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:13,267-Speed 3397.94 samples/sec Loss 4.2496 LearningRate 0.0292 Epoch: 9 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:16,282-Speed 3397.39 samples/sec Loss 4.2416 LearningRate 0.0292 Epoch: 9 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:19,304-Speed 3389.34 samples/sec Loss 4.3083 LearningRate 0.0292 Epoch: 9 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:22,317-Speed 3398.34 samples/sec Loss 4.3677 LearningRate 0.0292 Epoch: 9 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:25,350-Speed 3377.51 samples/sec Loss 4.3335 LearningRate 0.0292 Epoch: 9 Global Step: 52260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:51:28,386-Speed 3373.61 samples/sec Loss 4.3961 LearningRate 0.0292 Epoch: 9 Global Step: 52270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:51:31,402-Speed 3396.01 samples/sec Loss 4.4022 LearningRate 0.0292 Epoch: 9 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:51:34,410-Speed 3404.94 samples/sec Loss 4.3228 LearningRate 0.0292 Epoch: 9 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:37,440-Speed 3380.87 samples/sec Loss 4.3257 LearningRate 0.0292 Epoch: 9 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:40,463-Speed 3387.85 samples/sec Loss 4.3801 LearningRate 0.0292 Epoch: 9 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:43,487-Speed 3386.57 samples/sec Loss 4.4441 LearningRate 0.0292 Epoch: 9 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:46,539-Speed 3356.47 samples/sec Loss 4.3150 LearningRate 0.0291 Epoch: 9 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:49,565-Speed 3385.31 samples/sec Loss 4.4193 LearningRate 0.0291 Epoch: 9 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:52,597-Speed 3377.79 samples/sec Loss 4.4371 LearningRate 0.0291 Epoch: 9 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:55,619-Speed 3389.62 samples/sec Loss 4.4522 LearningRate 0.0291 Epoch: 9 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:51:58,652-Speed 3377.46 samples/sec Loss 4.3051 LearningRate 0.0291 Epoch: 9 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:01,671-Speed 3392.14 samples/sec Loss 4.3131 LearningRate 0.0291 Epoch: 9 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:04,668-Speed 3417.63 samples/sec Loss 4.4158 LearningRate 0.0291 Epoch: 9 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:07,681-Speed 3400.08 samples/sec Loss 4.3667 LearningRate 0.0291 Epoch: 9 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:10,701-Speed 3391.56 samples/sec Loss 4.4086 LearningRate 0.0291 Epoch: 9 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:13,734-Speed 3376.76 samples/sec Loss 4.3680 LearningRate 0.0291 Epoch: 9 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:16,763-Speed 3381.72 samples/sec Loss 4.2563 LearningRate 0.0290 Epoch: 9 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:19,777-Speed 3397.90 samples/sec Loss 4.3706 LearningRate 0.0290 Epoch: 9 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:22,797-Speed 3392.48 samples/sec Loss 4.4327 LearningRate 0.0290 Epoch: 9 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:25,818-Speed 3390.16 samples/sec Loss 4.2886 LearningRate 0.0290 Epoch: 9 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:28,831-Speed 3399.16 samples/sec Loss 4.4183 LearningRate 0.0290 Epoch: 9 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:31,857-Speed 3384.64 samples/sec Loss 4.4325 LearningRate 0.0290 Epoch: 9 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:52:34,875-Speed 3394.52 samples/sec Loss 4.3549 LearningRate 0.0290 Epoch: 9 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:37,887-Speed 3399.58 samples/sec Loss 4.4503 LearningRate 0.0290 Epoch: 9 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:40,905-Speed 3394.04 samples/sec Loss 4.3763 LearningRate 0.0290 Epoch: 9 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:43,925-Speed 3391.86 samples/sec Loss 4.3093 LearningRate 0.0290 Epoch: 9 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:46,954-Speed 3381.57 samples/sec Loss 4.2112 LearningRate 0.0290 Epoch: 9 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:49,980-Speed 3384.69 samples/sec Loss 4.3244 LearningRate 0.0289 Epoch: 9 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:53,001-Speed 3390.51 samples/sec Loss 4.2966 LearningRate 0.0289 Epoch: 9 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:56,024-Speed 3388.05 samples/sec Loss 4.4148 LearningRate 0.0289 Epoch: 9 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:52:59,075-Speed 3356.30 samples/sec Loss 4.3399 LearningRate 0.0289 Epoch: 9 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:53:02,076-Speed 3414.06 samples/sec Loss 4.5215 LearningRate 0.0289 Epoch: 9 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:05,130-Speed 3353.92 samples/sec Loss 4.3615 LearningRate 0.0289 Epoch: 9 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:08,149-Speed 3391.83 samples/sec Loss 4.3311 LearningRate 0.0289 Epoch: 9 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:11,168-Speed 3392.60 samples/sec Loss 4.4031 LearningRate 0.0289 Epoch: 9 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:14,186-Speed 3394.55 samples/sec Loss 4.3658 LearningRate 0.0289 Epoch: 9 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:17,206-Speed 3391.72 samples/sec Loss 4.3588 LearningRate 0.0289 Epoch: 9 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:20,225-Speed 3392.78 samples/sec Loss 4.3728 LearningRate 0.0288 Epoch: 9 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:23,247-Speed 3388.20 samples/sec Loss 4.2872 LearningRate 0.0288 Epoch: 9 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:26,295-Speed 3360.80 samples/sec Loss 4.2443 LearningRate 0.0288 Epoch: 9 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:29,310-Speed 3397.07 samples/sec Loss 4.4560 LearningRate 0.0288 Epoch: 9 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:32,325-Speed 3398.05 samples/sec Loss 4.3554 LearningRate 0.0288 Epoch: 9 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:53:35,343-Speed 3393.76 samples/sec Loss 4.4101 LearningRate 0.0288 Epoch: 9 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:53:38,358-Speed 3396.71 samples/sec Loss 4.2598 LearningRate 0.0288 Epoch: 9 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:53:41,375-Speed 3394.68 samples/sec Loss 4.4112 LearningRate 0.0288 Epoch: 9 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:53:44,375-Speed 3414.25 samples/sec Loss 4.4563 LearningRate 0.0288 Epoch: 9 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:47,392-Speed 3394.94 samples/sec Loss 4.3732 LearningRate 0.0288 Epoch: 9 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:50,443-Speed 3356.99 samples/sec Loss 4.3742 LearningRate 0.0288 Epoch: 9 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:53:53,454-Speed 3401.43 samples/sec Loss 4.4620 LearningRate 0.0287 Epoch: 9 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:53:56,473-Speed 3393.18 samples/sec Loss 4.2995 LearningRate 0.0287 Epoch: 9 Global Step: 52760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:53:59,506-Speed 3376.93 samples/sec Loss 4.2298 LearningRate 0.0287 Epoch: 9 Global Step: 52770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:02,529-Speed 3388.35 samples/sec Loss 4.4234 LearningRate 0.0287 Epoch: 9 Global Step: 52780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:05,547-Speed 3393.63 samples/sec Loss 4.4280 LearningRate 0.0287 Epoch: 9 Global Step: 52790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:08,565-Speed 3393.55 samples/sec Loss 4.3669 LearningRate 0.0287 Epoch: 9 Global Step: 52800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:11,602-Speed 3373.22 samples/sec Loss 4.4404 LearningRate 0.0287 Epoch: 9 Global Step: 52810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:14,656-Speed 3353.24 samples/sec Loss 4.2111 LearningRate 0.0287 Epoch: 9 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:17,676-Speed 3391.47 samples/sec Loss 4.3757 LearningRate 0.0287 Epoch: 9 Global Step: 52830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:20,693-Speed 3395.79 samples/sec Loss 4.3875 LearningRate 0.0287 Epoch: 9 Global Step: 52840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:54:23,714-Speed 3389.58 samples/sec Loss 4.3108 LearningRate 0.0287 Epoch: 9 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:26,733-Speed 3393.11 samples/sec Loss 4.3022 LearningRate 0.0286 Epoch: 9 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:29,751-Speed 3393.99 samples/sec Loss 4.3955 LearningRate 0.0286 Epoch: 9 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:32,774-Speed 3387.47 samples/sec Loss 4.3262 LearningRate 0.0286 Epoch: 9 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:35,799-Speed 3386.01 samples/sec Loss 4.2047 LearningRate 0.0286 Epoch: 9 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:38,829-Speed 3381.13 samples/sec Loss 4.3353 LearningRate 0.0286 Epoch: 9 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:41,855-Speed 3384.47 samples/sec Loss 4.5362 LearningRate 0.0286 Epoch: 9 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:44,885-Speed 3380.16 samples/sec Loss 4.3583 LearningRate 0.0286 Epoch: 9 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:47,928-Speed 3365.44 samples/sec Loss 4.3489 LearningRate 0.0286 Epoch: 9 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:50,949-Speed 3391.17 samples/sec Loss 4.3882 LearningRate 0.0286 Epoch: 9 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:53,955-Speed 3406.95 samples/sec Loss 4.3880 LearningRate 0.0286 Epoch: 9 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:54:56,987-Speed 3378.07 samples/sec Loss 4.3780 LearningRate 0.0285 Epoch: 9 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:00,025-Speed 3372.44 samples/sec Loss 4.3893 LearningRate 0.0285 Epoch: 9 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:03,052-Speed 3383.84 samples/sec Loss 4.3330 LearningRate 0.0285 Epoch: 9 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:06,073-Speed 3389.83 samples/sec Loss 4.3230 LearningRate 0.0285 Epoch: 9 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:09,094-Speed 3390.43 samples/sec Loss 4.3250 LearningRate 0.0285 Epoch: 9 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:12,116-Speed 3389.20 samples/sec Loss 4.4346 LearningRate 0.0285 Epoch: 9 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:15,141-Speed 3386.23 samples/sec Loss 4.3854 LearningRate 0.0285 Epoch: 9 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:18,160-Speed 3392.16 samples/sec Loss 4.3507 LearningRate 0.0285 Epoch: 9 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:21,179-Speed 3393.57 samples/sec Loss 4.3689 LearningRate 0.0285 Epoch: 9 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:24,218-Speed 3370.19 samples/sec Loss 4.3781 LearningRate 0.0285 Epoch: 9 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:55:27,228-Speed 3403.06 samples/sec Loss 4.3407 LearningRate 0.0285 Epoch: 9 Global Step: 53060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:55:30,233-Speed 3407.89 samples/sec Loss 4.3553 LearningRate 0.0284 Epoch: 9 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:33,254-Speed 3390.81 samples/sec Loss 4.2953 LearningRate 0.0284 Epoch: 9 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:36,280-Speed 3384.80 samples/sec Loss 4.3938 LearningRate 0.0284 Epoch: 9 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:39,302-Speed 3388.31 samples/sec Loss 4.6086 LearningRate 0.0284 Epoch: 9 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:42,328-Speed 3385.05 samples/sec Loss 4.3966 LearningRate 0.0284 Epoch: 9 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:45,351-Speed 3388.84 samples/sec Loss 4.2987 LearningRate 0.0284 Epoch: 9 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:48,383-Speed 3378.23 samples/sec Loss 4.3395 LearningRate 0.0284 Epoch: 9 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:51,414-Speed 3378.62 samples/sec Loss 4.2881 LearningRate 0.0284 Epoch: 9 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:54,446-Speed 3378.22 samples/sec Loss 4.3306 LearningRate 0.0284 Epoch: 9 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:55:57,488-Speed 3367.04 samples/sec Loss 4.4146 LearningRate 0.0284 Epoch: 9 Global Step: 53160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 06:56:00,545-Speed 3350.60 samples/sec Loss 4.3530 LearningRate 0.0284 Epoch: 9 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:03,570-Speed 3386.26 samples/sec Loss 4.3009 LearningRate 0.0283 Epoch: 9 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:06,597-Speed 3383.46 samples/sec Loss 4.3934 LearningRate 0.0283 Epoch: 9 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:09,632-Speed 3375.38 samples/sec Loss 4.3763 LearningRate 0.0283 Epoch: 9 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:12,704-Speed 3334.44 samples/sec Loss 4.3837 LearningRate 0.0283 Epoch: 9 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:15,732-Speed 3382.20 samples/sec Loss 4.4175 LearningRate 0.0283 Epoch: 9 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:18,757-Speed 3386.23 samples/sec Loss 4.2644 LearningRate 0.0283 Epoch: 9 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:21,784-Speed 3382.71 samples/sec Loss 4.3891 LearningRate 0.0283 Epoch: 9 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:24,812-Speed 3382.84 samples/sec Loss 4.4065 LearningRate 0.0283 Epoch: 9 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:27,835-Speed 3388.79 samples/sec Loss 4.4310 LearningRate 0.0283 Epoch: 9 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:30,869-Speed 3375.63 samples/sec Loss 4.4201 LearningRate 0.0283 Epoch: 9 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:56:33,899-Speed 3379.99 samples/sec Loss 4.2845 LearningRate 0.0282 Epoch: 9 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:56:36,907-Speed 3405.49 samples/sec Loss 4.3841 LearningRate 0.0282 Epoch: 9 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:39,928-Speed 3390.87 samples/sec Loss 4.3909 LearningRate 0.0282 Epoch: 9 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:42,949-Speed 3389.88 samples/sec Loss 4.2943 LearningRate 0.0282 Epoch: 9 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:45,992-Speed 3365.88 samples/sec Loss 4.3961 LearningRate 0.0282 Epoch: 9 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:49,016-Speed 3387.35 samples/sec Loss 4.3702 LearningRate 0.0282 Epoch: 9 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:52,044-Speed 3382.32 samples/sec Loss 4.4420 LearningRate 0.0282 Epoch: 9 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:55,069-Speed 3386.84 samples/sec Loss 4.2676 LearningRate 0.0282 Epoch: 9 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:56:58,098-Speed 3381.02 samples/sec Loss 4.3537 LearningRate 0.0282 Epoch: 9 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:01,123-Speed 3386.15 samples/sec Loss 4.3899 LearningRate 0.0282 Epoch: 9 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:04,158-Speed 3373.95 samples/sec Loss 4.2935 LearningRate 0.0282 Epoch: 9 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:07,167-Speed 3404.75 samples/sec Loss 4.3133 LearningRate 0.0281 Epoch: 9 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:10,199-Speed 3377.60 samples/sec Loss 4.2814 LearningRate 0.0281 Epoch: 9 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:13,349-Speed 3251.84 samples/sec Loss 4.3382 LearningRate 0.0281 Epoch: 9 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:16,450-Speed 3303.24 samples/sec Loss 4.4204 LearningRate 0.0281 Epoch: 9 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:19,480-Speed 3379.43 samples/sec Loss 4.3067 LearningRate 0.0281 Epoch: 9 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:22,507-Speed 3384.10 samples/sec Loss 4.3478 LearningRate 0.0281 Epoch: 9 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:25,565-Speed 3349.78 samples/sec Loss 4.4179 LearningRate 0.0281 Epoch: 9 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:28,608-Speed 3365.24 samples/sec Loss 4.2670 LearningRate 0.0281 Epoch: 9 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:31,639-Speed 3379.46 samples/sec Loss 4.3672 LearningRate 0.0281 Epoch: 9 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:34,660-Speed 3390.22 samples/sec Loss 4.3792 LearningRate 0.0281 Epoch: 9 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:57:37,691-Speed 3379.45 samples/sec Loss 4.2931 LearningRate 0.0281 Epoch: 9 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:40,713-Speed 3389.44 samples/sec Loss 4.4596 LearningRate 0.0280 Epoch: 9 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:43,753-Speed 3370.80 samples/sec Loss 4.2959 LearningRate 0.0280 Epoch: 9 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:46,799-Speed 3362.84 samples/sec Loss 4.3929 LearningRate 0.0280 Epoch: 9 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:49,841-Speed 3366.03 samples/sec Loss 4.3248 LearningRate 0.0280 Epoch: 9 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:52,895-Speed 3353.81 samples/sec Loss 4.3701 LearningRate 0.0280 Epoch: 9 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:55,918-Speed 3388.38 samples/sec Loss 4.3664 LearningRate 0.0280 Epoch: 9 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:57:58,947-Speed 3381.58 samples/sec Loss 4.3561 LearningRate 0.0280 Epoch: 9 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:01,979-Speed 3378.58 samples/sec Loss 4.2904 LearningRate 0.0280 Epoch: 9 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:04,988-Speed 3404.39 samples/sec Loss 4.3477 LearningRate 0.0280 Epoch: 9 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:08,061-Speed 3332.51 samples/sec Loss 4.3104 LearningRate 0.0280 Epoch: 9 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:11,084-Speed 3388.05 samples/sec Loss 4.4070 LearningRate 0.0279 Epoch: 9 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:14,111-Speed 3383.79 samples/sec Loss 4.3763 LearningRate 0.0279 Epoch: 9 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:17,141-Speed 3380.18 samples/sec Loss 4.2558 LearningRate 0.0279 Epoch: 9 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:20,177-Speed 3374.32 samples/sec Loss 4.2618 LearningRate 0.0279 Epoch: 9 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:23,203-Speed 3384.11 samples/sec Loss 4.4127 LearningRate 0.0279 Epoch: 9 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:26,236-Speed 3376.65 samples/sec Loss 4.3034 LearningRate 0.0279 Epoch: 9 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:29,262-Speed 3385.88 samples/sec Loss 4.4528 LearningRate 0.0279 Epoch: 9 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:32,289-Speed 3385.05 samples/sec Loss 4.2986 LearningRate 0.0279 Epoch: 9 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:58:35,318-Speed 3381.17 samples/sec Loss 4.3308 LearningRate 0.0279 Epoch: 9 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:38,346-Speed 3383.16 samples/sec Loss 4.3597 LearningRate 0.0279 Epoch: 9 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:41,368-Speed 3389.25 samples/sec Loss 4.3236 LearningRate 0.0279 Epoch: 9 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:44,390-Speed 3388.67 samples/sec Loss 4.3179 LearningRate 0.0278 Epoch: 9 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:47,434-Speed 3364.80 samples/sec Loss 4.4990 LearningRate 0.0278 Epoch: 9 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:50,459-Speed 3386.02 samples/sec Loss 4.3893 LearningRate 0.0278 Epoch: 9 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:53,483-Speed 3387.43 samples/sec Loss 4.3631 LearningRate 0.0278 Epoch: 9 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:56,507-Speed 3386.90 samples/sec Loss 4.2816 LearningRate 0.0278 Epoch: 9 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:58:59,536-Speed 3381.03 samples/sec Loss 4.4005 LearningRate 0.0278 Epoch: 9 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:59:02,576-Speed 3369.28 samples/sec Loss 4.3501 LearningRate 0.0278 Epoch: 9 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:59:05,578-Speed 3411.94 samples/sec Loss 4.3953 LearningRate 0.0278 Epoch: 9 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:59:08,610-Speed 3378.64 samples/sec Loss 4.3797 LearningRate 0.0278 Epoch: 9 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:59:11,617-Speed 3405.62 samples/sec Loss 4.4130 LearningRate 0.0278 Epoch: 9 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:14,641-Speed 3387.22 samples/sec Loss 4.3759 LearningRate 0.0278 Epoch: 9 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:17,666-Speed 3385.53 samples/sec Loss 4.4386 LearningRate 0.0277 Epoch: 9 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:20,712-Speed 3362.73 samples/sec Loss 4.3209 LearningRate 0.0277 Epoch: 9 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:23,775-Speed 3344.10 samples/sec Loss 4.2804 LearningRate 0.0277 Epoch: 9 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:26,807-Speed 3378.28 samples/sec Loss 4.4310 LearningRate 0.0277 Epoch: 9 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:29,839-Speed 3378.43 samples/sec Loss 4.4246 LearningRate 0.0277 Epoch: 9 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:32,863-Speed 3387.12 samples/sec Loss 4.3882 LearningRate 0.0277 Epoch: 9 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:35,888-Speed 3385.95 samples/sec Loss 4.2767 LearningRate 0.0277 Epoch: 9 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:38,934-Speed 3362.71 samples/sec Loss 4.3118 LearningRate 0.0277 Epoch: 9 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:41,958-Speed 3386.26 samples/sec Loss 4.3524 LearningRate 0.0277 Epoch: 9 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 06:59:44,970-Speed 3401.23 samples/sec Loss 4.3272 LearningRate 0.0277 Epoch: 9 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:48,005-Speed 3374.65 samples/sec Loss 4.3962 LearningRate 0.0277 Epoch: 9 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:51,060-Speed 3352.70 samples/sec Loss 4.3802 LearningRate 0.0276 Epoch: 9 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:54,085-Speed 3386.47 samples/sec Loss 4.2410 LearningRate 0.0276 Epoch: 9 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 06:59:57,104-Speed 3392.12 samples/sec Loss 4.3804 LearningRate 0.0276 Epoch: 9 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:00,129-Speed 3386.30 samples/sec Loss 4.2374 LearningRate 0.0276 Epoch: 9 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:03,158-Speed 3381.55 samples/sec Loss 4.3346 LearningRate 0.0276 Epoch: 9 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:06,180-Speed 3388.62 samples/sec Loss 4.2922 LearningRate 0.0276 Epoch: 9 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:09,204-Speed 3387.64 samples/sec Loss 4.4302 LearningRate 0.0276 Epoch: 9 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:12,246-Speed 3367.27 samples/sec Loss 4.1646 LearningRate 0.0276 Epoch: 9 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:00:55,857-[lfw][54000]XNorm: 21.680687 Training: 2022-04-27 07:00:55,857-[lfw][54000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 07:00:55,858-[lfw][54000]Accuracy-Highest: 0.99817 Training: 2022-04-27 07:01:46,613-[cfp_fp][54000]XNorm: 19.846197 Training: 2022-04-27 07:01:46,614-[cfp_fp][54000]Accuracy-Flip: 0.96171+-0.00805 Training: 2022-04-27 07:01:46,614-[cfp_fp][54000]Accuracy-Highest: 0.96414 Training: 2022-04-27 07:02:30,355-[agedb_30][54000]XNorm: 22.034334 Training: 2022-04-27 07:02:30,356-[agedb_30][54000]Accuracy-Flip: 0.97667+-0.00957 Training: 2022-04-27 07:02:30,356-[agedb_30][54000]Accuracy-Highest: 0.97767 Training: 2022-04-27 07:02:33,380-Speed 72.56 samples/sec Loss 4.2218 LearningRate 0.0276 Epoch: 9 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:36,453-Speed 3332.90 samples/sec Loss 4.3346 LearningRate 0.0276 Epoch: 9 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:39,461-Speed 3406.59 samples/sec Loss 4.3016 LearningRate 0.0276 Epoch: 9 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:42,469-Speed 3404.17 samples/sec Loss 4.3237 LearningRate 0.0275 Epoch: 9 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:45,480-Speed 3401.60 samples/sec Loss 4.3612 LearningRate 0.0275 Epoch: 9 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:48,494-Speed 3398.34 samples/sec Loss 4.3521 LearningRate 0.0275 Epoch: 9 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:02:51,496-Speed 3412.16 samples/sec Loss 4.3628 LearningRate 0.0275 Epoch: 9 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:02:54,511-Speed 3396.80 samples/sec Loss 4.2999 LearningRate 0.0275 Epoch: 9 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:02:57,525-Speed 3398.26 samples/sec Loss 4.1906 LearningRate 0.0275 Epoch: 9 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:00,570-Speed 3363.68 samples/sec Loss 4.4074 LearningRate 0.0275 Epoch: 9 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:03,596-Speed 3384.62 samples/sec Loss 4.3339 LearningRate 0.0275 Epoch: 9 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:06,616-Speed 3392.02 samples/sec Loss 4.2942 LearningRate 0.0275 Epoch: 9 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:09,638-Speed 3389.49 samples/sec Loss 4.3163 LearningRate 0.0275 Epoch: 9 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:12,716-Speed 3327.66 samples/sec Loss 4.3055 LearningRate 0.0274 Epoch: 9 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:15,742-Speed 3384.36 samples/sec Loss 4.2419 LearningRate 0.0274 Epoch: 9 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:18,766-Speed 3386.58 samples/sec Loss 4.5051 LearningRate 0.0274 Epoch: 9 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:21,772-Speed 3408.31 samples/sec Loss 4.1993 LearningRate 0.0274 Epoch: 9 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:24,797-Speed 3385.52 samples/sec Loss 4.3109 LearningRate 0.0274 Epoch: 9 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:27,817-Speed 3391.70 samples/sec Loss 4.3283 LearningRate 0.0274 Epoch: 9 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:30,848-Speed 3379.35 samples/sec Loss 4.3771 LearningRate 0.0274 Epoch: 9 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:33,866-Speed 3394.41 samples/sec Loss 4.3270 LearningRate 0.0274 Epoch: 9 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:36,893-Speed 3383.53 samples/sec Loss 4.3254 LearningRate 0.0274 Epoch: 9 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:39,908-Speed 3397.09 samples/sec Loss 4.2814 LearningRate 0.0274 Epoch: 9 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:42,923-Speed 3396.81 samples/sec Loss 4.4420 LearningRate 0.0274 Epoch: 9 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:45,935-Speed 3400.44 samples/sec Loss 4.3677 LearningRate 0.0273 Epoch: 9 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:48,977-Speed 3366.50 samples/sec Loss 4.3619 LearningRate 0.0273 Epoch: 9 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:03:52,080-Speed 3301.01 samples/sec Loss 4.3800 LearningRate 0.0273 Epoch: 9 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:03:55,093-Speed 3399.79 samples/sec Loss 4.3213 LearningRate 0.0273 Epoch: 9 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:03:58,113-Speed 3391.30 samples/sec Loss 4.2484 LearningRate 0.0273 Epoch: 9 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:01,127-Speed 3399.21 samples/sec Loss 4.4379 LearningRate 0.0273 Epoch: 9 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:04,138-Speed 3401.26 samples/sec Loss 4.3128 LearningRate 0.0273 Epoch: 9 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:07,146-Speed 3405.49 samples/sec Loss 4.2982 LearningRate 0.0273 Epoch: 9 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:10,156-Speed 3402.03 samples/sec Loss 4.4614 LearningRate 0.0273 Epoch: 9 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:13,176-Speed 3392.24 samples/sec Loss 4.3644 LearningRate 0.0273 Epoch: 9 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:16,188-Speed 3400.61 samples/sec Loss 4.3456 LearningRate 0.0273 Epoch: 9 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:19,204-Speed 3394.94 samples/sec Loss 4.2833 LearningRate 0.0272 Epoch: 9 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:22,214-Speed 3402.91 samples/sec Loss 4.3343 LearningRate 0.0272 Epoch: 9 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:25,211-Speed 3417.53 samples/sec Loss 4.2058 LearningRate 0.0272 Epoch: 9 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:28,226-Speed 3397.56 samples/sec Loss 4.2919 LearningRate 0.0272 Epoch: 9 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:31,245-Speed 3392.78 samples/sec Loss 4.2817 LearningRate 0.0272 Epoch: 9 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:34,255-Speed 3402.90 samples/sec Loss 4.4432 LearningRate 0.0272 Epoch: 9 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:37,265-Speed 3402.64 samples/sec Loss 4.2983 LearningRate 0.0272 Epoch: 9 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:40,278-Speed 3399.74 samples/sec Loss 4.3382 LearningRate 0.0272 Epoch: 9 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:43,287-Speed 3403.08 samples/sec Loss 4.2588 LearningRate 0.0272 Epoch: 9 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:46,321-Speed 3375.80 samples/sec Loss 4.3614 LearningRate 0.0272 Epoch: 9 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:49,340-Speed 3393.48 samples/sec Loss 4.3979 LearningRate 0.0272 Epoch: 9 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:52,356-Speed 3395.51 samples/sec Loss 4.3758 LearningRate 0.0271 Epoch: 9 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:04:55,367-Speed 3402.77 samples/sec Loss 4.3697 LearningRate 0.0271 Epoch: 9 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:04:58,374-Speed 3405.18 samples/sec Loss 4.3013 LearningRate 0.0271 Epoch: 9 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:01,389-Speed 3397.82 samples/sec Loss 4.3869 LearningRate 0.0271 Epoch: 9 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:04,402-Speed 3399.68 samples/sec Loss 4.4337 LearningRate 0.0271 Epoch: 9 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:07,421-Speed 3392.33 samples/sec Loss 4.2639 LearningRate 0.0271 Epoch: 9 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:10,434-Speed 3399.15 samples/sec Loss 4.3624 LearningRate 0.0271 Epoch: 9 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:13,450-Speed 3396.30 samples/sec Loss 4.3707 LearningRate 0.0271 Epoch: 9 Global Step: 54540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:16,463-Speed 3399.06 samples/sec Loss 4.3165 LearningRate 0.0271 Epoch: 9 Global Step: 54550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:19,482-Speed 3393.12 samples/sec Loss 4.2707 LearningRate 0.0271 Epoch: 9 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:22,494-Speed 3400.21 samples/sec Loss 4.2454 LearningRate 0.0271 Epoch: 9 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:05:25,497-Speed 3411.02 samples/sec Loss 4.2217 LearningRate 0.0270 Epoch: 9 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:28,570-Speed 3332.89 samples/sec Loss 4.3156 LearningRate 0.0270 Epoch: 9 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:31,584-Speed 3398.84 samples/sec Loss 4.4154 LearningRate 0.0270 Epoch: 9 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:34,608-Speed 3386.52 samples/sec Loss 4.3240 LearningRate 0.0270 Epoch: 9 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:37,624-Speed 3396.19 samples/sec Loss 4.3467 LearningRate 0.0270 Epoch: 9 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:40,640-Speed 3395.04 samples/sec Loss 4.1863 LearningRate 0.0270 Epoch: 9 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:05:43,638-Speed 3416.86 samples/sec Loss 4.2313 LearningRate 0.0270 Epoch: 9 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:05:46,682-Speed 3364.53 samples/sec Loss 4.2856 LearningRate 0.0270 Epoch: 9 Global Step: 54650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:05:49,697-Speed 3397.51 samples/sec Loss 4.3286 LearningRate 0.0270 Epoch: 9 Global Step: 54660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:05:52,711-Speed 3397.98 samples/sec Loss 4.3271 LearningRate 0.0270 Epoch: 9 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:05:55,740-Speed 3382.73 samples/sec Loss 4.3501 LearningRate 0.0270 Epoch: 9 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:05:58,758-Speed 3393.78 samples/sec Loss 4.3860 LearningRate 0.0269 Epoch: 9 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:06:01,776-Speed 3393.06 samples/sec Loss 4.3512 LearningRate 0.0269 Epoch: 9 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:06:04,794-Speed 3393.83 samples/sec Loss 4.3096 LearningRate 0.0269 Epoch: 9 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:06:07,814-Speed 3391.90 samples/sec Loss 4.1899 LearningRate 0.0269 Epoch: 9 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:06:10,839-Speed 3385.27 samples/sec Loss 4.2933 LearningRate 0.0269 Epoch: 9 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:06:13,858-Speed 3393.27 samples/sec Loss 4.3062 LearningRate 0.0269 Epoch: 9 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:16,872-Speed 3398.74 samples/sec Loss 4.4593 LearningRate 0.0269 Epoch: 9 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:19,890-Speed 3394.06 samples/sec Loss 4.2972 LearningRate 0.0269 Epoch: 9 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:22,905-Speed 3396.61 samples/sec Loss 4.3656 LearningRate 0.0269 Epoch: 9 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:25,930-Speed 3385.83 samples/sec Loss 4.2560 LearningRate 0.0269 Epoch: 9 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:28,946-Speed 3396.63 samples/sec Loss 4.2182 LearningRate 0.0269 Epoch: 9 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:31,963-Speed 3394.67 samples/sec Loss 4.3718 LearningRate 0.0268 Epoch: 9 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:34,991-Speed 3382.93 samples/sec Loss 4.2160 LearningRate 0.0268 Epoch: 9 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:38,008-Speed 3394.70 samples/sec Loss 4.2980 LearningRate 0.0268 Epoch: 9 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:41,027-Speed 3392.94 samples/sec Loss 4.3150 LearningRate 0.0268 Epoch: 9 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:44,051-Speed 3387.28 samples/sec Loss 4.2575 LearningRate 0.0268 Epoch: 9 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:06:47,053-Speed 3411.92 samples/sec Loss 4.1367 LearningRate 0.0268 Epoch: 9 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:50,070-Speed 3394.36 samples/sec Loss 4.2408 LearningRate 0.0268 Epoch: 9 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:53,087-Speed 3394.60 samples/sec Loss 4.3069 LearningRate 0.0268 Epoch: 9 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:56,105-Speed 3393.92 samples/sec Loss 4.4732 LearningRate 0.0268 Epoch: 9 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:06:59,121-Speed 3396.01 samples/sec Loss 4.2248 LearningRate 0.0268 Epoch: 9 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:02,150-Speed 3382.12 samples/sec Loss 4.4146 LearningRate 0.0268 Epoch: 9 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:05,179-Speed 3380.97 samples/sec Loss 4.3455 LearningRate 0.0267 Epoch: 9 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:08,198-Speed 3392.50 samples/sec Loss 4.2570 LearningRate 0.0267 Epoch: 9 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:11,225-Speed 3383.90 samples/sec Loss 4.2080 LearningRate 0.0267 Epoch: 9 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:14,241-Speed 3395.87 samples/sec Loss 4.2680 LearningRate 0.0267 Epoch: 9 Global Step: 54940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:17,242-Speed 3413.87 samples/sec Loss 4.3176 LearningRate 0.0267 Epoch: 9 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:20,268-Speed 3384.75 samples/sec Loss 4.2830 LearningRate 0.0267 Epoch: 9 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:23,294-Speed 3384.63 samples/sec Loss 4.2258 LearningRate 0.0267 Epoch: 9 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:26,312-Speed 3392.98 samples/sec Loss 4.2287 LearningRate 0.0267 Epoch: 9 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:29,332-Speed 3393.46 samples/sec Loss 4.3132 LearningRate 0.0267 Epoch: 9 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:32,355-Speed 3388.24 samples/sec Loss 4.3151 LearningRate 0.0267 Epoch: 9 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:35,384-Speed 3381.66 samples/sec Loss 4.2482 LearningRate 0.0267 Epoch: 9 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:38,401-Speed 3394.07 samples/sec Loss 4.2392 LearningRate 0.0266 Epoch: 9 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:41,424-Speed 3388.65 samples/sec Loss 4.2233 LearningRate 0.0266 Epoch: 9 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:44,437-Speed 3399.41 samples/sec Loss 4.3369 LearningRate 0.0266 Epoch: 9 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:07:47,475-Speed 3371.46 samples/sec Loss 4.2513 LearningRate 0.0266 Epoch: 9 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:07:50,498-Speed 3388.45 samples/sec Loss 4.2596 LearningRate 0.0266 Epoch: 9 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:07:53,541-Speed 3365.35 samples/sec Loss 4.3361 LearningRate 0.0266 Epoch: 9 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:07:56,562-Speed 3390.45 samples/sec Loss 4.3442 LearningRate 0.0266 Epoch: 9 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:07:59,584-Speed 3389.87 samples/sec Loss 4.2456 LearningRate 0.0266 Epoch: 9 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:02,601-Speed 3394.80 samples/sec Loss 4.2497 LearningRate 0.0266 Epoch: 9 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:05,622-Speed 3389.73 samples/sec Loss 4.2890 LearningRate 0.0266 Epoch: 9 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:08,640-Speed 3394.62 samples/sec Loss 4.3624 LearningRate 0.0266 Epoch: 9 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:11,679-Speed 3369.68 samples/sec Loss 4.3754 LearningRate 0.0265 Epoch: 9 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:14,698-Speed 3392.79 samples/sec Loss 4.2531 LearningRate 0.0265 Epoch: 9 Global Step: 55140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:17,701-Speed 3411.14 samples/sec Loss 4.3536 LearningRate 0.0265 Epoch: 9 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:20,723-Speed 3389.30 samples/sec Loss 4.3402 LearningRate 0.0265 Epoch: 9 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:23,743-Speed 3390.73 samples/sec Loss 4.4419 LearningRate 0.0265 Epoch: 9 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:27,694-Speed 2592.38 samples/sec Loss 4.2562 LearningRate 0.0265 Epoch: 9 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:30,721-Speed 3388.26 samples/sec Loss 4.3396 LearningRate 0.0265 Epoch: 9 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:33,745-Speed 3386.81 samples/sec Loss 4.1546 LearningRate 0.0265 Epoch: 9 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:36,822-Speed 3328.89 samples/sec Loss 4.2811 LearningRate 0.0265 Epoch: 9 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:39,862-Speed 3368.85 samples/sec Loss 4.2222 LearningRate 0.0265 Epoch: 9 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:42,890-Speed 3382.90 samples/sec Loss 4.2332 LearningRate 0.0265 Epoch: 9 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:45,914-Speed 3387.28 samples/sec Loss 4.2173 LearningRate 0.0264 Epoch: 9 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:48,936-Speed 3389.19 samples/sec Loss 4.2598 LearningRate 0.0264 Epoch: 9 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:08:51,947-Speed 3401.87 samples/sec Loss 4.1608 LearningRate 0.0264 Epoch: 9 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:54,968-Speed 3389.81 samples/sec Loss 4.3696 LearningRate 0.0264 Epoch: 9 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:08:58,047-Speed 3326.63 samples/sec Loss 4.2030 LearningRate 0.0264 Epoch: 9 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:01,102-Speed 3352.45 samples/sec Loss 4.1526 LearningRate 0.0264 Epoch: 9 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:04,110-Speed 3405.27 samples/sec Loss 4.2407 LearningRate 0.0264 Epoch: 9 Global Step: 55300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:07,135-Speed 3386.24 samples/sec Loss 4.3148 LearningRate 0.0264 Epoch: 9 Global Step: 55310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:10,154-Speed 3392.51 samples/sec Loss 4.1949 LearningRate 0.0264 Epoch: 9 Global Step: 55320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:13,187-Speed 3376.90 samples/sec Loss 4.2433 LearningRate 0.0264 Epoch: 9 Global Step: 55330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:16,207-Speed 3391.60 samples/sec Loss 4.3172 LearningRate 0.0264 Epoch: 9 Global Step: 55340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:19,228-Speed 3390.26 samples/sec Loss 4.5410 LearningRate 0.0263 Epoch: 9 Global Step: 55350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:22,245-Speed 3395.13 samples/sec Loss 4.3292 LearningRate 0.0263 Epoch: 9 Global Step: 55360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:25,265-Speed 3391.95 samples/sec Loss 4.3261 LearningRate 0.0263 Epoch: 9 Global Step: 55370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:28,283-Speed 3393.34 samples/sec Loss 4.2334 LearningRate 0.0263 Epoch: 9 Global Step: 55380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:31,302-Speed 3393.22 samples/sec Loss 4.4073 LearningRate 0.0263 Epoch: 9 Global Step: 55390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:09:34,330-Speed 3382.11 samples/sec Loss 4.2172 LearningRate 0.0263 Epoch: 9 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:37,353-Speed 3388.61 samples/sec Loss 4.2347 LearningRate 0.0263 Epoch: 9 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:40,370-Speed 3394.77 samples/sec Loss 4.2259 LearningRate 0.0263 Epoch: 9 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:43,388-Speed 3393.08 samples/sec Loss 4.2778 LearningRate 0.0263 Epoch: 9 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:46,418-Speed 3381.04 samples/sec Loss 4.4070 LearningRate 0.0263 Epoch: 9 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:49,438-Speed 3390.97 samples/sec Loss 4.1997 LearningRate 0.0263 Epoch: 9 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:52,462-Speed 3388.49 samples/sec Loss 4.2478 LearningRate 0.0262 Epoch: 9 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:55,482-Speed 3390.59 samples/sec Loss 4.1931 LearningRate 0.0262 Epoch: 9 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:09:58,506-Speed 3387.74 samples/sec Loss 4.1862 LearningRate 0.0262 Epoch: 9 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:01,527-Speed 3389.54 samples/sec Loss 4.3900 LearningRate 0.0262 Epoch: 9 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:04,549-Speed 3390.21 samples/sec Loss 4.2472 LearningRate 0.0262 Epoch: 9 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:07,568-Speed 3391.78 samples/sec Loss 4.3728 LearningRate 0.0262 Epoch: 9 Global Step: 55510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:10,589-Speed 3391.19 samples/sec Loss 4.2294 LearningRate 0.0262 Epoch: 9 Global Step: 55520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:13,611-Speed 3388.65 samples/sec Loss 4.2370 LearningRate 0.0262 Epoch: 9 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:16,630-Speed 3393.34 samples/sec Loss 4.2422 LearningRate 0.0262 Epoch: 9 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:19,645-Speed 3397.14 samples/sec Loss 4.2133 LearningRate 0.0262 Epoch: 9 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:22,665-Speed 3391.03 samples/sec Loss 4.2878 LearningRate 0.0262 Epoch: 9 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:25,687-Speed 3389.32 samples/sec Loss 4.3010 LearningRate 0.0261 Epoch: 9 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:28,708-Speed 3390.15 samples/sec Loss 4.2868 LearningRate 0.0261 Epoch: 9 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:31,729-Speed 3390.12 samples/sec Loss 4.3082 LearningRate 0.0261 Epoch: 9 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:34,762-Speed 3377.37 samples/sec Loss 4.1932 LearningRate 0.0261 Epoch: 9 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:37,808-Speed 3362.27 samples/sec Loss 4.2725 LearningRate 0.0261 Epoch: 9 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:40,840-Speed 3379.65 samples/sec Loss 4.3660 LearningRate 0.0261 Epoch: 9 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:43,875-Speed 3374.94 samples/sec Loss 4.2369 LearningRate 0.0261 Epoch: 9 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:46,901-Speed 3384.70 samples/sec Loss 4.2026 LearningRate 0.0261 Epoch: 9 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:10:49,925-Speed 3386.24 samples/sec Loss 4.2348 LearningRate 0.0261 Epoch: 9 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:52,948-Speed 3388.09 samples/sec Loss 4.2045 LearningRate 0.0261 Epoch: 9 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:55,968-Speed 3392.34 samples/sec Loss 4.2598 LearningRate 0.0261 Epoch: 9 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:10:58,997-Speed 3381.19 samples/sec Loss 4.1216 LearningRate 0.0260 Epoch: 9 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:11:02,046-Speed 3359.05 samples/sec Loss 4.2848 LearningRate 0.0260 Epoch: 9 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:05,073-Speed 3384.13 samples/sec Loss 4.1652 LearningRate 0.0260 Epoch: 9 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:08,100-Speed 3383.61 samples/sec Loss 4.2009 LearningRate 0.0260 Epoch: 9 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:11,123-Speed 3388.29 samples/sec Loss 4.2483 LearningRate 0.0260 Epoch: 9 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:14,163-Speed 3369.12 samples/sec Loss 4.1949 LearningRate 0.0260 Epoch: 9 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:17,184-Speed 3390.73 samples/sec Loss 4.0933 LearningRate 0.0260 Epoch: 9 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:20,220-Speed 3372.85 samples/sec Loss 4.2545 LearningRate 0.0260 Epoch: 9 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:23,249-Speed 3382.02 samples/sec Loss 4.2018 LearningRate 0.0260 Epoch: 9 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:26,411-Speed 3238.40 samples/sec Loss 4.2323 LearningRate 0.0260 Epoch: 9 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:29,434-Speed 3388.54 samples/sec Loss 4.2454 LearningRate 0.0260 Epoch: 9 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:32,441-Speed 3405.85 samples/sec Loss 4.2969 LearningRate 0.0259 Epoch: 9 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:35,466-Speed 3385.99 samples/sec Loss 4.2035 LearningRate 0.0259 Epoch: 9 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:38,491-Speed 3386.90 samples/sec Loss 4.2964 LearningRate 0.0259 Epoch: 9 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:41,516-Speed 3386.13 samples/sec Loss 4.2012 LearningRate 0.0259 Epoch: 9 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:44,544-Speed 3381.75 samples/sec Loss 4.3337 LearningRate 0.0259 Epoch: 9 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:47,577-Speed 3377.16 samples/sec Loss 4.3152 LearningRate 0.0259 Epoch: 9 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:50,617-Speed 3369.24 samples/sec Loss 4.3306 LearningRate 0.0259 Epoch: 9 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:53,649-Speed 3379.17 samples/sec Loss 4.3810 LearningRate 0.0259 Epoch: 9 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:56,672-Speed 3387.38 samples/sec Loss 4.2192 LearningRate 0.0259 Epoch: 9 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:11:59,697-Speed 3385.99 samples/sec Loss 4.2443 LearningRate 0.0259 Epoch: 9 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:02,730-Speed 3377.16 samples/sec Loss 4.3484 LearningRate 0.0259 Epoch: 9 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:12:05,767-Speed 3373.22 samples/sec Loss 4.4010 LearningRate 0.0259 Epoch: 9 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:12:08,790-Speed 3388.26 samples/sec Loss 4.2443 LearningRate 0.0258 Epoch: 9 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:12:11,792-Speed 3412.19 samples/sec Loss 4.1650 LearningRate 0.0258 Epoch: 9 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:14,820-Speed 3381.53 samples/sec Loss 4.2602 LearningRate 0.0258 Epoch: 9 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:17,852-Speed 3378.61 samples/sec Loss 4.2007 LearningRate 0.0258 Epoch: 9 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:20,875-Speed 3388.13 samples/sec Loss 4.2805 LearningRate 0.0258 Epoch: 9 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:23,899-Speed 3386.45 samples/sec Loss 4.2498 LearningRate 0.0258 Epoch: 9 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:26,927-Speed 3382.61 samples/sec Loss 4.1429 LearningRate 0.0258 Epoch: 9 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:29,955-Speed 3382.80 samples/sec Loss 4.2260 LearningRate 0.0258 Epoch: 9 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:32,979-Speed 3386.48 samples/sec Loss 4.1779 LearningRate 0.0258 Epoch: 9 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:12:36,010-Speed 3380.45 samples/sec Loss 4.1384 LearningRate 0.0258 Epoch: 9 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:13:19,670-[lfw][56000]XNorm: 21.520633 Training: 2022-04-27 07:13:19,671-[lfw][56000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-04-27 07:13:19,671-[lfw][56000]Accuracy-Highest: 0.99817 Training: 2022-04-27 07:14:10,094-[cfp_fp][56000]XNorm: 19.650348 Training: 2022-04-27 07:14:10,094-[cfp_fp][56000]Accuracy-Flip: 0.96643+-0.00877 Training: 2022-04-27 07:14:10,095-[cfp_fp][56000]Accuracy-Highest: 0.96643 Training: 2022-04-27 07:14:53,410-[agedb_30][56000]XNorm: 21.535365 Training: 2022-04-27 07:14:53,411-[agedb_30][56000]Accuracy-Flip: 0.97483+-0.00758 Training: 2022-04-27 07:14:53,411-[agedb_30][56000]Accuracy-Highest: 0.97767 Training: 2022-04-27 07:14:56,430-Speed 72.92 samples/sec Loss 4.1655 LearningRate 0.0258 Epoch: 9 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:14:59,438-Speed 3404.60 samples/sec Loss 4.3177 LearningRate 0.0257 Epoch: 9 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:15:02,447-Speed 3404.14 samples/sec Loss 4.2868 LearningRate 0.0257 Epoch: 9 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:05,459-Speed 3400.16 samples/sec Loss 4.3269 LearningRate 0.0257 Epoch: 9 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:08,474-Speed 3397.05 samples/sec Loss 4.2151 LearningRate 0.0257 Epoch: 9 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:11,490-Speed 3395.61 samples/sec Loss 4.1578 LearningRate 0.0257 Epoch: 9 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:14,514-Speed 3387.11 samples/sec Loss 4.2625 LearningRate 0.0257 Epoch: 9 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:17,530-Speed 3396.64 samples/sec Loss 4.3349 LearningRate 0.0257 Epoch: 9 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:20,543-Speed 3398.82 samples/sec Loss 4.2505 LearningRate 0.0257 Epoch: 9 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:23,572-Speed 3382.06 samples/sec Loss 4.3015 LearningRate 0.0257 Epoch: 9 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:26,601-Speed 3380.98 samples/sec Loss 4.2353 LearningRate 0.0257 Epoch: 9 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:29,626-Speed 3385.86 samples/sec Loss 4.1429 LearningRate 0.0257 Epoch: 9 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:32,629-Speed 3411.23 samples/sec Loss 4.1938 LearningRate 0.0256 Epoch: 9 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:35,645-Speed 3396.16 samples/sec Loss 4.2319 LearningRate 0.0256 Epoch: 9 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:38,671-Speed 3384.02 samples/sec Loss 4.2353 LearningRate 0.0256 Epoch: 9 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:41,697-Speed 3385.46 samples/sec Loss 4.2183 LearningRate 0.0256 Epoch: 9 Global Step: 56160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:44,720-Speed 3387.66 samples/sec Loss 4.1463 LearningRate 0.0256 Epoch: 9 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:47,760-Speed 3369.19 samples/sec Loss 4.3002 LearningRate 0.0256 Epoch: 9 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:50,796-Speed 3373.50 samples/sec Loss 4.1777 LearningRate 0.0256 Epoch: 9 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:53,829-Speed 3376.96 samples/sec Loss 4.3062 LearningRate 0.0256 Epoch: 9 Global Step: 56200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:56,874-Speed 3364.42 samples/sec Loss 4.2422 LearningRate 0.0256 Epoch: 9 Global Step: 56210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:15:59,931-Speed 3350.52 samples/sec Loss 4.1530 LearningRate 0.0256 Epoch: 9 Global Step: 56220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:02,949-Speed 3393.75 samples/sec Loss 4.2227 LearningRate 0.0256 Epoch: 9 Global Step: 56230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:05,977-Speed 3381.99 samples/sec Loss 4.2943 LearningRate 0.0255 Epoch: 9 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:09,015-Speed 3371.52 samples/sec Loss 4.1586 LearningRate 0.0255 Epoch: 9 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:12,045-Speed 3380.56 samples/sec Loss 4.3724 LearningRate 0.0255 Epoch: 9 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:15,209-Speed 3236.48 samples/sec Loss 4.1935 LearningRate 0.0255 Epoch: 9 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:18,283-Speed 3332.73 samples/sec Loss 4.3450 LearningRate 0.0255 Epoch: 9 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:21,303-Speed 3391.13 samples/sec Loss 4.2210 LearningRate 0.0255 Epoch: 9 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:24,322-Speed 3392.35 samples/sec Loss 4.1472 LearningRate 0.0255 Epoch: 9 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:27,353-Speed 3380.11 samples/sec Loss 4.2186 LearningRate 0.0255 Epoch: 9 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:30,368-Speed 3396.31 samples/sec Loss 4.2245 LearningRate 0.0255 Epoch: 9 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:33,387-Speed 3392.85 samples/sec Loss 4.1886 LearningRate 0.0255 Epoch: 9 Global Step: 56330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:16:36,409-Speed 3389.30 samples/sec Loss 4.3447 LearningRate 0.0255 Epoch: 9 Global Step: 56340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:16:39,425-Speed 3396.02 samples/sec Loss 4.2733 LearningRate 0.0255 Epoch: 9 Global Step: 56350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:16:42,446-Speed 3390.23 samples/sec Loss 4.1211 LearningRate 0.0254 Epoch: 9 Global Step: 56360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:16:45,437-Speed 3425.01 samples/sec Loss 4.1355 LearningRate 0.0254 Epoch: 9 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:48,451-Speed 3398.16 samples/sec Loss 4.1605 LearningRate 0.0254 Epoch: 9 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:51,464-Speed 3399.11 samples/sec Loss 4.3242 LearningRate 0.0254 Epoch: 9 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:54,475-Speed 3401.68 samples/sec Loss 4.1900 LearningRate 0.0254 Epoch: 9 Global Step: 56400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:16:57,489-Speed 3398.55 samples/sec Loss 4.1354 LearningRate 0.0254 Epoch: 9 Global Step: 56410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:00,507-Speed 3395.09 samples/sec Loss 4.2134 LearningRate 0.0254 Epoch: 9 Global Step: 56420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:03,592-Speed 3319.29 samples/sec Loss 4.1850 LearningRate 0.0254 Epoch: 9 Global Step: 56430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:06,614-Speed 3390.33 samples/sec Loss 4.1727 LearningRate 0.0254 Epoch: 9 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:09,631-Speed 3393.78 samples/sec Loss 4.2710 LearningRate 0.0254 Epoch: 9 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:12,641-Speed 3403.40 samples/sec Loss 4.1351 LearningRate 0.0254 Epoch: 9 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:15,635-Speed 3421.28 samples/sec Loss 4.2731 LearningRate 0.0253 Epoch: 9 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:18,643-Speed 3404.89 samples/sec Loss 4.3015 LearningRate 0.0253 Epoch: 9 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:21,650-Speed 3405.92 samples/sec Loss 4.1572 LearningRate 0.0253 Epoch: 9 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:24,666-Speed 3396.32 samples/sec Loss 4.2169 LearningRate 0.0253 Epoch: 9 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:27,702-Speed 3373.23 samples/sec Loss 4.0963 LearningRate 0.0253 Epoch: 9 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:30,712-Speed 3403.29 samples/sec Loss 4.0989 LearningRate 0.0253 Epoch: 9 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:33,719-Speed 3405.80 samples/sec Loss 4.1234 LearningRate 0.0253 Epoch: 9 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:36,735-Speed 3396.57 samples/sec Loss 4.4039 LearningRate 0.0253 Epoch: 9 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:39,749-Speed 3398.61 samples/sec Loss 4.1947 LearningRate 0.0253 Epoch: 9 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:42,758-Speed 3403.42 samples/sec Loss 4.2255 LearningRate 0.0253 Epoch: 9 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:45,753-Speed 3419.14 samples/sec Loss 4.1678 LearningRate 0.0253 Epoch: 9 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:48,770-Speed 3395.84 samples/sec Loss 4.3078 LearningRate 0.0252 Epoch: 9 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:51,785-Speed 3397.17 samples/sec Loss 4.1917 LearningRate 0.0252 Epoch: 9 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:54,795-Speed 3402.79 samples/sec Loss 4.3007 LearningRate 0.0252 Epoch: 9 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:17:57,807-Speed 3400.41 samples/sec Loss 4.2258 LearningRate 0.0252 Epoch: 9 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:00,823-Speed 3396.11 samples/sec Loss 4.2284 LearningRate 0.0252 Epoch: 9 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:03,843-Speed 3390.53 samples/sec Loss 4.2570 LearningRate 0.0252 Epoch: 9 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:06,856-Speed 3400.33 samples/sec Loss 4.2265 LearningRate 0.0252 Epoch: 9 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:09,871-Speed 3396.80 samples/sec Loss 4.2004 LearningRate 0.0252 Epoch: 9 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:12,891-Speed 3392.12 samples/sec Loss 4.2698 LearningRate 0.0252 Epoch: 9 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:15,887-Speed 3417.87 samples/sec Loss 4.1600 LearningRate 0.0252 Epoch: 9 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:18,896-Speed 3404.71 samples/sec Loss 4.1167 LearningRate 0.0252 Epoch: 9 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:21,910-Speed 3398.13 samples/sec Loss 4.1383 LearningRate 0.0251 Epoch: 9 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:24,957-Speed 3361.29 samples/sec Loss 4.0728 LearningRate 0.0251 Epoch: 9 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:27,971-Speed 3397.90 samples/sec Loss 4.1141 LearningRate 0.0251 Epoch: 9 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:30,992-Speed 3389.88 samples/sec Loss 4.2662 LearningRate 0.0251 Epoch: 9 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:34,018-Speed 3385.79 samples/sec Loss 4.1999 LearningRate 0.0251 Epoch: 9 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:37,109-Speed 3313.16 samples/sec Loss 4.0538 LearningRate 0.0251 Epoch: 9 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:40,131-Speed 3389.14 samples/sec Loss 4.1648 LearningRate 0.0251 Epoch: 9 Global Step: 56750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:43,144-Speed 3400.17 samples/sec Loss 4.3162 LearningRate 0.0251 Epoch: 9 Global Step: 56760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:46,160-Speed 3395.19 samples/sec Loss 4.2573 LearningRate 0.0251 Epoch: 9 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:18:49,193-Speed 3378.01 samples/sec Loss 4.3455 LearningRate 0.0251 Epoch: 9 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:18:52,218-Speed 3384.84 samples/sec Loss 4.0309 LearningRate 0.0251 Epoch: 9 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:18:55,222-Speed 3410.02 samples/sec Loss 4.2170 LearningRate 0.0251 Epoch: 9 Global Step: 56800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:18:58,243-Speed 3389.75 samples/sec Loss 4.1434 LearningRate 0.0250 Epoch: 9 Global Step: 56810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:01,258-Speed 3398.66 samples/sec Loss 4.2457 LearningRate 0.0250 Epoch: 9 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:04,273-Speed 3397.14 samples/sec Loss 4.1371 LearningRate 0.0250 Epoch: 9 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:07,287-Speed 3399.00 samples/sec Loss 4.1496 LearningRate 0.0250 Epoch: 9 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:10,437-Speed 3250.65 samples/sec Loss 4.0531 LearningRate 0.0250 Epoch: 9 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:13,447-Speed 3403.46 samples/sec Loss 4.0980 LearningRate 0.0250 Epoch: 9 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:26,777-Speed 768.25 samples/sec Loss 3.6710 LearningRate 0.0250 Epoch: 10 Global Step: 56870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:29,799-Speed 3389.34 samples/sec Loss 3.5931 LearningRate 0.0250 Epoch: 10 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:32,977-Speed 3223.15 samples/sec Loss 3.6001 LearningRate 0.0250 Epoch: 10 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:36,009-Speed 3378.93 samples/sec Loss 3.6642 LearningRate 0.0250 Epoch: 10 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:19:39,025-Speed 3396.19 samples/sec Loss 3.6384 LearningRate 0.0250 Epoch: 10 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:19:42,044-Speed 3393.04 samples/sec Loss 3.5950 LearningRate 0.0249 Epoch: 10 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:19:45,065-Speed 3390.19 samples/sec Loss 3.5259 LearningRate 0.0249 Epoch: 10 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:19:48,079-Speed 3397.86 samples/sec Loss 3.5942 LearningRate 0.0249 Epoch: 10 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:51,111-Speed 3378.55 samples/sec Loss 3.6945 LearningRate 0.0249 Epoch: 10 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:54,134-Speed 3387.89 samples/sec Loss 3.5189 LearningRate 0.0249 Epoch: 10 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:19:57,158-Speed 3387.41 samples/sec Loss 3.5289 LearningRate 0.0249 Epoch: 10 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:00,189-Speed 3379.06 samples/sec Loss 3.5888 LearningRate 0.0249 Epoch: 10 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:03,218-Speed 3381.63 samples/sec Loss 3.6231 LearningRate 0.0249 Epoch: 10 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:06,246-Speed 3382.74 samples/sec Loss 3.6097 LearningRate 0.0249 Epoch: 10 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:09,260-Speed 3397.99 samples/sec Loss 3.5491 LearningRate 0.0249 Epoch: 10 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:12,298-Speed 3372.20 samples/sec Loss 3.7801 LearningRate 0.0249 Epoch: 10 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:15,399-Speed 3302.72 samples/sec Loss 3.6134 LearningRate 0.0249 Epoch: 10 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:18,414-Speed 3397.25 samples/sec Loss 3.6262 LearningRate 0.0248 Epoch: 10 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:20:21,433-Speed 3392.73 samples/sec Loss 3.5046 LearningRate 0.0248 Epoch: 10 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:20:24,465-Speed 3378.95 samples/sec Loss 3.6727 LearningRate 0.0248 Epoch: 10 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:20:27,461-Speed 3417.73 samples/sec Loss 3.5903 LearningRate 0.0248 Epoch: 10 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:30,489-Speed 3382.42 samples/sec Loss 3.6560 LearningRate 0.0248 Epoch: 10 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:33,510-Speed 3391.34 samples/sec Loss 3.7150 LearningRate 0.0248 Epoch: 10 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:36,532-Speed 3389.67 samples/sec Loss 3.7081 LearningRate 0.0248 Epoch: 10 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:39,551-Speed 3391.63 samples/sec Loss 3.7203 LearningRate 0.0248 Epoch: 10 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:42,579-Speed 3383.53 samples/sec Loss 3.7847 LearningRate 0.0248 Epoch: 10 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:45,600-Speed 3389.69 samples/sec Loss 3.5511 LearningRate 0.0248 Epoch: 10 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:48,627-Speed 3383.80 samples/sec Loss 3.7173 LearningRate 0.0248 Epoch: 10 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:51,651-Speed 3386.57 samples/sec Loss 3.8399 LearningRate 0.0247 Epoch: 10 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:54,681-Speed 3381.10 samples/sec Loss 3.7592 LearningRate 0.0247 Epoch: 10 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:20:57,705-Speed 3386.29 samples/sec Loss 3.6525 LearningRate 0.0247 Epoch: 10 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:00,741-Speed 3374.25 samples/sec Loss 3.7871 LearningRate 0.0247 Epoch: 10 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:03,757-Speed 3396.05 samples/sec Loss 3.5862 LearningRate 0.0247 Epoch: 10 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:06,774-Speed 3395.07 samples/sec Loss 3.6031 LearningRate 0.0247 Epoch: 10 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:09,834-Speed 3346.81 samples/sec Loss 3.6934 LearningRate 0.0247 Epoch: 10 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:12,850-Speed 3396.03 samples/sec Loss 3.8042 LearningRate 0.0247 Epoch: 10 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:15,854-Speed 3409.99 samples/sec Loss 3.7546 LearningRate 0.0247 Epoch: 10 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:18,869-Speed 3397.25 samples/sec Loss 3.7583 LearningRate 0.0247 Epoch: 10 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:21,897-Speed 3382.06 samples/sec Loss 3.8238 LearningRate 0.0247 Epoch: 10 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:24,920-Speed 3388.07 samples/sec Loss 3.7368 LearningRate 0.0246 Epoch: 10 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:27,941-Speed 3390.74 samples/sec Loss 3.7554 LearningRate 0.0246 Epoch: 10 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:30,955-Speed 3397.95 samples/sec Loss 3.7267 LearningRate 0.0246 Epoch: 10 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:33,973-Speed 3394.62 samples/sec Loss 3.7638 LearningRate 0.0246 Epoch: 10 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:36,989-Speed 3396.12 samples/sec Loss 3.7057 LearningRate 0.0246 Epoch: 10 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:40,009-Speed 3390.58 samples/sec Loss 3.6932 LearningRate 0.0246 Epoch: 10 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:43,041-Speed 3378.33 samples/sec Loss 3.7624 LearningRate 0.0246 Epoch: 10 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:46,066-Speed 3385.55 samples/sec Loss 3.7485 LearningRate 0.0246 Epoch: 10 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:21:49,067-Speed 3412.77 samples/sec Loss 3.7592 LearningRate 0.0246 Epoch: 10 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:52,094-Speed 3383.94 samples/sec Loss 3.8219 LearningRate 0.0246 Epoch: 10 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:55,137-Speed 3366.30 samples/sec Loss 3.6456 LearningRate 0.0246 Epoch: 10 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:21:58,163-Speed 3384.84 samples/sec Loss 3.8465 LearningRate 0.0246 Epoch: 10 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:01,182-Speed 3392.75 samples/sec Loss 3.8003 LearningRate 0.0245 Epoch: 10 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:04,200-Speed 3393.14 samples/sec Loss 3.6786 LearningRate 0.0245 Epoch: 10 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:07,217-Speed 3395.94 samples/sec Loss 3.8703 LearningRate 0.0245 Epoch: 10 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:10,257-Speed 3369.64 samples/sec Loss 3.6507 LearningRate 0.0245 Epoch: 10 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:13,297-Speed 3369.52 samples/sec Loss 3.8378 LearningRate 0.0245 Epoch: 10 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:16,352-Speed 3351.78 samples/sec Loss 3.9238 LearningRate 0.0245 Epoch: 10 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:19,364-Speed 3401.63 samples/sec Loss 3.7609 LearningRate 0.0245 Epoch: 10 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:22,388-Speed 3386.77 samples/sec Loss 3.8485 LearningRate 0.0245 Epoch: 10 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:25,440-Speed 3356.35 samples/sec Loss 3.8813 LearningRate 0.0245 Epoch: 10 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:28,468-Speed 3382.68 samples/sec Loss 3.8146 LearningRate 0.0245 Epoch: 10 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:31,483-Speed 3396.55 samples/sec Loss 3.8829 LearningRate 0.0245 Epoch: 10 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:34,500-Speed 3394.60 samples/sec Loss 3.8182 LearningRate 0.0244 Epoch: 10 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:37,519-Speed 3393.00 samples/sec Loss 3.8364 LearningRate 0.0244 Epoch: 10 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:40,537-Speed 3394.37 samples/sec Loss 3.8467 LearningRate 0.0244 Epoch: 10 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:43,582-Speed 3363.46 samples/sec Loss 3.8647 LearningRate 0.0244 Epoch: 10 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:46,616-Speed 3376.56 samples/sec Loss 3.7684 LearningRate 0.0244 Epoch: 10 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:22:49,652-Speed 3373.37 samples/sec Loss 3.7908 LearningRate 0.0244 Epoch: 10 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:22:52,676-Speed 3386.67 samples/sec Loss 3.7933 LearningRate 0.0244 Epoch: 10 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:22:55,708-Speed 3379.55 samples/sec Loss 3.8344 LearningRate 0.0244 Epoch: 10 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:22:58,730-Speed 3388.43 samples/sec Loss 3.8245 LearningRate 0.0244 Epoch: 10 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:23:01,757-Speed 3383.98 samples/sec Loss 3.7720 LearningRate 0.0244 Epoch: 10 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:23:04,760-Speed 3410.80 samples/sec Loss 3.7501 LearningRate 0.0244 Epoch: 10 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:07,778-Speed 3393.58 samples/sec Loss 3.7392 LearningRate 0.0244 Epoch: 10 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:10,798-Speed 3391.84 samples/sec Loss 3.8521 LearningRate 0.0243 Epoch: 10 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:13,820-Speed 3390.02 samples/sec Loss 3.7059 LearningRate 0.0243 Epoch: 10 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:16,845-Speed 3385.35 samples/sec Loss 3.8626 LearningRate 0.0243 Epoch: 10 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:19,881-Speed 3373.68 samples/sec Loss 3.8308 LearningRate 0.0243 Epoch: 10 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:22,900-Speed 3392.67 samples/sec Loss 3.6884 LearningRate 0.0243 Epoch: 10 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:25,924-Speed 3387.04 samples/sec Loss 3.8071 LearningRate 0.0243 Epoch: 10 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:28,957-Speed 3376.45 samples/sec Loss 3.8670 LearningRate 0.0243 Epoch: 10 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:31,983-Speed 3385.39 samples/sec Loss 3.8982 LearningRate 0.0243 Epoch: 10 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:35,016-Speed 3377.19 samples/sec Loss 3.8372 LearningRate 0.0243 Epoch: 10 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:23:38,054-Speed 3370.85 samples/sec Loss 3.7561 LearningRate 0.0243 Epoch: 10 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:23:41,058-Speed 3411.69 samples/sec Loss 3.9922 LearningRate 0.0243 Epoch: 10 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:44,088-Speed 3380.63 samples/sec Loss 3.9160 LearningRate 0.0242 Epoch: 10 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:47,119-Speed 3380.08 samples/sec Loss 3.8566 LearningRate 0.0242 Epoch: 10 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:50,156-Speed 3372.83 samples/sec Loss 3.8199 LearningRate 0.0242 Epoch: 10 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:53,182-Speed 3383.88 samples/sec Loss 3.8199 LearningRate 0.0242 Epoch: 10 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:56,206-Speed 3387.61 samples/sec Loss 3.9197 LearningRate 0.0242 Epoch: 10 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:23:59,234-Speed 3383.00 samples/sec Loss 3.9020 LearningRate 0.0242 Epoch: 10 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:02,258-Speed 3386.32 samples/sec Loss 3.9375 LearningRate 0.0242 Epoch: 10 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:05,288-Speed 3380.68 samples/sec Loss 3.9105 LearningRate 0.0242 Epoch: 10 Global Step: 57790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:08,329-Speed 3368.51 samples/sec Loss 3.8574 LearningRate 0.0242 Epoch: 10 Global Step: 57800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:11,397-Speed 3339.08 samples/sec Loss 3.8713 LearningRate 0.0242 Epoch: 10 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:24:14,464-Speed 3338.78 samples/sec Loss 3.8520 LearningRate 0.0242 Epoch: 10 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:24:17,469-Speed 3408.99 samples/sec Loss 3.9924 LearningRate 0.0242 Epoch: 10 Global Step: 57830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:20,497-Speed 3382.09 samples/sec Loss 3.8992 LearningRate 0.0241 Epoch: 10 Global Step: 57840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:23,528-Speed 3380.18 samples/sec Loss 3.8412 LearningRate 0.0241 Epoch: 10 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:26,557-Speed 3380.82 samples/sec Loss 3.7888 LearningRate 0.0241 Epoch: 10 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:29,578-Speed 3390.82 samples/sec Loss 3.8404 LearningRate 0.0241 Epoch: 10 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:32,599-Speed 3390.23 samples/sec Loss 3.8542 LearningRate 0.0241 Epoch: 10 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:35,639-Speed 3369.97 samples/sec Loss 3.8602 LearningRate 0.0241 Epoch: 10 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:38,671-Speed 3377.63 samples/sec Loss 3.8687 LearningRate 0.0241 Epoch: 10 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:41,695-Speed 3387.26 samples/sec Loss 3.7574 LearningRate 0.0241 Epoch: 10 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:44,716-Speed 3389.38 samples/sec Loss 4.0000 LearningRate 0.0241 Epoch: 10 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:47,738-Speed 3389.24 samples/sec Loss 3.9402 LearningRate 0.0241 Epoch: 10 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:24:50,764-Speed 3384.75 samples/sec Loss 3.9081 LearningRate 0.0241 Epoch: 10 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:24:53,770-Speed 3408.40 samples/sec Loss 3.7971 LearningRate 0.0241 Epoch: 10 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:56,800-Speed 3380.37 samples/sec Loss 3.9199 LearningRate 0.0240 Epoch: 10 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:24:59,837-Speed 3372.16 samples/sec Loss 3.7663 LearningRate 0.0240 Epoch: 10 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:25:02,872-Speed 3374.51 samples/sec Loss 3.8508 LearningRate 0.0240 Epoch: 10 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:25:05,899-Speed 3384.38 samples/sec Loss 3.8429 LearningRate 0.0240 Epoch: 10 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:25:08,933-Speed 3375.11 samples/sec Loss 3.8825 LearningRate 0.0240 Epoch: 10 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:25:52,680-[lfw][58000]XNorm: 22.443554 Training: 2022-04-27 07:25:52,680-[lfw][58000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-27 07:25:52,681-[lfw][58000]Accuracy-Highest: 0.99817 Training: 2022-04-27 07:26:43,631-[cfp_fp][58000]XNorm: 20.331181 Training: 2022-04-27 07:26:43,632-[cfp_fp][58000]Accuracy-Flip: 0.96743+-0.00887 Training: 2022-04-27 07:26:43,632-[cfp_fp][58000]Accuracy-Highest: 0.96743 Training: 2022-04-27 07:27:27,177-[agedb_30][58000]XNorm: 22.511959 Training: 2022-04-27 07:27:27,178-[agedb_30][58000]Accuracy-Flip: 0.97633+-0.00632 Training: 2022-04-27 07:27:27,178-[agedb_30][58000]Accuracy-Highest: 0.97767 Training: 2022-04-27 07:27:30,191-Speed 72.49 samples/sec Loss 3.8454 LearningRate 0.0240 Epoch: 10 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:27:33,197-Speed 3406.96 samples/sec Loss 3.8498 LearningRate 0.0240 Epoch: 10 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:27:36,206-Speed 3404.41 samples/sec Loss 3.8957 LearningRate 0.0240 Epoch: 10 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:27:39,215-Speed 3403.54 samples/sec Loss 3.8618 LearningRate 0.0240 Epoch: 10 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:27:42,227-Speed 3400.44 samples/sec Loss 3.8590 LearningRate 0.0240 Epoch: 10 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:27:45,236-Speed 3404.37 samples/sec Loss 3.7996 LearningRate 0.0240 Epoch: 10 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:27:48,247-Speed 3401.19 samples/sec Loss 3.7528 LearningRate 0.0239 Epoch: 10 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:27:51,268-Speed 3390.23 samples/sec Loss 3.8961 LearningRate 0.0239 Epoch: 10 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:27:54,280-Speed 3400.80 samples/sec Loss 3.8900 LearningRate 0.0239 Epoch: 10 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:27:57,276-Speed 3418.44 samples/sec Loss 3.7875 LearningRate 0.0239 Epoch: 10 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:00,297-Speed 3390.62 samples/sec Loss 3.9066 LearningRate 0.0239 Epoch: 10 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:03,314-Speed 3395.21 samples/sec Loss 3.7729 LearningRate 0.0239 Epoch: 10 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:06,332-Speed 3393.33 samples/sec Loss 3.7660 LearningRate 0.0239 Epoch: 10 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:09,352-Speed 3391.62 samples/sec Loss 3.9529 LearningRate 0.0239 Epoch: 10 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:12,385-Speed 3376.96 samples/sec Loss 3.8812 LearningRate 0.0239 Epoch: 10 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:15,422-Speed 3372.88 samples/sec Loss 3.7812 LearningRate 0.0239 Epoch: 10 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:18,457-Speed 3374.60 samples/sec Loss 3.8370 LearningRate 0.0239 Epoch: 10 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:21,481-Speed 3387.70 samples/sec Loss 4.0274 LearningRate 0.0239 Epoch: 10 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:24,508-Speed 3383.23 samples/sec Loss 3.8645 LearningRate 0.0238 Epoch: 10 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:27,534-Speed 3385.26 samples/sec Loss 3.8003 LearningRate 0.0238 Epoch: 10 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:28:30,548-Speed 3397.87 samples/sec Loss 3.9761 LearningRate 0.0238 Epoch: 10 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:33,583-Speed 3374.20 samples/sec Loss 3.9520 LearningRate 0.0238 Epoch: 10 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:36,619-Speed 3373.94 samples/sec Loss 3.8896 LearningRate 0.0238 Epoch: 10 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:39,656-Speed 3373.37 samples/sec Loss 3.7861 LearningRate 0.0238 Epoch: 10 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:42,689-Speed 3376.29 samples/sec Loss 3.9602 LearningRate 0.0238 Epoch: 10 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:45,719-Speed 3380.03 samples/sec Loss 3.9764 LearningRate 0.0238 Epoch: 10 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:48,746-Speed 3384.22 samples/sec Loss 3.9816 LearningRate 0.0238 Epoch: 10 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:51,769-Speed 3388.09 samples/sec Loss 3.8462 LearningRate 0.0238 Epoch: 10 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:54,790-Speed 3390.16 samples/sec Loss 3.9275 LearningRate 0.0238 Epoch: 10 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:28:57,813-Speed 3388.93 samples/sec Loss 3.9100 LearningRate 0.0237 Epoch: 10 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:00,839-Speed 3383.73 samples/sec Loss 3.8812 LearningRate 0.0237 Epoch: 10 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:03,866-Speed 3384.44 samples/sec Loss 3.9235 LearningRate 0.0237 Epoch: 10 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:06,890-Speed 3386.46 samples/sec Loss 3.9676 LearningRate 0.0237 Epoch: 10 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:09,910-Speed 3391.90 samples/sec Loss 3.9506 LearningRate 0.0237 Epoch: 10 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:12,928-Speed 3394.30 samples/sec Loss 4.0070 LearningRate 0.0237 Epoch: 10 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:15,928-Speed 3413.64 samples/sec Loss 3.9957 LearningRate 0.0237 Epoch: 10 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:18,942-Speed 3398.02 samples/sec Loss 3.9265 LearningRate 0.0237 Epoch: 10 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:21,962-Speed 3392.43 samples/sec Loss 4.0346 LearningRate 0.0237 Epoch: 10 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:25,005-Speed 3365.57 samples/sec Loss 3.8256 LearningRate 0.0237 Epoch: 10 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:28,019-Speed 3397.76 samples/sec Loss 4.0262 LearningRate 0.0237 Epoch: 10 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:31,033-Speed 3397.85 samples/sec Loss 3.9396 LearningRate 0.0237 Epoch: 10 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:34,053-Speed 3392.65 samples/sec Loss 3.8503 LearningRate 0.0236 Epoch: 10 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:37,085-Speed 3377.71 samples/sec Loss 3.9422 LearningRate 0.0236 Epoch: 10 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:40,122-Speed 3373.07 samples/sec Loss 3.7877 LearningRate 0.0236 Epoch: 10 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:43,140-Speed 3393.87 samples/sec Loss 3.9148 LearningRate 0.0236 Epoch: 10 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:29:46,153-Speed 3398.90 samples/sec Loss 3.9460 LearningRate 0.0236 Epoch: 10 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:49,170-Speed 3395.09 samples/sec Loss 3.7469 LearningRate 0.0236 Epoch: 10 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:52,193-Speed 3387.87 samples/sec Loss 3.9413 LearningRate 0.0236 Epoch: 10 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:55,203-Speed 3402.87 samples/sec Loss 3.8676 LearningRate 0.0236 Epoch: 10 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:29:58,216-Speed 3399.78 samples/sec Loss 3.9209 LearningRate 0.0236 Epoch: 10 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:01,211-Speed 3418.85 samples/sec Loss 4.0160 LearningRate 0.0236 Epoch: 10 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:04,271-Speed 3347.43 samples/sec Loss 3.8813 LearningRate 0.0236 Epoch: 10 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:07,350-Speed 3327.37 samples/sec Loss 3.8512 LearningRate 0.0236 Epoch: 10 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:10,360-Speed 3402.26 samples/sec Loss 3.8578 LearningRate 0.0235 Epoch: 10 Global Step: 58540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:13,371-Speed 3402.18 samples/sec Loss 3.8619 LearningRate 0.0235 Epoch: 10 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:16,386-Speed 3396.56 samples/sec Loss 3.9260 LearningRate 0.0235 Epoch: 10 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:19,397-Speed 3401.79 samples/sec Loss 3.8692 LearningRate 0.0235 Epoch: 10 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:22,407-Speed 3403.15 samples/sec Loss 3.8273 LearningRate 0.0235 Epoch: 10 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:25,421-Speed 3397.45 samples/sec Loss 4.0182 LearningRate 0.0235 Epoch: 10 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:28,438-Speed 3395.83 samples/sec Loss 3.9549 LearningRate 0.0235 Epoch: 10 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:31,456-Speed 3393.39 samples/sec Loss 4.0055 LearningRate 0.0235 Epoch: 10 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:34,498-Speed 3366.94 samples/sec Loss 3.8427 LearningRate 0.0235 Epoch: 10 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:37,513-Speed 3397.24 samples/sec Loss 3.8059 LearningRate 0.0235 Epoch: 10 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:40,525-Speed 3400.83 samples/sec Loss 3.8776 LearningRate 0.0235 Epoch: 10 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:43,536-Speed 3401.88 samples/sec Loss 3.8557 LearningRate 0.0235 Epoch: 10 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:46,551-Speed 3396.89 samples/sec Loss 3.9053 LearningRate 0.0234 Epoch: 10 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:49,572-Speed 3389.85 samples/sec Loss 3.9244 LearningRate 0.0234 Epoch: 10 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:52,584-Speed 3401.53 samples/sec Loss 3.8866 LearningRate 0.0234 Epoch: 10 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:30:55,579-Speed 3418.95 samples/sec Loss 3.8687 LearningRate 0.0234 Epoch: 10 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:30:58,599-Speed 3391.86 samples/sec Loss 3.9559 LearningRate 0.0234 Epoch: 10 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:01,614-Speed 3396.93 samples/sec Loss 3.9364 LearningRate 0.0234 Epoch: 10 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:04,638-Speed 3387.49 samples/sec Loss 3.9923 LearningRate 0.0234 Epoch: 10 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:07,649-Speed 3401.95 samples/sec Loss 3.8781 LearningRate 0.0234 Epoch: 10 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:10,662-Speed 3398.93 samples/sec Loss 3.8414 LearningRate 0.0234 Epoch: 10 Global Step: 58740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:13,677-Speed 3396.94 samples/sec Loss 3.8797 LearningRate 0.0234 Epoch: 10 Global Step: 58750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:16,708-Speed 3379.96 samples/sec Loss 3.8890 LearningRate 0.0234 Epoch: 10 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:19,719-Speed 3401.38 samples/sec Loss 3.9640 LearningRate 0.0233 Epoch: 10 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:22,743-Speed 3386.70 samples/sec Loss 3.9472 LearningRate 0.0233 Epoch: 10 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:25,778-Speed 3375.42 samples/sec Loss 3.8635 LearningRate 0.0233 Epoch: 10 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:28,795-Speed 3394.53 samples/sec Loss 3.9716 LearningRate 0.0233 Epoch: 10 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:31,814-Speed 3392.58 samples/sec Loss 3.8958 LearningRate 0.0233 Epoch: 10 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:34,830-Speed 3396.67 samples/sec Loss 3.9130 LearningRate 0.0233 Epoch: 10 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:37,860-Speed 3380.19 samples/sec Loss 3.9130 LearningRate 0.0233 Epoch: 10 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:40,876-Speed 3396.46 samples/sec Loss 4.0724 LearningRate 0.0233 Epoch: 10 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:43,893-Speed 3394.68 samples/sec Loss 3.9181 LearningRate 0.0233 Epoch: 10 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:31:46,913-Speed 3391.02 samples/sec Loss 3.9711 LearningRate 0.0233 Epoch: 10 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:49,928-Speed 3397.94 samples/sec Loss 3.8197 LearningRate 0.0233 Epoch: 10 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:52,949-Speed 3389.94 samples/sec Loss 3.8556 LearningRate 0.0233 Epoch: 10 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:55,972-Speed 3388.10 samples/sec Loss 3.9991 LearningRate 0.0232 Epoch: 10 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:31:58,990-Speed 3393.90 samples/sec Loss 4.0184 LearningRate 0.0232 Epoch: 10 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:02,006-Speed 3396.54 samples/sec Loss 4.0002 LearningRate 0.0232 Epoch: 10 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:05,022-Speed 3395.73 samples/sec Loss 3.8057 LearningRate 0.0232 Epoch: 10 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:08,033-Speed 3401.36 samples/sec Loss 3.9633 LearningRate 0.0232 Epoch: 10 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:11,048-Speed 3397.16 samples/sec Loss 3.9554 LearningRate 0.0232 Epoch: 10 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:14,067-Speed 3393.31 samples/sec Loss 3.8334 LearningRate 0.0232 Epoch: 10 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:17,091-Speed 3386.47 samples/sec Loss 3.8517 LearningRate 0.0232 Epoch: 10 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:32:20,099-Speed 3405.68 samples/sec Loss 3.7908 LearningRate 0.0232 Epoch: 10 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:23,122-Speed 3387.77 samples/sec Loss 3.8686 LearningRate 0.0232 Epoch: 10 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:26,138-Speed 3396.31 samples/sec Loss 3.8723 LearningRate 0.0232 Epoch: 10 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:29,151-Speed 3399.66 samples/sec Loss 3.8794 LearningRate 0.0232 Epoch: 10 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:32,166-Speed 3397.02 samples/sec Loss 3.7364 LearningRate 0.0231 Epoch: 10 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:35,183-Speed 3394.51 samples/sec Loss 3.9656 LearningRate 0.0231 Epoch: 10 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:38,199-Speed 3395.36 samples/sec Loss 3.8572 LearningRate 0.0231 Epoch: 10 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:41,234-Speed 3375.63 samples/sec Loss 3.8344 LearningRate 0.0231 Epoch: 10 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:44,249-Speed 3396.23 samples/sec Loss 3.8042 LearningRate 0.0231 Epoch: 10 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:47,270-Speed 3390.80 samples/sec Loss 3.8752 LearningRate 0.0231 Epoch: 10 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:50,294-Speed 3387.94 samples/sec Loss 3.9551 LearningRate 0.0231 Epoch: 10 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:32:53,297-Speed 3410.01 samples/sec Loss 3.9593 LearningRate 0.0231 Epoch: 10 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:56,311-Speed 3399.17 samples/sec Loss 3.9394 LearningRate 0.0231 Epoch: 10 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:32:59,328-Speed 3394.31 samples/sec Loss 3.8818 LearningRate 0.0231 Epoch: 10 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:02,349-Speed 3390.97 samples/sec Loss 3.9967 LearningRate 0.0231 Epoch: 10 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:05,374-Speed 3385.04 samples/sec Loss 4.0276 LearningRate 0.0231 Epoch: 10 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:08,394-Speed 3392.16 samples/sec Loss 3.8408 LearningRate 0.0230 Epoch: 10 Global Step: 59130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:11,407-Speed 3398.79 samples/sec Loss 3.8674 LearningRate 0.0230 Epoch: 10 Global Step: 59140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:14,428-Speed 3390.00 samples/sec Loss 3.9477 LearningRate 0.0230 Epoch: 10 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:17,443-Speed 3399.16 samples/sec Loss 4.0706 LearningRate 0.0230 Epoch: 10 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:20,456-Speed 3399.76 samples/sec Loss 3.9135 LearningRate 0.0230 Epoch: 10 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:23,479-Speed 3387.75 samples/sec Loss 3.7901 LearningRate 0.0230 Epoch: 10 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:33:26,585-Speed 3298.13 samples/sec Loss 3.9167 LearningRate 0.0230 Epoch: 10 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:33:29,583-Speed 3416.29 samples/sec Loss 3.9282 LearningRate 0.0230 Epoch: 10 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:32,599-Speed 3395.95 samples/sec Loss 3.9249 LearningRate 0.0230 Epoch: 10 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:35,614-Speed 3397.03 samples/sec Loss 3.9037 LearningRate 0.0230 Epoch: 10 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:38,636-Speed 3389.10 samples/sec Loss 3.9620 LearningRate 0.0230 Epoch: 10 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:41,651-Speed 3397.13 samples/sec Loss 3.9576 LearningRate 0.0230 Epoch: 10 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:44,670-Speed 3392.90 samples/sec Loss 3.8748 LearningRate 0.0229 Epoch: 10 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:47,690-Speed 3392.23 samples/sec Loss 3.8965 LearningRate 0.0229 Epoch: 10 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:50,708-Speed 3393.57 samples/sec Loss 3.8528 LearningRate 0.0229 Epoch: 10 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:53,731-Speed 3387.86 samples/sec Loss 4.0503 LearningRate 0.0229 Epoch: 10 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:56,749-Speed 3393.38 samples/sec Loss 3.8477 LearningRate 0.0229 Epoch: 10 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:33:59,765-Speed 3395.97 samples/sec Loss 3.9517 LearningRate 0.0229 Epoch: 10 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:34:02,787-Speed 3389.07 samples/sec Loss 4.0532 LearningRate 0.0229 Epoch: 10 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:34:05,788-Speed 3413.34 samples/sec Loss 3.8582 LearningRate 0.0229 Epoch: 10 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:08,811-Speed 3388.32 samples/sec Loss 3.8914 LearningRate 0.0229 Epoch: 10 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:11,835-Speed 3386.68 samples/sec Loss 3.8525 LearningRate 0.0229 Epoch: 10 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:14,855-Speed 3391.59 samples/sec Loss 3.8873 LearningRate 0.0229 Epoch: 10 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:17,873-Speed 3394.51 samples/sec Loss 3.9611 LearningRate 0.0228 Epoch: 10 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:20,891-Speed 3393.44 samples/sec Loss 3.8893 LearningRate 0.0228 Epoch: 10 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:23,910-Speed 3392.81 samples/sec Loss 3.9572 LearningRate 0.0228 Epoch: 10 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:26,932-Speed 3388.86 samples/sec Loss 3.8928 LearningRate 0.0228 Epoch: 10 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:29,953-Speed 3390.72 samples/sec Loss 3.8508 LearningRate 0.0228 Epoch: 10 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:32,976-Speed 3388.04 samples/sec Loss 3.9355 LearningRate 0.0228 Epoch: 10 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:35,997-Speed 3389.91 samples/sec Loss 3.9308 LearningRate 0.0228 Epoch: 10 Global Step: 59420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:34:39,021-Speed 3387.20 samples/sec Loss 3.8803 LearningRate 0.0228 Epoch: 10 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:34:42,024-Speed 3410.82 samples/sec Loss 3.8533 LearningRate 0.0228 Epoch: 10 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:45,046-Speed 3390.31 samples/sec Loss 3.8537 LearningRate 0.0228 Epoch: 10 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:48,069-Speed 3387.59 samples/sec Loss 3.8861 LearningRate 0.0228 Epoch: 10 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:51,095-Speed 3384.86 samples/sec Loss 3.9574 LearningRate 0.0228 Epoch: 10 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:54,116-Speed 3390.27 samples/sec Loss 3.9049 LearningRate 0.0227 Epoch: 10 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:34:57,135-Speed 3392.37 samples/sec Loss 3.9145 LearningRate 0.0227 Epoch: 10 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:00,178-Speed 3365.85 samples/sec Loss 3.8521 LearningRate 0.0227 Epoch: 10 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:03,218-Speed 3368.60 samples/sec Loss 3.8634 LearningRate 0.0227 Epoch: 10 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:06,246-Speed 3382.79 samples/sec Loss 3.9530 LearningRate 0.0227 Epoch: 10 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:09,270-Speed 3387.62 samples/sec Loss 3.8332 LearningRate 0.0227 Epoch: 10 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:12,296-Speed 3384.88 samples/sec Loss 3.8993 LearningRate 0.0227 Epoch: 10 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:35:15,319-Speed 3388.01 samples/sec Loss 3.8415 LearningRate 0.0227 Epoch: 10 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:35:18,343-Speed 3387.17 samples/sec Loss 4.0042 LearningRate 0.0227 Epoch: 10 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:35:21,350-Speed 3406.19 samples/sec Loss 3.9068 LearningRate 0.0227 Epoch: 10 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:24,375-Speed 3385.74 samples/sec Loss 3.8282 LearningRate 0.0227 Epoch: 10 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:35:27,386-Speed 3401.32 samples/sec Loss 3.9000 LearningRate 0.0227 Epoch: 10 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:30,428-Speed 3367.10 samples/sec Loss 3.8632 LearningRate 0.0226 Epoch: 10 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:33,454-Speed 3384.46 samples/sec Loss 3.8158 LearningRate 0.0226 Epoch: 10 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:36,481-Speed 3384.09 samples/sec Loss 3.8306 LearningRate 0.0226 Epoch: 10 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:39,501-Speed 3391.62 samples/sec Loss 3.8756 LearningRate 0.0226 Epoch: 10 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:42,537-Speed 3374.26 samples/sec Loss 3.8348 LearningRate 0.0226 Epoch: 10 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:45,563-Speed 3385.09 samples/sec Loss 3.8663 LearningRate 0.0226 Epoch: 10 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:48,593-Speed 3379.16 samples/sec Loss 3.8640 LearningRate 0.0226 Epoch: 10 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:51,623-Speed 3380.99 samples/sec Loss 3.8460 LearningRate 0.0226 Epoch: 10 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:54,653-Speed 3380.13 samples/sec Loss 3.9213 LearningRate 0.0226 Epoch: 10 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 07:35:57,682-Speed 3381.25 samples/sec Loss 3.8545 LearningRate 0.0226 Epoch: 10 Global Step: 59690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:00,718-Speed 3374.24 samples/sec Loss 3.8529 LearningRate 0.0226 Epoch: 10 Global Step: 59700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:03,743-Speed 3385.19 samples/sec Loss 3.9357 LearningRate 0.0226 Epoch: 10 Global Step: 59710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:06,772-Speed 3381.95 samples/sec Loss 3.8500 LearningRate 0.0225 Epoch: 10 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:09,793-Speed 3390.38 samples/sec Loss 3.8728 LearningRate 0.0225 Epoch: 10 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:12,819-Speed 3384.60 samples/sec Loss 3.8765 LearningRate 0.0225 Epoch: 10 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:15,841-Speed 3390.01 samples/sec Loss 3.9396 LearningRate 0.0225 Epoch: 10 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:18,867-Speed 3384.55 samples/sec Loss 3.9261 LearningRate 0.0225 Epoch: 10 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:21,893-Speed 3384.67 samples/sec Loss 3.8890 LearningRate 0.0225 Epoch: 10 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:24,926-Speed 3376.43 samples/sec Loss 3.9090 LearningRate 0.0225 Epoch: 10 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:27,951-Speed 3386.98 samples/sec Loss 3.9473 LearningRate 0.0225 Epoch: 10 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:36:30,976-Speed 3385.25 samples/sec Loss 3.9697 LearningRate 0.0225 Epoch: 10 Global Step: 59800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:36:34,016-Speed 3369.44 samples/sec Loss 3.9342 LearningRate 0.0225 Epoch: 10 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:36:37,023-Speed 3405.59 samples/sec Loss 3.8504 LearningRate 0.0225 Epoch: 10 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:40,067-Speed 3365.18 samples/sec Loss 3.8272 LearningRate 0.0225 Epoch: 10 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:43,095-Speed 3382.52 samples/sec Loss 3.7874 LearningRate 0.0224 Epoch: 10 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:46,120-Speed 3386.16 samples/sec Loss 3.8010 LearningRate 0.0224 Epoch: 10 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:49,145-Speed 3385.40 samples/sec Loss 3.9303 LearningRate 0.0224 Epoch: 10 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:52,169-Speed 3387.96 samples/sec Loss 3.8951 LearningRate 0.0224 Epoch: 10 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:55,206-Speed 3372.47 samples/sec Loss 3.7908 LearningRate 0.0224 Epoch: 10 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:36:58,230-Speed 3387.13 samples/sec Loss 3.8393 LearningRate 0.0224 Epoch: 10 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:01,266-Speed 3373.51 samples/sec Loss 3.8642 LearningRate 0.0224 Epoch: 10 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:04,291-Speed 3386.16 samples/sec Loss 3.8864 LearningRate 0.0224 Epoch: 10 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:07,300-Speed 3403.19 samples/sec Loss 4.0176 LearningRate 0.0224 Epoch: 10 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:10,327-Speed 3383.77 samples/sec Loss 3.7647 LearningRate 0.0224 Epoch: 10 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:13,353-Speed 3384.20 samples/sec Loss 3.9385 LearningRate 0.0224 Epoch: 10 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:16,378-Speed 3386.63 samples/sec Loss 3.8881 LearningRate 0.0224 Epoch: 10 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:19,402-Speed 3387.01 samples/sec Loss 3.9566 LearningRate 0.0223 Epoch: 10 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:22,430-Speed 3382.57 samples/sec Loss 3.8550 LearningRate 0.0223 Epoch: 10 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:25,516-Speed 3319.58 samples/sec Loss 3.7486 LearningRate 0.0223 Epoch: 10 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:28,569-Speed 3354.60 samples/sec Loss 3.8915 LearningRate 0.0223 Epoch: 10 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:37:31,595-Speed 3384.30 samples/sec Loss 3.8998 LearningRate 0.0223 Epoch: 10 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:38:15,326-[lfw][60000]XNorm: 22.612268 Training: 2022-04-27 07:38:15,327-[lfw][60000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-04-27 07:38:15,327-[lfw][60000]Accuracy-Highest: 0.99817 Training: 2022-04-27 07:39:06,236-[cfp_fp][60000]XNorm: 20.646764 Training: 2022-04-27 07:39:06,237-[cfp_fp][60000]Accuracy-Flip: 0.97100+-0.00930 Training: 2022-04-27 07:39:06,237-[cfp_fp][60000]Accuracy-Highest: 0.97100 Training: 2022-04-27 07:39:49,518-[agedb_30][60000]XNorm: 22.595011 Training: 2022-04-27 07:39:49,519-[agedb_30][60000]Accuracy-Flip: 0.97500+-0.00830 Training: 2022-04-27 07:39:49,519-[agedb_30][60000]Accuracy-Highest: 0.97767 Training: 2022-04-27 07:39:52,544-Speed 72.65 samples/sec Loss 3.7668 LearningRate 0.0223 Epoch: 10 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:39:55,548-Speed 3408.75 samples/sec Loss 3.8990 LearningRate 0.0223 Epoch: 10 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:39:58,557-Speed 3404.68 samples/sec Loss 3.9034 LearningRate 0.0223 Epoch: 10 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:01,564-Speed 3405.42 samples/sec Loss 3.7317 LearningRate 0.0223 Epoch: 10 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:04,581-Speed 3395.47 samples/sec Loss 3.8061 LearningRate 0.0223 Epoch: 10 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:07,594-Speed 3399.04 samples/sec Loss 3.8756 LearningRate 0.0223 Epoch: 10 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:10,624-Speed 3380.92 samples/sec Loss 3.8022 LearningRate 0.0223 Epoch: 10 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:13,662-Speed 3371.41 samples/sec Loss 3.8833 LearningRate 0.0222 Epoch: 10 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:16,677-Speed 3397.24 samples/sec Loss 3.7955 LearningRate 0.0222 Epoch: 10 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:19,673-Speed 3418.65 samples/sec Loss 3.9954 LearningRate 0.0222 Epoch: 10 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:22,692-Speed 3391.78 samples/sec Loss 3.7805 LearningRate 0.0222 Epoch: 10 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:25,705-Speed 3400.57 samples/sec Loss 3.8500 LearningRate 0.0222 Epoch: 10 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:28,718-Speed 3398.64 samples/sec Loss 3.8510 LearningRate 0.0222 Epoch: 10 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:31,731-Speed 3399.35 samples/sec Loss 3.9704 LearningRate 0.0222 Epoch: 10 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:34,759-Speed 3383.01 samples/sec Loss 3.7537 LearningRate 0.0222 Epoch: 10 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:37,785-Speed 3384.48 samples/sec Loss 4.0206 LearningRate 0.0222 Epoch: 10 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:40,827-Speed 3367.72 samples/sec Loss 3.8718 LearningRate 0.0222 Epoch: 10 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:43,840-Speed 3398.86 samples/sec Loss 3.9862 LearningRate 0.0222 Epoch: 10 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:46,856-Speed 3397.09 samples/sec Loss 3.9536 LearningRate 0.0222 Epoch: 10 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:40:49,881-Speed 3385.05 samples/sec Loss 3.9151 LearningRate 0.0221 Epoch: 10 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:52,899-Speed 3393.91 samples/sec Loss 3.9747 LearningRate 0.0221 Epoch: 10 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:55,924-Speed 3386.62 samples/sec Loss 3.9086 LearningRate 0.0221 Epoch: 10 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:40:58,987-Speed 3343.16 samples/sec Loss 3.8821 LearningRate 0.0221 Epoch: 10 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:41:02,010-Speed 3388.69 samples/sec Loss 3.8480 LearningRate 0.0221 Epoch: 10 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:41:05,038-Speed 3382.61 samples/sec Loss 3.8725 LearningRate 0.0221 Epoch: 10 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:41:08,039-Speed 3412.60 samples/sec Loss 3.8178 LearningRate 0.0221 Epoch: 10 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:11,054-Speed 3397.51 samples/sec Loss 3.8767 LearningRate 0.0221 Epoch: 10 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:14,076-Speed 3389.53 samples/sec Loss 3.8270 LearningRate 0.0221 Epoch: 10 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:17,099-Speed 3388.18 samples/sec Loss 3.9652 LearningRate 0.0221 Epoch: 10 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:20,118-Speed 3392.37 samples/sec Loss 4.0179 LearningRate 0.0221 Epoch: 10 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:23,132-Speed 3398.19 samples/sec Loss 3.7339 LearningRate 0.0221 Epoch: 10 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:26,163-Speed 3379.63 samples/sec Loss 3.8362 LearningRate 0.0221 Epoch: 10 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:29,183-Speed 3390.84 samples/sec Loss 3.8498 LearningRate 0.0220 Epoch: 10 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:32,205-Speed 3390.29 samples/sec Loss 3.8319 LearningRate 0.0220 Epoch: 10 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:35,246-Speed 3367.95 samples/sec Loss 3.8489 LearningRate 0.0220 Epoch: 10 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 07:41:38,262-Speed 3395.59 samples/sec Loss 4.0056 LearningRate 0.0220 Epoch: 10 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 07:41:41,280-Speed 3393.86 samples/sec Loss 3.8149 LearningRate 0.0220 Epoch: 10 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:41:44,279-Speed 3415.84 samples/sec Loss 3.8222 LearningRate 0.0220 Epoch: 10 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:41:47,303-Speed 3387.07 samples/sec Loss 3.8170 LearningRate 0.0220 Epoch: 10 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:41:50,325-Speed 3389.45 samples/sec Loss 3.7780 LearningRate 0.0220 Epoch: 10 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:41:53,345-Speed 3391.05 samples/sec Loss 3.7261 LearningRate 0.0220 Epoch: 10 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:41:56,373-Speed 3382.35 samples/sec Loss 3.8417 LearningRate 0.0220 Epoch: 10 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:41:59,403-Speed 3380.51 samples/sec Loss 3.8729 LearningRate 0.0220 Epoch: 10 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:02,419-Speed 3396.36 samples/sec Loss 3.7559 LearningRate 0.0220 Epoch: 10 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:05,441-Speed 3388.93 samples/sec Loss 4.0111 LearningRate 0.0219 Epoch: 10 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:08,453-Speed 3400.06 samples/sec Loss 3.8377 LearningRate 0.0219 Epoch: 10 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:11,470-Speed 3395.58 samples/sec Loss 4.0651 LearningRate 0.0219 Epoch: 10 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:14,489-Speed 3392.35 samples/sec Loss 3.7765 LearningRate 0.0219 Epoch: 10 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:17,512-Speed 3388.49 samples/sec Loss 3.9519 LearningRate 0.0219 Epoch: 10 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:20,525-Speed 3399.36 samples/sec Loss 3.7927 LearningRate 0.0219 Epoch: 10 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:23,552-Speed 3383.59 samples/sec Loss 3.7887 LearningRate 0.0219 Epoch: 10 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:26,568-Speed 3396.05 samples/sec Loss 3.7123 LearningRate 0.0219 Epoch: 10 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:29,585-Speed 3395.16 samples/sec Loss 3.8550 LearningRate 0.0219 Epoch: 10 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:32,608-Speed 3388.55 samples/sec Loss 3.8861 LearningRate 0.0219 Epoch: 10 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:35,640-Speed 3377.84 samples/sec Loss 3.9415 LearningRate 0.0219 Epoch: 10 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:38,664-Speed 3387.43 samples/sec Loss 3.8027 LearningRate 0.0219 Epoch: 10 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:41,681-Speed 3394.15 samples/sec Loss 3.8249 LearningRate 0.0218 Epoch: 10 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:44,707-Speed 3385.36 samples/sec Loss 3.8392 LearningRate 0.0218 Epoch: 10 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:42:47,729-Speed 3388.98 samples/sec Loss 3.8165 LearningRate 0.0218 Epoch: 10 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:42:50,750-Speed 3390.11 samples/sec Loss 3.9009 LearningRate 0.0218 Epoch: 10 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:42:53,752-Speed 3411.82 samples/sec Loss 3.8560 LearningRate 0.0218 Epoch: 10 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:56,769-Speed 3395.68 samples/sec Loss 3.7796 LearningRate 0.0218 Epoch: 10 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:42:59,797-Speed 3381.82 samples/sec Loss 3.8844 LearningRate 0.0218 Epoch: 10 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:02,822-Speed 3386.61 samples/sec Loss 3.8849 LearningRate 0.0218 Epoch: 10 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:05,856-Speed 3375.07 samples/sec Loss 3.9268 LearningRate 0.0218 Epoch: 10 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:08,880-Speed 3387.38 samples/sec Loss 3.8695 LearningRate 0.0218 Epoch: 10 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:11,898-Speed 3394.34 samples/sec Loss 3.8109 LearningRate 0.0218 Epoch: 10 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:14,916-Speed 3393.61 samples/sec Loss 3.8844 LearningRate 0.0218 Epoch: 10 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:17,935-Speed 3393.06 samples/sec Loss 3.7881 LearningRate 0.0217 Epoch: 10 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:20,952-Speed 3394.42 samples/sec Loss 3.7662 LearningRate 0.0217 Epoch: 10 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:23,970-Speed 3393.67 samples/sec Loss 3.8680 LearningRate 0.0217 Epoch: 10 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:43:26,993-Speed 3388.28 samples/sec Loss 3.8226 LearningRate 0.0217 Epoch: 10 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:30,008-Speed 3396.71 samples/sec Loss 3.7919 LearningRate 0.0217 Epoch: 10 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:33,026-Speed 3394.17 samples/sec Loss 3.7610 LearningRate 0.0217 Epoch: 10 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:36,053-Speed 3383.41 samples/sec Loss 3.9512 LearningRate 0.0217 Epoch: 10 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:39,069-Speed 3395.98 samples/sec Loss 3.8744 LearningRate 0.0217 Epoch: 10 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:42,088-Speed 3393.03 samples/sec Loss 3.8084 LearningRate 0.0217 Epoch: 10 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:45,110-Speed 3389.42 samples/sec Loss 3.9749 LearningRate 0.0217 Epoch: 10 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:48,124-Speed 3398.10 samples/sec Loss 3.7852 LearningRate 0.0217 Epoch: 10 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:51,160-Speed 3373.73 samples/sec Loss 3.9183 LearningRate 0.0217 Epoch: 10 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:54,176-Speed 3395.50 samples/sec Loss 3.7733 LearningRate 0.0216 Epoch: 10 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:43:57,205-Speed 3382.00 samples/sec Loss 3.7855 LearningRate 0.0216 Epoch: 10 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:00,226-Speed 3389.76 samples/sec Loss 3.7790 LearningRate 0.0216 Epoch: 10 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:03,246-Speed 3392.76 samples/sec Loss 3.9233 LearningRate 0.0216 Epoch: 10 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:06,272-Speed 3383.93 samples/sec Loss 3.8334 LearningRate 0.0216 Epoch: 10 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:09,291-Speed 3393.47 samples/sec Loss 3.7236 LearningRate 0.0216 Epoch: 10 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:12,313-Speed 3388.65 samples/sec Loss 3.8079 LearningRate 0.0216 Epoch: 10 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:15,351-Speed 3371.80 samples/sec Loss 3.8241 LearningRate 0.0216 Epoch: 10 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:18,376-Speed 3384.96 samples/sec Loss 3.8240 LearningRate 0.0216 Epoch: 10 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:21,394-Speed 3394.89 samples/sec Loss 3.9562 LearningRate 0.0216 Epoch: 10 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:24,411-Speed 3394.57 samples/sec Loss 3.9157 LearningRate 0.0216 Epoch: 10 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:27,435-Speed 3386.47 samples/sec Loss 3.7068 LearningRate 0.0216 Epoch: 10 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:44:30,454-Speed 3392.98 samples/sec Loss 3.8143 LearningRate 0.0215 Epoch: 10 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:33,471-Speed 3395.81 samples/sec Loss 3.7759 LearningRate 0.0215 Epoch: 10 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:36,492-Speed 3389.43 samples/sec Loss 3.8631 LearningRate 0.0215 Epoch: 10 Global Step: 60950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:39,515-Speed 3388.39 samples/sec Loss 3.8064 LearningRate 0.0215 Epoch: 10 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:42,536-Speed 3391.15 samples/sec Loss 3.9672 LearningRate 0.0215 Epoch: 10 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:45,573-Speed 3372.47 samples/sec Loss 3.7307 LearningRate 0.0215 Epoch: 10 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:48,595-Speed 3388.68 samples/sec Loss 3.9150 LearningRate 0.0215 Epoch: 10 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:51,622-Speed 3384.19 samples/sec Loss 3.8938 LearningRate 0.0215 Epoch: 10 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:54,645-Speed 3388.28 samples/sec Loss 3.8666 LearningRate 0.0215 Epoch: 10 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:44:57,650-Speed 3408.23 samples/sec Loss 3.7899 LearningRate 0.0215 Epoch: 10 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:00,669-Speed 3392.93 samples/sec Loss 3.7804 LearningRate 0.0215 Epoch: 10 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:03,693-Speed 3386.60 samples/sec Loss 3.9148 LearningRate 0.0215 Epoch: 10 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:06,718-Speed 3386.44 samples/sec Loss 4.0718 LearningRate 0.0215 Epoch: 10 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:09,738-Speed 3391.32 samples/sec Loss 3.7491 LearningRate 0.0214 Epoch: 10 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:12,776-Speed 3370.70 samples/sec Loss 3.7158 LearningRate 0.0214 Epoch: 10 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:15,797-Speed 3390.64 samples/sec Loss 3.9446 LearningRate 0.0214 Epoch: 10 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:18,823-Speed 3384.76 samples/sec Loss 3.8696 LearningRate 0.0214 Epoch: 10 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:21,850-Speed 3383.78 samples/sec Loss 3.7441 LearningRate 0.0214 Epoch: 10 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:24,877-Speed 3383.89 samples/sec Loss 3.7785 LearningRate 0.0214 Epoch: 10 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:45:27,894-Speed 3395.05 samples/sec Loss 3.8427 LearningRate 0.0214 Epoch: 10 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:30,919-Speed 3385.99 samples/sec Loss 3.9272 LearningRate 0.0214 Epoch: 10 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:33,939-Speed 3390.95 samples/sec Loss 3.9062 LearningRate 0.0214 Epoch: 10 Global Step: 61140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:36,965-Speed 3385.36 samples/sec Loss 3.8356 LearningRate 0.0214 Epoch: 10 Global Step: 61150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:39,985-Speed 3390.83 samples/sec Loss 3.9093 LearningRate 0.0214 Epoch: 10 Global Step: 61160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:43,011-Speed 3385.38 samples/sec Loss 3.7703 LearningRate 0.0214 Epoch: 10 Global Step: 61170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:46,038-Speed 3383.49 samples/sec Loss 3.8328 LearningRate 0.0213 Epoch: 10 Global Step: 61180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:49,062-Speed 3386.10 samples/sec Loss 3.8764 LearningRate 0.0213 Epoch: 10 Global Step: 61190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:52,092-Speed 3382.13 samples/sec Loss 3.7074 LearningRate 0.0213 Epoch: 10 Global Step: 61200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:55,114-Speed 3388.63 samples/sec Loss 3.7718 LearningRate 0.0213 Epoch: 10 Global Step: 61210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:45:58,138-Speed 3387.02 samples/sec Loss 3.6873 LearningRate 0.0213 Epoch: 10 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:01,163-Speed 3386.10 samples/sec Loss 3.8604 LearningRate 0.0213 Epoch: 10 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:04,192-Speed 3382.04 samples/sec Loss 3.9360 LearningRate 0.0213 Epoch: 10 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:07,219-Speed 3382.85 samples/sec Loss 3.7157 LearningRate 0.0213 Epoch: 10 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:10,253-Speed 3376.49 samples/sec Loss 3.8579 LearningRate 0.0213 Epoch: 10 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:13,291-Speed 3370.69 samples/sec Loss 3.8946 LearningRate 0.0213 Epoch: 10 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:16,321-Speed 3380.26 samples/sec Loss 3.7829 LearningRate 0.0213 Epoch: 10 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:19,349-Speed 3383.85 samples/sec Loss 3.9771 LearningRate 0.0213 Epoch: 10 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:22,390-Speed 3367.20 samples/sec Loss 3.8960 LearningRate 0.0212 Epoch: 10 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:25,422-Speed 3378.82 samples/sec Loss 3.6986 LearningRate 0.0212 Epoch: 10 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:28,453-Speed 3378.75 samples/sec Loss 3.7547 LearningRate 0.0212 Epoch: 10 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:46:31,482-Speed 3381.13 samples/sec Loss 3.8664 LearningRate 0.0212 Epoch: 10 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:46:34,509-Speed 3384.55 samples/sec Loss 3.7552 LearningRate 0.0212 Epoch: 10 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:46:37,526-Speed 3394.43 samples/sec Loss 3.7690 LearningRate 0.0212 Epoch: 10 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:40,553-Speed 3383.84 samples/sec Loss 3.8087 LearningRate 0.0212 Epoch: 10 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:43,575-Speed 3388.81 samples/sec Loss 3.7342 LearningRate 0.0212 Epoch: 10 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:46,606-Speed 3379.97 samples/sec Loss 3.8944 LearningRate 0.0212 Epoch: 10 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:49,636-Speed 3380.12 samples/sec Loss 3.9031 LearningRate 0.0212 Epoch: 10 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:52,662-Speed 3385.56 samples/sec Loss 3.7237 LearningRate 0.0212 Epoch: 10 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:55,694-Speed 3377.94 samples/sec Loss 3.7693 LearningRate 0.0212 Epoch: 10 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:46:58,715-Speed 3390.08 samples/sec Loss 3.6368 LearningRate 0.0212 Epoch: 10 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:01,744-Speed 3381.59 samples/sec Loss 3.8594 LearningRate 0.0211 Epoch: 10 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:04,794-Speed 3358.23 samples/sec Loss 3.8115 LearningRate 0.0211 Epoch: 10 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:07,816-Speed 3388.78 samples/sec Loss 3.8546 LearningRate 0.0211 Epoch: 10 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:47:10,837-Speed 3389.92 samples/sec Loss 3.9443 LearningRate 0.0211 Epoch: 10 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:47:13,860-Speed 3388.42 samples/sec Loss 3.8635 LearningRate 0.0211 Epoch: 10 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:47:16,872-Speed 3400.67 samples/sec Loss 3.7792 LearningRate 0.0211 Epoch: 10 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:19,904-Speed 3379.38 samples/sec Loss 3.8911 LearningRate 0.0211 Epoch: 10 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:22,929-Speed 3385.14 samples/sec Loss 3.7879 LearningRate 0.0211 Epoch: 10 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:25,961-Speed 3378.43 samples/sec Loss 3.8662 LearningRate 0.0211 Epoch: 10 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:28,986-Speed 3385.25 samples/sec Loss 3.7207 LearningRate 0.0211 Epoch: 10 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:32,019-Speed 3377.11 samples/sec Loss 3.7553 LearningRate 0.0211 Epoch: 10 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:35,058-Speed 3370.73 samples/sec Loss 3.8099 LearningRate 0.0211 Epoch: 10 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:38,090-Speed 3378.13 samples/sec Loss 3.8381 LearningRate 0.0210 Epoch: 10 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:41,122-Speed 3378.72 samples/sec Loss 3.7424 LearningRate 0.0210 Epoch: 10 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:44,151-Speed 3381.36 samples/sec Loss 3.8487 LearningRate 0.0210 Epoch: 10 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:47,175-Speed 3386.07 samples/sec Loss 3.8419 LearningRate 0.0210 Epoch: 10 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:47:50,204-Speed 3381.44 samples/sec Loss 3.7879 LearningRate 0.0210 Epoch: 10 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:47:53,215-Speed 3401.91 samples/sec Loss 3.8378 LearningRate 0.0210 Epoch: 10 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:56,244-Speed 3381.88 samples/sec Loss 3.8074 LearningRate 0.0210 Epoch: 10 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:47:59,274-Speed 3379.57 samples/sec Loss 3.8358 LearningRate 0.0210 Epoch: 10 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:02,307-Speed 3377.18 samples/sec Loss 3.6841 LearningRate 0.0210 Epoch: 10 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:05,332-Speed 3385.61 samples/sec Loss 3.8945 LearningRate 0.0210 Epoch: 10 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:08,359-Speed 3384.83 samples/sec Loss 3.8963 LearningRate 0.0210 Epoch: 10 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:11,390-Speed 3378.32 samples/sec Loss 3.6400 LearningRate 0.0210 Epoch: 10 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:14,420-Speed 3381.36 samples/sec Loss 3.6984 LearningRate 0.0209 Epoch: 10 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:17,450-Speed 3379.99 samples/sec Loss 3.8067 LearningRate 0.0209 Epoch: 10 Global Step: 61680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:20,479-Speed 3380.80 samples/sec Loss 3.7793 LearningRate 0.0209 Epoch: 10 Global Step: 61690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:23,506-Speed 3383.77 samples/sec Loss 3.7874 LearningRate 0.0209 Epoch: 10 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:48:26,517-Speed 3401.75 samples/sec Loss 3.8134 LearningRate 0.0209 Epoch: 10 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:29,550-Speed 3377.18 samples/sec Loss 3.7575 LearningRate 0.0209 Epoch: 10 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:32,579-Speed 3381.67 samples/sec Loss 3.7263 LearningRate 0.0209 Epoch: 10 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:35,606-Speed 3383.40 samples/sec Loss 3.7534 LearningRate 0.0209 Epoch: 10 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:38,630-Speed 3386.79 samples/sec Loss 3.8225 LearningRate 0.0209 Epoch: 10 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:41,661-Speed 3380.13 samples/sec Loss 3.8908 LearningRate 0.0209 Epoch: 10 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:44,686-Speed 3384.98 samples/sec Loss 3.7132 LearningRate 0.0209 Epoch: 10 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:47,735-Speed 3360.37 samples/sec Loss 3.8565 LearningRate 0.0209 Epoch: 10 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:50,767-Speed 3377.45 samples/sec Loss 3.6829 LearningRate 0.0209 Epoch: 10 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:53,804-Speed 3372.12 samples/sec Loss 3.8195 LearningRate 0.0208 Epoch: 10 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:48:56,832-Speed 3382.96 samples/sec Loss 3.8683 LearningRate 0.0208 Epoch: 10 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:48:59,845-Speed 3399.81 samples/sec Loss 3.7303 LearningRate 0.0208 Epoch: 10 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:02,874-Speed 3382.13 samples/sec Loss 3.6995 LearningRate 0.0208 Epoch: 10 Global Step: 61830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:05,904-Speed 3379.47 samples/sec Loss 3.7766 LearningRate 0.0208 Epoch: 10 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:08,927-Speed 3388.36 samples/sec Loss 3.7710 LearningRate 0.0208 Epoch: 10 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:11,969-Speed 3367.22 samples/sec Loss 3.7800 LearningRate 0.0208 Epoch: 10 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:14,991-Speed 3388.39 samples/sec Loss 3.7434 LearningRate 0.0208 Epoch: 10 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:18,016-Speed 3386.96 samples/sec Loss 3.7315 LearningRate 0.0208 Epoch: 10 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:21,038-Speed 3388.67 samples/sec Loss 3.9349 LearningRate 0.0208 Epoch: 10 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:24,074-Speed 3373.30 samples/sec Loss 3.7081 LearningRate 0.0208 Epoch: 10 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:27,102-Speed 3382.81 samples/sec Loss 3.7553 LearningRate 0.0208 Epoch: 10 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:30,109-Speed 3407.18 samples/sec Loss 3.7804 LearningRate 0.0207 Epoch: 10 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:33,138-Speed 3381.34 samples/sec Loss 3.8382 LearningRate 0.0207 Epoch: 10 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:36,165-Speed 3382.72 samples/sec Loss 3.6931 LearningRate 0.0207 Epoch: 10 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:39,189-Speed 3387.72 samples/sec Loss 3.8754 LearningRate 0.0207 Epoch: 10 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:42,213-Speed 3386.21 samples/sec Loss 3.8765 LearningRate 0.0207 Epoch: 10 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:49:45,237-Speed 3387.30 samples/sec Loss 3.7583 LearningRate 0.0207 Epoch: 10 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:49:48,323-Speed 3319.72 samples/sec Loss 3.7544 LearningRate 0.0207 Epoch: 10 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:49:51,447-Speed 3278.55 samples/sec Loss 3.8248 LearningRate 0.0207 Epoch: 10 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:49:54,514-Speed 3338.98 samples/sec Loss 3.6825 LearningRate 0.0207 Epoch: 10 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:50:37,725-[lfw][62000]XNorm: 23.436172 Training: 2022-04-27 07:50:37,725-[lfw][62000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-27 07:50:37,726-[lfw][62000]Accuracy-Highest: 0.99817 Training: 2022-04-27 07:51:28,095-[cfp_fp][62000]XNorm: 21.172719 Training: 2022-04-27 07:51:28,095-[cfp_fp][62000]Accuracy-Flip: 0.97243+-0.00688 Training: 2022-04-27 07:51:28,096-[cfp_fp][62000]Accuracy-Highest: 0.97243 Training: 2022-04-27 07:52:11,293-[agedb_30][62000]XNorm: 23.276430 Training: 2022-04-27 07:52:11,294-[agedb_30][62000]Accuracy-Flip: 0.97600+-0.00754 Training: 2022-04-27 07:52:11,294-[agedb_30][62000]Accuracy-Highest: 0.97767 Training: 2022-04-27 07:52:14,320-Speed 73.25 samples/sec Loss 3.7732 LearningRate 0.0207 Epoch: 10 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:17,319-Speed 3415.25 samples/sec Loss 3.8230 LearningRate 0.0207 Epoch: 10 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:20,333-Speed 3398.70 samples/sec Loss 3.7938 LearningRate 0.0207 Epoch: 10 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:23,343-Speed 3402.08 samples/sec Loss 3.7792 LearningRate 0.0207 Epoch: 10 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:26,355-Speed 3400.25 samples/sec Loss 3.7866 LearningRate 0.0206 Epoch: 10 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:29,370-Speed 3397.65 samples/sec Loss 3.7990 LearningRate 0.0206 Epoch: 10 Global Step: 62060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:52:32,378-Speed 3405.45 samples/sec Loss 3.8182 LearningRate 0.0206 Epoch: 10 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:35,411-Speed 3376.01 samples/sec Loss 3.7185 LearningRate 0.0206 Epoch: 10 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:38,424-Speed 3399.20 samples/sec Loss 3.7235 LearningRate 0.0206 Epoch: 10 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:41,456-Speed 3379.00 samples/sec Loss 3.8276 LearningRate 0.0206 Epoch: 10 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:44,474-Speed 3393.50 samples/sec Loss 3.7101 LearningRate 0.0206 Epoch: 10 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:47,490-Speed 3396.64 samples/sec Loss 3.7740 LearningRate 0.0206 Epoch: 10 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:50,508-Speed 3392.99 samples/sec Loss 3.8108 LearningRate 0.0206 Epoch: 10 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:53,528-Speed 3391.86 samples/sec Loss 3.7677 LearningRate 0.0206 Epoch: 10 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:56,593-Speed 3341.61 samples/sec Loss 3.7842 LearningRate 0.0206 Epoch: 10 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:52:59,616-Speed 3388.51 samples/sec Loss 3.7305 LearningRate 0.0206 Epoch: 10 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:02,654-Speed 3371.61 samples/sec Loss 3.8499 LearningRate 0.0205 Epoch: 10 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:53:05,678-Speed 3386.49 samples/sec Loss 3.7167 LearningRate 0.0205 Epoch: 10 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:08,708-Speed 3380.32 samples/sec Loss 3.6843 LearningRate 0.0205 Epoch: 10 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:11,738-Speed 3380.95 samples/sec Loss 3.7741 LearningRate 0.0205 Epoch: 10 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:14,793-Speed 3352.77 samples/sec Loss 3.7577 LearningRate 0.0205 Epoch: 10 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:17,822-Speed 3381.47 samples/sec Loss 3.8189 LearningRate 0.0205 Epoch: 10 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:20,848-Speed 3384.58 samples/sec Loss 3.7015 LearningRate 0.0205 Epoch: 10 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:23,898-Speed 3358.08 samples/sec Loss 3.8283 LearningRate 0.0205 Epoch: 10 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:26,925-Speed 3383.60 samples/sec Loss 3.7482 LearningRate 0.0205 Epoch: 10 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:29,949-Speed 3387.19 samples/sec Loss 3.7481 LearningRate 0.0205 Epoch: 10 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:32,978-Speed 3381.66 samples/sec Loss 3.7854 LearningRate 0.0205 Epoch: 10 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:35,999-Speed 3389.62 samples/sec Loss 3.7112 LearningRate 0.0205 Epoch: 10 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:53:39,019-Speed 3392.09 samples/sec Loss 3.7271 LearningRate 0.0205 Epoch: 10 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:53:42,037-Speed 3393.89 samples/sec Loss 3.8560 LearningRate 0.0204 Epoch: 10 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:53:45,041-Speed 3409.12 samples/sec Loss 3.7716 LearningRate 0.0204 Epoch: 10 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:48,122-Speed 3325.05 samples/sec Loss 3.6914 LearningRate 0.0204 Epoch: 10 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:51,137-Speed 3396.79 samples/sec Loss 3.6915 LearningRate 0.0204 Epoch: 10 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:54,151-Speed 3398.54 samples/sec Loss 3.8926 LearningRate 0.0204 Epoch: 10 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:53:57,162-Speed 3400.82 samples/sec Loss 3.7414 LearningRate 0.0204 Epoch: 10 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:00,182-Speed 3392.18 samples/sec Loss 3.7475 LearningRate 0.0204 Epoch: 10 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:03,209-Speed 3383.27 samples/sec Loss 3.7895 LearningRate 0.0204 Epoch: 10 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:06,223-Speed 3398.89 samples/sec Loss 3.7209 LearningRate 0.0204 Epoch: 10 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:09,238-Speed 3397.50 samples/sec Loss 3.7185 LearningRate 0.0204 Epoch: 10 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:12,251-Speed 3398.25 samples/sec Loss 3.7766 LearningRate 0.0204 Epoch: 10 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:15,278-Speed 3384.57 samples/sec Loss 3.7262 LearningRate 0.0204 Epoch: 10 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:54:18,275-Speed 3416.95 samples/sec Loss 3.9117 LearningRate 0.0203 Epoch: 10 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:21,297-Speed 3389.77 samples/sec Loss 3.7204 LearningRate 0.0203 Epoch: 10 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:24,312-Speed 3396.32 samples/sec Loss 3.7384 LearningRate 0.0203 Epoch: 10 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:27,323-Speed 3401.82 samples/sec Loss 3.6436 LearningRate 0.0203 Epoch: 10 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:30,335-Speed 3400.99 samples/sec Loss 3.7846 LearningRate 0.0203 Epoch: 10 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:33,353-Speed 3393.66 samples/sec Loss 3.6555 LearningRate 0.0203 Epoch: 10 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:36,370-Speed 3395.45 samples/sec Loss 3.8557 LearningRate 0.0203 Epoch: 10 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:39,380-Speed 3402.17 samples/sec Loss 3.7243 LearningRate 0.0203 Epoch: 10 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:42,392-Speed 3400.86 samples/sec Loss 3.8155 LearningRate 0.0203 Epoch: 10 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:45,404-Speed 3400.30 samples/sec Loss 3.7403 LearningRate 0.0203 Epoch: 10 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:54:48,398-Speed 3420.61 samples/sec Loss 3.7612 LearningRate 0.0203 Epoch: 10 Global Step: 62520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:54:51,413-Speed 3397.49 samples/sec Loss 3.8102 LearningRate 0.0203 Epoch: 10 Global Step: 62530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:54:54,504-Speed 3313.54 samples/sec Loss 3.5876 LearningRate 0.0203 Epoch: 10 Global Step: 62540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:07,887-Speed 765.22 samples/sec Loss 3.5473 LearningRate 0.0202 Epoch: 11 Global Step: 62550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:10,896-Speed 3404.54 samples/sec Loss 3.1289 LearningRate 0.0202 Epoch: 11 Global Step: 62560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:13,907-Speed 3401.24 samples/sec Loss 3.1046 LearningRate 0.0202 Epoch: 11 Global Step: 62570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:16,951-Speed 3364.41 samples/sec Loss 3.1197 LearningRate 0.0202 Epoch: 11 Global Step: 62580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:19,962-Speed 3402.33 samples/sec Loss 3.0536 LearningRate 0.0202 Epoch: 11 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:22,997-Speed 3374.36 samples/sec Loss 3.0800 LearningRate 0.0202 Epoch: 11 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:26,016-Speed 3392.43 samples/sec Loss 3.1922 LearningRate 0.0202 Epoch: 11 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:29,030-Speed 3398.49 samples/sec Loss 3.1146 LearningRate 0.0202 Epoch: 11 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:55:32,037-Speed 3405.62 samples/sec Loss 3.1629 LearningRate 0.0202 Epoch: 11 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:55:35,070-Speed 3377.68 samples/sec Loss 3.2593 LearningRate 0.0202 Epoch: 11 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:55:38,062-Speed 3423.39 samples/sec Loss 3.2655 LearningRate 0.0202 Epoch: 11 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:41,075-Speed 3399.16 samples/sec Loss 3.1322 LearningRate 0.0202 Epoch: 11 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:44,147-Speed 3333.61 samples/sec Loss 3.1653 LearningRate 0.0202 Epoch: 11 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:47,159-Speed 3400.75 samples/sec Loss 3.1240 LearningRate 0.0201 Epoch: 11 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:50,180-Speed 3390.42 samples/sec Loss 3.2560 LearningRate 0.0201 Epoch: 11 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:53,191-Speed 3401.31 samples/sec Loss 3.2585 LearningRate 0.0201 Epoch: 11 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:56,207-Speed 3396.80 samples/sec Loss 3.1824 LearningRate 0.0201 Epoch: 11 Global Step: 62710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:55:59,224-Speed 3394.47 samples/sec Loss 3.2148 LearningRate 0.0201 Epoch: 11 Global Step: 62720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:56:02,239-Speed 3397.12 samples/sec Loss 3.1168 LearningRate 0.0201 Epoch: 11 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:56:05,260-Speed 3390.81 samples/sec Loss 3.1792 LearningRate 0.0201 Epoch: 11 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:56:08,278-Speed 3393.68 samples/sec Loss 3.2078 LearningRate 0.0201 Epoch: 11 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:11,308-Speed 3380.62 samples/sec Loss 3.2176 LearningRate 0.0201 Epoch: 11 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:14,336-Speed 3382.61 samples/sec Loss 3.2045 LearningRate 0.0201 Epoch: 11 Global Step: 62770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:17,352-Speed 3395.56 samples/sec Loss 3.2517 LearningRate 0.0201 Epoch: 11 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:20,372-Speed 3391.45 samples/sec Loss 3.1792 LearningRate 0.0201 Epoch: 11 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:23,387-Speed 3396.70 samples/sec Loss 3.1961 LearningRate 0.0200 Epoch: 11 Global Step: 62800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:26,467-Speed 3326.14 samples/sec Loss 3.2413 LearningRate 0.0200 Epoch: 11 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:29,497-Speed 3380.33 samples/sec Loss 3.1004 LearningRate 0.0200 Epoch: 11 Global Step: 62820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:32,539-Speed 3367.48 samples/sec Loss 3.1933 LearningRate 0.0200 Epoch: 11 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:35,573-Speed 3375.49 samples/sec Loss 3.3266 LearningRate 0.0200 Epoch: 11 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:38,605-Speed 3378.49 samples/sec Loss 3.2982 LearningRate 0.0200 Epoch: 11 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:56:41,830-Speed 3175.83 samples/sec Loss 3.2976 LearningRate 0.0200 Epoch: 11 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:44,855-Speed 3385.41 samples/sec Loss 3.2370 LearningRate 0.0200 Epoch: 11 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:47,889-Speed 3376.16 samples/sec Loss 3.2307 LearningRate 0.0200 Epoch: 11 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:50,907-Speed 3393.63 samples/sec Loss 3.1985 LearningRate 0.0200 Epoch: 11 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:53,929-Speed 3390.17 samples/sec Loss 3.2149 LearningRate 0.0200 Epoch: 11 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:56,947-Speed 3393.33 samples/sec Loss 3.2211 LearningRate 0.0200 Epoch: 11 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:56:59,988-Speed 3367.52 samples/sec Loss 3.2819 LearningRate 0.0200 Epoch: 11 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:03,033-Speed 3364.31 samples/sec Loss 3.2723 LearningRate 0.0199 Epoch: 11 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:06,054-Speed 3390.04 samples/sec Loss 3.2939 LearningRate 0.0199 Epoch: 11 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:09,085-Speed 3379.21 samples/sec Loss 3.1091 LearningRate 0.0199 Epoch: 11 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:12,107-Speed 3389.12 samples/sec Loss 3.2856 LearningRate 0.0199 Epoch: 11 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:57:15,137-Speed 3380.17 samples/sec Loss 3.2208 LearningRate 0.0199 Epoch: 11 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:57:18,153-Speed 3395.91 samples/sec Loss 3.3291 LearningRate 0.0199 Epoch: 11 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:57:21,178-Speed 3386.93 samples/sec Loss 3.2748 LearningRate 0.0199 Epoch: 11 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:57:24,216-Speed 3370.85 samples/sec Loss 3.1623 LearningRate 0.0199 Epoch: 11 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:57:27,222-Speed 3407.41 samples/sec Loss 3.2349 LearningRate 0.0199 Epoch: 11 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:30,252-Speed 3380.21 samples/sec Loss 3.2783 LearningRate 0.0199 Epoch: 11 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:33,271-Speed 3392.35 samples/sec Loss 3.3263 LearningRate 0.0199 Epoch: 11 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:36,299-Speed 3383.30 samples/sec Loss 3.3572 LearningRate 0.0199 Epoch: 11 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:39,322-Speed 3387.25 samples/sec Loss 3.1999 LearningRate 0.0199 Epoch: 11 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:42,340-Speed 3393.68 samples/sec Loss 3.3614 LearningRate 0.0198 Epoch: 11 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:45,357-Speed 3395.45 samples/sec Loss 3.3152 LearningRate 0.0198 Epoch: 11 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:48,391-Speed 3375.94 samples/sec Loss 3.3675 LearningRate 0.0198 Epoch: 11 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:51,438-Speed 3361.49 samples/sec Loss 3.3673 LearningRate 0.0198 Epoch: 11 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:54,456-Speed 3393.92 samples/sec Loss 3.3208 LearningRate 0.0198 Epoch: 11 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:57:57,474-Speed 3393.50 samples/sec Loss 3.2518 LearningRate 0.0198 Epoch: 11 Global Step: 63110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:00,492-Speed 3394.08 samples/sec Loss 3.3787 LearningRate 0.0198 Epoch: 11 Global Step: 63120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:03,510-Speed 3393.83 samples/sec Loss 3.2248 LearningRate 0.0198 Epoch: 11 Global Step: 63130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:06,528-Speed 3393.23 samples/sec Loss 3.3407 LearningRate 0.0198 Epoch: 11 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:09,553-Speed 3385.85 samples/sec Loss 3.4516 LearningRate 0.0198 Epoch: 11 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:12,627-Speed 3332.32 samples/sec Loss 3.2256 LearningRate 0.0198 Epoch: 11 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:15,647-Speed 3391.96 samples/sec Loss 3.3308 LearningRate 0.0198 Epoch: 11 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:18,665-Speed 3393.58 samples/sec Loss 3.3203 LearningRate 0.0198 Epoch: 11 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:21,684-Speed 3392.32 samples/sec Loss 3.2190 LearningRate 0.0197 Epoch: 11 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:24,715-Speed 3379.40 samples/sec Loss 3.2484 LearningRate 0.0197 Epoch: 11 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:27,736-Speed 3390.45 samples/sec Loss 3.3674 LearningRate 0.0197 Epoch: 11 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:30,757-Speed 3389.86 samples/sec Loss 3.2977 LearningRate 0.0197 Epoch: 11 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:33,790-Speed 3376.71 samples/sec Loss 3.4232 LearningRate 0.0197 Epoch: 11 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:36,812-Speed 3390.47 samples/sec Loss 3.3603 LearningRate 0.0197 Epoch: 11 Global Step: 63240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:39,833-Speed 3391.42 samples/sec Loss 3.3494 LearningRate 0.0197 Epoch: 11 Global Step: 63250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:42,855-Speed 3389.28 samples/sec Loss 3.3561 LearningRate 0.0197 Epoch: 11 Global Step: 63260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:45,899-Speed 3364.40 samples/sec Loss 3.3299 LearningRate 0.0197 Epoch: 11 Global Step: 63270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:48,929-Speed 3380.27 samples/sec Loss 3.3891 LearningRate 0.0197 Epoch: 11 Global Step: 63280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:58:51,935-Speed 3406.71 samples/sec Loss 3.3943 LearningRate 0.0197 Epoch: 11 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:54,952-Speed 3395.68 samples/sec Loss 3.3838 LearningRate 0.0197 Epoch: 11 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:58:57,971-Speed 3392.35 samples/sec Loss 3.4993 LearningRate 0.0196 Epoch: 11 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:01,094-Speed 3279.66 samples/sec Loss 3.2212 LearningRate 0.0196 Epoch: 11 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:04,137-Speed 3366.33 samples/sec Loss 3.4607 LearningRate 0.0196 Epoch: 11 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:07,155-Speed 3393.41 samples/sec Loss 3.4488 LearningRate 0.0196 Epoch: 11 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:10,191-Speed 3373.81 samples/sec Loss 3.2850 LearningRate 0.0196 Epoch: 11 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:13,213-Speed 3389.14 samples/sec Loss 3.3526 LearningRate 0.0196 Epoch: 11 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:16,245-Speed 3378.53 samples/sec Loss 3.3664 LearningRate 0.0196 Epoch: 11 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:19,273-Speed 3382.09 samples/sec Loss 3.2073 LearningRate 0.0196 Epoch: 11 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:22,290-Speed 3394.56 samples/sec Loss 3.2868 LearningRate 0.0196 Epoch: 11 Global Step: 63390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 07:59:25,309-Speed 3393.63 samples/sec Loss 3.2843 LearningRate 0.0196 Epoch: 11 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:28,339-Speed 3379.94 samples/sec Loss 3.3097 LearningRate 0.0196 Epoch: 11 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:31,364-Speed 3385.60 samples/sec Loss 3.3947 LearningRate 0.0196 Epoch: 11 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:34,389-Speed 3386.24 samples/sec Loss 3.2192 LearningRate 0.0196 Epoch: 11 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:37,409-Speed 3391.92 samples/sec Loss 3.3444 LearningRate 0.0195 Epoch: 11 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:40,431-Speed 3389.17 samples/sec Loss 3.4113 LearningRate 0.0195 Epoch: 11 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:43,454-Speed 3388.39 samples/sec Loss 3.3516 LearningRate 0.0195 Epoch: 11 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:46,475-Speed 3390.29 samples/sec Loss 3.3270 LearningRate 0.0195 Epoch: 11 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:49,498-Speed 3388.02 samples/sec Loss 3.3147 LearningRate 0.0195 Epoch: 11 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 07:59:52,514-Speed 3395.26 samples/sec Loss 3.2654 LearningRate 0.0195 Epoch: 11 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:59:55,540-Speed 3385.23 samples/sec Loss 3.3780 LearningRate 0.0195 Epoch: 11 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 07:59:58,564-Speed 3386.88 samples/sec Loss 3.3786 LearningRate 0.0195 Epoch: 11 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:01,586-Speed 3389.08 samples/sec Loss 3.4400 LearningRate 0.0195 Epoch: 11 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:04,614-Speed 3383.44 samples/sec Loss 3.4781 LearningRate 0.0195 Epoch: 11 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:07,637-Speed 3387.55 samples/sec Loss 3.4296 LearningRate 0.0195 Epoch: 11 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:10,658-Speed 3390.33 samples/sec Loss 3.4116 LearningRate 0.0195 Epoch: 11 Global Step: 63550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:13,680-Speed 3389.71 samples/sec Loss 3.3740 LearningRate 0.0195 Epoch: 11 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:16,705-Speed 3385.77 samples/sec Loss 3.3809 LearningRate 0.0194 Epoch: 11 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:19,722-Speed 3394.88 samples/sec Loss 3.2573 LearningRate 0.0194 Epoch: 11 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:22,742-Speed 3391.42 samples/sec Loss 3.4146 LearningRate 0.0194 Epoch: 11 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:00:25,779-Speed 3372.52 samples/sec Loss 3.4550 LearningRate 0.0194 Epoch: 11 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:00:28,792-Speed 3399.37 samples/sec Loss 3.4038 LearningRate 0.0194 Epoch: 11 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:31,815-Speed 3388.55 samples/sec Loss 3.4442 LearningRate 0.0194 Epoch: 11 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:34,843-Speed 3382.72 samples/sec Loss 3.3287 LearningRate 0.0194 Epoch: 11 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:37,988-Speed 3256.46 samples/sec Loss 3.3781 LearningRate 0.0194 Epoch: 11 Global Step: 63640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:41,017-Speed 3380.60 samples/sec Loss 3.3400 LearningRate 0.0194 Epoch: 11 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:44,039-Speed 3389.70 samples/sec Loss 3.3826 LearningRate 0.0194 Epoch: 11 Global Step: 63660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:47,063-Speed 3387.05 samples/sec Loss 3.4226 LearningRate 0.0194 Epoch: 11 Global Step: 63670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:50,089-Speed 3384.34 samples/sec Loss 3.4221 LearningRate 0.0194 Epoch: 11 Global Step: 63680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:53,112-Speed 3388.68 samples/sec Loss 3.4352 LearningRate 0.0194 Epoch: 11 Global Step: 63690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:56,132-Speed 3392.75 samples/sec Loss 3.4534 LearningRate 0.0193 Epoch: 11 Global Step: 63700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:00:59,159-Speed 3383.71 samples/sec Loss 3.4379 LearningRate 0.0193 Epoch: 11 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:02,189-Speed 3380.38 samples/sec Loss 3.3729 LearningRate 0.0193 Epoch: 11 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:05,208-Speed 3392.17 samples/sec Loss 3.4560 LearningRate 0.0193 Epoch: 11 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:08,229-Speed 3389.70 samples/sec Loss 3.3741 LearningRate 0.0193 Epoch: 11 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:11,251-Speed 3390.12 samples/sec Loss 3.3978 LearningRate 0.0193 Epoch: 11 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:14,281-Speed 3380.31 samples/sec Loss 3.4548 LearningRate 0.0193 Epoch: 11 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:17,302-Speed 3389.98 samples/sec Loss 3.4295 LearningRate 0.0193 Epoch: 11 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:20,321-Speed 3392.45 samples/sec Loss 3.4013 LearningRate 0.0193 Epoch: 11 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:23,348-Speed 3384.54 samples/sec Loss 3.4471 LearningRate 0.0193 Epoch: 11 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:26,374-Speed 3384.26 samples/sec Loss 3.3389 LearningRate 0.0193 Epoch: 11 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:29,395-Speed 3390.77 samples/sec Loss 3.3812 LearningRate 0.0193 Epoch: 11 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:01:32,410-Speed 3396.75 samples/sec Loss 3.3366 LearningRate 0.0193 Epoch: 11 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:01:35,412-Speed 3411.39 samples/sec Loss 3.4842 LearningRate 0.0192 Epoch: 11 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:38,435-Speed 3388.28 samples/sec Loss 3.3165 LearningRate 0.0192 Epoch: 11 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:41,461-Speed 3385.51 samples/sec Loss 3.3325 LearningRate 0.0192 Epoch: 11 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:44,480-Speed 3391.85 samples/sec Loss 3.3726 LearningRate 0.0192 Epoch: 11 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:47,528-Speed 3360.69 samples/sec Loss 3.4189 LearningRate 0.0192 Epoch: 11 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:50,552-Speed 3387.50 samples/sec Loss 3.4534 LearningRate 0.0192 Epoch: 11 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:53,576-Speed 3388.57 samples/sec Loss 3.3889 LearningRate 0.0192 Epoch: 11 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:56,598-Speed 3389.25 samples/sec Loss 3.3781 LearningRate 0.0192 Epoch: 11 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:01:59,624-Speed 3384.72 samples/sec Loss 3.3787 LearningRate 0.0192 Epoch: 11 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:02,651-Speed 3383.61 samples/sec Loss 3.3459 LearningRate 0.0192 Epoch: 11 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:05,676-Speed 3386.41 samples/sec Loss 3.3584 LearningRate 0.0192 Epoch: 11 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:02:08,691-Speed 3396.64 samples/sec Loss 3.5367 LearningRate 0.0192 Epoch: 11 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:11,711-Speed 3391.87 samples/sec Loss 3.2836 LearningRate 0.0192 Epoch: 11 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:14,732-Speed 3389.81 samples/sec Loss 3.3991 LearningRate 0.0191 Epoch: 11 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:17,766-Speed 3376.59 samples/sec Loss 3.4144 LearningRate 0.0191 Epoch: 11 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:20,792-Speed 3384.82 samples/sec Loss 3.4707 LearningRate 0.0191 Epoch: 11 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:23,832-Speed 3369.22 samples/sec Loss 3.4198 LearningRate 0.0191 Epoch: 11 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:02:26,866-Speed 3375.60 samples/sec Loss 3.3848 LearningRate 0.0191 Epoch: 11 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:03:10,168-[lfw][64000]XNorm: 21.939068 Training: 2022-04-27 08:03:10,169-[lfw][64000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-27 08:03:10,169-[lfw][64000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:04:00,531-[cfp_fp][64000]XNorm: 20.491285 Training: 2022-04-27 08:04:00,532-[cfp_fp][64000]Accuracy-Flip: 0.97400+-0.01008 Training: 2022-04-27 08:04:00,532-[cfp_fp][64000]Accuracy-Highest: 0.97400 Training: 2022-04-27 08:04:44,220-[agedb_30][64000]XNorm: 22.025909 Training: 2022-04-27 08:04:44,221-[agedb_30][64000]Accuracy-Flip: 0.97683+-0.00765 Training: 2022-04-27 08:04:44,221-[agedb_30][64000]Accuracy-Highest: 0.97767 Training: 2022-04-27 08:04:47,246-Speed 72.95 samples/sec Loss 3.4210 LearningRate 0.0191 Epoch: 11 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:04:50,251-Speed 3408.32 samples/sec Loss 3.4782 LearningRate 0.0191 Epoch: 11 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:04:53,261-Speed 3403.01 samples/sec Loss 3.4240 LearningRate 0.0191 Epoch: 11 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:04:56,270-Speed 3403.97 samples/sec Loss 3.4292 LearningRate 0.0191 Epoch: 11 Global Step: 64040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:04:59,285-Speed 3397.46 samples/sec Loss 3.5166 LearningRate 0.0191 Epoch: 11 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:02,308-Speed 3387.70 samples/sec Loss 3.4186 LearningRate 0.0191 Epoch: 11 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:05,327-Speed 3392.57 samples/sec Loss 3.5245 LearningRate 0.0191 Epoch: 11 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:08,360-Speed 3377.73 samples/sec Loss 3.3486 LearningRate 0.0191 Epoch: 11 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:11,375-Speed 3396.94 samples/sec Loss 3.4325 LearningRate 0.0190 Epoch: 11 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:14,396-Speed 3390.76 samples/sec Loss 3.3594 LearningRate 0.0190 Epoch: 11 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:17,416-Speed 3391.07 samples/sec Loss 3.4261 LearningRate 0.0190 Epoch: 11 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:20,438-Speed 3389.21 samples/sec Loss 3.5156 LearningRate 0.0190 Epoch: 11 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:23,466-Speed 3382.55 samples/sec Loss 3.4605 LearningRate 0.0190 Epoch: 11 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:26,497-Speed 3379.46 samples/sec Loss 3.4187 LearningRate 0.0190 Epoch: 11 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:29,542-Speed 3364.18 samples/sec Loss 3.5057 LearningRate 0.0190 Epoch: 11 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:05:32,550-Speed 3404.59 samples/sec Loss 3.3230 LearningRate 0.0190 Epoch: 11 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:35,582-Speed 3377.47 samples/sec Loss 3.4600 LearningRate 0.0190 Epoch: 11 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:38,615-Speed 3377.22 samples/sec Loss 3.6355 LearningRate 0.0190 Epoch: 11 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:41,638-Speed 3388.37 samples/sec Loss 3.2765 LearningRate 0.0190 Epoch: 11 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:44,667-Speed 3381.04 samples/sec Loss 3.4356 LearningRate 0.0190 Epoch: 11 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:47,755-Speed 3316.98 samples/sec Loss 3.4831 LearningRate 0.0190 Epoch: 11 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:50,781-Speed 3385.63 samples/sec Loss 3.4699 LearningRate 0.0189 Epoch: 11 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:53,810-Speed 3380.55 samples/sec Loss 3.4709 LearningRate 0.0189 Epoch: 11 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:56,840-Speed 3380.95 samples/sec Loss 3.4988 LearningRate 0.0189 Epoch: 11 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:05:59,877-Speed 3372.84 samples/sec Loss 3.4143 LearningRate 0.0189 Epoch: 11 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:02,909-Speed 3377.00 samples/sec Loss 3.4787 LearningRate 0.0189 Epoch: 11 Global Step: 64260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:06:05,945-Speed 3373.98 samples/sec Loss 3.4285 LearningRate 0.0189 Epoch: 11 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:06:08,961-Speed 3395.79 samples/sec Loss 3.4047 LearningRate 0.0189 Epoch: 11 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:11,998-Speed 3372.46 samples/sec Loss 3.4251 LearningRate 0.0189 Epoch: 11 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:15,024-Speed 3385.85 samples/sec Loss 3.5757 LearningRate 0.0189 Epoch: 11 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:18,051-Speed 3383.70 samples/sec Loss 3.3585 LearningRate 0.0189 Epoch: 11 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:21,071-Speed 3391.10 samples/sec Loss 3.5725 LearningRate 0.0189 Epoch: 11 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:24,093-Speed 3389.53 samples/sec Loss 3.5105 LearningRate 0.0189 Epoch: 11 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:27,122-Speed 3380.52 samples/sec Loss 3.4318 LearningRate 0.0189 Epoch: 11 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:30,139-Speed 3395.32 samples/sec Loss 3.3756 LearningRate 0.0188 Epoch: 11 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:33,165-Speed 3385.12 samples/sec Loss 3.4451 LearningRate 0.0188 Epoch: 11 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:36,219-Speed 3353.15 samples/sec Loss 3.3952 LearningRate 0.0188 Epoch: 11 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:39,257-Speed 3371.62 samples/sec Loss 3.4092 LearningRate 0.0188 Epoch: 11 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:06:42,258-Speed 3412.83 samples/sec Loss 3.3987 LearningRate 0.0188 Epoch: 11 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:45,272-Speed 3399.48 samples/sec Loss 3.4100 LearningRate 0.0188 Epoch: 11 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:48,290-Speed 3392.73 samples/sec Loss 3.5642 LearningRate 0.0188 Epoch: 11 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:51,311-Speed 3390.53 samples/sec Loss 3.4788 LearningRate 0.0188 Epoch: 11 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:54,336-Speed 3385.55 samples/sec Loss 3.3885 LearningRate 0.0188 Epoch: 11 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:06:57,359-Speed 3389.07 samples/sec Loss 3.3967 LearningRate 0.0188 Epoch: 11 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:00,383-Speed 3386.03 samples/sec Loss 3.4664 LearningRate 0.0188 Epoch: 11 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:03,446-Speed 3344.10 samples/sec Loss 3.4631 LearningRate 0.0188 Epoch: 11 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:06,461-Speed 3397.64 samples/sec Loss 3.4686 LearningRate 0.0188 Epoch: 11 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:09,487-Speed 3384.99 samples/sec Loss 3.5016 LearningRate 0.0187 Epoch: 11 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:12,481-Speed 3420.73 samples/sec Loss 3.5109 LearningRate 0.0187 Epoch: 11 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:15,499-Speed 3393.34 samples/sec Loss 3.4019 LearningRate 0.0187 Epoch: 11 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:18,517-Speed 3393.79 samples/sec Loss 3.5207 LearningRate 0.0187 Epoch: 11 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:21,526-Speed 3404.61 samples/sec Loss 3.4331 LearningRate 0.0187 Epoch: 11 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:24,546-Speed 3390.79 samples/sec Loss 3.5217 LearningRate 0.0187 Epoch: 11 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:27,566-Speed 3392.21 samples/sec Loss 3.3339 LearningRate 0.0187 Epoch: 11 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:30,585-Speed 3392.05 samples/sec Loss 3.5686 LearningRate 0.0187 Epoch: 11 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:33,596-Speed 3402.37 samples/sec Loss 3.4936 LearningRate 0.0187 Epoch: 11 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:36,634-Speed 3370.73 samples/sec Loss 3.4961 LearningRate 0.0187 Epoch: 11 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:39,647-Speed 3400.15 samples/sec Loss 3.4380 LearningRate 0.0187 Epoch: 11 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:42,673-Speed 3384.33 samples/sec Loss 3.4982 LearningRate 0.0187 Epoch: 11 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:07:45,687-Speed 3398.46 samples/sec Loss 3.4427 LearningRate 0.0187 Epoch: 11 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:07:48,684-Speed 3418.29 samples/sec Loss 3.4104 LearningRate 0.0186 Epoch: 11 Global Step: 64610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:51,699-Speed 3396.75 samples/sec Loss 3.4377 LearningRate 0.0186 Epoch: 11 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:54,717-Speed 3393.96 samples/sec Loss 3.5225 LearningRate 0.0186 Epoch: 11 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:07:57,741-Speed 3386.57 samples/sec Loss 3.4617 LearningRate 0.0186 Epoch: 11 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:00,754-Speed 3399.65 samples/sec Loss 3.4230 LearningRate 0.0186 Epoch: 11 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:03,770-Speed 3396.57 samples/sec Loss 3.3660 LearningRate 0.0186 Epoch: 11 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:06,783-Speed 3399.00 samples/sec Loss 3.3653 LearningRate 0.0186 Epoch: 11 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:09,801-Speed 3394.12 samples/sec Loss 3.5638 LearningRate 0.0186 Epoch: 11 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:12,828-Speed 3383.45 samples/sec Loss 3.4554 LearningRate 0.0186 Epoch: 11 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:15,877-Speed 3358.97 samples/sec Loss 3.4276 LearningRate 0.0186 Epoch: 11 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:18,897-Speed 3391.20 samples/sec Loss 3.5188 LearningRate 0.0186 Epoch: 11 Global Step: 64710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:08:21,910-Speed 3399.42 samples/sec Loss 3.4045 LearningRate 0.0186 Epoch: 11 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:08:24,929-Speed 3392.43 samples/sec Loss 3.4417 LearningRate 0.0186 Epoch: 11 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:08:28,003-Speed 3332.11 samples/sec Loss 3.3845 LearningRate 0.0186 Epoch: 11 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:08:31,017-Speed 3398.56 samples/sec Loss 3.3812 LearningRate 0.0185 Epoch: 11 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:08:34,013-Speed 3419.02 samples/sec Loss 3.4616 LearningRate 0.0185 Epoch: 11 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:37,026-Speed 3399.14 samples/sec Loss 3.4914 LearningRate 0.0185 Epoch: 11 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:40,049-Speed 3387.58 samples/sec Loss 3.3656 LearningRate 0.0185 Epoch: 11 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:43,063-Speed 3398.89 samples/sec Loss 3.4648 LearningRate 0.0185 Epoch: 11 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:46,091-Speed 3382.67 samples/sec Loss 3.4156 LearningRate 0.0185 Epoch: 11 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:49,120-Speed 3381.15 samples/sec Loss 3.4891 LearningRate 0.0185 Epoch: 11 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:52,138-Speed 3393.50 samples/sec Loss 3.4752 LearningRate 0.0185 Epoch: 11 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:08:55,142-Speed 3410.04 samples/sec Loss 3.5096 LearningRate 0.0185 Epoch: 11 Global Step: 64830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:08:58,162-Speed 3391.60 samples/sec Loss 3.4554 LearningRate 0.0185 Epoch: 11 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:01,187-Speed 3386.61 samples/sec Loss 3.4821 LearningRate 0.0185 Epoch: 11 Global Step: 64850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:04,223-Speed 3373.45 samples/sec Loss 3.5652 LearningRate 0.0185 Epoch: 11 Global Step: 64860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:07,242-Speed 3391.92 samples/sec Loss 3.3746 LearningRate 0.0185 Epoch: 11 Global Step: 64870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:10,261-Speed 3393.17 samples/sec Loss 3.4499 LearningRate 0.0184 Epoch: 11 Global Step: 64880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:13,286-Speed 3385.08 samples/sec Loss 3.4800 LearningRate 0.0184 Epoch: 11 Global Step: 64890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:16,309-Speed 3388.64 samples/sec Loss 3.5469 LearningRate 0.0184 Epoch: 11 Global Step: 64900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:19,336-Speed 3384.26 samples/sec Loss 3.4200 LearningRate 0.0184 Epoch: 11 Global Step: 64910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:22,404-Speed 3337.78 samples/sec Loss 3.3706 LearningRate 0.0184 Epoch: 11 Global Step: 64920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:09:25,602-Speed 3202.91 samples/sec Loss 3.5145 LearningRate 0.0184 Epoch: 11 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:28,670-Speed 3338.44 samples/sec Loss 3.4667 LearningRate 0.0184 Epoch: 11 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:31,684-Speed 3398.29 samples/sec Loss 3.4585 LearningRate 0.0184 Epoch: 11 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:34,716-Speed 3378.25 samples/sec Loss 3.4687 LearningRate 0.0184 Epoch: 11 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:37,752-Speed 3373.10 samples/sec Loss 3.3989 LearningRate 0.0184 Epoch: 11 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:40,769-Speed 3395.02 samples/sec Loss 3.4532 LearningRate 0.0184 Epoch: 11 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:43,791-Speed 3389.79 samples/sec Loss 3.3398 LearningRate 0.0184 Epoch: 11 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:46,811-Speed 3390.65 samples/sec Loss 3.4506 LearningRate 0.0184 Epoch: 11 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:49,871-Speed 3348.25 samples/sec Loss 3.4219 LearningRate 0.0183 Epoch: 11 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:52,888-Speed 3394.69 samples/sec Loss 3.4908 LearningRate 0.0183 Epoch: 11 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:09:55,907-Speed 3392.88 samples/sec Loss 3.4340 LearningRate 0.0183 Epoch: 11 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:09:58,936-Speed 3381.23 samples/sec Loss 3.5767 LearningRate 0.0183 Epoch: 11 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:01,965-Speed 3381.84 samples/sec Loss 3.3924 LearningRate 0.0183 Epoch: 11 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:04,981-Speed 3394.99 samples/sec Loss 3.3737 LearningRate 0.0183 Epoch: 11 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:07,996-Speed 3397.86 samples/sec Loss 3.4644 LearningRate 0.0183 Epoch: 11 Global Step: 65070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:11,027-Speed 3378.38 samples/sec Loss 3.4166 LearningRate 0.0183 Epoch: 11 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:14,043-Speed 3397.30 samples/sec Loss 3.3208 LearningRate 0.0183 Epoch: 11 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:10:17,043-Speed 3414.16 samples/sec Loss 3.5615 LearningRate 0.0183 Epoch: 11 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:20,070-Speed 3383.76 samples/sec Loss 3.5209 LearningRate 0.0183 Epoch: 11 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:23,108-Speed 3371.04 samples/sec Loss 3.4330 LearningRate 0.0183 Epoch: 11 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:26,128-Speed 3391.86 samples/sec Loss 3.4963 LearningRate 0.0183 Epoch: 11 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:29,152-Speed 3386.48 samples/sec Loss 3.3363 LearningRate 0.0182 Epoch: 11 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:32,169-Speed 3394.96 samples/sec Loss 3.3569 LearningRate 0.0182 Epoch: 11 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:35,186-Speed 3395.25 samples/sec Loss 3.4424 LearningRate 0.0182 Epoch: 11 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:38,216-Speed 3380.15 samples/sec Loss 3.3864 LearningRate 0.0182 Epoch: 11 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:41,246-Speed 3379.70 samples/sec Loss 3.5329 LearningRate 0.0182 Epoch: 11 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:44,262-Speed 3396.92 samples/sec Loss 3.5377 LearningRate 0.0182 Epoch: 11 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:47,267-Speed 3408.01 samples/sec Loss 3.4704 LearningRate 0.0182 Epoch: 11 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:50,284-Speed 3394.89 samples/sec Loss 3.4394 LearningRate 0.0182 Epoch: 11 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:53,307-Speed 3388.25 samples/sec Loss 3.3760 LearningRate 0.0182 Epoch: 11 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:56,333-Speed 3385.36 samples/sec Loss 3.5616 LearningRate 0.0182 Epoch: 11 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:10:59,395-Speed 3344.14 samples/sec Loss 3.5576 LearningRate 0.0182 Epoch: 11 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:02,446-Speed 3357.12 samples/sec Loss 3.4528 LearningRate 0.0182 Epoch: 11 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:05,472-Speed 3385.24 samples/sec Loss 3.5467 LearningRate 0.0182 Epoch: 11 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:08,519-Speed 3360.87 samples/sec Loss 3.3904 LearningRate 0.0182 Epoch: 11 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:11,538-Speed 3393.08 samples/sec Loss 3.4772 LearningRate 0.0181 Epoch: 11 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:14,589-Speed 3356.91 samples/sec Loss 3.3551 LearningRate 0.0181 Epoch: 11 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:17,609-Speed 3391.32 samples/sec Loss 3.5364 LearningRate 0.0181 Epoch: 11 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:11:20,632-Speed 3388.30 samples/sec Loss 3.4532 LearningRate 0.0181 Epoch: 11 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:11:23,676-Speed 3364.80 samples/sec Loss 3.4342 LearningRate 0.0181 Epoch: 11 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:26,818-Speed 3260.01 samples/sec Loss 3.4332 LearningRate 0.0181 Epoch: 11 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:29,852-Speed 3375.51 samples/sec Loss 3.4100 LearningRate 0.0181 Epoch: 11 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:32,876-Speed 3387.13 samples/sec Loss 3.4764 LearningRate 0.0181 Epoch: 11 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:35,892-Speed 3397.86 samples/sec Loss 3.3992 LearningRate 0.0181 Epoch: 11 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:38,928-Speed 3372.68 samples/sec Loss 3.5752 LearningRate 0.0181 Epoch: 11 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:41,947-Speed 3393.36 samples/sec Loss 3.3902 LearningRate 0.0181 Epoch: 11 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:44,978-Speed 3379.59 samples/sec Loss 3.5452 LearningRate 0.0181 Epoch: 11 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:48,000-Speed 3388.85 samples/sec Loss 3.3700 LearningRate 0.0181 Epoch: 11 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:51,031-Speed 3379.46 samples/sec Loss 3.4515 LearningRate 0.0180 Epoch: 11 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:11:54,056-Speed 3386.19 samples/sec Loss 3.4369 LearningRate 0.0180 Epoch: 11 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:11:57,082-Speed 3384.07 samples/sec Loss 3.3743 LearningRate 0.0180 Epoch: 11 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:12:00,102-Speed 3391.53 samples/sec Loss 3.3900 LearningRate 0.0180 Epoch: 11 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:12:03,108-Speed 3406.89 samples/sec Loss 3.5109 LearningRate 0.0180 Epoch: 11 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:06,130-Speed 3389.62 samples/sec Loss 3.4523 LearningRate 0.0180 Epoch: 11 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:09,148-Speed 3393.82 samples/sec Loss 3.4967 LearningRate 0.0180 Epoch: 11 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:12,168-Speed 3392.38 samples/sec Loss 3.4280 LearningRate 0.0180 Epoch: 11 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:15,204-Speed 3372.60 samples/sec Loss 3.3916 LearningRate 0.0180 Epoch: 11 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:18,233-Speed 3381.30 samples/sec Loss 3.3491 LearningRate 0.0180 Epoch: 11 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:21,251-Speed 3393.76 samples/sec Loss 3.4870 LearningRate 0.0180 Epoch: 11 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:24,287-Speed 3374.46 samples/sec Loss 3.4487 LearningRate 0.0180 Epoch: 11 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:27,309-Speed 3388.52 samples/sec Loss 3.4025 LearningRate 0.0180 Epoch: 11 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:30,328-Speed 3393.34 samples/sec Loss 3.5256 LearningRate 0.0179 Epoch: 11 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:33,352-Speed 3387.23 samples/sec Loss 3.5010 LearningRate 0.0179 Epoch: 11 Global Step: 65550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:12:36,378-Speed 3385.00 samples/sec Loss 3.4879 LearningRate 0.0179 Epoch: 11 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:12:39,390-Speed 3401.19 samples/sec Loss 3.4180 LearningRate 0.0179 Epoch: 11 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:42,411-Speed 3390.50 samples/sec Loss 3.4313 LearningRate 0.0179 Epoch: 11 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:45,433-Speed 3388.84 samples/sec Loss 3.2947 LearningRate 0.0179 Epoch: 11 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:48,457-Speed 3386.86 samples/sec Loss 3.3678 LearningRate 0.0179 Epoch: 11 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:51,481-Speed 3387.42 samples/sec Loss 3.4665 LearningRate 0.0179 Epoch: 11 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:12:54,490-Speed 3403.50 samples/sec Loss 3.4612 LearningRate 0.0179 Epoch: 11 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:12:57,513-Speed 3388.36 samples/sec Loss 3.4294 LearningRate 0.0179 Epoch: 11 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:00,536-Speed 3388.66 samples/sec Loss 3.4425 LearningRate 0.0179 Epoch: 11 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:03,571-Speed 3374.08 samples/sec Loss 3.4962 LearningRate 0.0179 Epoch: 11 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:06,611-Speed 3369.65 samples/sec Loss 3.3940 LearningRate 0.0179 Epoch: 11 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:09,630-Speed 3392.55 samples/sec Loss 3.5260 LearningRate 0.0179 Epoch: 11 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:12,672-Speed 3367.60 samples/sec Loss 3.4316 LearningRate 0.0178 Epoch: 11 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:15,699-Speed 3382.86 samples/sec Loss 3.3968 LearningRate 0.0178 Epoch: 11 Global Step: 65690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:18,720-Speed 3390.78 samples/sec Loss 3.5084 LearningRate 0.0178 Epoch: 11 Global Step: 65700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:21,741-Speed 3390.44 samples/sec Loss 3.4902 LearningRate 0.0178 Epoch: 11 Global Step: 65710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:13:24,763-Speed 3389.83 samples/sec Loss 3.3969 LearningRate 0.0178 Epoch: 11 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:27,786-Speed 3387.74 samples/sec Loss 3.3948 LearningRate 0.0178 Epoch: 11 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:30,808-Speed 3389.12 samples/sec Loss 3.4558 LearningRate 0.0178 Epoch: 11 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:33,834-Speed 3384.79 samples/sec Loss 3.4086 LearningRate 0.0178 Epoch: 11 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:36,862-Speed 3382.85 samples/sec Loss 3.3664 LearningRate 0.0178 Epoch: 11 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:39,883-Speed 3389.78 samples/sec Loss 3.4947 LearningRate 0.0178 Epoch: 11 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:42,905-Speed 3390.03 samples/sec Loss 3.5046 LearningRate 0.0178 Epoch: 11 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:45,931-Speed 3384.36 samples/sec Loss 3.5199 LearningRate 0.0178 Epoch: 11 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:48,956-Speed 3386.39 samples/sec Loss 3.4377 LearningRate 0.0178 Epoch: 11 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:51,982-Speed 3384.93 samples/sec Loss 3.3442 LearningRate 0.0177 Epoch: 11 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:13:55,014-Speed 3377.72 samples/sec Loss 3.4356 LearningRate 0.0177 Epoch: 11 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:13:58,036-Speed 3389.53 samples/sec Loss 3.4360 LearningRate 0.0177 Epoch: 11 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:01,061-Speed 3387.72 samples/sec Loss 3.4667 LearningRate 0.0177 Epoch: 11 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:04,084-Speed 3388.28 samples/sec Loss 3.5611 LearningRate 0.0177 Epoch: 11 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:07,109-Speed 3385.83 samples/sec Loss 3.5120 LearningRate 0.0177 Epoch: 11 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:10,113-Speed 3409.61 samples/sec Loss 3.4025 LearningRate 0.0177 Epoch: 11 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:13,144-Speed 3379.94 samples/sec Loss 3.2493 LearningRate 0.0177 Epoch: 11 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:16,167-Speed 3388.35 samples/sec Loss 3.5670 LearningRate 0.0177 Epoch: 11 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:19,186-Speed 3392.36 samples/sec Loss 3.4389 LearningRate 0.0177 Epoch: 11 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:22,208-Speed 3389.81 samples/sec Loss 3.3970 LearningRate 0.0177 Epoch: 11 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:25,239-Speed 3378.82 samples/sec Loss 3.4251 LearningRate 0.0177 Epoch: 11 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:28,269-Speed 3379.75 samples/sec Loss 3.2980 LearningRate 0.0177 Epoch: 11 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:31,290-Speed 3390.81 samples/sec Loss 3.4584 LearningRate 0.0177 Epoch: 11 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:34,311-Speed 3390.87 samples/sec Loss 3.4434 LearningRate 0.0176 Epoch: 11 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:37,334-Speed 3388.00 samples/sec Loss 3.3593 LearningRate 0.0176 Epoch: 11 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:14:40,359-Speed 3386.17 samples/sec Loss 3.4714 LearningRate 0.0176 Epoch: 11 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:43,379-Speed 3391.52 samples/sec Loss 3.3870 LearningRate 0.0176 Epoch: 11 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:46,402-Speed 3387.85 samples/sec Loss 3.4181 LearningRate 0.0176 Epoch: 11 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:14:49,424-Speed 3389.65 samples/sec Loss 3.3979 LearningRate 0.0176 Epoch: 11 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:15:32,974-[lfw][66000]XNorm: 23.685091 Training: 2022-04-27 08:15:32,975-[lfw][66000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-27 08:15:32,975-[lfw][66000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:16:23,612-[cfp_fp][66000]XNorm: 21.866582 Training: 2022-04-27 08:16:23,612-[cfp_fp][66000]Accuracy-Flip: 0.97529+-0.00676 Training: 2022-04-27 08:16:23,613-[cfp_fp][66000]Accuracy-Highest: 0.97529 Training: 2022-04-27 08:17:07,162-[agedb_30][66000]XNorm: 23.529873 Training: 2022-04-27 08:17:07,162-[agedb_30][66000]Accuracy-Flip: 0.97883+-0.00637 Training: 2022-04-27 08:17:07,163-[agedb_30][66000]Accuracy-Highest: 0.97883 Training: 2022-04-27 08:17:10,179-Speed 72.75 samples/sec Loss 3.3346 LearningRate 0.0176 Epoch: 11 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:17:13,183-Speed 3409.43 samples/sec Loss 3.3210 LearningRate 0.0176 Epoch: 11 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:16,194-Speed 3402.06 samples/sec Loss 3.4365 LearningRate 0.0176 Epoch: 11 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:19,203-Speed 3403.96 samples/sec Loss 3.4645 LearningRate 0.0176 Epoch: 11 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:22,247-Speed 3364.51 samples/sec Loss 3.4769 LearningRate 0.0176 Epoch: 11 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:25,266-Speed 3392.84 samples/sec Loss 3.3127 LearningRate 0.0176 Epoch: 11 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:28,286-Speed 3391.35 samples/sec Loss 3.3632 LearningRate 0.0176 Epoch: 11 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:31,300-Speed 3398.24 samples/sec Loss 3.3079 LearningRate 0.0175 Epoch: 11 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:34,321-Speed 3390.28 samples/sec Loss 3.4729 LearningRate 0.0175 Epoch: 11 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:37,344-Speed 3388.51 samples/sec Loss 3.3793 LearningRate 0.0175 Epoch: 11 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:40,362-Speed 3394.24 samples/sec Loss 3.4864 LearningRate 0.0175 Epoch: 11 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:43,378-Speed 3394.87 samples/sec Loss 3.4623 LearningRate 0.0175 Epoch: 11 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:17:46,400-Speed 3389.38 samples/sec Loss 3.4698 LearningRate 0.0175 Epoch: 11 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:17:49,408-Speed 3405.57 samples/sec Loss 3.4513 LearningRate 0.0175 Epoch: 11 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:52,430-Speed 3389.29 samples/sec Loss 3.3297 LearningRate 0.0175 Epoch: 11 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:55,449-Speed 3392.49 samples/sec Loss 3.4600 LearningRate 0.0175 Epoch: 11 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:17:58,466-Speed 3394.94 samples/sec Loss 3.5585 LearningRate 0.0175 Epoch: 11 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:01,480-Speed 3398.13 samples/sec Loss 3.4961 LearningRate 0.0175 Epoch: 11 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:04,496-Speed 3396.30 samples/sec Loss 3.3245 LearningRate 0.0175 Epoch: 11 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:07,519-Speed 3388.05 samples/sec Loss 3.4449 LearningRate 0.0175 Epoch: 11 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:10,545-Speed 3384.77 samples/sec Loss 3.4612 LearningRate 0.0175 Epoch: 11 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:13,579-Speed 3375.88 samples/sec Loss 3.4349 LearningRate 0.0174 Epoch: 11 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:16,597-Speed 3393.56 samples/sec Loss 3.3983 LearningRate 0.0174 Epoch: 11 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:18:19,617-Speed 3391.48 samples/sec Loss 3.4708 LearningRate 0.0174 Epoch: 11 Global Step: 66240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:22,636-Speed 3393.35 samples/sec Loss 3.2947 LearningRate 0.0174 Epoch: 11 Global Step: 66250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:25,655-Speed 3392.31 samples/sec Loss 3.4058 LearningRate 0.0174 Epoch: 11 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:28,674-Speed 3392.77 samples/sec Loss 3.4960 LearningRate 0.0174 Epoch: 11 Global Step: 66270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:31,705-Speed 3379.18 samples/sec Loss 3.2654 LearningRate 0.0174 Epoch: 11 Global Step: 66280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:34,734-Speed 3381.24 samples/sec Loss 3.4727 LearningRate 0.0174 Epoch: 11 Global Step: 66290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:37,753-Speed 3393.24 samples/sec Loss 3.3767 LearningRate 0.0174 Epoch: 11 Global Step: 66300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:40,774-Speed 3390.02 samples/sec Loss 3.3324 LearningRate 0.0174 Epoch: 11 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:43,798-Speed 3386.49 samples/sec Loss 3.4343 LearningRate 0.0174 Epoch: 11 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:46,815-Speed 3395.37 samples/sec Loss 3.3663 LearningRate 0.0174 Epoch: 11 Global Step: 66330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:49,815-Speed 3413.82 samples/sec Loss 3.4367 LearningRate 0.0174 Epoch: 11 Global Step: 66340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:52,838-Speed 3388.25 samples/sec Loss 3.3043 LearningRate 0.0174 Epoch: 11 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:55,857-Speed 3393.07 samples/sec Loss 3.3487 LearningRate 0.0173 Epoch: 11 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:18:58,873-Speed 3395.20 samples/sec Loss 3.3750 LearningRate 0.0173 Epoch: 11 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:01,894-Speed 3391.25 samples/sec Loss 3.4233 LearningRate 0.0173 Epoch: 11 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:04,914-Speed 3390.77 samples/sec Loss 3.4509 LearningRate 0.0173 Epoch: 11 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:07,935-Speed 3390.93 samples/sec Loss 3.3017 LearningRate 0.0173 Epoch: 11 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:10,963-Speed 3382.52 samples/sec Loss 3.3640 LearningRate 0.0173 Epoch: 11 Global Step: 66410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:13,984-Speed 3390.48 samples/sec Loss 3.4920 LearningRate 0.0173 Epoch: 11 Global Step: 66420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:17,000-Speed 3395.46 samples/sec Loss 3.3615 LearningRate 0.0173 Epoch: 11 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:19:20,017-Speed 3395.84 samples/sec Loss 3.3945 LearningRate 0.0173 Epoch: 11 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:23,035-Speed 3392.83 samples/sec Loss 3.5175 LearningRate 0.0173 Epoch: 11 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:26,054-Speed 3392.53 samples/sec Loss 3.4143 LearningRate 0.0173 Epoch: 11 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:29,072-Speed 3394.12 samples/sec Loss 3.4368 LearningRate 0.0173 Epoch: 11 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:32,089-Speed 3395.46 samples/sec Loss 3.3322 LearningRate 0.0173 Epoch: 11 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:35,118-Speed 3381.06 samples/sec Loss 3.3719 LearningRate 0.0172 Epoch: 11 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:38,129-Speed 3401.45 samples/sec Loss 3.4837 LearningRate 0.0172 Epoch: 11 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:41,147-Speed 3394.29 samples/sec Loss 3.5140 LearningRate 0.0172 Epoch: 11 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:44,158-Speed 3400.86 samples/sec Loss 3.4506 LearningRate 0.0172 Epoch: 11 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:47,234-Speed 3330.42 samples/sec Loss 3.3794 LearningRate 0.0172 Epoch: 11 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:50,245-Speed 3402.03 samples/sec Loss 3.3138 LearningRate 0.0172 Epoch: 11 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:53,257-Speed 3400.26 samples/sec Loss 3.4003 LearningRate 0.0172 Epoch: 11 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:56,272-Speed 3397.13 samples/sec Loss 3.2916 LearningRate 0.0172 Epoch: 11 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:19:59,289-Speed 3394.58 samples/sec Loss 3.3176 LearningRate 0.0172 Epoch: 11 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:02,312-Speed 3387.87 samples/sec Loss 3.3654 LearningRate 0.0172 Epoch: 11 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:05,328-Speed 3396.34 samples/sec Loss 3.3565 LearningRate 0.0172 Epoch: 11 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:08,346-Speed 3393.35 samples/sec Loss 3.4398 LearningRate 0.0172 Epoch: 11 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:11,362-Speed 3396.12 samples/sec Loss 3.4847 LearningRate 0.0172 Epoch: 11 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:14,444-Speed 3323.66 samples/sec Loss 3.4349 LearningRate 0.0172 Epoch: 11 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:17,464-Speed 3391.41 samples/sec Loss 3.2691 LearningRate 0.0171 Epoch: 11 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:20,478-Speed 3397.89 samples/sec Loss 3.3479 LearningRate 0.0171 Epoch: 11 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:20:23,498-Speed 3391.42 samples/sec Loss 3.3211 LearningRate 0.0171 Epoch: 11 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:20:26,520-Speed 3390.02 samples/sec Loss 3.3832 LearningRate 0.0171 Epoch: 11 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:29,545-Speed 3385.46 samples/sec Loss 3.4093 LearningRate 0.0171 Epoch: 11 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:32,563-Speed 3394.39 samples/sec Loss 3.3747 LearningRate 0.0171 Epoch: 11 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:35,587-Speed 3386.46 samples/sec Loss 3.3384 LearningRate 0.0171 Epoch: 11 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:38,610-Speed 3388.24 samples/sec Loss 3.3380 LearningRate 0.0171 Epoch: 11 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:41,628-Speed 3393.99 samples/sec Loss 3.3985 LearningRate 0.0171 Epoch: 11 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:44,650-Speed 3389.49 samples/sec Loss 3.3848 LearningRate 0.0171 Epoch: 11 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:47,672-Speed 3389.23 samples/sec Loss 3.3498 LearningRate 0.0171 Epoch: 11 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:50,694-Speed 3389.27 samples/sec Loss 3.3804 LearningRate 0.0171 Epoch: 11 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:53,714-Speed 3391.41 samples/sec Loss 3.5045 LearningRate 0.0171 Epoch: 11 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:20:56,731-Speed 3394.43 samples/sec Loss 3.3818 LearningRate 0.0171 Epoch: 11 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:20:59,771-Speed 3369.70 samples/sec Loss 3.4732 LearningRate 0.0170 Epoch: 11 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:21:02,802-Speed 3379.38 samples/sec Loss 3.4360 LearningRate 0.0170 Epoch: 11 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:21:05,823-Speed 3390.28 samples/sec Loss 3.3163 LearningRate 0.0170 Epoch: 11 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:21:08,833-Speed 3402.76 samples/sec Loss 3.4476 LearningRate 0.0170 Epoch: 11 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:11,859-Speed 3384.64 samples/sec Loss 3.4322 LearningRate 0.0170 Epoch: 11 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:14,880-Speed 3390.57 samples/sec Loss 3.3209 LearningRate 0.0170 Epoch: 11 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:17,906-Speed 3384.42 samples/sec Loss 3.4116 LearningRate 0.0170 Epoch: 11 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:20,924-Speed 3394.03 samples/sec Loss 3.4411 LearningRate 0.0170 Epoch: 11 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:23,946-Speed 3388.61 samples/sec Loss 3.4672 LearningRate 0.0170 Epoch: 11 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:26,983-Speed 3372.84 samples/sec Loss 3.3101 LearningRate 0.0170 Epoch: 11 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:30,004-Speed 3391.26 samples/sec Loss 3.3071 LearningRate 0.0170 Epoch: 11 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:33,033-Speed 3381.37 samples/sec Loss 3.3916 LearningRate 0.0170 Epoch: 11 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:36,054-Speed 3389.84 samples/sec Loss 3.3933 LearningRate 0.0170 Epoch: 11 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:39,081-Speed 3384.22 samples/sec Loss 3.3400 LearningRate 0.0170 Epoch: 11 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:21:42,082-Speed 3412.78 samples/sec Loss 3.3826 LearningRate 0.0169 Epoch: 11 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:45,102-Speed 3390.97 samples/sec Loss 3.3333 LearningRate 0.0169 Epoch: 11 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:48,134-Speed 3378.42 samples/sec Loss 3.3568 LearningRate 0.0169 Epoch: 11 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:51,169-Speed 3374.82 samples/sec Loss 3.3193 LearningRate 0.0169 Epoch: 11 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:54,192-Speed 3387.51 samples/sec Loss 3.4486 LearningRate 0.0169 Epoch: 11 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:21:57,212-Speed 3392.50 samples/sec Loss 3.3912 LearningRate 0.0169 Epoch: 11 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:00,233-Speed 3390.10 samples/sec Loss 3.3876 LearningRate 0.0169 Epoch: 11 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:03,257-Speed 3387.92 samples/sec Loss 3.4256 LearningRate 0.0169 Epoch: 11 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:06,280-Speed 3387.68 samples/sec Loss 3.3882 LearningRate 0.0169 Epoch: 11 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:09,307-Speed 3383.12 samples/sec Loss 3.3222 LearningRate 0.0169 Epoch: 11 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:12,307-Speed 3414.46 samples/sec Loss 3.3743 LearningRate 0.0169 Epoch: 11 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:15,333-Speed 3384.44 samples/sec Loss 3.4086 LearningRate 0.0169 Epoch: 11 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:18,353-Speed 3391.97 samples/sec Loss 3.3278 LearningRate 0.0169 Epoch: 11 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:21,379-Speed 3385.04 samples/sec Loss 3.5178 LearningRate 0.0168 Epoch: 11 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:24,409-Speed 3379.67 samples/sec Loss 3.3449 LearningRate 0.0168 Epoch: 11 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:27,429-Speed 3392.48 samples/sec Loss 3.3462 LearningRate 0.0168 Epoch: 11 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:30,530-Speed 3303.16 samples/sec Loss 3.4362 LearningRate 0.0168 Epoch: 11 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:33,556-Speed 3384.10 samples/sec Loss 3.4962 LearningRate 0.0168 Epoch: 11 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:36,585-Speed 3382.07 samples/sec Loss 3.3801 LearningRate 0.0168 Epoch: 11 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:39,661-Speed 3329.10 samples/sec Loss 3.4329 LearningRate 0.0168 Epoch: 11 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:22:42,689-Speed 3383.08 samples/sec Loss 3.4756 LearningRate 0.0168 Epoch: 11 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:45,712-Speed 3387.85 samples/sec Loss 3.3394 LearningRate 0.0168 Epoch: 11 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:48,740-Speed 3383.41 samples/sec Loss 3.3270 LearningRate 0.0168 Epoch: 11 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:51,761-Speed 3389.90 samples/sec Loss 3.3908 LearningRate 0.0168 Epoch: 11 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:54,790-Speed 3382.00 samples/sec Loss 3.2892 LearningRate 0.0168 Epoch: 11 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:22:57,830-Speed 3368.42 samples/sec Loss 3.3477 LearningRate 0.0168 Epoch: 11 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:00,854-Speed 3387.00 samples/sec Loss 3.3742 LearningRate 0.0168 Epoch: 11 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:03,886-Speed 3378.44 samples/sec Loss 3.3982 LearningRate 0.0167 Epoch: 11 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:06,906-Speed 3392.09 samples/sec Loss 3.3198 LearningRate 0.0167 Epoch: 11 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:09,934-Speed 3381.50 samples/sec Loss 3.3959 LearningRate 0.0167 Epoch: 11 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:12,961-Speed 3384.59 samples/sec Loss 3.4501 LearningRate 0.0167 Epoch: 11 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:23:15,981-Speed 3391.29 samples/sec Loss 3.2135 LearningRate 0.0167 Epoch: 11 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:23:19,009-Speed 3382.33 samples/sec Loss 3.3142 LearningRate 0.0167 Epoch: 11 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:23:22,024-Speed 3397.54 samples/sec Loss 3.4198 LearningRate 0.0167 Epoch: 11 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:25,056-Speed 3378.18 samples/sec Loss 3.5283 LearningRate 0.0167 Epoch: 11 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:28,132-Speed 3329.13 samples/sec Loss 3.3344 LearningRate 0.0167 Epoch: 11 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:31,167-Speed 3375.41 samples/sec Loss 3.5074 LearningRate 0.0167 Epoch: 11 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:34,192-Speed 3385.90 samples/sec Loss 3.3564 LearningRate 0.0167 Epoch: 11 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:37,230-Speed 3371.57 samples/sec Loss 3.4072 LearningRate 0.0167 Epoch: 11 Global Step: 67290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:40,253-Speed 3387.66 samples/sec Loss 3.2399 LearningRate 0.0167 Epoch: 11 Global Step: 67300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:43,273-Speed 3391.28 samples/sec Loss 3.3224 LearningRate 0.0167 Epoch: 11 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:46,298-Speed 3386.85 samples/sec Loss 3.4250 LearningRate 0.0166 Epoch: 11 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:49,348-Speed 3357.69 samples/sec Loss 3.3594 LearningRate 0.0166 Epoch: 11 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:52,374-Speed 3385.02 samples/sec Loss 3.4164 LearningRate 0.0166 Epoch: 11 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:23:55,381-Speed 3406.04 samples/sec Loss 3.3787 LearningRate 0.0166 Epoch: 11 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:23:58,434-Speed 3355.52 samples/sec Loss 3.3751 LearningRate 0.0166 Epoch: 11 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:01,464-Speed 3380.29 samples/sec Loss 3.4173 LearningRate 0.0166 Epoch: 11 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:04,487-Speed 3387.29 samples/sec Loss 3.3658 LearningRate 0.0166 Epoch: 11 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:07,520-Speed 3377.57 samples/sec Loss 3.3979 LearningRate 0.0166 Epoch: 11 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:10,562-Speed 3367.06 samples/sec Loss 3.3110 LearningRate 0.0166 Epoch: 11 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:13,586-Speed 3386.93 samples/sec Loss 3.2086 LearningRate 0.0166 Epoch: 11 Global Step: 67410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:16,612-Speed 3384.70 samples/sec Loss 3.2803 LearningRate 0.0166 Epoch: 11 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:19,637-Speed 3386.17 samples/sec Loss 3.3638 LearningRate 0.0166 Epoch: 11 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:22,682-Speed 3363.62 samples/sec Loss 3.3456 LearningRate 0.0166 Epoch: 11 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:25,672-Speed 3425.21 samples/sec Loss 3.4739 LearningRate 0.0166 Epoch: 11 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:28,706-Speed 3376.24 samples/sec Loss 3.3060 LearningRate 0.0165 Epoch: 11 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:31,737-Speed 3379.27 samples/sec Loss 3.3969 LearningRate 0.0165 Epoch: 11 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:34,763-Speed 3384.65 samples/sec Loss 3.3815 LearningRate 0.0165 Epoch: 11 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:37,784-Speed 3390.18 samples/sec Loss 3.3357 LearningRate 0.0165 Epoch: 11 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:40,811-Speed 3384.24 samples/sec Loss 3.2427 LearningRate 0.0165 Epoch: 11 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:43,831-Speed 3391.39 samples/sec Loss 3.3742 LearningRate 0.0165 Epoch: 11 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:46,883-Speed 3356.03 samples/sec Loss 3.2680 LearningRate 0.0165 Epoch: 11 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:49,904-Speed 3390.44 samples/sec Loss 3.3558 LearningRate 0.0165 Epoch: 11 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:52,929-Speed 3386.46 samples/sec Loss 3.4339 LearningRate 0.0165 Epoch: 11 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:24:55,951-Speed 3389.03 samples/sec Loss 3.3837 LearningRate 0.0165 Epoch: 11 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:24:58,977-Speed 3384.51 samples/sec Loss 3.3473 LearningRate 0.0165 Epoch: 11 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:02,016-Speed 3370.77 samples/sec Loss 3.3748 LearningRate 0.0165 Epoch: 11 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:05,040-Speed 3386.68 samples/sec Loss 3.3780 LearningRate 0.0165 Epoch: 11 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:08,063-Speed 3388.52 samples/sec Loss 3.3998 LearningRate 0.0165 Epoch: 11 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:11,088-Speed 3387.68 samples/sec Loss 3.2757 LearningRate 0.0164 Epoch: 11 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:14,110-Speed 3388.54 samples/sec Loss 3.3297 LearningRate 0.0164 Epoch: 11 Global Step: 67610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:17,132-Speed 3389.25 samples/sec Loss 3.3020 LearningRate 0.0164 Epoch: 11 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:20,153-Speed 3390.98 samples/sec Loss 3.3579 LearningRate 0.0164 Epoch: 11 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:23,176-Speed 3387.38 samples/sec Loss 3.3758 LearningRate 0.0164 Epoch: 11 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:26,204-Speed 3382.51 samples/sec Loss 3.3781 LearningRate 0.0164 Epoch: 11 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:25:29,235-Speed 3379.72 samples/sec Loss 3.2800 LearningRate 0.0164 Epoch: 11 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:25:32,261-Speed 3384.70 samples/sec Loss 3.4153 LearningRate 0.0164 Epoch: 11 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:25:35,272-Speed 3401.34 samples/sec Loss 3.3082 LearningRate 0.0164 Epoch: 11 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:38,300-Speed 3382.41 samples/sec Loss 3.3089 LearningRate 0.0164 Epoch: 11 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:41,329-Speed 3381.60 samples/sec Loss 3.4552 LearningRate 0.0164 Epoch: 11 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:44,353-Speed 3387.26 samples/sec Loss 3.2924 LearningRate 0.0164 Epoch: 11 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:47,379-Speed 3385.24 samples/sec Loss 3.3534 LearningRate 0.0164 Epoch: 11 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:50,417-Speed 3371.41 samples/sec Loss 3.2419 LearningRate 0.0164 Epoch: 11 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:53,448-Speed 3378.53 samples/sec Loss 3.4036 LearningRate 0.0163 Epoch: 11 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:56,477-Speed 3381.00 samples/sec Loss 3.4112 LearningRate 0.0163 Epoch: 11 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:25:59,509-Speed 3378.66 samples/sec Loss 3.2303 LearningRate 0.0163 Epoch: 11 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:02,540-Speed 3380.02 samples/sec Loss 3.3828 LearningRate 0.0163 Epoch: 11 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:05,570-Speed 3379.58 samples/sec Loss 3.3256 LearningRate 0.0163 Epoch: 11 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:08,597-Speed 3384.37 samples/sec Loss 3.4438 LearningRate 0.0163 Epoch: 11 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:11,632-Speed 3373.92 samples/sec Loss 3.2518 LearningRate 0.0163 Epoch: 11 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:14,658-Speed 3384.61 samples/sec Loss 3.2360 LearningRate 0.0163 Epoch: 11 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:17,700-Speed 3367.80 samples/sec Loss 3.5287 LearningRate 0.0163 Epoch: 11 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:20,731-Speed 3378.89 samples/sec Loss 3.3873 LearningRate 0.0163 Epoch: 11 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:23,745-Speed 3398.65 samples/sec Loss 3.3616 LearningRate 0.0163 Epoch: 11 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:26,779-Speed 3375.51 samples/sec Loss 3.3628 LearningRate 0.0163 Epoch: 11 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:29,803-Speed 3387.70 samples/sec Loss 3.4400 LearningRate 0.0163 Epoch: 11 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:32,825-Speed 3389.87 samples/sec Loss 3.3060 LearningRate 0.0163 Epoch: 11 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:35,853-Speed 3381.43 samples/sec Loss 3.4042 LearningRate 0.0162 Epoch: 11 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:38,904-Speed 3357.01 samples/sec Loss 3.3699 LearningRate 0.0162 Epoch: 11 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:41,934-Speed 3380.63 samples/sec Loss 3.2219 LearningRate 0.0162 Epoch: 11 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:44,964-Speed 3380.25 samples/sec Loss 3.4291 LearningRate 0.0162 Epoch: 11 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:48,003-Speed 3370.12 samples/sec Loss 3.3905 LearningRate 0.0162 Epoch: 11 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:51,026-Speed 3388.86 samples/sec Loss 3.3550 LearningRate 0.0162 Epoch: 11 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:26:54,065-Speed 3369.88 samples/sec Loss 3.3510 LearningRate 0.0162 Epoch: 11 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:26:57,159-Speed 3310.84 samples/sec Loss 3.4221 LearningRate 0.0162 Epoch: 11 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:27:00,190-Speed 3379.46 samples/sec Loss 3.3226 LearningRate 0.0162 Epoch: 11 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:27:03,220-Speed 3379.96 samples/sec Loss 3.4101 LearningRate 0.0162 Epoch: 11 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:27:06,250-Speed 3379.77 samples/sec Loss 3.2079 LearningRate 0.0162 Epoch: 11 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:27:09,276-Speed 3384.75 samples/sec Loss 3.3844 LearningRate 0.0162 Epoch: 11 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:27:12,300-Speed 3387.81 samples/sec Loss 3.3353 LearningRate 0.0162 Epoch: 11 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:27:55,883-[lfw][68000]XNorm: 22.379005 Training: 2022-04-27 08:27:55,883-[lfw][68000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 08:27:55,884-[lfw][68000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:28:46,686-[cfp_fp][68000]XNorm: 21.270001 Training: 2022-04-27 08:28:46,686-[cfp_fp][68000]Accuracy-Flip: 0.97186+-0.00729 Training: 2022-04-27 08:28:46,687-[cfp_fp][68000]Accuracy-Highest: 0.97529 Training: 2022-04-27 08:29:30,153-[agedb_30][68000]XNorm: 22.876901 Training: 2022-04-27 08:29:30,154-[agedb_30][68000]Accuracy-Flip: 0.97950+-0.00587 Training: 2022-04-27 08:29:30,154-[agedb_30][68000]Accuracy-Highest: 0.97950 Training: 2022-04-27 08:29:33,171-Speed 72.69 samples/sec Loss 3.3702 LearningRate 0.0162 Epoch: 11 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:29:36,174-Speed 3410.33 samples/sec Loss 3.3172 LearningRate 0.0161 Epoch: 11 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:29:39,177-Speed 3410.97 samples/sec Loss 3.3371 LearningRate 0.0161 Epoch: 11 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:29:42,184-Speed 3406.40 samples/sec Loss 3.2081 LearningRate 0.0161 Epoch: 11 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:29:45,200-Speed 3395.61 samples/sec Loss 3.3177 LearningRate 0.0161 Epoch: 11 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:29:48,214-Speed 3398.33 samples/sec Loss 3.3769 LearningRate 0.0161 Epoch: 11 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:29:51,230-Speed 3396.59 samples/sec Loss 3.2798 LearningRate 0.0161 Epoch: 11 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:29:54,245-Speed 3396.37 samples/sec Loss 3.2475 LearningRate 0.0161 Epoch: 11 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:29:57,266-Speed 3390.99 samples/sec Loss 3.3300 LearningRate 0.0161 Epoch: 11 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:30:00,273-Speed 3405.96 samples/sec Loss 3.1644 LearningRate 0.0161 Epoch: 11 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:03,295-Speed 3389.07 samples/sec Loss 3.3767 LearningRate 0.0161 Epoch: 11 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:06,318-Speed 3387.94 samples/sec Loss 3.2945 LearningRate 0.0161 Epoch: 11 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:09,360-Speed 3367.10 samples/sec Loss 3.2932 LearningRate 0.0161 Epoch: 11 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:12,392-Speed 3379.32 samples/sec Loss 3.3454 LearningRate 0.0161 Epoch: 11 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:15,424-Speed 3377.19 samples/sec Loss 3.3205 LearningRate 0.0161 Epoch: 11 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:18,445-Speed 3390.61 samples/sec Loss 3.3089 LearningRate 0.0161 Epoch: 11 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:21,470-Speed 3385.53 samples/sec Loss 3.1783 LearningRate 0.0160 Epoch: 11 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:24,492-Speed 3390.21 samples/sec Loss 3.3599 LearningRate 0.0160 Epoch: 11 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:27,511-Speed 3392.35 samples/sec Loss 3.3319 LearningRate 0.0160 Epoch: 11 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:30,533-Speed 3388.47 samples/sec Loss 3.4040 LearningRate 0.0160 Epoch: 11 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:30:33,546-Speed 3399.64 samples/sec Loss 3.4516 LearningRate 0.0160 Epoch: 11 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:36,573-Speed 3384.28 samples/sec Loss 3.3043 LearningRate 0.0160 Epoch: 11 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:39,674-Speed 3302.43 samples/sec Loss 3.3284 LearningRate 0.0160 Epoch: 11 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:53,066-Speed 764.72 samples/sec Loss 2.8906 LearningRate 0.0160 Epoch: 12 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:56,101-Speed 3375.33 samples/sec Loss 2.6391 LearningRate 0.0160 Epoch: 12 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:30:59,111-Speed 3402.64 samples/sec Loss 2.6966 LearningRate 0.0160 Epoch: 12 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:02,140-Speed 3381.84 samples/sec Loss 2.7196 LearningRate 0.0160 Epoch: 12 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:05,159-Speed 3393.04 samples/sec Loss 2.6933 LearningRate 0.0160 Epoch: 12 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:08,180-Speed 3390.51 samples/sec Loss 2.6670 LearningRate 0.0160 Epoch: 12 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:11,206-Speed 3384.27 samples/sec Loss 2.7023 LearningRate 0.0160 Epoch: 12 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:14,224-Speed 3394.70 samples/sec Loss 2.7314 LearningRate 0.0159 Epoch: 12 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:17,233-Speed 3403.46 samples/sec Loss 2.7915 LearningRate 0.0159 Epoch: 12 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:20,244-Speed 3401.68 samples/sec Loss 2.6779 LearningRate 0.0159 Epoch: 12 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:23,263-Speed 3392.37 samples/sec Loss 2.7151 LearningRate 0.0159 Epoch: 12 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:26,316-Speed 3355.98 samples/sec Loss 2.6300 LearningRate 0.0159 Epoch: 12 Global Step: 68350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:29,333-Speed 3394.57 samples/sec Loss 2.7133 LearningRate 0.0159 Epoch: 12 Global Step: 68360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:31:32,334-Speed 3411.97 samples/sec Loss 2.8186 LearningRate 0.0159 Epoch: 12 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:35,384-Speed 3359.34 samples/sec Loss 2.7797 LearningRate 0.0159 Epoch: 12 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:38,400-Speed 3395.81 samples/sec Loss 2.8686 LearningRate 0.0159 Epoch: 12 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:41,403-Speed 3410.54 samples/sec Loss 2.7509 LearningRate 0.0159 Epoch: 12 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:44,413-Speed 3402.62 samples/sec Loss 2.8202 LearningRate 0.0159 Epoch: 12 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:47,475-Speed 3345.57 samples/sec Loss 2.7698 LearningRate 0.0159 Epoch: 12 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:50,533-Speed 3349.39 samples/sec Loss 2.7210 LearningRate 0.0159 Epoch: 12 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:53,595-Speed 3344.04 samples/sec Loss 2.8631 LearningRate 0.0159 Epoch: 12 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:56,605-Speed 3403.39 samples/sec Loss 2.7102 LearningRate 0.0158 Epoch: 12 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:31:59,616-Speed 3401.85 samples/sec Loss 2.7892 LearningRate 0.0158 Epoch: 12 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:02,614-Speed 3416.61 samples/sec Loss 2.7480 LearningRate 0.0158 Epoch: 12 Global Step: 68470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:05,641-Speed 3383.32 samples/sec Loss 2.7210 LearningRate 0.0158 Epoch: 12 Global Step: 68480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:08,653-Speed 3400.47 samples/sec Loss 2.7783 LearningRate 0.0158 Epoch: 12 Global Step: 68490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:11,666-Speed 3399.67 samples/sec Loss 2.7672 LearningRate 0.0158 Epoch: 12 Global Step: 68500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:14,681-Speed 3398.12 samples/sec Loss 2.7803 LearningRate 0.0158 Epoch: 12 Global Step: 68510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:17,716-Speed 3374.04 samples/sec Loss 2.6687 LearningRate 0.0158 Epoch: 12 Global Step: 68520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:20,732-Speed 3395.93 samples/sec Loss 2.7307 LearningRate 0.0158 Epoch: 12 Global Step: 68530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:23,748-Speed 3395.58 samples/sec Loss 2.8233 LearningRate 0.0158 Epoch: 12 Global Step: 68540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:26,763-Speed 3398.36 samples/sec Loss 2.7349 LearningRate 0.0158 Epoch: 12 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:29,780-Speed 3394.39 samples/sec Loss 2.9215 LearningRate 0.0158 Epoch: 12 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:32,807-Speed 3383.89 samples/sec Loss 2.7536 LearningRate 0.0158 Epoch: 12 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:35,872-Speed 3341.51 samples/sec Loss 2.8362 LearningRate 0.0158 Epoch: 12 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:38,885-Speed 3399.76 samples/sec Loss 2.8543 LearningRate 0.0157 Epoch: 12 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:41,905-Speed 3391.03 samples/sec Loss 2.7979 LearningRate 0.0157 Epoch: 12 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:44,926-Speed 3390.93 samples/sec Loss 2.8277 LearningRate 0.0157 Epoch: 12 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:47,948-Speed 3388.68 samples/sec Loss 2.8039 LearningRate 0.0157 Epoch: 12 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:32:50,949-Speed 3413.47 samples/sec Loss 2.8073 LearningRate 0.0157 Epoch: 12 Global Step: 68630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:53,964-Speed 3397.05 samples/sec Loss 2.9534 LearningRate 0.0157 Epoch: 12 Global Step: 68640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:32:56,987-Speed 3388.67 samples/sec Loss 2.7494 LearningRate 0.0157 Epoch: 12 Global Step: 68650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:00,014-Speed 3383.33 samples/sec Loss 2.8443 LearningRate 0.0157 Epoch: 12 Global Step: 68660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:03,036-Speed 3389.36 samples/sec Loss 2.9545 LearningRate 0.0157 Epoch: 12 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:06,062-Speed 3384.91 samples/sec Loss 2.8741 LearningRate 0.0157 Epoch: 12 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:09,080-Speed 3393.38 samples/sec Loss 2.8273 LearningRate 0.0157 Epoch: 12 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:12,095-Speed 3397.42 samples/sec Loss 2.9421 LearningRate 0.0157 Epoch: 12 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:15,111-Speed 3396.38 samples/sec Loss 2.7945 LearningRate 0.0157 Epoch: 12 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:18,125-Speed 3398.37 samples/sec Loss 2.8377 LearningRate 0.0157 Epoch: 12 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:33:21,137-Speed 3400.63 samples/sec Loss 2.7301 LearningRate 0.0157 Epoch: 12 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:24,171-Speed 3374.84 samples/sec Loss 2.8459 LearningRate 0.0156 Epoch: 12 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:27,207-Speed 3374.18 samples/sec Loss 2.8955 LearningRate 0.0156 Epoch: 12 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:30,247-Speed 3368.60 samples/sec Loss 2.8931 LearningRate 0.0156 Epoch: 12 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:33,263-Speed 3396.07 samples/sec Loss 2.8192 LearningRate 0.0156 Epoch: 12 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:36,294-Speed 3378.89 samples/sec Loss 2.9418 LearningRate 0.0156 Epoch: 12 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:39,313-Speed 3393.59 samples/sec Loss 2.8436 LearningRate 0.0156 Epoch: 12 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:42,336-Speed 3388.05 samples/sec Loss 2.8552 LearningRate 0.0156 Epoch: 12 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:45,358-Speed 3389.11 samples/sec Loss 2.9611 LearningRate 0.0156 Epoch: 12 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:48,372-Speed 3398.38 samples/sec Loss 2.8706 LearningRate 0.0156 Epoch: 12 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:33:51,392-Speed 3392.12 samples/sec Loss 2.8263 LearningRate 0.0156 Epoch: 12 Global Step: 68830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:33:54,407-Speed 3396.88 samples/sec Loss 2.9706 LearningRate 0.0156 Epoch: 12 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:33:57,436-Speed 3380.96 samples/sec Loss 2.8946 LearningRate 0.0156 Epoch: 12 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:34:00,456-Speed 3391.44 samples/sec Loss 2.8579 LearningRate 0.0156 Epoch: 12 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:34:03,481-Speed 3386.28 samples/sec Loss 2.9203 LearningRate 0.0156 Epoch: 12 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:34:06,501-Speed 3391.21 samples/sec Loss 2.8691 LearningRate 0.0155 Epoch: 12 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:34:09,525-Speed 3387.43 samples/sec Loss 2.9902 LearningRate 0.0155 Epoch: 12 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:12,548-Speed 3387.54 samples/sec Loss 2.9430 LearningRate 0.0155 Epoch: 12 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:15,572-Speed 3387.49 samples/sec Loss 2.7942 LearningRate 0.0155 Epoch: 12 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:18,590-Speed 3393.71 samples/sec Loss 2.9153 LearningRate 0.0155 Epoch: 12 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:21,602-Speed 3400.70 samples/sec Loss 2.9602 LearningRate 0.0155 Epoch: 12 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:24,621-Speed 3392.73 samples/sec Loss 2.8985 LearningRate 0.0155 Epoch: 12 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:34:27,634-Speed 3399.25 samples/sec Loss 2.8272 LearningRate 0.0155 Epoch: 12 Global Step: 68950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:30,647-Speed 3398.98 samples/sec Loss 2.9287 LearningRate 0.0155 Epoch: 12 Global Step: 68960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:33,682-Speed 3375.54 samples/sec Loss 2.8855 LearningRate 0.0155 Epoch: 12 Global Step: 68970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:36,723-Speed 3368.05 samples/sec Loss 2.8697 LearningRate 0.0155 Epoch: 12 Global Step: 68980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:39,737-Speed 3397.88 samples/sec Loss 2.9272 LearningRate 0.0155 Epoch: 12 Global Step: 68990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:42,756-Speed 3393.23 samples/sec Loss 2.8038 LearningRate 0.0155 Epoch: 12 Global Step: 69000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:45,781-Speed 3385.15 samples/sec Loss 2.9353 LearningRate 0.0155 Epoch: 12 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:48,798-Speed 3395.13 samples/sec Loss 2.8408 LearningRate 0.0155 Epoch: 12 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:51,815-Speed 3395.01 samples/sec Loss 2.8736 LearningRate 0.0154 Epoch: 12 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:54,833-Speed 3394.11 samples/sec Loss 2.8371 LearningRate 0.0154 Epoch: 12 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:34:57,855-Speed 3389.30 samples/sec Loss 2.9553 LearningRate 0.0154 Epoch: 12 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:00,881-Speed 3383.73 samples/sec Loss 2.9095 LearningRate 0.0154 Epoch: 12 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:03,929-Speed 3360.83 samples/sec Loss 2.8833 LearningRate 0.0154 Epoch: 12 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:06,954-Speed 3385.81 samples/sec Loss 3.0126 LearningRate 0.0154 Epoch: 12 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:09,980-Speed 3385.78 samples/sec Loss 3.0224 LearningRate 0.0154 Epoch: 12 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:12,999-Speed 3391.91 samples/sec Loss 2.9118 LearningRate 0.0154 Epoch: 12 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:16,031-Speed 3378.00 samples/sec Loss 2.9507 LearningRate 0.0154 Epoch: 12 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:19,054-Speed 3388.20 samples/sec Loss 2.9247 LearningRate 0.0154 Epoch: 12 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:22,081-Speed 3384.15 samples/sec Loss 3.0166 LearningRate 0.0154 Epoch: 12 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:25,101-Speed 3391.60 samples/sec Loss 2.9326 LearningRate 0.0154 Epoch: 12 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:28,184-Speed 3321.78 samples/sec Loss 2.8806 LearningRate 0.0154 Epoch: 12 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:35:31,185-Speed 3412.62 samples/sec Loss 2.8610 LearningRate 0.0154 Epoch: 12 Global Step: 69160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:34,202-Speed 3395.58 samples/sec Loss 2.9163 LearningRate 0.0153 Epoch: 12 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:37,222-Speed 3391.52 samples/sec Loss 2.8866 LearningRate 0.0153 Epoch: 12 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:40,258-Speed 3373.19 samples/sec Loss 2.8600 LearningRate 0.0153 Epoch: 12 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:43,296-Speed 3372.44 samples/sec Loss 2.7646 LearningRate 0.0153 Epoch: 12 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:46,321-Speed 3385.59 samples/sec Loss 2.8803 LearningRate 0.0153 Epoch: 12 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:49,357-Speed 3373.27 samples/sec Loss 2.9149 LearningRate 0.0153 Epoch: 12 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:52,378-Speed 3390.13 samples/sec Loss 2.9208 LearningRate 0.0153 Epoch: 12 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:55,398-Speed 3391.62 samples/sec Loss 2.9092 LearningRate 0.0153 Epoch: 12 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:35:58,421-Speed 3387.80 samples/sec Loss 2.9775 LearningRate 0.0153 Epoch: 12 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:01,464-Speed 3366.28 samples/sec Loss 2.8944 LearningRate 0.0153 Epoch: 12 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:36:04,477-Speed 3399.38 samples/sec Loss 2.9035 LearningRate 0.0153 Epoch: 12 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:07,498-Speed 3391.05 samples/sec Loss 2.8886 LearningRate 0.0153 Epoch: 12 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:10,566-Speed 3338.29 samples/sec Loss 2.9042 LearningRate 0.0153 Epoch: 12 Global Step: 69290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:13,590-Speed 3387.28 samples/sec Loss 2.9637 LearningRate 0.0153 Epoch: 12 Global Step: 69300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:16,616-Speed 3384.30 samples/sec Loss 3.0239 LearningRate 0.0153 Epoch: 12 Global Step: 69310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:19,633-Speed 3395.20 samples/sec Loss 2.9796 LearningRate 0.0152 Epoch: 12 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:22,664-Speed 3379.28 samples/sec Loss 3.0048 LearningRate 0.0152 Epoch: 12 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:25,685-Speed 3390.51 samples/sec Loss 2.8677 LearningRate 0.0152 Epoch: 12 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:28,706-Speed 3390.58 samples/sec Loss 2.9528 LearningRate 0.0152 Epoch: 12 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:31,730-Speed 3387.08 samples/sec Loss 2.9660 LearningRate 0.0152 Epoch: 12 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:34,766-Speed 3372.61 samples/sec Loss 2.9115 LearningRate 0.0152 Epoch: 12 Global Step: 69370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:36:37,819-Speed 3355.79 samples/sec Loss 2.9767 LearningRate 0.0152 Epoch: 12 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:36:40,851-Speed 3377.57 samples/sec Loss 2.9475 LearningRate 0.0152 Epoch: 12 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:36:43,858-Speed 3406.16 samples/sec Loss 2.8955 LearningRate 0.0152 Epoch: 12 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:46,887-Speed 3381.03 samples/sec Loss 2.9508 LearningRate 0.0152 Epoch: 12 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:49,948-Speed 3346.41 samples/sec Loss 3.0545 LearningRate 0.0152 Epoch: 12 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:52,967-Speed 3392.85 samples/sec Loss 2.9622 LearningRate 0.0152 Epoch: 12 Global Step: 69430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:55,986-Speed 3392.62 samples/sec Loss 2.9645 LearningRate 0.0152 Epoch: 12 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:36:59,007-Speed 3390.12 samples/sec Loss 2.9542 LearningRate 0.0152 Epoch: 12 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:02,037-Speed 3380.25 samples/sec Loss 2.9494 LearningRate 0.0151 Epoch: 12 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:05,087-Speed 3358.32 samples/sec Loss 2.9143 LearningRate 0.0151 Epoch: 12 Global Step: 69470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:08,121-Speed 3376.25 samples/sec Loss 2.9291 LearningRate 0.0151 Epoch: 12 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:11,148-Speed 3383.47 samples/sec Loss 2.9520 LearningRate 0.0151 Epoch: 12 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:14,181-Speed 3377.57 samples/sec Loss 2.8318 LearningRate 0.0151 Epoch: 12 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:37:17,191-Speed 3402.21 samples/sec Loss 2.9046 LearningRate 0.0151 Epoch: 12 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:20,210-Speed 3393.03 samples/sec Loss 2.9580 LearningRate 0.0151 Epoch: 12 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:23,238-Speed 3383.12 samples/sec Loss 2.8556 LearningRate 0.0151 Epoch: 12 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:26,260-Speed 3388.93 samples/sec Loss 3.0425 LearningRate 0.0151 Epoch: 12 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:29,282-Speed 3389.33 samples/sec Loss 2.9451 LearningRate 0.0151 Epoch: 12 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:32,300-Speed 3393.15 samples/sec Loss 2.9904 LearningRate 0.0151 Epoch: 12 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:35,328-Speed 3382.94 samples/sec Loss 2.8999 LearningRate 0.0151 Epoch: 12 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:38,363-Speed 3374.73 samples/sec Loss 2.9934 LearningRate 0.0151 Epoch: 12 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:41,390-Speed 3383.07 samples/sec Loss 2.9356 LearningRate 0.0151 Epoch: 12 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:44,412-Speed 3389.10 samples/sec Loss 3.0342 LearningRate 0.0151 Epoch: 12 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:47,434-Speed 3390.32 samples/sec Loss 3.0176 LearningRate 0.0150 Epoch: 12 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:37:50,527-Speed 3311.57 samples/sec Loss 3.0811 LearningRate 0.0150 Epoch: 12 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:53,555-Speed 3382.58 samples/sec Loss 2.9070 LearningRate 0.0150 Epoch: 12 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:56,584-Speed 3381.16 samples/sec Loss 3.0134 LearningRate 0.0150 Epoch: 12 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:37:59,605-Speed 3390.37 samples/sec Loss 2.9500 LearningRate 0.0150 Epoch: 12 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:02,628-Speed 3387.65 samples/sec Loss 3.0333 LearningRate 0.0150 Epoch: 12 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:05,650-Speed 3390.96 samples/sec Loss 2.9589 LearningRate 0.0150 Epoch: 12 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:08,670-Speed 3390.99 samples/sec Loss 2.9688 LearningRate 0.0150 Epoch: 12 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:11,710-Speed 3369.06 samples/sec Loss 2.9899 LearningRate 0.0150 Epoch: 12 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:14,749-Speed 3370.34 samples/sec Loss 3.0149 LearningRate 0.0150 Epoch: 12 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:17,772-Speed 3389.16 samples/sec Loss 2.9683 LearningRate 0.0150 Epoch: 12 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:20,793-Speed 3390.29 samples/sec Loss 2.9777 LearningRate 0.0150 Epoch: 12 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:38:23,826-Speed 3376.87 samples/sec Loss 3.0046 LearningRate 0.0150 Epoch: 12 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:38:26,832-Speed 3406.70 samples/sec Loss 2.9200 LearningRate 0.0150 Epoch: 12 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:29,857-Speed 3386.78 samples/sec Loss 2.8952 LearningRate 0.0149 Epoch: 12 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:32,879-Speed 3388.43 samples/sec Loss 3.0575 LearningRate 0.0149 Epoch: 12 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:35,988-Speed 3295.26 samples/sec Loss 3.0682 LearningRate 0.0149 Epoch: 12 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:39,017-Speed 3380.67 samples/sec Loss 2.9610 LearningRate 0.0149 Epoch: 12 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:42,050-Speed 3377.37 samples/sec Loss 2.9542 LearningRate 0.0149 Epoch: 12 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:45,076-Speed 3385.50 samples/sec Loss 3.0253 LearningRate 0.0149 Epoch: 12 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:48,100-Speed 3386.66 samples/sec Loss 2.9055 LearningRate 0.0149 Epoch: 12 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:51,122-Speed 3388.94 samples/sec Loss 2.9553 LearningRate 0.0149 Epoch: 12 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:54,141-Speed 3392.86 samples/sec Loss 2.9653 LearningRate 0.0149 Epoch: 12 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:38:57,171-Speed 3380.48 samples/sec Loss 2.9707 LearningRate 0.0149 Epoch: 12 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:00,310-Speed 3262.83 samples/sec Loss 2.9749 LearningRate 0.0149 Epoch: 12 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:03,386-Speed 3329.87 samples/sec Loss 2.9620 LearningRate 0.0149 Epoch: 12 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:06,424-Speed 3371.03 samples/sec Loss 2.9800 LearningRate 0.0149 Epoch: 12 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:09,447-Speed 3388.37 samples/sec Loss 2.9425 LearningRate 0.0149 Epoch: 12 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:12,485-Speed 3371.36 samples/sec Loss 2.9277 LearningRate 0.0149 Epoch: 12 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:15,513-Speed 3382.66 samples/sec Loss 3.0090 LearningRate 0.0148 Epoch: 12 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:39:18,519-Speed 3406.95 samples/sec Loss 2.9337 LearningRate 0.0148 Epoch: 12 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:39:21,545-Speed 3384.72 samples/sec Loss 2.8932 LearningRate 0.0148 Epoch: 12 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 08:39:24,580-Speed 3375.41 samples/sec Loss 3.0361 LearningRate 0.0148 Epoch: 12 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:27,630-Speed 3358.47 samples/sec Loss 2.9571 LearningRate 0.0148 Epoch: 12 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:30,660-Speed 3379.24 samples/sec Loss 2.9061 LearningRate 0.0148 Epoch: 12 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:33,689-Speed 3381.53 samples/sec Loss 3.0292 LearningRate 0.0148 Epoch: 12 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:36,730-Speed 3369.04 samples/sec Loss 2.9082 LearningRate 0.0148 Epoch: 12 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:39,761-Speed 3379.01 samples/sec Loss 2.9864 LearningRate 0.0148 Epoch: 12 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:42,786-Speed 3386.14 samples/sec Loss 2.9400 LearningRate 0.0148 Epoch: 12 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:39:45,837-Speed 3356.61 samples/sec Loss 2.9113 LearningRate 0.0148 Epoch: 12 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:40:29,074-[lfw][70000]XNorm: 21.786232 Training: 2022-04-27 08:40:29,075-[lfw][70000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 08:40:29,076-[lfw][70000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:41:19,765-[cfp_fp][70000]XNorm: 20.163254 Training: 2022-04-27 08:41:19,766-[cfp_fp][70000]Accuracy-Flip: 0.97286+-0.01048 Training: 2022-04-27 08:41:19,766-[cfp_fp][70000]Accuracy-Highest: 0.97529 Training: 2022-04-27 08:42:03,105-[agedb_30][70000]XNorm: 22.169804 Training: 2022-04-27 08:42:03,106-[agedb_30][70000]Accuracy-Flip: 0.97650+-0.00713 Training: 2022-04-27 08:42:03,106-[agedb_30][70000]Accuracy-Highest: 0.97950 Training: 2022-04-27 08:42:06,117-Speed 73.00 samples/sec Loss 3.0755 LearningRate 0.0148 Epoch: 12 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:09,133-Speed 3396.29 samples/sec Loss 2.9348 LearningRate 0.0148 Epoch: 12 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:12,137-Speed 3409.69 samples/sec Loss 2.9863 LearningRate 0.0148 Epoch: 12 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:15,147-Speed 3402.84 samples/sec Loss 3.0182 LearningRate 0.0148 Epoch: 12 Global Step: 70040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:18,177-Speed 3379.83 samples/sec Loss 2.9671 LearningRate 0.0147 Epoch: 12 Global Step: 70050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:21,202-Speed 3386.31 samples/sec Loss 2.9982 LearningRate 0.0147 Epoch: 12 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 08:42:24,191-Speed 3427.08 samples/sec Loss 2.9643 LearningRate 0.0147 Epoch: 12 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:27,206-Speed 3396.29 samples/sec Loss 2.9391 LearningRate 0.0147 Epoch: 12 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:30,225-Speed 3392.68 samples/sec Loss 3.0053 LearningRate 0.0147 Epoch: 12 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:33,244-Speed 3392.85 samples/sec Loss 3.0922 LearningRate 0.0147 Epoch: 12 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:36,266-Speed 3389.86 samples/sec Loss 2.9610 LearningRate 0.0147 Epoch: 12 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:39,285-Speed 3393.27 samples/sec Loss 3.1107 LearningRate 0.0147 Epoch: 12 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:42,306-Speed 3389.59 samples/sec Loss 2.9213 LearningRate 0.0147 Epoch: 12 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 08:42:45,334-Speed 3383.09 samples/sec Loss 2.9504 LearningRate 0.0147 Epoch: 12 Global Step: 70140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:42:48,397-Speed 3343.79 samples/sec Loss 2.9613 LearningRate 0.0147 Epoch: 12 Global Step: 70150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:42:51,433-Speed 3373.72 samples/sec Loss 2.9641 LearningRate 0.0147 Epoch: 12 Global Step: 70160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:42:54,464-Speed 3378.89 samples/sec Loss 2.9847 LearningRate 0.0147 Epoch: 12 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:42:57,494-Speed 3380.16 samples/sec Loss 3.0807 LearningRate 0.0147 Epoch: 12 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:00,527-Speed 3377.74 samples/sec Loss 3.0753 LearningRate 0.0147 Epoch: 12 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:03,597-Speed 3335.89 samples/sec Loss 3.0317 LearningRate 0.0146 Epoch: 12 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:06,619-Speed 3388.78 samples/sec Loss 2.9754 LearningRate 0.0146 Epoch: 12 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:09,645-Speed 3384.87 samples/sec Loss 2.9868 LearningRate 0.0146 Epoch: 12 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:12,667-Speed 3389.21 samples/sec Loss 2.9660 LearningRate 0.0146 Epoch: 12 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:15,687-Speed 3391.95 samples/sec Loss 3.0555 LearningRate 0.0146 Epoch: 12 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:18,708-Speed 3390.96 samples/sec Loss 2.9860 LearningRate 0.0146 Epoch: 12 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:21,727-Speed 3391.96 samples/sec Loss 2.9238 LearningRate 0.0146 Epoch: 12 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:24,753-Speed 3385.25 samples/sec Loss 3.0357 LearningRate 0.0146 Epoch: 12 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:43:27,779-Speed 3384.55 samples/sec Loss 2.9661 LearningRate 0.0146 Epoch: 12 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:43:30,776-Speed 3418.24 samples/sec Loss 2.9053 LearningRate 0.0146 Epoch: 12 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:33,791-Speed 3396.51 samples/sec Loss 2.9835 LearningRate 0.0146 Epoch: 12 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:36,816-Speed 3386.10 samples/sec Loss 2.9525 LearningRate 0.0146 Epoch: 12 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:39,828-Speed 3399.96 samples/sec Loss 3.0130 LearningRate 0.0146 Epoch: 12 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:42,847-Speed 3393.49 samples/sec Loss 2.9591 LearningRate 0.0146 Epoch: 12 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:45,861-Speed 3397.61 samples/sec Loss 3.1279 LearningRate 0.0146 Epoch: 12 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:48,871-Speed 3402.68 samples/sec Loss 3.0067 LearningRate 0.0145 Epoch: 12 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:51,882-Speed 3402.34 samples/sec Loss 3.0225 LearningRate 0.0145 Epoch: 12 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:54,890-Speed 3404.47 samples/sec Loss 2.9934 LearningRate 0.0145 Epoch: 12 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:43:57,901-Speed 3401.91 samples/sec Loss 2.8937 LearningRate 0.0145 Epoch: 12 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:00,912-Speed 3401.78 samples/sec Loss 2.9831 LearningRate 0.0145 Epoch: 12 Global Step: 70390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:44:03,925-Speed 3399.42 samples/sec Loss 2.8897 LearningRate 0.0145 Epoch: 12 Global Step: 70400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:44:06,914-Speed 3425.96 samples/sec Loss 2.9877 LearningRate 0.0145 Epoch: 12 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:09,937-Speed 3389.32 samples/sec Loss 2.8955 LearningRate 0.0145 Epoch: 12 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:12,952-Speed 3396.82 samples/sec Loss 2.8363 LearningRate 0.0145 Epoch: 12 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:15,981-Speed 3381.46 samples/sec Loss 3.0332 LearningRate 0.0145 Epoch: 12 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:18,989-Speed 3404.74 samples/sec Loss 2.9248 LearningRate 0.0145 Epoch: 12 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:21,999-Speed 3403.55 samples/sec Loss 3.0021 LearningRate 0.0145 Epoch: 12 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:25,008-Speed 3403.34 samples/sec Loss 2.9024 LearningRate 0.0145 Epoch: 12 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:28,053-Speed 3364.27 samples/sec Loss 2.9232 LearningRate 0.0145 Epoch: 12 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:31,063-Speed 3402.24 samples/sec Loss 2.8825 LearningRate 0.0145 Epoch: 12 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:44:34,058-Speed 3419.68 samples/sec Loss 3.1152 LearningRate 0.0144 Epoch: 12 Global Step: 70500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:37,083-Speed 3385.41 samples/sec Loss 2.9028 LearningRate 0.0144 Epoch: 12 Global Step: 70510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:40,102-Speed 3393.17 samples/sec Loss 3.0528 LearningRate 0.0144 Epoch: 12 Global Step: 70520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:43,118-Speed 3396.04 samples/sec Loss 3.0642 LearningRate 0.0144 Epoch: 12 Global Step: 70530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:46,140-Speed 3389.80 samples/sec Loss 3.0241 LearningRate 0.0144 Epoch: 12 Global Step: 70540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:49,151-Speed 3401.22 samples/sec Loss 3.0396 LearningRate 0.0144 Epoch: 12 Global Step: 70550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:52,163-Speed 3400.99 samples/sec Loss 2.9291 LearningRate 0.0144 Epoch: 12 Global Step: 70560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:55,174-Speed 3401.60 samples/sec Loss 3.0364 LearningRate 0.0144 Epoch: 12 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:44:58,195-Speed 3390.03 samples/sec Loss 3.0446 LearningRate 0.0144 Epoch: 12 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:45:01,241-Speed 3362.89 samples/sec Loss 3.0092 LearningRate 0.0144 Epoch: 12 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:45:04,253-Speed 3399.86 samples/sec Loss 2.8565 LearningRate 0.0144 Epoch: 12 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:07,265-Speed 3400.49 samples/sec Loss 3.0092 LearningRate 0.0144 Epoch: 12 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:10,281-Speed 3396.72 samples/sec Loss 3.0704 LearningRate 0.0144 Epoch: 12 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:13,293-Speed 3400.09 samples/sec Loss 3.0433 LearningRate 0.0144 Epoch: 12 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:16,306-Speed 3399.79 samples/sec Loss 2.9770 LearningRate 0.0144 Epoch: 12 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:19,321-Speed 3396.75 samples/sec Loss 3.0028 LearningRate 0.0143 Epoch: 12 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:22,337-Speed 3396.15 samples/sec Loss 2.9616 LearningRate 0.0143 Epoch: 12 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:25,357-Speed 3391.14 samples/sec Loss 2.9238 LearningRate 0.0143 Epoch: 12 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:28,373-Speed 3396.82 samples/sec Loss 3.0049 LearningRate 0.0143 Epoch: 12 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:31,385-Speed 3400.62 samples/sec Loss 2.9895 LearningRate 0.0143 Epoch: 12 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:34,401-Speed 3395.76 samples/sec Loss 3.0348 LearningRate 0.0143 Epoch: 12 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:45:37,419-Speed 3393.08 samples/sec Loss 3.0291 LearningRate 0.0143 Epoch: 12 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:45:40,427-Speed 3405.97 samples/sec Loss 3.0131 LearningRate 0.0143 Epoch: 12 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:43,437-Speed 3402.57 samples/sec Loss 3.0264 LearningRate 0.0143 Epoch: 12 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:46,457-Speed 3392.00 samples/sec Loss 3.0129 LearningRate 0.0143 Epoch: 12 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:49,476-Speed 3392.63 samples/sec Loss 2.9210 LearningRate 0.0143 Epoch: 12 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:52,490-Speed 3397.62 samples/sec Loss 2.9508 LearningRate 0.0143 Epoch: 12 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:55,504-Speed 3398.48 samples/sec Loss 2.9477 LearningRate 0.0143 Epoch: 12 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:45:58,517-Speed 3398.67 samples/sec Loss 3.0852 LearningRate 0.0143 Epoch: 12 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:01,538-Speed 3390.59 samples/sec Loss 2.9993 LearningRate 0.0143 Epoch: 12 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:04,558-Speed 3390.99 samples/sec Loss 2.8712 LearningRate 0.0142 Epoch: 12 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:07,570-Speed 3401.25 samples/sec Loss 3.0042 LearningRate 0.0142 Epoch: 12 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:10,586-Speed 3395.69 samples/sec Loss 3.0788 LearningRate 0.0142 Epoch: 12 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:13,615-Speed 3381.92 samples/sec Loss 3.0204 LearningRate 0.0142 Epoch: 12 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:16,629-Speed 3399.21 samples/sec Loss 3.0087 LearningRate 0.0142 Epoch: 12 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:19,644-Speed 3396.18 samples/sec Loss 2.9713 LearningRate 0.0142 Epoch: 12 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:22,661-Speed 3395.10 samples/sec Loss 2.9859 LearningRate 0.0142 Epoch: 12 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:25,662-Speed 3412.92 samples/sec Loss 2.9094 LearningRate 0.0142 Epoch: 12 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:28,674-Speed 3400.22 samples/sec Loss 2.9526 LearningRate 0.0142 Epoch: 12 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:31,695-Speed 3390.93 samples/sec Loss 2.9163 LearningRate 0.0142 Epoch: 12 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:34,724-Speed 3381.71 samples/sec Loss 2.9847 LearningRate 0.0142 Epoch: 12 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:37,739-Speed 3396.88 samples/sec Loss 3.0371 LearningRate 0.0142 Epoch: 12 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:40,754-Speed 3397.12 samples/sec Loss 3.1249 LearningRate 0.0142 Epoch: 12 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:43,768-Speed 3397.90 samples/sec Loss 2.9246 LearningRate 0.0142 Epoch: 12 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:46,800-Speed 3378.42 samples/sec Loss 3.0665 LearningRate 0.0142 Epoch: 12 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:49,865-Speed 3342.39 samples/sec Loss 3.0100 LearningRate 0.0141 Epoch: 12 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:52,884-Speed 3392.36 samples/sec Loss 2.9569 LearningRate 0.0141 Epoch: 12 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:46:55,915-Speed 3379.16 samples/sec Loss 3.0189 LearningRate 0.0141 Epoch: 12 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:46:58,964-Speed 3359.42 samples/sec Loss 2.9410 LearningRate 0.0141 Epoch: 12 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:47:01,984-Speed 3391.29 samples/sec Loss 2.9373 LearningRate 0.0141 Epoch: 12 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:47:05,039-Speed 3352.33 samples/sec Loss 2.8762 LearningRate 0.0141 Epoch: 12 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:47:08,072-Speed 3378.03 samples/sec Loss 2.8992 LearningRate 0.0141 Epoch: 12 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:47:11,072-Speed 3413.46 samples/sec Loss 3.0703 LearningRate 0.0141 Epoch: 12 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:14,107-Speed 3374.82 samples/sec Loss 2.9355 LearningRate 0.0141 Epoch: 12 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:17,120-Speed 3399.21 samples/sec Loss 2.9525 LearningRate 0.0141 Epoch: 12 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:20,135-Speed 3397.79 samples/sec Loss 3.0576 LearningRate 0.0141 Epoch: 12 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:23,183-Speed 3359.44 samples/sec Loss 2.8769 LearningRate 0.0141 Epoch: 12 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:26,229-Speed 3363.13 samples/sec Loss 3.0718 LearningRate 0.0141 Epoch: 12 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:29,250-Speed 3391.12 samples/sec Loss 3.0119 LearningRate 0.0141 Epoch: 12 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:32,266-Speed 3396.18 samples/sec Loss 2.9719 LearningRate 0.0141 Epoch: 12 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:35,334-Speed 3338.15 samples/sec Loss 2.9936 LearningRate 0.0140 Epoch: 12 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:38,351-Speed 3395.43 samples/sec Loss 2.9588 LearningRate 0.0140 Epoch: 12 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:41,349-Speed 3415.80 samples/sec Loss 2.9380 LearningRate 0.0140 Epoch: 12 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:44,371-Speed 3388.82 samples/sec Loss 3.0038 LearningRate 0.0140 Epoch: 12 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:47,390-Speed 3393.35 samples/sec Loss 3.1440 LearningRate 0.0140 Epoch: 12 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:50,417-Speed 3382.96 samples/sec Loss 3.0325 LearningRate 0.0140 Epoch: 12 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:53,442-Speed 3386.09 samples/sec Loss 2.9148 LearningRate 0.0140 Epoch: 12 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:56,463-Speed 3390.41 samples/sec Loss 3.0601 LearningRate 0.0140 Epoch: 12 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:47:59,499-Speed 3374.15 samples/sec Loss 3.0690 LearningRate 0.0140 Epoch: 12 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:03,007-Speed 2919.70 samples/sec Loss 2.9328 LearningRate 0.0140 Epoch: 12 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:06,332-Speed 3080.24 samples/sec Loss 3.0142 LearningRate 0.0140 Epoch: 12 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:09,358-Speed 3384.54 samples/sec Loss 3.0577 LearningRate 0.0140 Epoch: 12 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:12,462-Speed 3299.62 samples/sec Loss 2.9218 LearningRate 0.0140 Epoch: 12 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:15,515-Speed 3355.56 samples/sec Loss 3.0894 LearningRate 0.0140 Epoch: 12 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:18,544-Speed 3380.59 samples/sec Loss 2.9275 LearningRate 0.0140 Epoch: 12 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:21,558-Speed 3398.76 samples/sec Loss 3.0397 LearningRate 0.0139 Epoch: 12 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:24,573-Speed 3396.78 samples/sec Loss 2.9891 LearningRate 0.0139 Epoch: 12 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:27,589-Speed 3396.50 samples/sec Loss 2.9284 LearningRate 0.0139 Epoch: 12 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:30,603-Speed 3398.47 samples/sec Loss 3.0608 LearningRate 0.0139 Epoch: 12 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:33,618-Speed 3396.65 samples/sec Loss 3.0124 LearningRate 0.0139 Epoch: 12 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:36,642-Speed 3387.41 samples/sec Loss 2.9867 LearningRate 0.0139 Epoch: 12 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:39,660-Speed 3394.15 samples/sec Loss 2.9471 LearningRate 0.0139 Epoch: 12 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:48:42,682-Speed 3389.01 samples/sec Loss 2.9521 LearningRate 0.0139 Epoch: 12 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:48:45,705-Speed 3387.51 samples/sec Loss 3.0029 LearningRate 0.0139 Epoch: 12 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:48:48,733-Speed 3382.22 samples/sec Loss 2.9968 LearningRate 0.0139 Epoch: 12 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:48:51,756-Speed 3388.67 samples/sec Loss 3.0603 LearningRate 0.0139 Epoch: 12 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:48:54,773-Speed 3395.52 samples/sec Loss 3.0013 LearningRate 0.0139 Epoch: 12 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:48:57,793-Speed 3391.41 samples/sec Loss 2.9905 LearningRate 0.0139 Epoch: 12 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:49:00,802-Speed 3403.98 samples/sec Loss 3.1291 LearningRate 0.0139 Epoch: 12 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:03,828-Speed 3384.46 samples/sec Loss 2.9547 LearningRate 0.0139 Epoch: 12 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:06,859-Speed 3378.78 samples/sec Loss 3.0144 LearningRate 0.0138 Epoch: 12 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:09,892-Speed 3377.24 samples/sec Loss 2.9717 LearningRate 0.0138 Epoch: 12 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:12,925-Speed 3377.19 samples/sec Loss 3.0302 LearningRate 0.0138 Epoch: 12 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:15,952-Speed 3383.58 samples/sec Loss 2.9250 LearningRate 0.0138 Epoch: 12 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:18,968-Speed 3396.18 samples/sec Loss 2.9279 LearningRate 0.0138 Epoch: 12 Global Step: 71440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:21,986-Speed 3394.61 samples/sec Loss 2.9565 LearningRate 0.0138 Epoch: 12 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:25,007-Speed 3390.42 samples/sec Loss 2.9326 LearningRate 0.0138 Epoch: 12 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:28,027-Speed 3391.24 samples/sec Loss 2.9207 LearningRate 0.0138 Epoch: 12 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:31,049-Speed 3388.76 samples/sec Loss 3.1577 LearningRate 0.0138 Epoch: 12 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:49:34,058-Speed 3403.62 samples/sec Loss 3.0114 LearningRate 0.0138 Epoch: 12 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:37,081-Speed 3388.28 samples/sec Loss 2.9297 LearningRate 0.0138 Epoch: 12 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:40,096-Speed 3398.07 samples/sec Loss 2.9366 LearningRate 0.0138 Epoch: 12 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:43,116-Speed 3391.93 samples/sec Loss 2.9441 LearningRate 0.0138 Epoch: 12 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:46,133-Speed 3395.23 samples/sec Loss 3.0375 LearningRate 0.0138 Epoch: 12 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:49,182-Speed 3358.42 samples/sec Loss 3.0414 LearningRate 0.0138 Epoch: 12 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:52,202-Speed 3392.08 samples/sec Loss 2.9293 LearningRate 0.0138 Epoch: 12 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:55,217-Speed 3396.58 samples/sec Loss 3.0096 LearningRate 0.0137 Epoch: 12 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:49:58,237-Speed 3391.71 samples/sec Loss 2.8658 LearningRate 0.0137 Epoch: 12 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:01,257-Speed 3391.59 samples/sec Loss 2.9103 LearningRate 0.0137 Epoch: 12 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:04,275-Speed 3394.00 samples/sec Loss 3.0797 LearningRate 0.0137 Epoch: 12 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:50:07,288-Speed 3398.82 samples/sec Loss 2.9102 LearningRate 0.0137 Epoch: 12 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:10,331-Speed 3366.06 samples/sec Loss 3.0505 LearningRate 0.0137 Epoch: 12 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:13,407-Speed 3329.47 samples/sec Loss 3.0175 LearningRate 0.0137 Epoch: 12 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:16,572-Speed 3236.97 samples/sec Loss 2.9774 LearningRate 0.0137 Epoch: 12 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:19,590-Speed 3393.29 samples/sec Loss 2.8842 LearningRate 0.0137 Epoch: 12 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:22,610-Speed 3392.30 samples/sec Loss 3.0111 LearningRate 0.0137 Epoch: 12 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:25,636-Speed 3383.97 samples/sec Loss 3.0372 LearningRate 0.0137 Epoch: 12 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:28,667-Speed 3379.96 samples/sec Loss 3.0502 LearningRate 0.0137 Epoch: 12 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:31,687-Speed 3390.50 samples/sec Loss 3.0522 LearningRate 0.0137 Epoch: 12 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:34,715-Speed 3382.40 samples/sec Loss 3.0511 LearningRate 0.0137 Epoch: 12 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:37,730-Speed 3396.89 samples/sec Loss 3.0050 LearningRate 0.0137 Epoch: 12 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:40,752-Speed 3389.90 samples/sec Loss 3.0415 LearningRate 0.0136 Epoch: 12 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:43,776-Speed 3387.73 samples/sec Loss 3.0627 LearningRate 0.0136 Epoch: 12 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:46,797-Speed 3390.07 samples/sec Loss 2.8825 LearningRate 0.0136 Epoch: 12 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:49,819-Speed 3389.45 samples/sec Loss 3.0109 LearningRate 0.0136 Epoch: 12 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:52,840-Speed 3389.63 samples/sec Loss 2.9249 LearningRate 0.0136 Epoch: 12 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:55,869-Speed 3381.34 samples/sec Loss 2.9515 LearningRate 0.0136 Epoch: 12 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:50:58,892-Speed 3388.63 samples/sec Loss 3.0029 LearningRate 0.0136 Epoch: 12 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:01,928-Speed 3373.53 samples/sec Loss 3.0484 LearningRate 0.0136 Epoch: 12 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:04,961-Speed 3376.56 samples/sec Loss 2.9873 LearningRate 0.0136 Epoch: 12 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:07,964-Speed 3411.31 samples/sec Loss 2.9760 LearningRate 0.0136 Epoch: 12 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:10,990-Speed 3385.15 samples/sec Loss 2.9444 LearningRate 0.0136 Epoch: 12 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:14,021-Speed 3378.54 samples/sec Loss 2.9574 LearningRate 0.0136 Epoch: 12 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:17,046-Speed 3385.96 samples/sec Loss 2.9334 LearningRate 0.0136 Epoch: 12 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:20,072-Speed 3385.34 samples/sec Loss 2.9169 LearningRate 0.0136 Epoch: 12 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:23,110-Speed 3371.29 samples/sec Loss 2.9639 LearningRate 0.0136 Epoch: 12 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:26,143-Speed 3376.38 samples/sec Loss 2.9595 LearningRate 0.0135 Epoch: 12 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:29,168-Speed 3386.20 samples/sec Loss 2.9943 LearningRate 0.0135 Epoch: 12 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:32,192-Speed 3387.56 samples/sec Loss 3.0210 LearningRate 0.0135 Epoch: 12 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:35,214-Speed 3389.48 samples/sec Loss 2.9866 LearningRate 0.0135 Epoch: 12 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:38,251-Speed 3372.27 samples/sec Loss 3.0346 LearningRate 0.0135 Epoch: 12 Global Step: 71900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:51:41,255-Speed 3409.40 samples/sec Loss 3.0235 LearningRate 0.0135 Epoch: 12 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:44,277-Speed 3389.54 samples/sec Loss 3.0138 LearningRate 0.0135 Epoch: 12 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:47,300-Speed 3388.13 samples/sec Loss 2.8917 LearningRate 0.0135 Epoch: 12 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:50,321-Speed 3390.77 samples/sec Loss 3.0403 LearningRate 0.0135 Epoch: 12 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:53,361-Speed 3368.78 samples/sec Loss 2.8835 LearningRate 0.0135 Epoch: 12 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:56,385-Speed 3387.10 samples/sec Loss 2.9847 LearningRate 0.0135 Epoch: 12 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:51:59,407-Speed 3389.01 samples/sec Loss 2.9743 LearningRate 0.0135 Epoch: 12 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:52:02,430-Speed 3387.57 samples/sec Loss 2.9027 LearningRate 0.0135 Epoch: 12 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:52:05,454-Speed 3387.89 samples/sec Loss 2.9736 LearningRate 0.0135 Epoch: 12 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:52:08,481-Speed 3383.50 samples/sec Loss 2.9344 LearningRate 0.0135 Epoch: 12 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:52:51,790-[lfw][72000]XNorm: 22.158012 Training: 2022-04-27 08:52:51,791-[lfw][72000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-27 08:52:51,791-[lfw][72000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:53:42,115-[cfp_fp][72000]XNorm: 20.789949 Training: 2022-04-27 08:53:42,116-[cfp_fp][72000]Accuracy-Flip: 0.97557+-0.00875 Training: 2022-04-27 08:53:42,116-[cfp_fp][72000]Accuracy-Highest: 0.97557 Training: 2022-04-27 08:54:25,649-[agedb_30][72000]XNorm: 22.255885 Training: 2022-04-27 08:54:25,650-[agedb_30][72000]Accuracy-Flip: 0.97800+-0.00690 Training: 2022-04-27 08:54:25,650-[agedb_30][72000]Accuracy-Highest: 0.97950 Training: 2022-04-27 08:54:28,668-Speed 73.05 samples/sec Loss 3.0417 LearningRate 0.0135 Epoch: 12 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:54:31,677-Speed 3404.22 samples/sec Loss 3.0046 LearningRate 0.0134 Epoch: 12 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:54:34,689-Speed 3400.69 samples/sec Loss 2.9336 LearningRate 0.0134 Epoch: 12 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:54:37,690-Speed 3412.86 samples/sec Loss 2.9366 LearningRate 0.0134 Epoch: 12 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:40,711-Speed 3389.58 samples/sec Loss 2.9592 LearningRate 0.0134 Epoch: 12 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:43,745-Speed 3375.57 samples/sec Loss 2.9353 LearningRate 0.0134 Epoch: 12 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:46,757-Speed 3400.39 samples/sec Loss 2.9072 LearningRate 0.0134 Epoch: 12 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:49,772-Speed 3397.92 samples/sec Loss 2.8902 LearningRate 0.0134 Epoch: 12 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:52,795-Speed 3388.13 samples/sec Loss 2.9547 LearningRate 0.0134 Epoch: 12 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:55,815-Speed 3392.14 samples/sec Loss 2.9920 LearningRate 0.0134 Epoch: 12 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:54:58,839-Speed 3386.76 samples/sec Loss 2.8781 LearningRate 0.0134 Epoch: 12 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:01,862-Speed 3387.95 samples/sec Loss 2.8874 LearningRate 0.0134 Epoch: 12 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:04,892-Speed 3380.38 samples/sec Loss 2.9221 LearningRate 0.0134 Epoch: 12 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:07,906-Speed 3398.19 samples/sec Loss 3.0439 LearningRate 0.0134 Epoch: 12 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:10,908-Speed 3411.48 samples/sec Loss 2.9778 LearningRate 0.0134 Epoch: 12 Global Step: 72150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:13,930-Speed 3389.02 samples/sec Loss 2.9649 LearningRate 0.0134 Epoch: 12 Global Step: 72160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:16,958-Speed 3383.29 samples/sec Loss 2.9213 LearningRate 0.0133 Epoch: 12 Global Step: 72170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:19,975-Speed 3394.57 samples/sec Loss 3.0227 LearningRate 0.0133 Epoch: 12 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:23,061-Speed 3319.41 samples/sec Loss 2.9675 LearningRate 0.0133 Epoch: 12 Global Step: 72190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:26,088-Speed 3383.87 samples/sec Loss 2.8613 LearningRate 0.0133 Epoch: 12 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:29,119-Speed 3379.42 samples/sec Loss 2.9441 LearningRate 0.0133 Epoch: 12 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:32,141-Speed 3388.84 samples/sec Loss 3.0157 LearningRate 0.0133 Epoch: 12 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:35,168-Speed 3383.40 samples/sec Loss 3.0637 LearningRate 0.0133 Epoch: 12 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:38,199-Speed 3379.50 samples/sec Loss 2.9006 LearningRate 0.0133 Epoch: 12 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:55:41,221-Speed 3389.32 samples/sec Loss 3.0498 LearningRate 0.0133 Epoch: 12 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:44,252-Speed 3378.90 samples/sec Loss 2.9198 LearningRate 0.0133 Epoch: 12 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:47,318-Speed 3340.66 samples/sec Loss 2.9474 LearningRate 0.0133 Epoch: 12 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:50,377-Speed 3349.81 samples/sec Loss 2.8934 LearningRate 0.0133 Epoch: 12 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:53,404-Speed 3383.95 samples/sec Loss 2.9373 LearningRate 0.0133 Epoch: 12 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:56,428-Speed 3386.57 samples/sec Loss 2.8655 LearningRate 0.0133 Epoch: 12 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:55:59,452-Speed 3387.05 samples/sec Loss 2.9988 LearningRate 0.0133 Epoch: 12 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:02,476-Speed 3386.65 samples/sec Loss 3.0289 LearningRate 0.0133 Epoch: 12 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:05,503-Speed 3384.50 samples/sec Loss 2.9863 LearningRate 0.0132 Epoch: 12 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:08,525-Speed 3389.02 samples/sec Loss 2.9773 LearningRate 0.0132 Epoch: 12 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:11,551-Speed 3384.17 samples/sec Loss 2.9201 LearningRate 0.0132 Epoch: 12 Global Step: 72350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:56:14,583-Speed 3378.43 samples/sec Loss 2.9123 LearningRate 0.0132 Epoch: 12 Global Step: 72360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:56:17,587-Speed 3410.01 samples/sec Loss 2.9157 LearningRate 0.0132 Epoch: 12 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:20,608-Speed 3391.16 samples/sec Loss 2.9752 LearningRate 0.0132 Epoch: 12 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:23,677-Speed 3337.11 samples/sec Loss 2.8976 LearningRate 0.0132 Epoch: 12 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:26,726-Speed 3359.57 samples/sec Loss 2.9724 LearningRate 0.0132 Epoch: 12 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:29,751-Speed 3385.72 samples/sec Loss 2.9064 LearningRate 0.0132 Epoch: 12 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:32,771-Speed 3390.63 samples/sec Loss 2.9512 LearningRate 0.0132 Epoch: 12 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:35,792-Speed 3390.98 samples/sec Loss 2.8686 LearningRate 0.0132 Epoch: 12 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:38,814-Speed 3388.83 samples/sec Loss 2.8654 LearningRate 0.0132 Epoch: 12 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:41,838-Speed 3387.54 samples/sec Loss 2.9493 LearningRate 0.0132 Epoch: 12 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:44,858-Speed 3391.47 samples/sec Loss 2.9905 LearningRate 0.0132 Epoch: 12 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:47,876-Speed 3393.32 samples/sec Loss 2.8669 LearningRate 0.0132 Epoch: 12 Global Step: 72470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:56:50,897-Speed 3390.97 samples/sec Loss 2.9673 LearningRate 0.0132 Epoch: 12 Global Step: 72480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:56:53,900-Speed 3410.57 samples/sec Loss 2.8717 LearningRate 0.0131 Epoch: 12 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:56,917-Speed 3394.59 samples/sec Loss 2.9620 LearningRate 0.0131 Epoch: 12 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:56:59,958-Speed 3368.33 samples/sec Loss 2.9365 LearningRate 0.0131 Epoch: 12 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:02,991-Speed 3377.44 samples/sec Loss 2.9956 LearningRate 0.0131 Epoch: 12 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:06,012-Speed 3389.86 samples/sec Loss 2.8465 LearningRate 0.0131 Epoch: 12 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:09,038-Speed 3384.83 samples/sec Loss 2.9552 LearningRate 0.0131 Epoch: 12 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:12,061-Speed 3388.30 samples/sec Loss 2.9269 LearningRate 0.0131 Epoch: 12 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:15,059-Speed 3416.02 samples/sec Loss 2.8960 LearningRate 0.0131 Epoch: 12 Global Step: 72560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:18,078-Speed 3392.36 samples/sec Loss 2.9222 LearningRate 0.0131 Epoch: 12 Global Step: 72570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:21,100-Speed 3390.17 samples/sec Loss 2.9589 LearningRate 0.0131 Epoch: 12 Global Step: 72580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:24,124-Speed 3387.10 samples/sec Loss 2.9127 LearningRate 0.0131 Epoch: 12 Global Step: 72590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:27,147-Speed 3388.36 samples/sec Loss 2.8460 LearningRate 0.0131 Epoch: 12 Global Step: 72600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:30,175-Speed 3381.60 samples/sec Loss 2.9347 LearningRate 0.0131 Epoch: 12 Global Step: 72610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:33,204-Speed 3381.91 samples/sec Loss 2.9632 LearningRate 0.0131 Epoch: 12 Global Step: 72620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:36,247-Speed 3364.89 samples/sec Loss 2.9449 LearningRate 0.0131 Epoch: 12 Global Step: 72630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:39,268-Speed 3391.54 samples/sec Loss 2.9280 LearningRate 0.0130 Epoch: 12 Global Step: 72640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:42,285-Speed 3395.31 samples/sec Loss 3.0044 LearningRate 0.0130 Epoch: 12 Global Step: 72650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:57:45,321-Speed 3372.86 samples/sec Loss 2.8817 LearningRate 0.0130 Epoch: 12 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:48,339-Speed 3393.88 samples/sec Loss 2.8744 LearningRate 0.0130 Epoch: 12 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:51,358-Speed 3392.89 samples/sec Loss 2.8848 LearningRate 0.0130 Epoch: 12 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:54,377-Speed 3392.35 samples/sec Loss 2.9969 LearningRate 0.0130 Epoch: 12 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:57:57,409-Speed 3377.92 samples/sec Loss 3.0340 LearningRate 0.0130 Epoch: 12 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:00,439-Speed 3380.96 samples/sec Loss 2.9694 LearningRate 0.0130 Epoch: 12 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:03,465-Speed 3384.18 samples/sec Loss 2.9211 LearningRate 0.0130 Epoch: 12 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:06,492-Speed 3383.63 samples/sec Loss 2.8591 LearningRate 0.0130 Epoch: 12 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:09,520-Speed 3382.88 samples/sec Loss 2.9554 LearningRate 0.0130 Epoch: 12 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:12,596-Speed 3329.47 samples/sec Loss 2.9081 LearningRate 0.0130 Epoch: 12 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:15,623-Speed 3384.26 samples/sec Loss 3.0438 LearningRate 0.0130 Epoch: 12 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:58:18,627-Speed 3409.65 samples/sec Loss 2.8704 LearningRate 0.0130 Epoch: 12 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:21,648-Speed 3389.74 samples/sec Loss 2.8690 LearningRate 0.0130 Epoch: 12 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:24,706-Speed 3349.41 samples/sec Loss 2.9705 LearningRate 0.0130 Epoch: 12 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:27,739-Speed 3377.27 samples/sec Loss 2.8748 LearningRate 0.0129 Epoch: 12 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:30,763-Speed 3387.24 samples/sec Loss 2.9610 LearningRate 0.0129 Epoch: 12 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:33,785-Speed 3389.37 samples/sec Loss 2.9684 LearningRate 0.0129 Epoch: 12 Global Step: 72820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:36,817-Speed 3378.57 samples/sec Loss 2.9279 LearningRate 0.0129 Epoch: 12 Global Step: 72830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:39,839-Speed 3389.26 samples/sec Loss 2.9099 LearningRate 0.0129 Epoch: 12 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:42,861-Speed 3388.80 samples/sec Loss 2.8181 LearningRate 0.0129 Epoch: 12 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:45,881-Speed 3391.43 samples/sec Loss 2.8713 LearningRate 0.0129 Epoch: 12 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:48,905-Speed 3386.95 samples/sec Loss 2.8996 LearningRate 0.0129 Epoch: 12 Global Step: 72870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:58:51,932-Speed 3383.94 samples/sec Loss 2.9933 LearningRate 0.0129 Epoch: 12 Global Step: 72880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 08:58:54,940-Speed 3404.37 samples/sec Loss 2.9494 LearningRate 0.0129 Epoch: 12 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:58:58,036-Speed 3308.48 samples/sec Loss 2.8497 LearningRate 0.0129 Epoch: 12 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:01,087-Speed 3357.90 samples/sec Loss 3.0402 LearningRate 0.0129 Epoch: 12 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:04,210-Speed 3279.63 samples/sec Loss 3.0286 LearningRate 0.0129 Epoch: 12 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:07,236-Speed 3384.86 samples/sec Loss 2.9189 LearningRate 0.0129 Epoch: 12 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:10,260-Speed 3386.53 samples/sec Loss 2.9943 LearningRate 0.0129 Epoch: 12 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:13,287-Speed 3384.04 samples/sec Loss 2.8651 LearningRate 0.0129 Epoch: 12 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:16,298-Speed 3401.33 samples/sec Loss 2.8662 LearningRate 0.0128 Epoch: 12 Global Step: 72960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:19,325-Speed 3383.42 samples/sec Loss 3.0109 LearningRate 0.0128 Epoch: 12 Global Step: 72970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:22,355-Speed 3380.39 samples/sec Loss 3.0068 LearningRate 0.0128 Epoch: 12 Global Step: 72980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:25,393-Speed 3372.31 samples/sec Loss 3.0098 LearningRate 0.0128 Epoch: 12 Global Step: 72990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:28,423-Speed 3380.98 samples/sec Loss 2.9869 LearningRate 0.0128 Epoch: 12 Global Step: 73000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:31,451-Speed 3382.83 samples/sec Loss 3.0532 LearningRate 0.0128 Epoch: 12 Global Step: 73010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:34,484-Speed 3376.69 samples/sec Loss 2.9699 LearningRate 0.0128 Epoch: 12 Global Step: 73020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:37,507-Speed 3387.69 samples/sec Loss 2.9548 LearningRate 0.0128 Epoch: 12 Global Step: 73030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:40,533-Speed 3385.64 samples/sec Loss 2.9927 LearningRate 0.0128 Epoch: 12 Global Step: 73040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:43,560-Speed 3383.02 samples/sec Loss 2.8180 LearningRate 0.0128 Epoch: 12 Global Step: 73050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 08:59:46,609-Speed 3358.92 samples/sec Loss 2.9191 LearningRate 0.0128 Epoch: 12 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:49,707-Speed 3306.79 samples/sec Loss 3.0338 LearningRate 0.0128 Epoch: 12 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:52,789-Speed 3323.25 samples/sec Loss 2.9421 LearningRate 0.0128 Epoch: 12 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:55,811-Speed 3388.77 samples/sec Loss 2.9367 LearningRate 0.0128 Epoch: 12 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 08:59:58,836-Speed 3386.89 samples/sec Loss 2.8891 LearningRate 0.0128 Epoch: 12 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:01,858-Speed 3388.91 samples/sec Loss 2.8427 LearningRate 0.0128 Epoch: 12 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:04,920-Speed 3344.90 samples/sec Loss 3.0667 LearningRate 0.0127 Epoch: 12 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:07,950-Speed 3379.82 samples/sec Loss 2.8976 LearningRate 0.0127 Epoch: 12 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:10,984-Speed 3376.08 samples/sec Loss 2.9693 LearningRate 0.0127 Epoch: 12 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:14,018-Speed 3376.10 samples/sec Loss 2.9801 LearningRate 0.0127 Epoch: 12 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:17,100-Speed 3322.40 samples/sec Loss 2.9294 LearningRate 0.0127 Epoch: 12 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:20,132-Speed 3378.32 samples/sec Loss 2.8829 LearningRate 0.0127 Epoch: 12 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:23,157-Speed 3386.17 samples/sec Loss 2.9208 LearningRate 0.0127 Epoch: 12 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:26,177-Speed 3392.23 samples/sec Loss 2.7978 LearningRate 0.0127 Epoch: 12 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:29,197-Speed 3390.84 samples/sec Loss 2.9562 LearningRate 0.0127 Epoch: 12 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:32,220-Speed 3388.58 samples/sec Loss 2.9643 LearningRate 0.0127 Epoch: 12 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:35,249-Speed 3381.13 samples/sec Loss 2.8365 LearningRate 0.0127 Epoch: 12 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:38,326-Speed 3329.22 samples/sec Loss 2.8939 LearningRate 0.0127 Epoch: 12 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:41,354-Speed 3382.74 samples/sec Loss 2.8007 LearningRate 0.0127 Epoch: 12 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:44,375-Speed 3389.95 samples/sec Loss 2.8758 LearningRate 0.0127 Epoch: 12 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:47,440-Speed 3341.72 samples/sec Loss 2.9350 LearningRate 0.0127 Epoch: 12 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:00:50,490-Speed 3359.00 samples/sec Loss 2.8800 LearningRate 0.0127 Epoch: 12 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:00:53,499-Speed 3403.39 samples/sec Loss 2.9448 LearningRate 0.0126 Epoch: 12 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:56,519-Speed 3391.01 samples/sec Loss 2.8822 LearningRate 0.0126 Epoch: 12 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:00:59,554-Speed 3375.36 samples/sec Loss 2.9363 LearningRate 0.0126 Epoch: 12 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:02,579-Speed 3386.16 samples/sec Loss 2.9328 LearningRate 0.0126 Epoch: 12 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:05,607-Speed 3382.97 samples/sec Loss 2.8604 LearningRate 0.0126 Epoch: 12 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:08,651-Speed 3364.36 samples/sec Loss 2.8660 LearningRate 0.0126 Epoch: 12 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:11,680-Speed 3380.99 samples/sec Loss 3.0384 LearningRate 0.0126 Epoch: 12 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:14,703-Speed 3388.40 samples/sec Loss 2.9014 LearningRate 0.0126 Epoch: 12 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:17,733-Speed 3380.35 samples/sec Loss 2.8864 LearningRate 0.0126 Epoch: 12 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:20,762-Speed 3382.02 samples/sec Loss 2.8674 LearningRate 0.0126 Epoch: 12 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:23,865-Speed 3300.80 samples/sec Loss 2.8950 LearningRate 0.0126 Epoch: 12 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:26,905-Speed 3368.40 samples/sec Loss 2.8909 LearningRate 0.0126 Epoch: 12 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:29,932-Speed 3383.64 samples/sec Loss 2.8958 LearningRate 0.0126 Epoch: 12 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:32,959-Speed 3384.18 samples/sec Loss 2.8766 LearningRate 0.0126 Epoch: 12 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:36,008-Speed 3359.37 samples/sec Loss 2.9006 LearningRate 0.0126 Epoch: 12 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:39,038-Speed 3380.02 samples/sec Loss 2.8586 LearningRate 0.0126 Epoch: 12 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:42,068-Speed 3381.01 samples/sec Loss 2.9212 LearningRate 0.0125 Epoch: 12 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:45,094-Speed 3384.86 samples/sec Loss 2.8678 LearningRate 0.0125 Epoch: 12 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:48,123-Speed 3381.45 samples/sec Loss 2.8719 LearningRate 0.0125 Epoch: 12 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:51,183-Speed 3346.54 samples/sec Loss 2.8746 LearningRate 0.0125 Epoch: 12 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:01:54,215-Speed 3378.20 samples/sec Loss 2.7850 LearningRate 0.0125 Epoch: 12 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:01:57,225-Speed 3402.17 samples/sec Loss 2.8934 LearningRate 0.0125 Epoch: 12 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:00,262-Speed 3372.90 samples/sec Loss 2.9791 LearningRate 0.0125 Epoch: 12 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:03,287-Speed 3386.51 samples/sec Loss 2.7747 LearningRate 0.0125 Epoch: 12 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:06,331-Speed 3364.61 samples/sec Loss 2.9994 LearningRate 0.0125 Epoch: 12 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:09,380-Speed 3359.06 samples/sec Loss 2.8507 LearningRate 0.0125 Epoch: 12 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:12,413-Speed 3376.61 samples/sec Loss 2.8774 LearningRate 0.0125 Epoch: 12 Global Step: 73540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:15,423-Speed 3402.80 samples/sec Loss 2.9192 LearningRate 0.0125 Epoch: 12 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:18,452-Speed 3382.25 samples/sec Loss 2.7602 LearningRate 0.0125 Epoch: 12 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:21,488-Speed 3373.37 samples/sec Loss 2.8998 LearningRate 0.0125 Epoch: 12 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:24,517-Speed 3380.85 samples/sec Loss 2.8270 LearningRate 0.0125 Epoch: 12 Global Step: 73580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:27,548-Speed 3379.50 samples/sec Loss 2.9970 LearningRate 0.0125 Epoch: 12 Global Step: 73590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:30,575-Speed 3383.30 samples/sec Loss 2.7373 LearningRate 0.0124 Epoch: 12 Global Step: 73600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:33,613-Speed 3371.51 samples/sec Loss 2.8123 LearningRate 0.0124 Epoch: 12 Global Step: 73610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:36,645-Speed 3378.16 samples/sec Loss 2.8558 LearningRate 0.0124 Epoch: 12 Global Step: 73620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:39,673-Speed 3383.03 samples/sec Loss 2.7808 LearningRate 0.0124 Epoch: 12 Global Step: 73630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:42,699-Speed 3384.73 samples/sec Loss 2.8472 LearningRate 0.0124 Epoch: 12 Global Step: 73640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:02:45,728-Speed 3381.22 samples/sec Loss 2.8752 LearningRate 0.0124 Epoch: 12 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:48,752-Speed 3387.83 samples/sec Loss 2.7922 LearningRate 0.0124 Epoch: 12 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:51,786-Speed 3375.70 samples/sec Loss 2.8463 LearningRate 0.0124 Epoch: 12 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:54,810-Speed 3386.87 samples/sec Loss 2.9139 LearningRate 0.0124 Epoch: 12 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:02:57,879-Speed 3337.21 samples/sec Loss 2.7771 LearningRate 0.0124 Epoch: 12 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:00,930-Speed 3357.96 samples/sec Loss 2.7920 LearningRate 0.0124 Epoch: 12 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:03,966-Speed 3373.98 samples/sec Loss 2.7743 LearningRate 0.0124 Epoch: 12 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:06,993-Speed 3383.34 samples/sec Loss 2.9315 LearningRate 0.0124 Epoch: 12 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:10,022-Speed 3382.02 samples/sec Loss 2.8681 LearningRate 0.0124 Epoch: 12 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:13,050-Speed 3381.80 samples/sec Loss 2.9373 LearningRate 0.0124 Epoch: 12 Global Step: 73740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:16,079-Speed 3382.43 samples/sec Loss 2.8042 LearningRate 0.0124 Epoch: 12 Global Step: 73750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:19,111-Speed 3378.64 samples/sec Loss 2.8214 LearningRate 0.0123 Epoch: 12 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:22,138-Speed 3383.21 samples/sec Loss 2.8484 LearningRate 0.0123 Epoch: 12 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:25,168-Speed 3380.02 samples/sec Loss 2.9432 LearningRate 0.0123 Epoch: 12 Global Step: 73780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:28,199-Speed 3379.83 samples/sec Loss 2.8418 LearningRate 0.0123 Epoch: 12 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:31,228-Speed 3381.45 samples/sec Loss 2.7803 LearningRate 0.0123 Epoch: 12 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:34,256-Speed 3382.88 samples/sec Loss 2.8536 LearningRate 0.0123 Epoch: 12 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:37,285-Speed 3381.08 samples/sec Loss 2.8954 LearningRate 0.0123 Epoch: 12 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:40,315-Speed 3381.04 samples/sec Loss 2.9475 LearningRate 0.0123 Epoch: 12 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:43,344-Speed 3380.35 samples/sec Loss 2.7333 LearningRate 0.0123 Epoch: 12 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:46,427-Speed 3322.08 samples/sec Loss 2.8988 LearningRate 0.0123 Epoch: 12 Global Step: 73850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:03:49,446-Speed 3393.58 samples/sec Loss 2.8565 LearningRate 0.0123 Epoch: 12 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:52,495-Speed 3359.45 samples/sec Loss 2.8528 LearningRate 0.0123 Epoch: 12 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:55,529-Speed 3376.16 samples/sec Loss 2.9475 LearningRate 0.0123 Epoch: 12 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:03:58,557-Speed 3382.77 samples/sec Loss 2.7411 LearningRate 0.0123 Epoch: 12 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:01,585-Speed 3383.04 samples/sec Loss 2.8188 LearningRate 0.0123 Epoch: 12 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:04,744-Speed 3242.47 samples/sec Loss 2.7688 LearningRate 0.0123 Epoch: 12 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:18,626-Speed 737.69 samples/sec Loss 2.5737 LearningRate 0.0122 Epoch: 13 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:21,650-Speed 3388.20 samples/sec Loss 2.2306 LearningRate 0.0122 Epoch: 13 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:24,695-Speed 3363.59 samples/sec Loss 2.2511 LearningRate 0.0122 Epoch: 13 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:27,752-Speed 3350.14 samples/sec Loss 2.2807 LearningRate 0.0122 Epoch: 13 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:30,771-Speed 3393.55 samples/sec Loss 2.3043 LearningRate 0.0122 Epoch: 13 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:04:33,791-Speed 3390.57 samples/sec Loss 2.3033 LearningRate 0.0122 Epoch: 13 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:04:36,808-Speed 3395.78 samples/sec Loss 2.2224 LearningRate 0.0122 Epoch: 13 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:39,835-Speed 3383.43 samples/sec Loss 2.2244 LearningRate 0.0122 Epoch: 13 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:04:42,856-Speed 3391.00 samples/sec Loss 2.3194 LearningRate 0.0122 Epoch: 13 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:05:26,291-[lfw][74000]XNorm: 22.218615 Training: 2022-04-27 09:05:26,292-[lfw][74000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-27 09:05:26,292-[lfw][74000]Accuracy-Highest: 0.99817 Training: 2022-04-27 09:06:17,167-[cfp_fp][74000]XNorm: 20.875479 Training: 2022-04-27 09:06:17,168-[cfp_fp][74000]Accuracy-Flip: 0.97429+-0.00769 Training: 2022-04-27 09:06:17,168-[cfp_fp][74000]Accuracy-Highest: 0.97557 Training: 2022-04-27 09:07:00,602-[agedb_30][74000]XNorm: 22.516372 Training: 2022-04-27 09:07:00,603-[agedb_30][74000]Accuracy-Flip: 0.98100+-0.00700 Training: 2022-04-27 09:07:00,603-[agedb_30][74000]Accuracy-Highest: 0.98100 Training: 2022-04-27 09:07:03,644-Speed 72.73 samples/sec Loss 2.2675 LearningRate 0.0122 Epoch: 13 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:06,651-Speed 3405.72 samples/sec Loss 2.3848 LearningRate 0.0122 Epoch: 13 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:09,658-Speed 3405.68 samples/sec Loss 2.3096 LearningRate 0.0122 Epoch: 13 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:12,671-Speed 3399.38 samples/sec Loss 2.3462 LearningRate 0.0122 Epoch: 13 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:15,692-Speed 3390.88 samples/sec Loss 2.2541 LearningRate 0.0122 Epoch: 13 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:18,703-Speed 3401.28 samples/sec Loss 2.3619 LearningRate 0.0122 Epoch: 13 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:21,728-Speed 3386.14 samples/sec Loss 2.2892 LearningRate 0.0122 Epoch: 13 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:24,734-Speed 3407.48 samples/sec Loss 2.2911 LearningRate 0.0122 Epoch: 13 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:27,756-Speed 3389.09 samples/sec Loss 2.2837 LearningRate 0.0121 Epoch: 13 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:30,774-Speed 3393.43 samples/sec Loss 2.3875 LearningRate 0.0121 Epoch: 13 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:33,788-Speed 3398.39 samples/sec Loss 2.2875 LearningRate 0.0121 Epoch: 13 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:36,937-Speed 3252.96 samples/sec Loss 2.3015 LearningRate 0.0121 Epoch: 13 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:39,953-Speed 3395.24 samples/sec Loss 2.3470 LearningRate 0.0121 Epoch: 13 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:42,969-Speed 3396.26 samples/sec Loss 2.2526 LearningRate 0.0121 Epoch: 13 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:45,996-Speed 3383.78 samples/sec Loss 2.3743 LearningRate 0.0121 Epoch: 13 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:49,032-Speed 3374.39 samples/sec Loss 2.3086 LearningRate 0.0121 Epoch: 13 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:52,057-Speed 3385.91 samples/sec Loss 2.3707 LearningRate 0.0121 Epoch: 13 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:07:55,072-Speed 3396.93 samples/sec Loss 2.3725 LearningRate 0.0121 Epoch: 13 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:07:58,088-Speed 3396.06 samples/sec Loss 2.3318 LearningRate 0.0121 Epoch: 13 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:08:01,084-Speed 3417.88 samples/sec Loss 2.3869 LearningRate 0.0121 Epoch: 13 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:04,106-Speed 3389.80 samples/sec Loss 2.2941 LearningRate 0.0121 Epoch: 13 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:07,127-Speed 3390.34 samples/sec Loss 2.3784 LearningRate 0.0121 Epoch: 13 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:10,143-Speed 3396.11 samples/sec Loss 2.2992 LearningRate 0.0121 Epoch: 13 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:13,160-Speed 3394.67 samples/sec Loss 2.3051 LearningRate 0.0121 Epoch: 13 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:16,194-Speed 3375.83 samples/sec Loss 2.3933 LearningRate 0.0120 Epoch: 13 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:19,222-Speed 3382.95 samples/sec Loss 2.3296 LearningRate 0.0120 Epoch: 13 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:22,238-Speed 3396.55 samples/sec Loss 2.3804 LearningRate 0.0120 Epoch: 13 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:25,252-Speed 3397.38 samples/sec Loss 2.3862 LearningRate 0.0120 Epoch: 13 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:28,267-Speed 3397.68 samples/sec Loss 2.3515 LearningRate 0.0120 Epoch: 13 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:31,265-Speed 3416.73 samples/sec Loss 2.4436 LearningRate 0.0120 Epoch: 13 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:34,282-Speed 3394.41 samples/sec Loss 2.4174 LearningRate 0.0120 Epoch: 13 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:37,323-Speed 3367.88 samples/sec Loss 2.4148 LearningRate 0.0120 Epoch: 13 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:40,340-Speed 3394.56 samples/sec Loss 2.3879 LearningRate 0.0120 Epoch: 13 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:43,355-Speed 3398.59 samples/sec Loss 2.3777 LearningRate 0.0120 Epoch: 13 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:46,369-Speed 3397.95 samples/sec Loss 2.4062 LearningRate 0.0120 Epoch: 13 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:49,387-Speed 3393.13 samples/sec Loss 2.3218 LearningRate 0.0120 Epoch: 13 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:52,408-Speed 3390.82 samples/sec Loss 2.3809 LearningRate 0.0120 Epoch: 13 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:55,436-Speed 3382.05 samples/sec Loss 2.3621 LearningRate 0.0120 Epoch: 13 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:08:58,465-Speed 3382.05 samples/sec Loss 2.4255 LearningRate 0.0120 Epoch: 13 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:09:01,488-Speed 3387.89 samples/sec Loss 2.3319 LearningRate 0.0120 Epoch: 13 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:09:04,503-Speed 3396.57 samples/sec Loss 2.4230 LearningRate 0.0119 Epoch: 13 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:09:07,525-Speed 3390.32 samples/sec Loss 2.4330 LearningRate 0.0119 Epoch: 13 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:09:10,542-Speed 3394.05 samples/sec Loss 2.3540 LearningRate 0.0119 Epoch: 13 Global Step: 74430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:09:13,560-Speed 3394.61 samples/sec Loss 2.4290 LearningRate 0.0119 Epoch: 13 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:09:16,561-Speed 3413.09 samples/sec Loss 2.3743 LearningRate 0.0119 Epoch: 13 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:09:19,578-Speed 3394.73 samples/sec Loss 2.4023 LearningRate 0.0119 Epoch: 13 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:09:22,596-Speed 3393.55 samples/sec Loss 2.3750 LearningRate 0.0119 Epoch: 13 Global Step: 74470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:25,670-Speed 3332.66 samples/sec Loss 2.3885 LearningRate 0.0119 Epoch: 13 Global Step: 74480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:28,746-Speed 3329.41 samples/sec Loss 2.5126 LearningRate 0.0119 Epoch: 13 Global Step: 74490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:31,761-Speed 3396.32 samples/sec Loss 2.3732 LearningRate 0.0119 Epoch: 13 Global Step: 74500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:34,794-Speed 3376.88 samples/sec Loss 2.3727 LearningRate 0.0119 Epoch: 13 Global Step: 74510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:37,813-Speed 3393.25 samples/sec Loss 2.3249 LearningRate 0.0119 Epoch: 13 Global Step: 74520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:40,837-Speed 3386.98 samples/sec Loss 2.4289 LearningRate 0.0119 Epoch: 13 Global Step: 74530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:43,854-Speed 3395.84 samples/sec Loss 2.4537 LearningRate 0.0119 Epoch: 13 Global Step: 74540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:46,878-Speed 3386.85 samples/sec Loss 2.3363 LearningRate 0.0119 Epoch: 13 Global Step: 74550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:49,896-Speed 3393.80 samples/sec Loss 2.4068 LearningRate 0.0119 Epoch: 13 Global Step: 74560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:09:52,920-Speed 3386.27 samples/sec Loss 2.3877 LearningRate 0.0119 Epoch: 13 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:09:55,938-Speed 3393.73 samples/sec Loss 2.4066 LearningRate 0.0118 Epoch: 13 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:09:58,959-Speed 3390.46 samples/sec Loss 2.4550 LearningRate 0.0118 Epoch: 13 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:01,986-Speed 3384.14 samples/sec Loss 2.4509 LearningRate 0.0118 Epoch: 13 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:05,003-Speed 3395.14 samples/sec Loss 2.4504 LearningRate 0.0118 Epoch: 13 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:08,017-Speed 3398.15 samples/sec Loss 2.3937 LearningRate 0.0118 Epoch: 13 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:11,036-Speed 3392.97 samples/sec Loss 2.4541 LearningRate 0.0118 Epoch: 13 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:14,058-Speed 3388.38 samples/sec Loss 2.4327 LearningRate 0.0118 Epoch: 13 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:17,086-Speed 3382.49 samples/sec Loss 2.4355 LearningRate 0.0118 Epoch: 13 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:20,105-Speed 3392.36 samples/sec Loss 2.4371 LearningRate 0.0118 Epoch: 13 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:23,128-Speed 3388.15 samples/sec Loss 2.3990 LearningRate 0.0118 Epoch: 13 Global Step: 74670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:10:26,131-Speed 3411.76 samples/sec Loss 2.3167 LearningRate 0.0118 Epoch: 13 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:29,152-Speed 3389.77 samples/sec Loss 2.4250 LearningRate 0.0118 Epoch: 13 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:32,167-Speed 3397.98 samples/sec Loss 2.3634 LearningRate 0.0118 Epoch: 13 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:35,201-Speed 3376.56 samples/sec Loss 2.4990 LearningRate 0.0118 Epoch: 13 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:38,243-Speed 3366.51 samples/sec Loss 2.4382 LearningRate 0.0118 Epoch: 13 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:41,260-Speed 3394.48 samples/sec Loss 2.4370 LearningRate 0.0118 Epoch: 13 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:44,284-Speed 3387.06 samples/sec Loss 2.4515 LearningRate 0.0117 Epoch: 13 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:47,300-Speed 3396.55 samples/sec Loss 2.4608 LearningRate 0.0117 Epoch: 13 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:50,318-Speed 3393.23 samples/sec Loss 2.5139 LearningRate 0.0117 Epoch: 13 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:53,342-Speed 3387.55 samples/sec Loss 2.4445 LearningRate 0.0117 Epoch: 13 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:56,349-Speed 3407.34 samples/sec Loss 2.4881 LearningRate 0.0117 Epoch: 13 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:10:59,388-Speed 3370.46 samples/sec Loss 2.4894 LearningRate 0.0117 Epoch: 13 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:02,414-Speed 3385.32 samples/sec Loss 2.3523 LearningRate 0.0117 Epoch: 13 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:05,437-Speed 3388.21 samples/sec Loss 2.3954 LearningRate 0.0117 Epoch: 13 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:08,458-Speed 3389.79 samples/sec Loss 2.3770 LearningRate 0.0117 Epoch: 13 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:11,492-Speed 3375.91 samples/sec Loss 2.5121 LearningRate 0.0117 Epoch: 13 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:14,516-Speed 3387.93 samples/sec Loss 2.3935 LearningRate 0.0117 Epoch: 13 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:17,537-Speed 3390.41 samples/sec Loss 2.5553 LearningRate 0.0117 Epoch: 13 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:20,561-Speed 3386.41 samples/sec Loss 2.3168 LearningRate 0.0117 Epoch: 13 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:23,586-Speed 3385.67 samples/sec Loss 2.4057 LearningRate 0.0117 Epoch: 13 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:26,595-Speed 3404.11 samples/sec Loss 2.4432 LearningRate 0.0117 Epoch: 13 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:29,616-Speed 3390.26 samples/sec Loss 2.5158 LearningRate 0.0117 Epoch: 13 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:32,651-Speed 3375.60 samples/sec Loss 2.4981 LearningRate 0.0117 Epoch: 13 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:35,683-Speed 3378.05 samples/sec Loss 2.3952 LearningRate 0.0116 Epoch: 13 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:38,701-Speed 3393.35 samples/sec Loss 2.5538 LearningRate 0.0116 Epoch: 13 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:41,722-Speed 3389.80 samples/sec Loss 2.4045 LearningRate 0.0116 Epoch: 13 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:44,743-Speed 3391.02 samples/sec Loss 2.4736 LearningRate 0.0116 Epoch: 13 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:47,776-Speed 3376.95 samples/sec Loss 2.5266 LearningRate 0.0116 Epoch: 13 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:50,825-Speed 3359.36 samples/sec Loss 2.4229 LearningRate 0.0116 Epoch: 13 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:53,847-Speed 3390.55 samples/sec Loss 2.4122 LearningRate 0.0116 Epoch: 13 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:11:56,872-Speed 3386.06 samples/sec Loss 2.4712 LearningRate 0.0116 Epoch: 13 Global Step: 74980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:11:59,893-Speed 3390.06 samples/sec Loss 2.4944 LearningRate 0.0116 Epoch: 13 Global Step: 74990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:12:02,919-Speed 3385.30 samples/sec Loss 2.4428 LearningRate 0.0116 Epoch: 13 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:12:05,939-Speed 3391.10 samples/sec Loss 2.4833 LearningRate 0.0116 Epoch: 13 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:12:08,938-Speed 3414.96 samples/sec Loss 2.3871 LearningRate 0.0116 Epoch: 13 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:12:11,946-Speed 3405.84 samples/sec Loss 2.4166 LearningRate 0.0116 Epoch: 13 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:14,971-Speed 3385.06 samples/sec Loss 2.4905 LearningRate 0.0116 Epoch: 13 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:17,993-Speed 3389.66 samples/sec Loss 2.4312 LearningRate 0.0116 Epoch: 13 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:21,011-Speed 3393.34 samples/sec Loss 2.4670 LearningRate 0.0116 Epoch: 13 Global Step: 75060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:24,030-Speed 3393.64 samples/sec Loss 2.6106 LearningRate 0.0116 Epoch: 13 Global Step: 75070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:27,049-Speed 3392.15 samples/sec Loss 2.4800 LearningRate 0.0115 Epoch: 13 Global Step: 75080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:30,074-Speed 3385.49 samples/sec Loss 2.4003 LearningRate 0.0115 Epoch: 13 Global Step: 75090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:33,100-Speed 3385.21 samples/sec Loss 2.4378 LearningRate 0.0115 Epoch: 13 Global Step: 75100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:36,124-Speed 3387.01 samples/sec Loss 2.4825 LearningRate 0.0115 Epoch: 13 Global Step: 75110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:39,155-Speed 3378.84 samples/sec Loss 2.5329 LearningRate 0.0115 Epoch: 13 Global Step: 75120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:42,172-Speed 3395.28 samples/sec Loss 2.3318 LearningRate 0.0115 Epoch: 13 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:12:45,189-Speed 3394.50 samples/sec Loss 2.4932 LearningRate 0.0115 Epoch: 13 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:12:48,190-Speed 3413.45 samples/sec Loss 2.5572 LearningRate 0.0115 Epoch: 13 Global Step: 75150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:51,208-Speed 3393.93 samples/sec Loss 2.4391 LearningRate 0.0115 Epoch: 13 Global Step: 75160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:54,229-Speed 3390.75 samples/sec Loss 2.5146 LearningRate 0.0115 Epoch: 13 Global Step: 75170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:12:57,254-Speed 3385.81 samples/sec Loss 2.5640 LearningRate 0.0115 Epoch: 13 Global Step: 75180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:00,306-Speed 3355.12 samples/sec Loss 2.5221 LearningRate 0.0115 Epoch: 13 Global Step: 75190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:03,329-Speed 3388.62 samples/sec Loss 2.4516 LearningRate 0.0115 Epoch: 13 Global Step: 75200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:06,352-Speed 3387.88 samples/sec Loss 2.5264 LearningRate 0.0115 Epoch: 13 Global Step: 75210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:09,374-Speed 3389.34 samples/sec Loss 2.3430 LearningRate 0.0115 Epoch: 13 Global Step: 75220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:12,400-Speed 3384.11 samples/sec Loss 2.4496 LearningRate 0.0115 Epoch: 13 Global Step: 75230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:15,420-Speed 3391.75 samples/sec Loss 2.5229 LearningRate 0.0114 Epoch: 13 Global Step: 75240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:18,446-Speed 3385.11 samples/sec Loss 2.5636 LearningRate 0.0114 Epoch: 13 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:21,469-Speed 3388.76 samples/sec Loss 2.5147 LearningRate 0.0114 Epoch: 13 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:24,492-Speed 3388.08 samples/sec Loss 2.5084 LearningRate 0.0114 Epoch: 13 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:27,516-Speed 3387.43 samples/sec Loss 2.4506 LearningRate 0.0114 Epoch: 13 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:30,540-Speed 3386.89 samples/sec Loss 2.4400 LearningRate 0.0114 Epoch: 13 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:33,561-Speed 3390.10 samples/sec Loss 2.4713 LearningRate 0.0114 Epoch: 13 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:36,584-Speed 3387.89 samples/sec Loss 2.5145 LearningRate 0.0114 Epoch: 13 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:39,617-Speed 3376.53 samples/sec Loss 2.5194 LearningRate 0.0114 Epoch: 13 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:42,747-Speed 3272.43 samples/sec Loss 2.4713 LearningRate 0.0114 Epoch: 13 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:45,788-Speed 3368.60 samples/sec Loss 2.5856 LearningRate 0.0114 Epoch: 13 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:13:48,811-Speed 3388.74 samples/sec Loss 2.5200 LearningRate 0.0114 Epoch: 13 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:13:51,835-Speed 3386.25 samples/sec Loss 2.3983 LearningRate 0.0114 Epoch: 13 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:13:54,826-Speed 3425.16 samples/sec Loss 2.5248 LearningRate 0.0114 Epoch: 13 Global Step: 75370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:13:57,847-Speed 3390.20 samples/sec Loss 2.3963 LearningRate 0.0114 Epoch: 13 Global Step: 75380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:00,871-Speed 3386.27 samples/sec Loss 2.5011 LearningRate 0.0114 Epoch: 13 Global Step: 75390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:03,897-Speed 3385.04 samples/sec Loss 2.5229 LearningRate 0.0114 Epoch: 13 Global Step: 75400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:06,922-Speed 3385.94 samples/sec Loss 2.4343 LearningRate 0.0113 Epoch: 13 Global Step: 75410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:09,998-Speed 3330.40 samples/sec Loss 2.5299 LearningRate 0.0113 Epoch: 13 Global Step: 75420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:13,039-Speed 3368.09 samples/sec Loss 2.4728 LearningRate 0.0113 Epoch: 13 Global Step: 75430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:16,073-Speed 3375.29 samples/sec Loss 2.4913 LearningRate 0.0113 Epoch: 13 Global Step: 75440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:19,095-Speed 3389.37 samples/sec Loss 2.4395 LearningRate 0.0113 Epoch: 13 Global Step: 75450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:22,122-Speed 3383.82 samples/sec Loss 2.5433 LearningRate 0.0113 Epoch: 13 Global Step: 75460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:14:25,150-Speed 3382.85 samples/sec Loss 2.5456 LearningRate 0.0113 Epoch: 13 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:28,176-Speed 3384.89 samples/sec Loss 2.5378 LearningRate 0.0113 Epoch: 13 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:31,197-Speed 3389.44 samples/sec Loss 2.4089 LearningRate 0.0113 Epoch: 13 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:34,229-Speed 3377.96 samples/sec Loss 2.5118 LearningRate 0.0113 Epoch: 13 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:37,253-Speed 3388.06 samples/sec Loss 2.5640 LearningRate 0.0113 Epoch: 13 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:40,274-Speed 3390.48 samples/sec Loss 2.5216 LearningRate 0.0113 Epoch: 13 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:43,298-Speed 3387.44 samples/sec Loss 2.6048 LearningRate 0.0113 Epoch: 13 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:46,326-Speed 3381.74 samples/sec Loss 2.4682 LearningRate 0.0113 Epoch: 13 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:49,365-Speed 3370.96 samples/sec Loss 2.5552 LearningRate 0.0113 Epoch: 13 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:52,399-Speed 3375.83 samples/sec Loss 2.4319 LearningRate 0.0113 Epoch: 13 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:55,405-Speed 3408.69 samples/sec Loss 2.4857 LearningRate 0.0113 Epoch: 13 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:14:58,436-Speed 3379.27 samples/sec Loss 2.5148 LearningRate 0.0112 Epoch: 13 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:01,460-Speed 3386.23 samples/sec Loss 2.5783 LearningRate 0.0112 Epoch: 13 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:04,486-Speed 3384.74 samples/sec Loss 2.5401 LearningRate 0.0112 Epoch: 13 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:07,507-Speed 3391.67 samples/sec Loss 2.5878 LearningRate 0.0112 Epoch: 13 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:10,554-Speed 3360.78 samples/sec Loss 2.5117 LearningRate 0.0112 Epoch: 13 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:13,576-Speed 3389.00 samples/sec Loss 2.4223 LearningRate 0.0112 Epoch: 13 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:16,602-Speed 3385.00 samples/sec Loss 2.4793 LearningRate 0.0112 Epoch: 13 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:19,630-Speed 3382.86 samples/sec Loss 2.5403 LearningRate 0.0112 Epoch: 13 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:22,664-Speed 3375.60 samples/sec Loss 2.5220 LearningRate 0.0112 Epoch: 13 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:25,753-Speed 3315.87 samples/sec Loss 2.4904 LearningRate 0.0112 Epoch: 13 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:15:28,762-Speed 3404.71 samples/sec Loss 2.4221 LearningRate 0.0112 Epoch: 13 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:31,784-Speed 3388.63 samples/sec Loss 2.4194 LearningRate 0.0112 Epoch: 13 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:34,808-Speed 3387.21 samples/sec Loss 2.6107 LearningRate 0.0112 Epoch: 13 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:37,828-Speed 3392.22 samples/sec Loss 2.5043 LearningRate 0.0112 Epoch: 13 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:40,852-Speed 3386.59 samples/sec Loss 2.5626 LearningRate 0.0112 Epoch: 13 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:43,876-Speed 3386.87 samples/sec Loss 2.4940 LearningRate 0.0112 Epoch: 13 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:46,896-Speed 3391.51 samples/sec Loss 2.4832 LearningRate 0.0112 Epoch: 13 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:49,932-Speed 3373.42 samples/sec Loss 2.5242 LearningRate 0.0111 Epoch: 13 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:52,952-Speed 3391.84 samples/sec Loss 2.5616 LearningRate 0.0111 Epoch: 13 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:55,988-Speed 3373.59 samples/sec Loss 2.5794 LearningRate 0.0111 Epoch: 13 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:15:59,016-Speed 3383.64 samples/sec Loss 2.6332 LearningRate 0.0111 Epoch: 13 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:16:02,023-Speed 3405.31 samples/sec Loss 2.5984 LearningRate 0.0111 Epoch: 13 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:05,045-Speed 3389.40 samples/sec Loss 2.4985 LearningRate 0.0111 Epoch: 13 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:08,073-Speed 3382.59 samples/sec Loss 2.5487 LearningRate 0.0111 Epoch: 13 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:11,112-Speed 3370.37 samples/sec Loss 2.6098 LearningRate 0.0111 Epoch: 13 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:14,140-Speed 3382.50 samples/sec Loss 2.6042 LearningRate 0.0111 Epoch: 13 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:17,166-Speed 3384.40 samples/sec Loss 2.5321 LearningRate 0.0111 Epoch: 13 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:20,213-Speed 3362.29 samples/sec Loss 2.5620 LearningRate 0.0111 Epoch: 13 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:23,233-Speed 3391.11 samples/sec Loss 2.5256 LearningRate 0.0111 Epoch: 13 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:26,261-Speed 3383.02 samples/sec Loss 2.4587 LearningRate 0.0111 Epoch: 13 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:29,284-Speed 3388.03 samples/sec Loss 2.5102 LearningRate 0.0111 Epoch: 13 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:32,307-Speed 3387.79 samples/sec Loss 2.4867 LearningRate 0.0111 Epoch: 13 Global Step: 75890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:16:35,324-Speed 3395.27 samples/sec Loss 2.5573 LearningRate 0.0111 Epoch: 13 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:38,366-Speed 3367.26 samples/sec Loss 2.5537 LearningRate 0.0111 Epoch: 13 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:41,386-Speed 3391.17 samples/sec Loss 2.6193 LearningRate 0.0110 Epoch: 13 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:44,413-Speed 3383.55 samples/sec Loss 2.6272 LearningRate 0.0110 Epoch: 13 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:47,436-Speed 3387.82 samples/sec Loss 2.4678 LearningRate 0.0110 Epoch: 13 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:50,468-Speed 3378.15 samples/sec Loss 2.3707 LearningRate 0.0110 Epoch: 13 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:53,509-Speed 3368.87 samples/sec Loss 2.5463 LearningRate 0.0110 Epoch: 13 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:56,539-Speed 3380.31 samples/sec Loss 2.4780 LearningRate 0.0110 Epoch: 13 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:16:59,564-Speed 3385.56 samples/sec Loss 2.4456 LearningRate 0.0110 Epoch: 13 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:17:02,594-Speed 3380.28 samples/sec Loss 2.3603 LearningRate 0.0110 Epoch: 13 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:17:05,607-Speed 3399.48 samples/sec Loss 2.5215 LearningRate 0.0110 Epoch: 13 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:17:49,111-[lfw][76000]XNorm: 20.858766 Training: 2022-04-27 09:17:49,111-[lfw][76000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-27 09:17:49,112-[lfw][76000]Accuracy-Highest: 0.99817 Training: 2022-04-27 09:18:39,696-[cfp_fp][76000]XNorm: 19.454355 Training: 2022-04-27 09:18:39,697-[cfp_fp][76000]Accuracy-Flip: 0.97743+-0.00654 Training: 2022-04-27 09:18:39,697-[cfp_fp][76000]Accuracy-Highest: 0.97743 Training: 2022-04-27 09:19:23,208-[agedb_30][76000]XNorm: 21.044185 Training: 2022-04-27 09:19:23,209-[agedb_30][76000]Accuracy-Flip: 0.98100+-0.00688 Training: 2022-04-27 09:19:23,209-[agedb_30][76000]Accuracy-Highest: 0.98100 Training: 2022-04-27 09:19:26,237-Speed 72.82 samples/sec Loss 2.5592 LearningRate 0.0110 Epoch: 13 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:29,248-Speed 3401.10 samples/sec Loss 2.4819 LearningRate 0.0110 Epoch: 13 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:32,256-Speed 3405.42 samples/sec Loss 2.5111 LearningRate 0.0110 Epoch: 13 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:35,267-Speed 3401.85 samples/sec Loss 2.5450 LearningRate 0.0110 Epoch: 13 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:38,277-Speed 3402.48 samples/sec Loss 2.5311 LearningRate 0.0110 Epoch: 13 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:41,303-Speed 3384.08 samples/sec Loss 2.3555 LearningRate 0.0110 Epoch: 13 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:19:44,299-Speed 3418.84 samples/sec Loss 2.4443 LearningRate 0.0110 Epoch: 13 Global Step: 76070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:19:47,322-Speed 3388.44 samples/sec Loss 2.3866 LearningRate 0.0110 Epoch: 13 Global Step: 76080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:19:50,344-Speed 3389.05 samples/sec Loss 2.5319 LearningRate 0.0109 Epoch: 13 Global Step: 76090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:19:53,361-Speed 3395.72 samples/sec Loss 2.5646 LearningRate 0.0109 Epoch: 13 Global Step: 76100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:19:56,381-Speed 3391.08 samples/sec Loss 2.5499 LearningRate 0.0109 Epoch: 13 Global Step: 76110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:19:59,407-Speed 3384.31 samples/sec Loss 2.6061 LearningRate 0.0109 Epoch: 13 Global Step: 76120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:20:02,429-Speed 3389.03 samples/sec Loss 2.5249 LearningRate 0.0109 Epoch: 13 Global Step: 76130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:20:05,449-Speed 3392.17 samples/sec Loss 2.6559 LearningRate 0.0109 Epoch: 13 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:20:08,469-Speed 3391.10 samples/sec Loss 2.4444 LearningRate 0.0109 Epoch: 13 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:20:11,511-Speed 3367.70 samples/sec Loss 2.5493 LearningRate 0.0109 Epoch: 13 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:20:14,561-Speed 3358.42 samples/sec Loss 2.4853 LearningRate 0.0109 Epoch: 13 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:17,597-Speed 3373.13 samples/sec Loss 2.5593 LearningRate 0.0109 Epoch: 13 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:20,637-Speed 3369.70 samples/sec Loss 2.5386 LearningRate 0.0109 Epoch: 13 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:23,662-Speed 3385.21 samples/sec Loss 2.4525 LearningRate 0.0109 Epoch: 13 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:26,691-Speed 3382.03 samples/sec Loss 2.5234 LearningRate 0.0109 Epoch: 13 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:29,707-Speed 3396.09 samples/sec Loss 2.5757 LearningRate 0.0109 Epoch: 13 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:32,722-Speed 3396.90 samples/sec Loss 2.5156 LearningRate 0.0109 Epoch: 13 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:35,740-Speed 3393.37 samples/sec Loss 2.4442 LearningRate 0.0109 Epoch: 13 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:38,775-Speed 3375.52 samples/sec Loss 2.5249 LearningRate 0.0109 Epoch: 13 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:41,794-Speed 3392.55 samples/sec Loss 2.4971 LearningRate 0.0109 Epoch: 13 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:44,813-Speed 3392.47 samples/sec Loss 2.5443 LearningRate 0.0108 Epoch: 13 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:20:47,808-Speed 3419.41 samples/sec Loss 2.5261 LearningRate 0.0108 Epoch: 13 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:50,827-Speed 3392.43 samples/sec Loss 2.5952 LearningRate 0.0108 Epoch: 13 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:53,848-Speed 3390.47 samples/sec Loss 2.5643 LearningRate 0.0108 Epoch: 13 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:56,861-Speed 3399.98 samples/sec Loss 2.5903 LearningRate 0.0108 Epoch: 13 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:20:59,879-Speed 3393.92 samples/sec Loss 2.5520 LearningRate 0.0108 Epoch: 13 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:02,903-Speed 3386.08 samples/sec Loss 2.5386 LearningRate 0.0108 Epoch: 13 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:05,916-Speed 3399.95 samples/sec Loss 2.5556 LearningRate 0.0108 Epoch: 13 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:08,928-Speed 3401.30 samples/sec Loss 2.4987 LearningRate 0.0108 Epoch: 13 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:11,941-Speed 3398.97 samples/sec Loss 2.5869 LearningRate 0.0108 Epoch: 13 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:14,953-Speed 3399.93 samples/sec Loss 2.5645 LearningRate 0.0108 Epoch: 13 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:18,230-Speed 3126.27 samples/sec Loss 2.4947 LearningRate 0.0108 Epoch: 13 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:21,306-Speed 3330.01 samples/sec Loss 2.4578 LearningRate 0.0108 Epoch: 13 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:24,334-Speed 3381.72 samples/sec Loss 2.4658 LearningRate 0.0108 Epoch: 13 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:27,360-Speed 3385.49 samples/sec Loss 2.4972 LearningRate 0.0108 Epoch: 13 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:30,410-Speed 3357.60 samples/sec Loss 2.5032 LearningRate 0.0108 Epoch: 13 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:33,424-Speed 3398.06 samples/sec Loss 2.6158 LearningRate 0.0108 Epoch: 13 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:36,447-Speed 3388.44 samples/sec Loss 2.5160 LearningRate 0.0107 Epoch: 13 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:39,459-Speed 3400.54 samples/sec Loss 2.6182 LearningRate 0.0107 Epoch: 13 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:42,474-Speed 3397.74 samples/sec Loss 2.4419 LearningRate 0.0107 Epoch: 13 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:45,482-Speed 3404.46 samples/sec Loss 2.5038 LearningRate 0.0107 Epoch: 13 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:48,494-Speed 3400.74 samples/sec Loss 2.4255 LearningRate 0.0107 Epoch: 13 Global Step: 76480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:21:51,490-Speed 3418.68 samples/sec Loss 2.6022 LearningRate 0.0107 Epoch: 13 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:54,495-Speed 3408.14 samples/sec Loss 2.5416 LearningRate 0.0107 Epoch: 13 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:21:57,513-Speed 3394.49 samples/sec Loss 2.5339 LearningRate 0.0107 Epoch: 13 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:00,518-Speed 3408.04 samples/sec Loss 2.4447 LearningRate 0.0107 Epoch: 13 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:03,539-Speed 3390.35 samples/sec Loss 2.5541 LearningRate 0.0107 Epoch: 13 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:06,551-Speed 3400.15 samples/sec Loss 2.6210 LearningRate 0.0107 Epoch: 13 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:09,569-Speed 3394.73 samples/sec Loss 2.4532 LearningRate 0.0107 Epoch: 13 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:12,584-Speed 3397.12 samples/sec Loss 2.5190 LearningRate 0.0107 Epoch: 13 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:15,594-Speed 3401.83 samples/sec Loss 2.5072 LearningRate 0.0107 Epoch: 13 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:18,612-Speed 3394.50 samples/sec Loss 2.4817 LearningRate 0.0107 Epoch: 13 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:21,623-Speed 3401.56 samples/sec Loss 2.5470 LearningRate 0.0107 Epoch: 13 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:22:24,650-Speed 3382.86 samples/sec Loss 2.5031 LearningRate 0.0107 Epoch: 13 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:27,663-Speed 3399.85 samples/sec Loss 2.5615 LearningRate 0.0106 Epoch: 13 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:30,683-Speed 3391.77 samples/sec Loss 2.5177 LearningRate 0.0106 Epoch: 13 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:33,700-Speed 3395.28 samples/sec Loss 2.5341 LearningRate 0.0106 Epoch: 13 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:36,725-Speed 3385.96 samples/sec Loss 2.6267 LearningRate 0.0106 Epoch: 13 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:39,757-Speed 3377.98 samples/sec Loss 2.4804 LearningRate 0.0106 Epoch: 13 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:42,786-Speed 3381.24 samples/sec Loss 2.5836 LearningRate 0.0106 Epoch: 13 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:45,803-Speed 3394.75 samples/sec Loss 2.6087 LearningRate 0.0106 Epoch: 13 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:48,825-Speed 3388.90 samples/sec Loss 2.5054 LearningRate 0.0106 Epoch: 13 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:51,846-Speed 3391.21 samples/sec Loss 2.6444 LearningRate 0.0106 Epoch: 13 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:22:54,862-Speed 3395.25 samples/sec Loss 2.5543 LearningRate 0.0106 Epoch: 13 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:22:57,861-Speed 3416.32 samples/sec Loss 2.4423 LearningRate 0.0106 Epoch: 13 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:00,901-Speed 3369.23 samples/sec Loss 2.4899 LearningRate 0.0106 Epoch: 13 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:03,921-Speed 3390.88 samples/sec Loss 2.6166 LearningRate 0.0106 Epoch: 13 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:06,941-Speed 3391.81 samples/sec Loss 2.5000 LearningRate 0.0106 Epoch: 13 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:09,958-Speed 3394.71 samples/sec Loss 2.5264 LearningRate 0.0106 Epoch: 13 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:12,978-Speed 3392.09 samples/sec Loss 2.4733 LearningRate 0.0106 Epoch: 13 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:16,004-Speed 3383.82 samples/sec Loss 2.4864 LearningRate 0.0106 Epoch: 13 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:19,026-Speed 3389.28 samples/sec Loss 2.5989 LearningRate 0.0106 Epoch: 13 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:22,051-Speed 3386.75 samples/sec Loss 2.4592 LearningRate 0.0105 Epoch: 13 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:25,079-Speed 3381.91 samples/sec Loss 2.5299 LearningRate 0.0105 Epoch: 13 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:28,081-Speed 3412.52 samples/sec Loss 2.4891 LearningRate 0.0105 Epoch: 13 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:31,102-Speed 3390.60 samples/sec Loss 2.5020 LearningRate 0.0105 Epoch: 13 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:34,120-Speed 3393.86 samples/sec Loss 2.6191 LearningRate 0.0105 Epoch: 13 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:37,139-Speed 3392.14 samples/sec Loss 2.5419 LearningRate 0.0105 Epoch: 13 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:40,158-Speed 3392.51 samples/sec Loss 2.4875 LearningRate 0.0105 Epoch: 13 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:23:43,162-Speed 3409.74 samples/sec Loss 2.4513 LearningRate 0.0105 Epoch: 13 Global Step: 76860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:23:46,201-Speed 3370.34 samples/sec Loss 2.4681 LearningRate 0.0105 Epoch: 13 Global Step: 76870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:23:49,227-Speed 3384.62 samples/sec Loss 2.4528 LearningRate 0.0105 Epoch: 13 Global Step: 76880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:23:52,252-Speed 3385.65 samples/sec Loss 2.4999 LearningRate 0.0105 Epoch: 13 Global Step: 76890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:23:55,268-Speed 3396.83 samples/sec Loss 2.4642 LearningRate 0.0105 Epoch: 13 Global Step: 76900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:23:58,283-Speed 3396.56 samples/sec Loss 2.4649 LearningRate 0.0105 Epoch: 13 Global Step: 76910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:01,300-Speed 3395.40 samples/sec Loss 2.5258 LearningRate 0.0105 Epoch: 13 Global Step: 76920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:04,320-Speed 3391.60 samples/sec Loss 2.4820 LearningRate 0.0105 Epoch: 13 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:07,336-Speed 3395.05 samples/sec Loss 2.5036 LearningRate 0.0105 Epoch: 13 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:10,355-Speed 3393.09 samples/sec Loss 2.5041 LearningRate 0.0105 Epoch: 13 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:13,371-Speed 3395.83 samples/sec Loss 2.5142 LearningRate 0.0104 Epoch: 13 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:16,400-Speed 3381.02 samples/sec Loss 2.4668 LearningRate 0.0104 Epoch: 13 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:19,414-Speed 3399.08 samples/sec Loss 2.5322 LearningRate 0.0104 Epoch: 13 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:22,432-Speed 3393.98 samples/sec Loss 2.4525 LearningRate 0.0104 Epoch: 13 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:25,451-Speed 3393.07 samples/sec Loss 2.3872 LearningRate 0.0104 Epoch: 13 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:28,472-Speed 3397.90 samples/sec Loss 2.5065 LearningRate 0.0104 Epoch: 13 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:24:31,471-Speed 3414.56 samples/sec Loss 2.6494 LearningRate 0.0104 Epoch: 13 Global Step: 77020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:34,493-Speed 3389.00 samples/sec Loss 2.5562 LearningRate 0.0104 Epoch: 13 Global Step: 77030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:37,515-Speed 3389.33 samples/sec Loss 2.6125 LearningRate 0.0104 Epoch: 13 Global Step: 77040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:40,535-Speed 3391.97 samples/sec Loss 2.5615 LearningRate 0.0104 Epoch: 13 Global Step: 77050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:43,553-Speed 3393.83 samples/sec Loss 2.4890 LearningRate 0.0104 Epoch: 13 Global Step: 77060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:46,570-Speed 3394.75 samples/sec Loss 2.4876 LearningRate 0.0104 Epoch: 13 Global Step: 77070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:49,591-Speed 3391.27 samples/sec Loss 2.6119 LearningRate 0.0104 Epoch: 13 Global Step: 77080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:52,611-Speed 3391.64 samples/sec Loss 2.4858 LearningRate 0.0104 Epoch: 13 Global Step: 77090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:55,631-Speed 3390.86 samples/sec Loss 2.6040 LearningRate 0.0104 Epoch: 13 Global Step: 77100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:24:58,663-Speed 3378.38 samples/sec Loss 2.5236 LearningRate 0.0104 Epoch: 13 Global Step: 77110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:01,685-Speed 3389.03 samples/sec Loss 2.4984 LearningRate 0.0104 Epoch: 13 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:04,714-Speed 3380.83 samples/sec Loss 2.5972 LearningRate 0.0104 Epoch: 13 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:07,735-Speed 3390.64 samples/sec Loss 2.5000 LearningRate 0.0103 Epoch: 13 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:10,757-Speed 3389.83 samples/sec Loss 2.4500 LearningRate 0.0103 Epoch: 13 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:13,792-Speed 3374.80 samples/sec Loss 2.4612 LearningRate 0.0103 Epoch: 13 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:16,822-Speed 3380.82 samples/sec Loss 2.5323 LearningRate 0.0103 Epoch: 13 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:19,846-Speed 3386.48 samples/sec Loss 2.5507 LearningRate 0.0103 Epoch: 13 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:22,864-Speed 3393.51 samples/sec Loss 2.4870 LearningRate 0.0103 Epoch: 13 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:25,882-Speed 3394.35 samples/sec Loss 2.5978 LearningRate 0.0103 Epoch: 13 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:28,904-Speed 3389.16 samples/sec Loss 2.4869 LearningRate 0.0103 Epoch: 13 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:31,921-Speed 3395.24 samples/sec Loss 2.4450 LearningRate 0.0103 Epoch: 13 Global Step: 77220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:25:34,938-Speed 3394.37 samples/sec Loss 2.5075 LearningRate 0.0103 Epoch: 13 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:25:37,946-Speed 3404.72 samples/sec Loss 2.5160 LearningRate 0.0103 Epoch: 13 Global Step: 77240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:40,965-Speed 3392.91 samples/sec Loss 2.4945 LearningRate 0.0103 Epoch: 13 Global Step: 77250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:43,982-Speed 3395.50 samples/sec Loss 2.4369 LearningRate 0.0103 Epoch: 13 Global Step: 77260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:47,013-Speed 3379.06 samples/sec Loss 2.4955 LearningRate 0.0103 Epoch: 13 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:50,035-Speed 3388.79 samples/sec Loss 2.5404 LearningRate 0.0103 Epoch: 13 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:53,074-Speed 3370.26 samples/sec Loss 2.5035 LearningRate 0.0103 Epoch: 13 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:56,091-Speed 3394.78 samples/sec Loss 2.5036 LearningRate 0.0103 Epoch: 13 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:25:59,118-Speed 3384.34 samples/sec Loss 2.5253 LearningRate 0.0103 Epoch: 13 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:02,150-Speed 3377.62 samples/sec Loss 2.5421 LearningRate 0.0102 Epoch: 13 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:05,172-Speed 3389.45 samples/sec Loss 2.5745 LearningRate 0.0102 Epoch: 13 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:08,191-Speed 3392.77 samples/sec Loss 2.3931 LearningRate 0.0102 Epoch: 13 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:11,215-Speed 3387.86 samples/sec Loss 2.4834 LearningRate 0.0102 Epoch: 13 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:14,240-Speed 3385.42 samples/sec Loss 2.5764 LearningRate 0.0102 Epoch: 13 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:17,273-Speed 3377.03 samples/sec Loss 2.5277 LearningRate 0.0102 Epoch: 13 Global Step: 77370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:20,294-Speed 3389.98 samples/sec Loss 2.4239 LearningRate 0.0102 Epoch: 13 Global Step: 77380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:23,320-Speed 3384.55 samples/sec Loss 2.5277 LearningRate 0.0102 Epoch: 13 Global Step: 77390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:26,344-Speed 3387.47 samples/sec Loss 2.4818 LearningRate 0.0102 Epoch: 13 Global Step: 77400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:29,362-Speed 3393.22 samples/sec Loss 2.4825 LearningRate 0.0102 Epoch: 13 Global Step: 77410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:32,385-Speed 3389.27 samples/sec Loss 2.5048 LearningRate 0.0102 Epoch: 13 Global Step: 77420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:35,408-Speed 3387.86 samples/sec Loss 2.4625 LearningRate 0.0102 Epoch: 13 Global Step: 77430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:38,429-Speed 3390.88 samples/sec Loss 2.5170 LearningRate 0.0102 Epoch: 13 Global Step: 77440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:41,451-Speed 3389.56 samples/sec Loss 2.4921 LearningRate 0.0102 Epoch: 13 Global Step: 77450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:26:44,473-Speed 3389.07 samples/sec Loss 2.5219 LearningRate 0.0102 Epoch: 13 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:47,523-Speed 3357.53 samples/sec Loss 2.5573 LearningRate 0.0102 Epoch: 13 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:50,545-Speed 3389.50 samples/sec Loss 2.6183 LearningRate 0.0102 Epoch: 13 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:53,575-Speed 3380.06 samples/sec Loss 2.4852 LearningRate 0.0101 Epoch: 13 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:56,598-Speed 3388.83 samples/sec Loss 2.5829 LearningRate 0.0101 Epoch: 13 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:26:59,638-Speed 3369.05 samples/sec Loss 2.4771 LearningRate 0.0101 Epoch: 13 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:02,676-Speed 3371.96 samples/sec Loss 2.4952 LearningRate 0.0101 Epoch: 13 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:05,710-Speed 3375.54 samples/sec Loss 2.5325 LearningRate 0.0101 Epoch: 13 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:08,737-Speed 3384.05 samples/sec Loss 2.4255 LearningRate 0.0101 Epoch: 13 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:11,767-Speed 3379.90 samples/sec Loss 2.5015 LearningRate 0.0101 Epoch: 13 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:14,791-Speed 3386.94 samples/sec Loss 2.6187 LearningRate 0.0101 Epoch: 13 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:27:17,818-Speed 3384.10 samples/sec Loss 2.4621 LearningRate 0.0101 Epoch: 13 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:27:20,849-Speed 3378.78 samples/sec Loss 2.4999 LearningRate 0.0101 Epoch: 13 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:27:23,865-Speed 3396.40 samples/sec Loss 2.5553 LearningRate 0.0101 Epoch: 13 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:26,928-Speed 3343.82 samples/sec Loss 2.6183 LearningRate 0.0101 Epoch: 13 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:29,950-Speed 3389.41 samples/sec Loss 2.6086 LearningRate 0.0101 Epoch: 13 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:32,975-Speed 3386.19 samples/sec Loss 2.5613 LearningRate 0.0101 Epoch: 13 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:35,998-Speed 3388.36 samples/sec Loss 2.5984 LearningRate 0.0101 Epoch: 13 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:39,023-Speed 3385.26 samples/sec Loss 2.4268 LearningRate 0.0101 Epoch: 13 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:42,050-Speed 3383.52 samples/sec Loss 2.5257 LearningRate 0.0101 Epoch: 13 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:45,075-Speed 3385.96 samples/sec Loss 2.6097 LearningRate 0.0101 Epoch: 13 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:48,098-Speed 3387.86 samples/sec Loss 2.5861 LearningRate 0.0100 Epoch: 13 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:51,132-Speed 3376.19 samples/sec Loss 2.5107 LearningRate 0.0100 Epoch: 13 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:54,142-Speed 3403.15 samples/sec Loss 2.5126 LearningRate 0.0100 Epoch: 13 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:27:57,165-Speed 3388.50 samples/sec Loss 2.5060 LearningRate 0.0100 Epoch: 13 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:00,190-Speed 3385.67 samples/sec Loss 2.5380 LearningRate 0.0100 Epoch: 13 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:03,213-Speed 3388.23 samples/sec Loss 2.4633 LearningRate 0.0100 Epoch: 13 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:06,239-Speed 3384.44 samples/sec Loss 2.5090 LearningRate 0.0100 Epoch: 13 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:09,265-Speed 3384.60 samples/sec Loss 2.5498 LearningRate 0.0100 Epoch: 13 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:12,292-Speed 3384.07 samples/sec Loss 2.5463 LearningRate 0.0100 Epoch: 13 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:15,320-Speed 3382.24 samples/sec Loss 2.4815 LearningRate 0.0100 Epoch: 13 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:18,346-Speed 3384.90 samples/sec Loss 2.5746 LearningRate 0.0100 Epoch: 13 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:21,370-Speed 3387.08 samples/sec Loss 2.4957 LearningRate 0.0100 Epoch: 13 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:24,379-Speed 3404.08 samples/sec Loss 2.3712 LearningRate 0.0100 Epoch: 13 Global Step: 77790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:27,403-Speed 3387.20 samples/sec Loss 2.5207 LearningRate 0.0100 Epoch: 13 Global Step: 77800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:30,425-Speed 3388.94 samples/sec Loss 2.4576 LearningRate 0.0100 Epoch: 13 Global Step: 77810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:33,451-Speed 3384.89 samples/sec Loss 2.5073 LearningRate 0.0100 Epoch: 13 Global Step: 77820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:36,486-Speed 3374.98 samples/sec Loss 2.5970 LearningRate 0.0100 Epoch: 13 Global Step: 77830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:39,506-Speed 3391.52 samples/sec Loss 2.5251 LearningRate 0.0100 Epoch: 13 Global Step: 77840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:42,536-Speed 3380.08 samples/sec Loss 2.5129 LearningRate 0.0099 Epoch: 13 Global Step: 77850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:45,565-Speed 3381.38 samples/sec Loss 2.5000 LearningRate 0.0099 Epoch: 13 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:48,589-Speed 3388.04 samples/sec Loss 2.4224 LearningRate 0.0099 Epoch: 13 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:51,619-Speed 3380.32 samples/sec Loss 2.5094 LearningRate 0.0099 Epoch: 13 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:28:54,636-Speed 3394.51 samples/sec Loss 2.5566 LearningRate 0.0099 Epoch: 13 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:28:57,655-Speed 3392.67 samples/sec Loss 2.4916 LearningRate 0.0099 Epoch: 13 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:00,678-Speed 3388.53 samples/sec Loss 2.4763 LearningRate 0.0099 Epoch: 13 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:03,705-Speed 3383.26 samples/sec Loss 2.5360 LearningRate 0.0099 Epoch: 13 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:06,732-Speed 3384.14 samples/sec Loss 2.4288 LearningRate 0.0099 Epoch: 13 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:09,758-Speed 3384.18 samples/sec Loss 2.4396 LearningRate 0.0099 Epoch: 13 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:12,783-Speed 3386.18 samples/sec Loss 2.6653 LearningRate 0.0099 Epoch: 13 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:15,818-Speed 3374.57 samples/sec Loss 2.6831 LearningRate 0.0099 Epoch: 13 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:18,845-Speed 3384.19 samples/sec Loss 2.5136 LearningRate 0.0099 Epoch: 13 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:21,878-Speed 3377.51 samples/sec Loss 2.4685 LearningRate 0.0099 Epoch: 13 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:29:24,908-Speed 3380.11 samples/sec Loss 2.4816 LearningRate 0.0099 Epoch: 13 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:29:27,923-Speed 3396.25 samples/sec Loss 2.5324 LearningRate 0.0099 Epoch: 13 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:30:11,379-[lfw][78000]XNorm: 22.533113 Training: 2022-04-27 09:30:11,380-[lfw][78000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-27 09:30:11,380-[lfw][78000]Accuracy-Highest: 0.99817 Training: 2022-04-27 09:31:01,690-[cfp_fp][78000]XNorm: 20.898850 Training: 2022-04-27 09:31:01,690-[cfp_fp][78000]Accuracy-Flip: 0.97714+-0.00616 Training: 2022-04-27 09:31:01,691-[cfp_fp][78000]Accuracy-Highest: 0.97743 Training: 2022-04-27 09:31:45,393-[agedb_30][78000]XNorm: 22.560032 Training: 2022-04-27 09:31:45,393-[agedb_30][78000]Accuracy-Flip: 0.98100+-0.00731 Training: 2022-04-27 09:31:45,394-[agedb_30][78000]Accuracy-Highest: 0.98100 Training: 2022-04-27 09:31:48,396-Speed 72.90 samples/sec Loss 2.5301 LearningRate 0.0099 Epoch: 13 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:31:51,401-Speed 3408.95 samples/sec Loss 2.4558 LearningRate 0.0099 Epoch: 13 Global Step: 78020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:31:54,398-Speed 3416.81 samples/sec Loss 2.5126 LearningRate 0.0098 Epoch: 13 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:31:57,399-Speed 3414.18 samples/sec Loss 2.3468 LearningRate 0.0098 Epoch: 13 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:00,402-Speed 3410.58 samples/sec Loss 2.5474 LearningRate 0.0098 Epoch: 13 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:03,405-Speed 3410.22 samples/sec Loss 2.5329 LearningRate 0.0098 Epoch: 13 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:06,409-Speed 3410.12 samples/sec Loss 2.4708 LearningRate 0.0098 Epoch: 13 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:09,419-Speed 3403.15 samples/sec Loss 2.5197 LearningRate 0.0098 Epoch: 13 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:12,431-Speed 3400.38 samples/sec Loss 2.6007 LearningRate 0.0098 Epoch: 13 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:15,441-Speed 3404.03 samples/sec Loss 2.4447 LearningRate 0.0098 Epoch: 13 Global Step: 78100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:32:18,468-Speed 3382.55 samples/sec Loss 2.4797 LearningRate 0.0098 Epoch: 13 Global Step: 78110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:32:21,480-Speed 3400.76 samples/sec Loss 2.4897 LearningRate 0.0098 Epoch: 13 Global Step: 78120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:32:24,482-Speed 3412.25 samples/sec Loss 2.5003 LearningRate 0.0098 Epoch: 13 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:27,495-Speed 3399.06 samples/sec Loss 2.5064 LearningRate 0.0098 Epoch: 13 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:30,509-Speed 3398.26 samples/sec Loss 2.4305 LearningRate 0.0098 Epoch: 13 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:33,525-Speed 3396.44 samples/sec Loss 2.4584 LearningRate 0.0098 Epoch: 13 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:36,541-Speed 3396.00 samples/sec Loss 2.4605 LearningRate 0.0098 Epoch: 13 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:39,562-Speed 3390.37 samples/sec Loss 2.5795 LearningRate 0.0098 Epoch: 13 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:42,606-Speed 3364.61 samples/sec Loss 2.4921 LearningRate 0.0098 Epoch: 13 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:45,628-Speed 3389.75 samples/sec Loss 2.5340 LearningRate 0.0098 Epoch: 13 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:48,654-Speed 3384.10 samples/sec Loss 2.4632 LearningRate 0.0098 Epoch: 13 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:51,673-Speed 3392.99 samples/sec Loss 2.5062 LearningRate 0.0097 Epoch: 13 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:54,679-Speed 3406.75 samples/sec Loss 2.3637 LearningRate 0.0097 Epoch: 13 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:32:57,705-Speed 3384.91 samples/sec Loss 2.3453 LearningRate 0.0097 Epoch: 13 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:00,728-Speed 3388.49 samples/sec Loss 2.4546 LearningRate 0.0097 Epoch: 13 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:03,777-Speed 3360.00 samples/sec Loss 2.4324 LearningRate 0.0097 Epoch: 13 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:06,797-Speed 3391.15 samples/sec Loss 2.4616 LearningRate 0.0097 Epoch: 13 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:09,828-Speed 3379.04 samples/sec Loss 2.4883 LearningRate 0.0097 Epoch: 13 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:12,845-Speed 3395.26 samples/sec Loss 2.4049 LearningRate 0.0097 Epoch: 13 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:15,843-Speed 3416.68 samples/sec Loss 2.4311 LearningRate 0.0097 Epoch: 13 Global Step: 78300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:18,874-Speed 3378.45 samples/sec Loss 2.5332 LearningRate 0.0097 Epoch: 13 Global Step: 78310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:21,886-Speed 3400.13 samples/sec Loss 2.5283 LearningRate 0.0097 Epoch: 13 Global Step: 78320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:24,902-Speed 3396.73 samples/sec Loss 2.5119 LearningRate 0.0097 Epoch: 13 Global Step: 78330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:27,935-Speed 3376.64 samples/sec Loss 2.4874 LearningRate 0.0097 Epoch: 13 Global Step: 78340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:30,947-Speed 3400.62 samples/sec Loss 2.5417 LearningRate 0.0097 Epoch: 13 Global Step: 78350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:33,957-Speed 3402.73 samples/sec Loss 2.5069 LearningRate 0.0097 Epoch: 13 Global Step: 78360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:36,970-Speed 3399.59 samples/sec Loss 2.4379 LearningRate 0.0097 Epoch: 13 Global Step: 78370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:39,980-Speed 3402.78 samples/sec Loss 2.5746 LearningRate 0.0097 Epoch: 13 Global Step: 78380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:42,994-Speed 3398.56 samples/sec Loss 2.4387 LearningRate 0.0097 Epoch: 13 Global Step: 78390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:33:46,093-Speed 3305.53 samples/sec Loss 2.5095 LearningRate 0.0096 Epoch: 13 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:49,105-Speed 3399.99 samples/sec Loss 2.4623 LearningRate 0.0096 Epoch: 13 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:52,133-Speed 3382.41 samples/sec Loss 2.4583 LearningRate 0.0096 Epoch: 13 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:55,144-Speed 3401.65 samples/sec Loss 2.4643 LearningRate 0.0096 Epoch: 13 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:33:58,151-Speed 3406.95 samples/sec Loss 2.5224 LearningRate 0.0096 Epoch: 13 Global Step: 78440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:01,158-Speed 3405.83 samples/sec Loss 2.4166 LearningRate 0.0096 Epoch: 13 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:04,148-Speed 3425.79 samples/sec Loss 2.5451 LearningRate 0.0096 Epoch: 13 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:07,153-Speed 3408.50 samples/sec Loss 2.5509 LearningRate 0.0096 Epoch: 13 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:10,160-Speed 3406.48 samples/sec Loss 2.4809 LearningRate 0.0096 Epoch: 13 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:13,167-Speed 3405.72 samples/sec Loss 2.4782 LearningRate 0.0096 Epoch: 13 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:16,172-Speed 3408.42 samples/sec Loss 2.2830 LearningRate 0.0096 Epoch: 13 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:19,182-Speed 3403.03 samples/sec Loss 2.4631 LearningRate 0.0096 Epoch: 13 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:22,187-Speed 3407.51 samples/sec Loss 2.5124 LearningRate 0.0096 Epoch: 13 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:25,195-Speed 3406.14 samples/sec Loss 2.4724 LearningRate 0.0096 Epoch: 13 Global Step: 78530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:28,200-Speed 3407.83 samples/sec Loss 2.4704 LearningRate 0.0096 Epoch: 13 Global Step: 78540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:31,211-Speed 3402.49 samples/sec Loss 2.3979 LearningRate 0.0096 Epoch: 13 Global Step: 78550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 09:34:34,235-Speed 3386.47 samples/sec Loss 2.5431 LearningRate 0.0096 Epoch: 13 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:37,246-Speed 3401.71 samples/sec Loss 2.3483 LearningRate 0.0096 Epoch: 13 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:40,251-Speed 3408.88 samples/sec Loss 2.4588 LearningRate 0.0095 Epoch: 13 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:43,266-Speed 3396.70 samples/sec Loss 2.4925 LearningRate 0.0095 Epoch: 13 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:46,277-Speed 3401.86 samples/sec Loss 2.4754 LearningRate 0.0095 Epoch: 13 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:49,288-Speed 3400.75 samples/sec Loss 2.4784 LearningRate 0.0095 Epoch: 13 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:52,295-Speed 3407.01 samples/sec Loss 2.5894 LearningRate 0.0095 Epoch: 13 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:55,301-Speed 3406.90 samples/sec Loss 2.3509 LearningRate 0.0095 Epoch: 13 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:34:58,316-Speed 3397.33 samples/sec Loss 2.5824 LearningRate 0.0095 Epoch: 13 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:01,331-Speed 3397.11 samples/sec Loss 2.4659 LearningRate 0.0095 Epoch: 13 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:04,343-Speed 3401.13 samples/sec Loss 2.5222 LearningRate 0.0095 Epoch: 13 Global Step: 78660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:35:07,332-Speed 3425.90 samples/sec Loss 2.5925 LearningRate 0.0095 Epoch: 13 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:10,349-Speed 3395.58 samples/sec Loss 2.4070 LearningRate 0.0095 Epoch: 13 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:13,356-Speed 3405.46 samples/sec Loss 2.4530 LearningRate 0.0095 Epoch: 13 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:16,374-Speed 3394.10 samples/sec Loss 2.4421 LearningRate 0.0095 Epoch: 13 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:19,386-Speed 3400.32 samples/sec Loss 2.4324 LearningRate 0.0095 Epoch: 13 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:22,404-Speed 3394.30 samples/sec Loss 2.4620 LearningRate 0.0095 Epoch: 13 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:25,415-Speed 3402.05 samples/sec Loss 2.5016 LearningRate 0.0095 Epoch: 13 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:28,430-Speed 3397.17 samples/sec Loss 2.5608 LearningRate 0.0095 Epoch: 13 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:31,446-Speed 3396.89 samples/sec Loss 2.4176 LearningRate 0.0095 Epoch: 13 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:34,477-Speed 3378.87 samples/sec Loss 2.5457 LearningRate 0.0095 Epoch: 13 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:37,487-Speed 3402.32 samples/sec Loss 2.3442 LearningRate 0.0094 Epoch: 13 Global Step: 78770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:35:40,480-Speed 3421.69 samples/sec Loss 2.5066 LearningRate 0.0094 Epoch: 13 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:43,493-Speed 3400.15 samples/sec Loss 2.5276 LearningRate 0.0094 Epoch: 13 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:46,515-Speed 3390.32 samples/sec Loss 2.5978 LearningRate 0.0094 Epoch: 13 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:49,530-Speed 3397.51 samples/sec Loss 2.3397 LearningRate 0.0094 Epoch: 13 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:52,543-Speed 3399.35 samples/sec Loss 2.4566 LearningRate 0.0094 Epoch: 13 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:55,554-Speed 3401.46 samples/sec Loss 2.4104 LearningRate 0.0094 Epoch: 13 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:35:58,563-Speed 3404.27 samples/sec Loss 2.4643 LearningRate 0.0094 Epoch: 13 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:01,581-Speed 3393.12 samples/sec Loss 2.4006 LearningRate 0.0094 Epoch: 13 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:04,598-Speed 3394.59 samples/sec Loss 2.3691 LearningRate 0.0094 Epoch: 13 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:07,612-Speed 3398.11 samples/sec Loss 2.4419 LearningRate 0.0094 Epoch: 13 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:10,612-Speed 3414.83 samples/sec Loss 2.5202 LearningRate 0.0094 Epoch: 13 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:13,623-Speed 3401.06 samples/sec Loss 2.4789 LearningRate 0.0094 Epoch: 13 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:16,634-Speed 3402.42 samples/sec Loss 2.5910 LearningRate 0.0094 Epoch: 13 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:19,647-Speed 3399.84 samples/sec Loss 2.4574 LearningRate 0.0094 Epoch: 13 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:22,657-Speed 3402.67 samples/sec Loss 2.3921 LearningRate 0.0094 Epoch: 13 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:25,669-Speed 3399.97 samples/sec Loss 2.4315 LearningRate 0.0094 Epoch: 13 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:28,688-Speed 3392.68 samples/sec Loss 2.3757 LearningRate 0.0094 Epoch: 13 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:31,702-Speed 3398.57 samples/sec Loss 2.3655 LearningRate 0.0093 Epoch: 13 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:34,728-Speed 3383.83 samples/sec Loss 2.4031 LearningRate 0.0093 Epoch: 13 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:37,795-Speed 3340.40 samples/sec Loss 2.4326 LearningRate 0.0093 Epoch: 13 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:40,791-Speed 3418.69 samples/sec Loss 2.4526 LearningRate 0.0093 Epoch: 13 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:43,800-Speed 3404.05 samples/sec Loss 2.4119 LearningRate 0.0093 Epoch: 13 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:46,817-Speed 3394.61 samples/sec Loss 2.3664 LearningRate 0.0093 Epoch: 13 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:49,832-Speed 3397.60 samples/sec Loss 2.4493 LearningRate 0.0093 Epoch: 13 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:52,845-Speed 3398.97 samples/sec Loss 2.5594 LearningRate 0.0093 Epoch: 13 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:55,857-Speed 3400.19 samples/sec Loss 2.3652 LearningRate 0.0093 Epoch: 13 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:36:58,876-Speed 3392.68 samples/sec Loss 2.4188 LearningRate 0.0093 Epoch: 13 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:01,892-Speed 3396.95 samples/sec Loss 2.5237 LearningRate 0.0093 Epoch: 13 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:04,915-Speed 3387.63 samples/sec Loss 2.4065 LearningRate 0.0093 Epoch: 13 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:07,930-Speed 3397.41 samples/sec Loss 2.4454 LearningRate 0.0093 Epoch: 13 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:10,930-Speed 3413.91 samples/sec Loss 2.4293 LearningRate 0.0093 Epoch: 13 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:13,942-Speed 3400.52 samples/sec Loss 2.5147 LearningRate 0.0093 Epoch: 13 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:16,957-Speed 3397.85 samples/sec Loss 2.4526 LearningRate 0.0093 Epoch: 13 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:19,973-Speed 3396.48 samples/sec Loss 2.4799 LearningRate 0.0093 Epoch: 13 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:22,990-Speed 3395.71 samples/sec Loss 2.5020 LearningRate 0.0093 Epoch: 13 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:26,015-Speed 3385.28 samples/sec Loss 2.5551 LearningRate 0.0093 Epoch: 13 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:29,034-Speed 3392.50 samples/sec Loss 2.3178 LearningRate 0.0092 Epoch: 13 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:32,047-Speed 3400.68 samples/sec Loss 2.4113 LearningRate 0.0092 Epoch: 13 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:35,061-Speed 3397.72 samples/sec Loss 2.5158 LearningRate 0.0092 Epoch: 13 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:38,079-Speed 3394.06 samples/sec Loss 2.4260 LearningRate 0.0092 Epoch: 13 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:41,089-Speed 3402.33 samples/sec Loss 2.5049 LearningRate 0.0092 Epoch: 13 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:44,113-Speed 3387.10 samples/sec Loss 2.5310 LearningRate 0.0092 Epoch: 13 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:47,136-Speed 3388.13 samples/sec Loss 2.3514 LearningRate 0.0092 Epoch: 13 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:50,164-Speed 3382.79 samples/sec Loss 2.4694 LearningRate 0.0092 Epoch: 13 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:53,182-Speed 3393.25 samples/sec Loss 2.3709 LearningRate 0.0092 Epoch: 13 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:56,204-Speed 3389.52 samples/sec Loss 2.4413 LearningRate 0.0092 Epoch: 13 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:37:59,223-Speed 3392.60 samples/sec Loss 2.4715 LearningRate 0.0092 Epoch: 13 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:02,243-Speed 3391.84 samples/sec Loss 2.4688 LearningRate 0.0092 Epoch: 13 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:05,261-Speed 3394.23 samples/sec Loss 2.4669 LearningRate 0.0092 Epoch: 13 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:08,279-Speed 3393.42 samples/sec Loss 2.4341 LearningRate 0.0092 Epoch: 13 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:11,304-Speed 3385.42 samples/sec Loss 2.6048 LearningRate 0.0092 Epoch: 13 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:38:14,320-Speed 3396.40 samples/sec Loss 2.4978 LearningRate 0.0092 Epoch: 13 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:17,347-Speed 3384.05 samples/sec Loss 2.4320 LearningRate 0.0092 Epoch: 13 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:20,370-Speed 3387.53 samples/sec Loss 2.4441 LearningRate 0.0092 Epoch: 13 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:23,391-Speed 3390.72 samples/sec Loss 2.3809 LearningRate 0.0092 Epoch: 13 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:26,421-Speed 3380.40 samples/sec Loss 2.4553 LearningRate 0.0091 Epoch: 13 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:29,438-Speed 3395.57 samples/sec Loss 2.4345 LearningRate 0.0091 Epoch: 13 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:32,454-Speed 3395.84 samples/sec Loss 2.4923 LearningRate 0.0091 Epoch: 13 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:35,468-Speed 3397.35 samples/sec Loss 2.4303 LearningRate 0.0091 Epoch: 13 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:38,521-Speed 3354.93 samples/sec Loss 2.3641 LearningRate 0.0091 Epoch: 13 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:41,535-Speed 3398.57 samples/sec Loss 2.4349 LearningRate 0.0091 Epoch: 13 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:44,541-Speed 3406.98 samples/sec Loss 2.3865 LearningRate 0.0091 Epoch: 13 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:47,563-Speed 3389.92 samples/sec Loss 2.4964 LearningRate 0.0091 Epoch: 13 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:50,582-Speed 3392.36 samples/sec Loss 2.3520 LearningRate 0.0091 Epoch: 13 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:53,604-Speed 3388.86 samples/sec Loss 2.4587 LearningRate 0.0091 Epoch: 13 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:56,622-Speed 3394.54 samples/sec Loss 2.4949 LearningRate 0.0091 Epoch: 13 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:38:59,644-Speed 3389.46 samples/sec Loss 2.4636 LearningRate 0.0091 Epoch: 13 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:02,679-Speed 3374.52 samples/sec Loss 2.3760 LearningRate 0.0091 Epoch: 13 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:05,695-Speed 3395.05 samples/sec Loss 2.3919 LearningRate 0.0091 Epoch: 13 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:08,715-Speed 3393.14 samples/sec Loss 2.4845 LearningRate 0.0091 Epoch: 13 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:11,751-Speed 3374.29 samples/sec Loss 2.3599 LearningRate 0.0091 Epoch: 13 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:14,788-Speed 3371.84 samples/sec Loss 2.3498 LearningRate 0.0091 Epoch: 13 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:39:17,789-Speed 3412.45 samples/sec Loss 2.3930 LearningRate 0.0091 Epoch: 13 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:20,818-Speed 3382.32 samples/sec Loss 2.5285 LearningRate 0.0090 Epoch: 13 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:23,856-Speed 3371.96 samples/sec Loss 2.4718 LearningRate 0.0090 Epoch: 13 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:26,926-Speed 3336.62 samples/sec Loss 2.3940 LearningRate 0.0090 Epoch: 13 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:29,954-Speed 3382.78 samples/sec Loss 2.4401 LearningRate 0.0090 Epoch: 13 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:32,970-Speed 3395.57 samples/sec Loss 2.2820 LearningRate 0.0090 Epoch: 13 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:35,991-Speed 3389.90 samples/sec Loss 2.3888 LearningRate 0.0090 Epoch: 13 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:39,020-Speed 3382.47 samples/sec Loss 2.3352 LearningRate 0.0090 Epoch: 13 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:42,040-Speed 3391.07 samples/sec Loss 2.4811 LearningRate 0.0090 Epoch: 13 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:45,104-Speed 3343.50 samples/sec Loss 2.4439 LearningRate 0.0090 Epoch: 13 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:39:48,187-Speed 3321.60 samples/sec Loss 2.4263 LearningRate 0.0090 Epoch: 13 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:01,603-Speed 763.34 samples/sec Loss 2.0940 LearningRate 0.0090 Epoch: 14 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:04,621-Speed 3393.39 samples/sec Loss 1.9365 LearningRate 0.0090 Epoch: 14 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:07,654-Speed 3377.79 samples/sec Loss 1.9344 LearningRate 0.0090 Epoch: 14 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:10,677-Speed 3387.99 samples/sec Loss 1.8368 LearningRate 0.0090 Epoch: 14 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:13,720-Speed 3365.88 samples/sec Loss 1.7792 LearningRate 0.0090 Epoch: 14 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:16,809-Speed 3316.09 samples/sec Loss 1.8465 LearningRate 0.0090 Epoch: 14 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:19,831-Speed 3388.84 samples/sec Loss 1.8926 LearningRate 0.0090 Epoch: 14 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:22,855-Speed 3387.12 samples/sec Loss 1.7856 LearningRate 0.0090 Epoch: 14 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:25,882-Speed 3383.46 samples/sec Loss 1.8748 LearningRate 0.0090 Epoch: 14 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:28,906-Speed 3386.91 samples/sec Loss 1.8453 LearningRate 0.0089 Epoch: 14 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 09:40:31,915-Speed 3404.15 samples/sec Loss 1.7992 LearningRate 0.0089 Epoch: 14 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 09:40:34,951-Speed 3373.75 samples/sec Loss 1.9051 LearningRate 0.0089 Epoch: 14 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:37,995-Speed 3365.11 samples/sec Loss 1.8749 LearningRate 0.0089 Epoch: 14 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:41,041-Speed 3362.67 samples/sec Loss 1.8135 LearningRate 0.0089 Epoch: 14 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:44,069-Speed 3382.46 samples/sec Loss 1.9596 LearningRate 0.0089 Epoch: 14 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:47,094-Speed 3385.97 samples/sec Loss 1.7432 LearningRate 0.0089 Epoch: 14 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:50,134-Speed 3369.24 samples/sec Loss 1.8393 LearningRate 0.0089 Epoch: 14 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:53,264-Speed 3272.79 samples/sec Loss 1.8289 LearningRate 0.0089 Epoch: 14 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:56,301-Speed 3372.64 samples/sec Loss 1.9121 LearningRate 0.0089 Epoch: 14 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:40:59,334-Speed 3377.00 samples/sec Loss 1.9114 LearningRate 0.0089 Epoch: 14 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:02,370-Speed 3372.70 samples/sec Loss 1.9952 LearningRate 0.0089 Epoch: 14 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:41:05,392-Speed 3389.62 samples/sec Loss 1.8481 LearningRate 0.0089 Epoch: 14 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:08,427-Speed 3374.96 samples/sec Loss 1.8922 LearningRate 0.0089 Epoch: 14 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:11,470-Speed 3366.54 samples/sec Loss 1.8985 LearningRate 0.0089 Epoch: 14 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:14,512-Speed 3366.44 samples/sec Loss 1.8984 LearningRate 0.0089 Epoch: 14 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:17,544-Speed 3378.33 samples/sec Loss 1.7908 LearningRate 0.0089 Epoch: 14 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:20,601-Speed 3350.44 samples/sec Loss 1.8521 LearningRate 0.0089 Epoch: 14 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:23,635-Speed 3375.50 samples/sec Loss 1.9535 LearningRate 0.0089 Epoch: 14 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:26,669-Speed 3376.19 samples/sec Loss 1.9467 LearningRate 0.0088 Epoch: 14 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:29,700-Speed 3378.84 samples/sec Loss 1.9674 LearningRate 0.0088 Epoch: 14 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:32,731-Speed 3379.72 samples/sec Loss 1.9422 LearningRate 0.0088 Epoch: 14 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:35,767-Speed 3373.46 samples/sec Loss 1.8881 LearningRate 0.0088 Epoch: 14 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:38,795-Speed 3383.70 samples/sec Loss 1.9220 LearningRate 0.0088 Epoch: 14 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:41,842-Speed 3361.82 samples/sec Loss 1.8215 LearningRate 0.0088 Epoch: 14 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:44,885-Speed 3365.74 samples/sec Loss 2.0358 LearningRate 0.0088 Epoch: 14 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:47,906-Speed 3390.23 samples/sec Loss 1.9711 LearningRate 0.0088 Epoch: 14 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:50,917-Speed 3400.60 samples/sec Loss 2.0016 LearningRate 0.0088 Epoch: 14 Global Step: 79970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:53,935-Speed 3395.16 samples/sec Loss 1.9994 LearningRate 0.0088 Epoch: 14 Global Step: 79980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:56,949-Speed 3398.44 samples/sec Loss 1.9045 LearningRate 0.0088 Epoch: 14 Global Step: 79990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:41:59,962-Speed 3398.73 samples/sec Loss 1.8309 LearningRate 0.0088 Epoch: 14 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:42:43,356-[lfw][80000]XNorm: 22.465401 Training: 2022-04-27 09:42:43,356-[lfw][80000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-27 09:42:43,357-[lfw][80000]Accuracy-Highest: 0.99817 Training: 2022-04-27 09:43:33,766-[cfp_fp][80000]XNorm: 21.072669 Training: 2022-04-27 09:43:33,766-[cfp_fp][80000]Accuracy-Flip: 0.97643+-0.00572 Training: 2022-04-27 09:43:33,767-[cfp_fp][80000]Accuracy-Highest: 0.97743 Training: 2022-04-27 09:44:17,117-[agedb_30][80000]XNorm: 22.450177 Training: 2022-04-27 09:44:17,118-[agedb_30][80000]Accuracy-Flip: 0.97883+-0.00738 Training: 2022-04-27 09:44:17,119-[agedb_30][80000]Accuracy-Highest: 0.98100 Training: 2022-04-27 09:44:20,123-Speed 73.06 samples/sec Loss 1.9434 LearningRate 0.0088 Epoch: 14 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:23,114-Speed 3424.98 samples/sec Loss 1.9406 LearningRate 0.0088 Epoch: 14 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:44:26,106-Speed 3422.23 samples/sec Loss 1.9628 LearningRate 0.0088 Epoch: 14 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:44:29,085-Speed 3439.41 samples/sec Loss 1.8724 LearningRate 0.0088 Epoch: 14 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:32,083-Speed 3416.13 samples/sec Loss 1.9819 LearningRate 0.0088 Epoch: 14 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:35,084-Speed 3412.21 samples/sec Loss 1.9977 LearningRate 0.0088 Epoch: 14 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:38,180-Speed 3308.49 samples/sec Loss 1.8972 LearningRate 0.0088 Epoch: 14 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:41,239-Speed 3348.12 samples/sec Loss 1.9475 LearningRate 0.0088 Epoch: 14 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:44,241-Speed 3412.20 samples/sec Loss 1.9179 LearningRate 0.0087 Epoch: 14 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:47,239-Speed 3416.91 samples/sec Loss 1.9597 LearningRate 0.0087 Epoch: 14 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:50,238-Speed 3415.07 samples/sec Loss 1.8815 LearningRate 0.0087 Epoch: 14 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:53,235-Speed 3417.66 samples/sec Loss 1.9165 LearningRate 0.0087 Epoch: 14 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:44:56,241-Speed 3407.20 samples/sec Loss 1.9339 LearningRate 0.0087 Epoch: 14 Global Step: 80130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:44:59,282-Speed 3367.65 samples/sec Loss 1.9232 LearningRate 0.0087 Epoch: 14 Global Step: 80140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:02,339-Speed 3351.25 samples/sec Loss 1.9110 LearningRate 0.0087 Epoch: 14 Global Step: 80150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:05,357-Speed 3393.49 samples/sec Loss 1.8592 LearningRate 0.0087 Epoch: 14 Global Step: 80160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:08,364-Speed 3405.49 samples/sec Loss 2.0290 LearningRate 0.0087 Epoch: 14 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:11,371-Speed 3406.12 samples/sec Loss 1.9256 LearningRate 0.0087 Epoch: 14 Global Step: 80180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:14,389-Speed 3393.98 samples/sec Loss 2.0819 LearningRate 0.0087 Epoch: 14 Global Step: 80190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:17,393-Speed 3409.94 samples/sec Loss 1.9724 LearningRate 0.0087 Epoch: 14 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:20,398-Speed 3408.53 samples/sec Loss 1.9971 LearningRate 0.0087 Epoch: 14 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:23,403-Speed 3408.90 samples/sec Loss 1.9486 LearningRate 0.0087 Epoch: 14 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:45:26,409-Speed 3407.39 samples/sec Loss 1.9061 LearningRate 0.0087 Epoch: 14 Global Step: 80230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:29,454-Speed 3363.53 samples/sec Loss 1.9547 LearningRate 0.0087 Epoch: 14 Global Step: 80240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:32,459-Speed 3408.68 samples/sec Loss 2.0480 LearningRate 0.0087 Epoch: 14 Global Step: 80250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:35,484-Speed 3384.73 samples/sec Loss 1.9744 LearningRate 0.0087 Epoch: 14 Global Step: 80260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:38,486-Speed 3412.13 samples/sec Loss 1.8846 LearningRate 0.0087 Epoch: 14 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:41,489-Speed 3410.85 samples/sec Loss 1.9745 LearningRate 0.0086 Epoch: 14 Global Step: 80280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:44,492-Speed 3410.81 samples/sec Loss 1.9972 LearningRate 0.0086 Epoch: 14 Global Step: 80290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:47,501-Speed 3404.05 samples/sec Loss 1.8467 LearningRate 0.0086 Epoch: 14 Global Step: 80300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:50,509-Speed 3405.61 samples/sec Loss 1.9484 LearningRate 0.0086 Epoch: 14 Global Step: 80310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:53,518-Speed 3404.04 samples/sec Loss 1.9422 LearningRate 0.0086 Epoch: 14 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:45:56,523-Speed 3407.88 samples/sec Loss 1.9639 LearningRate 0.0086 Epoch: 14 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:45:59,521-Speed 3416.35 samples/sec Loss 1.9617 LearningRate 0.0086 Epoch: 14 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:02,527-Speed 3407.91 samples/sec Loss 1.9612 LearningRate 0.0086 Epoch: 14 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:05,537-Speed 3401.84 samples/sec Loss 2.0858 LearningRate 0.0086 Epoch: 14 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:08,545-Speed 3405.06 samples/sec Loss 2.0233 LearningRate 0.0086 Epoch: 14 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:11,554-Speed 3404.21 samples/sec Loss 1.9843 LearningRate 0.0086 Epoch: 14 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:14,601-Speed 3362.38 samples/sec Loss 1.8959 LearningRate 0.0086 Epoch: 14 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:17,607-Speed 3407.51 samples/sec Loss 2.0126 LearningRate 0.0086 Epoch: 14 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:20,609-Speed 3411.66 samples/sec Loss 2.0605 LearningRate 0.0086 Epoch: 14 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:23,619-Speed 3402.30 samples/sec Loss 1.9224 LearningRate 0.0086 Epoch: 14 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:26,622-Speed 3410.33 samples/sec Loss 1.8699 LearningRate 0.0086 Epoch: 14 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:29,608-Speed 3430.07 samples/sec Loss 1.9585 LearningRate 0.0086 Epoch: 14 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:32,613-Speed 3409.10 samples/sec Loss 2.0607 LearningRate 0.0086 Epoch: 14 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:46:35,675-Speed 3345.22 samples/sec Loss 2.0040 LearningRate 0.0086 Epoch: 14 Global Step: 80460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:38,755-Speed 3324.60 samples/sec Loss 1.9448 LearningRate 0.0085 Epoch: 14 Global Step: 80470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:41,763-Speed 3405.74 samples/sec Loss 1.9477 LearningRate 0.0085 Epoch: 14 Global Step: 80480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:44,769-Speed 3407.59 samples/sec Loss 1.9702 LearningRate 0.0085 Epoch: 14 Global Step: 80490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:47,776-Speed 3405.59 samples/sec Loss 2.0023 LearningRate 0.0085 Epoch: 14 Global Step: 80500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:50,780-Speed 3409.23 samples/sec Loss 1.9975 LearningRate 0.0085 Epoch: 14 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:53,791-Speed 3402.72 samples/sec Loss 2.0581 LearningRate 0.0085 Epoch: 14 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:56,796-Speed 3408.12 samples/sec Loss 2.0030 LearningRate 0.0085 Epoch: 14 Global Step: 80530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:46:59,803-Speed 3405.42 samples/sec Loss 1.9996 LearningRate 0.0085 Epoch: 14 Global Step: 80540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:47:02,845-Speed 3368.03 samples/sec Loss 2.0257 LearningRate 0.0085 Epoch: 14 Global Step: 80550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:47:05,852-Speed 3407.39 samples/sec Loss 2.0274 LearningRate 0.0085 Epoch: 14 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:08,861-Speed 3403.20 samples/sec Loss 2.0649 LearningRate 0.0085 Epoch: 14 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:11,869-Speed 3404.98 samples/sec Loss 1.9576 LearningRate 0.0085 Epoch: 14 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:14,877-Speed 3404.67 samples/sec Loss 1.9408 LearningRate 0.0085 Epoch: 14 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:17,889-Speed 3401.40 samples/sec Loss 1.8832 LearningRate 0.0085 Epoch: 14 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:20,898-Speed 3403.59 samples/sec Loss 2.0886 LearningRate 0.0085 Epoch: 14 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:23,914-Speed 3396.86 samples/sec Loss 2.0263 LearningRate 0.0085 Epoch: 14 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:26,920-Speed 3407.19 samples/sec Loss 2.0058 LearningRate 0.0085 Epoch: 14 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:29,933-Speed 3399.47 samples/sec Loss 1.9363 LearningRate 0.0085 Epoch: 14 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:32,943-Speed 3402.72 samples/sec Loss 2.0095 LearningRate 0.0085 Epoch: 14 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:35,936-Speed 3421.80 samples/sec Loss 1.9484 LearningRate 0.0085 Epoch: 14 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:38,942-Speed 3407.16 samples/sec Loss 1.9753 LearningRate 0.0084 Epoch: 14 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:41,950-Speed 3405.37 samples/sec Loss 2.0445 LearningRate 0.0084 Epoch: 14 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:44,965-Speed 3396.23 samples/sec Loss 1.9796 LearningRate 0.0084 Epoch: 14 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:47:47,974-Speed 3404.44 samples/sec Loss 2.0021 LearningRate 0.0084 Epoch: 14 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:47:51,025-Speed 3356.80 samples/sec Loss 2.0371 LearningRate 0.0084 Epoch: 14 Global Step: 80710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:47:54,041-Speed 3396.70 samples/sec Loss 1.9981 LearningRate 0.0084 Epoch: 14 Global Step: 80720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:47:57,055-Speed 3398.19 samples/sec Loss 2.0026 LearningRate 0.0084 Epoch: 14 Global Step: 80730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:00,068-Speed 3399.21 samples/sec Loss 2.0531 LearningRate 0.0084 Epoch: 14 Global Step: 80740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:03,085-Speed 3395.15 samples/sec Loss 2.0460 LearningRate 0.0084 Epoch: 14 Global Step: 80750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:06,098-Speed 3399.73 samples/sec Loss 1.9859 LearningRate 0.0084 Epoch: 14 Global Step: 80760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:09,111-Speed 3399.23 samples/sec Loss 2.0514 LearningRate 0.0084 Epoch: 14 Global Step: 80770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:12,127-Speed 3396.06 samples/sec Loss 2.0269 LearningRate 0.0084 Epoch: 14 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:15,167-Speed 3369.12 samples/sec Loss 2.0321 LearningRate 0.0084 Epoch: 14 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:48:18,181-Speed 3398.23 samples/sec Loss 2.0487 LearningRate 0.0084 Epoch: 14 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:21,196-Speed 3396.32 samples/sec Loss 2.0214 LearningRate 0.0084 Epoch: 14 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:24,214-Speed 3394.69 samples/sec Loss 1.9795 LearningRate 0.0084 Epoch: 14 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:27,228-Speed 3398.40 samples/sec Loss 2.0728 LearningRate 0.0084 Epoch: 14 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:30,246-Speed 3393.86 samples/sec Loss 2.0451 LearningRate 0.0084 Epoch: 14 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:33,257-Speed 3400.95 samples/sec Loss 2.0664 LearningRate 0.0084 Epoch: 14 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:36,282-Speed 3386.62 samples/sec Loss 1.9635 LearningRate 0.0083 Epoch: 14 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:39,305-Speed 3388.30 samples/sec Loss 1.9544 LearningRate 0.0083 Epoch: 14 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:42,325-Speed 3391.02 samples/sec Loss 1.9681 LearningRate 0.0083 Epoch: 14 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:45,336-Speed 3402.18 samples/sec Loss 1.9991 LearningRate 0.0083 Epoch: 14 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:48,331-Speed 3419.62 samples/sec Loss 2.0415 LearningRate 0.0083 Epoch: 14 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:51,342-Speed 3402.02 samples/sec Loss 2.0413 LearningRate 0.0083 Epoch: 14 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:54,357-Speed 3396.91 samples/sec Loss 2.0509 LearningRate 0.0083 Epoch: 14 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:48:57,372-Speed 3397.56 samples/sec Loss 1.9788 LearningRate 0.0083 Epoch: 14 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:00,397-Speed 3385.74 samples/sec Loss 2.0778 LearningRate 0.0083 Epoch: 14 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:03,419-Speed 3389.29 samples/sec Loss 2.0419 LearningRate 0.0083 Epoch: 14 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:06,437-Speed 3393.26 samples/sec Loss 2.0297 LearningRate 0.0083 Epoch: 14 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:09,449-Speed 3400.74 samples/sec Loss 2.1074 LearningRate 0.0083 Epoch: 14 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:12,462-Speed 3399.69 samples/sec Loss 1.9975 LearningRate 0.0083 Epoch: 14 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:15,478-Speed 3395.52 samples/sec Loss 1.9996 LearningRate 0.0083 Epoch: 14 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:18,511-Speed 3377.73 samples/sec Loss 2.1104 LearningRate 0.0083 Epoch: 14 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:49:21,562-Speed 3357.02 samples/sec Loss 2.0211 LearningRate 0.0083 Epoch: 14 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:24,602-Speed 3369.08 samples/sec Loss 2.0827 LearningRate 0.0083 Epoch: 14 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:27,624-Speed 3388.47 samples/sec Loss 2.0261 LearningRate 0.0083 Epoch: 14 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:30,647-Speed 3388.97 samples/sec Loss 2.0559 LearningRate 0.0083 Epoch: 14 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:33,667-Speed 3391.45 samples/sec Loss 2.0984 LearningRate 0.0083 Epoch: 14 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:36,681-Speed 3397.85 samples/sec Loss 1.9321 LearningRate 0.0082 Epoch: 14 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:39,702-Speed 3390.08 samples/sec Loss 2.0212 LearningRate 0.0082 Epoch: 14 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:42,717-Speed 3397.14 samples/sec Loss 2.0092 LearningRate 0.0082 Epoch: 14 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:45,757-Speed 3370.85 samples/sec Loss 2.0621 LearningRate 0.0082 Epoch: 14 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:48,782-Speed 3385.82 samples/sec Loss 2.0555 LearningRate 0.0082 Epoch: 14 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:51,800-Speed 3394.02 samples/sec Loss 1.9974 LearningRate 0.0082 Epoch: 14 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:49:54,791-Speed 3424.73 samples/sec Loss 2.1081 LearningRate 0.0082 Epoch: 14 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:49:57,830-Speed 3370.05 samples/sec Loss 2.0113 LearningRate 0.0082 Epoch: 14 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:00,975-Speed 3256.66 samples/sec Loss 2.0831 LearningRate 0.0082 Epoch: 14 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:03,992-Speed 3394.57 samples/sec Loss 2.0291 LearningRate 0.0082 Epoch: 14 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:07,009-Speed 3394.73 samples/sec Loss 2.0354 LearningRate 0.0082 Epoch: 14 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:10,044-Speed 3375.44 samples/sec Loss 1.9418 LearningRate 0.0082 Epoch: 14 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:13,124-Speed 3325.74 samples/sec Loss 2.0596 LearningRate 0.0082 Epoch: 14 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:16,158-Speed 3375.59 samples/sec Loss 1.9799 LearningRate 0.0082 Epoch: 14 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:19,154-Speed 3419.23 samples/sec Loss 2.0498 LearningRate 0.0082 Epoch: 14 Global Step: 81200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:22,187-Speed 3377.11 samples/sec Loss 2.0089 LearningRate 0.0082 Epoch: 14 Global Step: 81210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:25,205-Speed 3393.34 samples/sec Loss 1.9744 LearningRate 0.0082 Epoch: 14 Global Step: 81220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:28,220-Speed 3396.34 samples/sec Loss 2.0097 LearningRate 0.0082 Epoch: 14 Global Step: 81230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:31,243-Speed 3388.53 samples/sec Loss 2.0442 LearningRate 0.0082 Epoch: 14 Global Step: 81240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:34,267-Speed 3386.64 samples/sec Loss 2.0648 LearningRate 0.0082 Epoch: 14 Global Step: 81250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:37,294-Speed 3384.67 samples/sec Loss 2.0398 LearningRate 0.0081 Epoch: 14 Global Step: 81260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:40,312-Speed 3393.88 samples/sec Loss 1.9887 LearningRate 0.0081 Epoch: 14 Global Step: 81270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:43,324-Speed 3399.82 samples/sec Loss 2.0714 LearningRate 0.0081 Epoch: 14 Global Step: 81280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:46,341-Speed 3395.18 samples/sec Loss 2.0519 LearningRate 0.0081 Epoch: 14 Global Step: 81290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:50:49,360-Speed 3393.15 samples/sec Loss 2.0305 LearningRate 0.0081 Epoch: 14 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:52,376-Speed 3396.11 samples/sec Loss 2.1088 LearningRate 0.0081 Epoch: 14 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:55,392-Speed 3395.56 samples/sec Loss 2.0256 LearningRate 0.0081 Epoch: 14 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:50:58,412-Speed 3393.14 samples/sec Loss 2.0314 LearningRate 0.0081 Epoch: 14 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:01,437-Speed 3385.26 samples/sec Loss 1.9573 LearningRate 0.0081 Epoch: 14 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:04,461-Speed 3387.29 samples/sec Loss 2.0663 LearningRate 0.0081 Epoch: 14 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:07,480-Speed 3393.19 samples/sec Loss 2.0217 LearningRate 0.0081 Epoch: 14 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:10,497-Speed 3394.64 samples/sec Loss 1.9618 LearningRate 0.0081 Epoch: 14 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:13,513-Speed 3396.39 samples/sec Loss 1.9674 LearningRate 0.0081 Epoch: 14 Global Step: 81380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:16,537-Speed 3386.18 samples/sec Loss 1.9882 LearningRate 0.0081 Epoch: 14 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:19,539-Speed 3412.32 samples/sec Loss 2.0479 LearningRate 0.0081 Epoch: 14 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:22,559-Speed 3390.87 samples/sec Loss 1.9294 LearningRate 0.0081 Epoch: 14 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:25,583-Speed 3387.54 samples/sec Loss 2.0601 LearningRate 0.0081 Epoch: 14 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:28,599-Speed 3396.02 samples/sec Loss 2.0736 LearningRate 0.0081 Epoch: 14 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:31,617-Speed 3393.69 samples/sec Loss 2.1134 LearningRate 0.0081 Epoch: 14 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:34,686-Speed 3337.73 samples/sec Loss 2.0202 LearningRate 0.0081 Epoch: 14 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:37,757-Speed 3335.51 samples/sec Loss 2.0923 LearningRate 0.0080 Epoch: 14 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:40,790-Speed 3376.61 samples/sec Loss 2.0016 LearningRate 0.0080 Epoch: 14 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:51:43,796-Speed 3407.60 samples/sec Loss 1.9745 LearningRate 0.0080 Epoch: 14 Global Step: 81480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:51:46,818-Speed 3388.34 samples/sec Loss 2.0729 LearningRate 0.0080 Epoch: 14 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:51:49,838-Speed 3391.75 samples/sec Loss 2.1272 LearningRate 0.0080 Epoch: 14 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:51:52,861-Speed 3388.11 samples/sec Loss 2.0275 LearningRate 0.0080 Epoch: 14 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:51:55,880-Speed 3393.17 samples/sec Loss 2.0114 LearningRate 0.0080 Epoch: 14 Global Step: 81520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:51:58,897-Speed 3394.45 samples/sec Loss 2.1202 LearningRate 0.0080 Epoch: 14 Global Step: 81530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:52:01,917-Speed 3392.05 samples/sec Loss 2.0211 LearningRate 0.0080 Epoch: 14 Global Step: 81540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:52:04,943-Speed 3385.21 samples/sec Loss 2.0078 LearningRate 0.0080 Epoch: 14 Global Step: 81550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:52:07,958-Speed 3396.09 samples/sec Loss 2.0008 LearningRate 0.0080 Epoch: 14 Global Step: 81560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:52:10,982-Speed 3386.92 samples/sec Loss 2.1159 LearningRate 0.0080 Epoch: 14 Global Step: 81570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:52:14,010-Speed 3382.59 samples/sec Loss 2.0313 LearningRate 0.0080 Epoch: 14 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:17,030-Speed 3391.31 samples/sec Loss 2.0106 LearningRate 0.0080 Epoch: 14 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:20,047-Speed 3394.99 samples/sec Loss 2.0723 LearningRate 0.0080 Epoch: 14 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:23,075-Speed 3383.37 samples/sec Loss 1.9886 LearningRate 0.0080 Epoch: 14 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:26,108-Speed 3377.48 samples/sec Loss 2.1065 LearningRate 0.0080 Epoch: 14 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:29,199-Speed 3313.34 samples/sec Loss 2.1287 LearningRate 0.0080 Epoch: 14 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:32,226-Speed 3383.75 samples/sec Loss 2.0120 LearningRate 0.0080 Epoch: 14 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:35,248-Speed 3389.30 samples/sec Loss 2.0448 LearningRate 0.0080 Epoch: 14 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:38,271-Speed 3388.30 samples/sec Loss 2.1682 LearningRate 0.0079 Epoch: 14 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:41,297-Speed 3384.85 samples/sec Loss 2.1319 LearningRate 0.0079 Epoch: 14 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:44,326-Speed 3381.50 samples/sec Loss 2.0317 LearningRate 0.0079 Epoch: 14 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:52:47,334-Speed 3404.30 samples/sec Loss 1.9973 LearningRate 0.0079 Epoch: 14 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:50,359-Speed 3386.06 samples/sec Loss 1.9908 LearningRate 0.0079 Epoch: 14 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:53,384-Speed 3385.35 samples/sec Loss 1.9451 LearningRate 0.0079 Epoch: 14 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:56,407-Speed 3389.03 samples/sec Loss 2.2059 LearningRate 0.0079 Epoch: 14 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:52:59,449-Speed 3367.31 samples/sec Loss 2.0288 LearningRate 0.0079 Epoch: 14 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:02,476-Speed 3383.23 samples/sec Loss 2.0126 LearningRate 0.0079 Epoch: 14 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:05,508-Speed 3378.23 samples/sec Loss 2.0264 LearningRate 0.0079 Epoch: 14 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:08,528-Speed 3391.38 samples/sec Loss 1.9809 LearningRate 0.0079 Epoch: 14 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:11,563-Speed 3374.61 samples/sec Loss 1.9873 LearningRate 0.0079 Epoch: 14 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:14,585-Speed 3389.80 samples/sec Loss 2.0560 LearningRate 0.0079 Epoch: 14 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:17,583-Speed 3415.43 samples/sec Loss 2.0790 LearningRate 0.0079 Epoch: 14 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:20,604-Speed 3390.95 samples/sec Loss 2.1427 LearningRate 0.0079 Epoch: 14 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:23,629-Speed 3385.76 samples/sec Loss 2.0379 LearningRate 0.0079 Epoch: 14 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:26,662-Speed 3376.86 samples/sec Loss 2.1545 LearningRate 0.0079 Epoch: 14 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:29,682-Speed 3392.14 samples/sec Loss 2.1276 LearningRate 0.0079 Epoch: 14 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:32,707-Speed 3386.07 samples/sec Loss 2.0160 LearningRate 0.0079 Epoch: 14 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:35,732-Speed 3385.41 samples/sec Loss 2.0876 LearningRate 0.0079 Epoch: 14 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:38,755-Speed 3387.90 samples/sec Loss 2.0890 LearningRate 0.0078 Epoch: 14 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:41,781-Speed 3384.97 samples/sec Loss 1.8965 LearningRate 0.0078 Epoch: 14 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:44,802-Speed 3390.75 samples/sec Loss 2.0784 LearningRate 0.0078 Epoch: 14 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:47,866-Speed 3342.10 samples/sec Loss 2.1006 LearningRate 0.0078 Epoch: 14 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:50,964-Speed 3306.45 samples/sec Loss 2.1823 LearningRate 0.0078 Epoch: 14 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:53,991-Speed 3383.79 samples/sec Loss 2.0159 LearningRate 0.0078 Epoch: 14 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:53:57,020-Speed 3381.11 samples/sec Loss 2.1118 LearningRate 0.0078 Epoch: 14 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:00,045-Speed 3386.06 samples/sec Loss 2.0813 LearningRate 0.0078 Epoch: 14 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:03,072-Speed 3383.97 samples/sec Loss 2.0732 LearningRate 0.0078 Epoch: 14 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:06,095-Speed 3388.17 samples/sec Loss 2.0992 LearningRate 0.0078 Epoch: 14 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:09,119-Speed 3386.89 samples/sec Loss 2.0959 LearningRate 0.0078 Epoch: 14 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:12,140-Speed 3390.30 samples/sec Loss 2.1176 LearningRate 0.0078 Epoch: 14 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:15,163-Speed 3387.97 samples/sec Loss 2.0167 LearningRate 0.0078 Epoch: 14 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:54:18,198-Speed 3375.86 samples/sec Loss 2.1512 LearningRate 0.0078 Epoch: 14 Global Step: 81990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:54:21,235-Speed 3371.54 samples/sec Loss 2.0376 LearningRate 0.0078 Epoch: 14 Global Step: 82000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:55:04,551-[lfw][82000]XNorm: 22.244488 Training: 2022-04-27 09:55:04,551-[lfw][82000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-27 09:55:04,552-[lfw][82000]Accuracy-Highest: 0.99817 Training: 2022-04-27 09:55:54,863-[cfp_fp][82000]XNorm: 20.730621 Training: 2022-04-27 09:55:54,864-[cfp_fp][82000]Accuracy-Flip: 0.97829+-0.00666 Training: 2022-04-27 09:55:54,864-[cfp_fp][82000]Accuracy-Highest: 0.97829 Training: 2022-04-27 09:56:38,090-[agedb_30][82000]XNorm: 22.237732 Training: 2022-04-27 09:56:38,091-[agedb_30][82000]Accuracy-Flip: 0.98133+-0.00763 Training: 2022-04-27 09:56:38,091-[agedb_30][82000]Accuracy-Highest: 0.98133 Training: 2022-04-27 09:56:41,153-Speed 73.19 samples/sec Loss 2.1107 LearningRate 0.0078 Epoch: 14 Global Step: 82010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:56:44,150-Speed 3416.58 samples/sec Loss 2.2154 LearningRate 0.0078 Epoch: 14 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:56:47,155-Speed 3408.95 samples/sec Loss 2.1194 LearningRate 0.0078 Epoch: 14 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:56:50,163-Speed 3404.99 samples/sec Loss 2.1513 LearningRate 0.0078 Epoch: 14 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:56:53,180-Speed 3394.50 samples/sec Loss 2.1046 LearningRate 0.0078 Epoch: 14 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:56:56,194-Speed 3398.20 samples/sec Loss 2.0890 LearningRate 0.0078 Epoch: 14 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:56:59,210-Speed 3396.60 samples/sec Loss 2.0329 LearningRate 0.0077 Epoch: 14 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:02,227-Speed 3395.57 samples/sec Loss 2.0552 LearningRate 0.0077 Epoch: 14 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:05,247-Speed 3391.60 samples/sec Loss 1.9665 LearningRate 0.0077 Epoch: 14 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:08,270-Speed 3388.93 samples/sec Loss 1.9738 LearningRate 0.0077 Epoch: 14 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:11,286-Speed 3395.25 samples/sec Loss 2.1444 LearningRate 0.0077 Epoch: 14 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:14,292-Speed 3407.82 samples/sec Loss 2.0347 LearningRate 0.0077 Epoch: 14 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:17,326-Speed 3375.88 samples/sec Loss 2.0381 LearningRate 0.0077 Epoch: 14 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:20,350-Speed 3386.50 samples/sec Loss 2.0813 LearningRate 0.0077 Epoch: 14 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:23,386-Speed 3374.25 samples/sec Loss 2.0484 LearningRate 0.0077 Epoch: 14 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:26,494-Speed 3294.86 samples/sec Loss 2.1317 LearningRate 0.0077 Epoch: 14 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:29,538-Speed 3364.51 samples/sec Loss 2.0473 LearningRate 0.0077 Epoch: 14 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:32,555-Speed 3395.41 samples/sec Loss 2.1067 LearningRate 0.0077 Epoch: 14 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:35,573-Speed 3393.42 samples/sec Loss 2.1454 LearningRate 0.0077 Epoch: 14 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:38,591-Speed 3394.83 samples/sec Loss 2.0121 LearningRate 0.0077 Epoch: 14 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:41,607-Speed 3395.68 samples/sec Loss 1.9980 LearningRate 0.0077 Epoch: 14 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:44,622-Speed 3397.08 samples/sec Loss 2.0370 LearningRate 0.0077 Epoch: 14 Global Step: 82220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 09:57:47,627-Speed 3407.93 samples/sec Loss 2.0039 LearningRate 0.0077 Epoch: 14 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:50,650-Speed 3388.25 samples/sec Loss 2.0030 LearningRate 0.0077 Epoch: 14 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:53,663-Speed 3399.12 samples/sec Loss 2.1445 LearningRate 0.0077 Epoch: 14 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:56,677-Speed 3398.27 samples/sec Loss 2.0683 LearningRate 0.0077 Epoch: 14 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:57:59,691-Speed 3398.86 samples/sec Loss 1.9977 LearningRate 0.0076 Epoch: 14 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:02,704-Speed 3399.22 samples/sec Loss 2.0523 LearningRate 0.0076 Epoch: 14 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:05,718-Speed 3398.13 samples/sec Loss 2.1469 LearningRate 0.0076 Epoch: 14 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:08,734-Speed 3396.82 samples/sec Loss 2.1065 LearningRate 0.0076 Epoch: 14 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:11,822-Speed 3316.27 samples/sec Loss 2.0248 LearningRate 0.0076 Epoch: 14 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:14,837-Speed 3397.09 samples/sec Loss 2.0081 LearningRate 0.0076 Epoch: 14 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:17,854-Speed 3394.73 samples/sec Loss 2.1211 LearningRate 0.0076 Epoch: 14 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:20,872-Speed 3394.00 samples/sec Loss 2.1035 LearningRate 0.0076 Epoch: 14 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:23,911-Speed 3370.49 samples/sec Loss 2.0842 LearningRate 0.0076 Epoch: 14 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:26,947-Speed 3373.44 samples/sec Loss 2.0375 LearningRate 0.0076 Epoch: 14 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:29,962-Speed 3396.71 samples/sec Loss 2.1022 LearningRate 0.0076 Epoch: 14 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:32,983-Speed 3390.91 samples/sec Loss 2.0354 LearningRate 0.0076 Epoch: 14 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:36,000-Speed 3394.50 samples/sec Loss 2.0142 LearningRate 0.0076 Epoch: 14 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:39,155-Speed 3248.54 samples/sec Loss 2.0559 LearningRate 0.0076 Epoch: 14 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:42,168-Speed 3399.37 samples/sec Loss 2.0107 LearningRate 0.0076 Epoch: 14 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:45,189-Speed 3389.93 samples/sec Loss 2.0802 LearningRate 0.0076 Epoch: 14 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:48,184-Speed 3419.48 samples/sec Loss 2.0913 LearningRate 0.0076 Epoch: 14 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:51,203-Speed 3393.25 samples/sec Loss 2.0358 LearningRate 0.0076 Epoch: 14 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:54,214-Speed 3401.89 samples/sec Loss 2.0028 LearningRate 0.0076 Epoch: 14 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:58:57,224-Speed 3401.88 samples/sec Loss 2.0618 LearningRate 0.0076 Epoch: 14 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:00,241-Speed 3395.62 samples/sec Loss 1.9988 LearningRate 0.0076 Epoch: 14 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:03,308-Speed 3338.85 samples/sec Loss 2.0497 LearningRate 0.0075 Epoch: 14 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:06,322-Speed 3398.86 samples/sec Loss 2.0692 LearningRate 0.0075 Epoch: 14 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:09,338-Speed 3395.57 samples/sec Loss 2.0533 LearningRate 0.0075 Epoch: 14 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:12,337-Speed 3415.08 samples/sec Loss 2.0643 LearningRate 0.0075 Epoch: 14 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:15,348-Speed 3402.35 samples/sec Loss 2.1251 LearningRate 0.0075 Epoch: 14 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:18,365-Speed 3394.24 samples/sec Loss 2.0592 LearningRate 0.0075 Epoch: 14 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:21,376-Speed 3401.88 samples/sec Loss 2.1173 LearningRate 0.0075 Epoch: 14 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:24,400-Speed 3387.92 samples/sec Loss 2.1399 LearningRate 0.0075 Epoch: 14 Global Step: 82550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:27,486-Speed 3318.87 samples/sec Loss 2.0228 LearningRate 0.0075 Epoch: 14 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:30,526-Speed 3368.89 samples/sec Loss 2.1470 LearningRate 0.0075 Epoch: 14 Global Step: 82570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:33,549-Speed 3388.41 samples/sec Loss 1.9672 LearningRate 0.0075 Epoch: 14 Global Step: 82580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:36,575-Speed 3384.19 samples/sec Loss 2.1125 LearningRate 0.0075 Epoch: 14 Global Step: 82590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:39,595-Speed 3391.66 samples/sec Loss 2.0976 LearningRate 0.0075 Epoch: 14 Global Step: 82600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 09:59:42,616-Speed 3390.06 samples/sec Loss 2.1201 LearningRate 0.0075 Epoch: 14 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:45,649-Speed 3377.80 samples/sec Loss 1.9676 LearningRate 0.0075 Epoch: 14 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:48,767-Speed 3285.06 samples/sec Loss 2.0827 LearningRate 0.0075 Epoch: 14 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:51,823-Speed 3351.32 samples/sec Loss 2.0599 LearningRate 0.0075 Epoch: 14 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:54,833-Speed 3402.22 samples/sec Loss 2.0349 LearningRate 0.0075 Epoch: 14 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 09:59:57,865-Speed 3378.66 samples/sec Loss 2.0157 LearningRate 0.0075 Epoch: 14 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:00,895-Speed 3380.34 samples/sec Loss 2.0048 LearningRate 0.0075 Epoch: 14 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:03,908-Speed 3399.07 samples/sec Loss 2.0330 LearningRate 0.0075 Epoch: 14 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:06,923-Speed 3397.32 samples/sec Loss 2.0666 LearningRate 0.0074 Epoch: 14 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:09,942-Speed 3392.16 samples/sec Loss 2.0337 LearningRate 0.0074 Epoch: 14 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:12,944-Speed 3411.67 samples/sec Loss 2.0150 LearningRate 0.0074 Epoch: 14 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:15,961-Speed 3396.73 samples/sec Loss 2.0902 LearningRate 0.0074 Epoch: 14 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:18,980-Speed 3392.32 samples/sec Loss 2.1154 LearningRate 0.0074 Epoch: 14 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:21,995-Speed 3397.82 samples/sec Loss 2.0868 LearningRate 0.0074 Epoch: 14 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:25,006-Speed 3400.75 samples/sec Loss 2.1052 LearningRate 0.0074 Epoch: 14 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:28,025-Speed 3392.91 samples/sec Loss 2.0103 LearningRate 0.0074 Epoch: 14 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:31,037-Speed 3400.17 samples/sec Loss 2.1133 LearningRate 0.0074 Epoch: 14 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:34,075-Speed 3371.57 samples/sec Loss 2.1116 LearningRate 0.0074 Epoch: 14 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:37,222-Speed 3254.67 samples/sec Loss 2.0808 LearningRate 0.0074 Epoch: 14 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:40,248-Speed 3384.25 samples/sec Loss 1.9710 LearningRate 0.0074 Epoch: 14 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:43,249-Speed 3414.15 samples/sec Loss 2.0869 LearningRate 0.0074 Epoch: 14 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:46,262-Speed 3398.95 samples/sec Loss 2.0619 LearningRate 0.0074 Epoch: 14 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:49,288-Speed 3384.29 samples/sec Loss 2.2171 LearningRate 0.0074 Epoch: 14 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:52,303-Speed 3397.33 samples/sec Loss 2.0677 LearningRate 0.0074 Epoch: 14 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:55,317-Speed 3398.41 samples/sec Loss 2.0313 LearningRate 0.0074 Epoch: 14 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:00:58,335-Speed 3394.22 samples/sec Loss 1.9973 LearningRate 0.0074 Epoch: 14 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:01,355-Speed 3391.96 samples/sec Loss 2.0491 LearningRate 0.0074 Epoch: 14 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:04,365-Speed 3402.48 samples/sec Loss 1.9843 LearningRate 0.0074 Epoch: 14 Global Step: 82880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:07,381-Speed 3396.13 samples/sec Loss 1.9537 LearningRate 0.0073 Epoch: 14 Global Step: 82890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:10,407-Speed 3385.33 samples/sec Loss 1.9919 LearningRate 0.0073 Epoch: 14 Global Step: 82900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:13,423-Speed 3395.95 samples/sec Loss 2.1106 LearningRate 0.0073 Epoch: 14 Global Step: 82910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:16,447-Speed 3386.92 samples/sec Loss 2.0916 LearningRate 0.0073 Epoch: 14 Global Step: 82920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:19,464-Speed 3395.40 samples/sec Loss 2.0907 LearningRate 0.0073 Epoch: 14 Global Step: 82930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:22,493-Speed 3380.90 samples/sec Loss 2.0571 LearningRate 0.0073 Epoch: 14 Global Step: 82940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:25,515-Speed 3389.29 samples/sec Loss 2.1380 LearningRate 0.0073 Epoch: 14 Global Step: 82950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:28,539-Speed 3387.20 samples/sec Loss 2.1601 LearningRate 0.0073 Epoch: 14 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:31,561-Speed 3389.41 samples/sec Loss 1.9876 LearningRate 0.0073 Epoch: 14 Global Step: 82970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:01:34,575-Speed 3398.17 samples/sec Loss 2.0549 LearningRate 0.0073 Epoch: 14 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:37,597-Speed 3389.28 samples/sec Loss 2.0882 LearningRate 0.0073 Epoch: 14 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:40,618-Speed 3390.78 samples/sec Loss 2.0051 LearningRate 0.0073 Epoch: 14 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:43,631-Speed 3399.80 samples/sec Loss 1.9408 LearningRate 0.0073 Epoch: 14 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:46,661-Speed 3380.10 samples/sec Loss 2.0294 LearningRate 0.0073 Epoch: 14 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:49,703-Speed 3367.08 samples/sec Loss 2.0160 LearningRate 0.0073 Epoch: 14 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:52,743-Speed 3368.63 samples/sec Loss 2.1284 LearningRate 0.0073 Epoch: 14 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:55,761-Speed 3393.71 samples/sec Loss 2.0959 LearningRate 0.0073 Epoch: 14 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:01:58,778-Speed 3395.29 samples/sec Loss 1.9940 LearningRate 0.0073 Epoch: 14 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:01,795-Speed 3395.06 samples/sec Loss 2.0488 LearningRate 0.0073 Epoch: 14 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:04,811-Speed 3395.61 samples/sec Loss 2.1249 LearningRate 0.0073 Epoch: 14 Global Step: 83080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:02:07,809-Speed 3416.42 samples/sec Loss 2.0179 LearningRate 0.0073 Epoch: 14 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:10,842-Speed 3377.58 samples/sec Loss 2.0474 LearningRate 0.0072 Epoch: 14 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:13,864-Speed 3389.08 samples/sec Loss 2.0380 LearningRate 0.0072 Epoch: 14 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:16,879-Speed 3397.07 samples/sec Loss 2.0234 LearningRate 0.0072 Epoch: 14 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:19,914-Speed 3374.01 samples/sec Loss 2.0945 LearningRate 0.0072 Epoch: 14 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:22,938-Speed 3387.63 samples/sec Loss 1.9932 LearningRate 0.0072 Epoch: 14 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:25,955-Speed 3394.98 samples/sec Loss 2.0546 LearningRate 0.0072 Epoch: 14 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:28,976-Speed 3390.29 samples/sec Loss 2.0848 LearningRate 0.0072 Epoch: 14 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:32,016-Speed 3369.56 samples/sec Loss 2.1123 LearningRate 0.0072 Epoch: 14 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:35,092-Speed 3329.44 samples/sec Loss 1.9734 LearningRate 0.0072 Epoch: 14 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:38,098-Speed 3407.59 samples/sec Loss 2.0538 LearningRate 0.0072 Epoch: 14 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:41,119-Speed 3390.18 samples/sec Loss 2.0333 LearningRate 0.0072 Epoch: 14 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:44,136-Speed 3395.13 samples/sec Loss 2.0029 LearningRate 0.0072 Epoch: 14 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:47,154-Speed 3393.04 samples/sec Loss 2.1004 LearningRate 0.0072 Epoch: 14 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:50,170-Speed 3395.82 samples/sec Loss 2.0437 LearningRate 0.0072 Epoch: 14 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:53,195-Speed 3386.52 samples/sec Loss 2.0492 LearningRate 0.0072 Epoch: 14 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:56,216-Speed 3391.22 samples/sec Loss 2.0758 LearningRate 0.0072 Epoch: 14 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:02:59,238-Speed 3388.56 samples/sec Loss 2.0279 LearningRate 0.0072 Epoch: 14 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:02,261-Speed 3389.53 samples/sec Loss 2.0325 LearningRate 0.0072 Epoch: 14 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:05,303-Speed 3366.33 samples/sec Loss 1.9805 LearningRate 0.0072 Epoch: 14 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:08,324-Speed 3390.60 samples/sec Loss 2.0574 LearningRate 0.0072 Epoch: 14 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:03:11,334-Speed 3402.89 samples/sec Loss 1.9659 LearningRate 0.0072 Epoch: 14 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:14,355-Speed 3391.16 samples/sec Loss 2.1033 LearningRate 0.0072 Epoch: 14 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:17,371-Speed 3394.93 samples/sec Loss 2.0342 LearningRate 0.0071 Epoch: 14 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:20,395-Speed 3387.21 samples/sec Loss 2.0741 LearningRate 0.0071 Epoch: 14 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:23,413-Speed 3393.76 samples/sec Loss 2.0274 LearningRate 0.0071 Epoch: 14 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:26,431-Speed 3394.74 samples/sec Loss 2.0441 LearningRate 0.0071 Epoch: 14 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:29,450-Speed 3392.18 samples/sec Loss 1.9549 LearningRate 0.0071 Epoch: 14 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:32,471-Speed 3390.26 samples/sec Loss 1.9649 LearningRate 0.0071 Epoch: 14 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:35,493-Speed 3389.22 samples/sec Loss 1.9921 LearningRate 0.0071 Epoch: 14 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:38,514-Speed 3390.14 samples/sec Loss 2.1035 LearningRate 0.0071 Epoch: 14 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:41,522-Speed 3405.22 samples/sec Loss 2.0112 LearningRate 0.0071 Epoch: 14 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:44,546-Speed 3387.34 samples/sec Loss 2.0890 LearningRate 0.0071 Epoch: 14 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:47,572-Speed 3385.15 samples/sec Loss 1.9989 LearningRate 0.0071 Epoch: 14 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:50,597-Speed 3386.04 samples/sec Loss 2.0952 LearningRate 0.0071 Epoch: 14 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:53,624-Speed 3383.97 samples/sec Loss 2.0178 LearningRate 0.0071 Epoch: 14 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:56,639-Speed 3396.70 samples/sec Loss 1.9723 LearningRate 0.0071 Epoch: 14 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:03:59,661-Speed 3388.76 samples/sec Loss 2.1685 LearningRate 0.0071 Epoch: 14 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:02,687-Speed 3385.31 samples/sec Loss 2.0916 LearningRate 0.0071 Epoch: 14 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:05,705-Speed 3393.89 samples/sec Loss 1.8465 LearningRate 0.0071 Epoch: 14 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:08,726-Speed 3390.87 samples/sec Loss 2.0235 LearningRate 0.0071 Epoch: 14 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:11,749-Speed 3388.34 samples/sec Loss 1.9931 LearningRate 0.0071 Epoch: 14 Global Step: 83500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:04:14,780-Speed 3378.85 samples/sec Loss 2.0391 LearningRate 0.0071 Epoch: 14 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:17,805-Speed 3386.93 samples/sec Loss 2.0262 LearningRate 0.0071 Epoch: 14 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:20,823-Speed 3393.25 samples/sec Loss 2.0548 LearningRate 0.0070 Epoch: 14 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:23,849-Speed 3384.47 samples/sec Loss 2.1167 LearningRate 0.0070 Epoch: 14 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:26,868-Speed 3393.35 samples/sec Loss 2.1259 LearningRate 0.0070 Epoch: 14 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:04:29,855-Speed 3429.37 samples/sec Loss 2.0646 LearningRate 0.0070 Epoch: 14 Global Step: 83560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:32,877-Speed 3389.12 samples/sec Loss 2.0495 LearningRate 0.0070 Epoch: 14 Global Step: 83570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:35,895-Speed 3392.88 samples/sec Loss 2.0185 LearningRate 0.0070 Epoch: 14 Global Step: 83580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:38,918-Speed 3388.12 samples/sec Loss 1.9319 LearningRate 0.0070 Epoch: 14 Global Step: 83590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:41,938-Speed 3392.44 samples/sec Loss 2.0178 LearningRate 0.0070 Epoch: 14 Global Step: 83600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:44,976-Speed 3371.73 samples/sec Loss 2.0278 LearningRate 0.0070 Epoch: 14 Global Step: 83610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:47,998-Speed 3389.06 samples/sec Loss 2.0247 LearningRate 0.0070 Epoch: 14 Global Step: 83620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:51,015-Speed 3394.78 samples/sec Loss 1.9790 LearningRate 0.0070 Epoch: 14 Global Step: 83630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:54,032-Speed 3395.07 samples/sec Loss 2.0595 LearningRate 0.0070 Epoch: 14 Global Step: 83640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:04:57,052-Speed 3390.91 samples/sec Loss 1.9851 LearningRate 0.0070 Epoch: 14 Global Step: 83650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 10:05:00,084-Speed 3378.71 samples/sec Loss 2.0458 LearningRate 0.0070 Epoch: 14 Global Step: 83660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:03,108-Speed 3386.19 samples/sec Loss 1.9517 LearningRate 0.0070 Epoch: 14 Global Step: 83670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:06,140-Speed 3378.28 samples/sec Loss 2.1095 LearningRate 0.0070 Epoch: 14 Global Step: 83680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:09,164-Speed 3386.86 samples/sec Loss 2.0706 LearningRate 0.0070 Epoch: 14 Global Step: 83690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:12,192-Speed 3382.79 samples/sec Loss 1.9789 LearningRate 0.0070 Epoch: 14 Global Step: 83700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:15,221-Speed 3382.97 samples/sec Loss 2.0359 LearningRate 0.0070 Epoch: 14 Global Step: 83710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:18,245-Speed 3387.18 samples/sec Loss 2.0494 LearningRate 0.0070 Epoch: 14 Global Step: 83720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:21,264-Speed 3392.40 samples/sec Loss 2.0508 LearningRate 0.0070 Epoch: 14 Global Step: 83730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:24,332-Speed 3338.47 samples/sec Loss 2.0059 LearningRate 0.0070 Epoch: 14 Global Step: 83740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:27,393-Speed 3345.30 samples/sec Loss 1.9988 LearningRate 0.0069 Epoch: 14 Global Step: 83750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:05:30,415-Speed 3389.51 samples/sec Loss 2.0398 LearningRate 0.0069 Epoch: 14 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:33,441-Speed 3384.90 samples/sec Loss 2.0386 LearningRate 0.0069 Epoch: 14 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:36,470-Speed 3381.36 samples/sec Loss 1.8627 LearningRate 0.0069 Epoch: 14 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:39,491-Speed 3391.24 samples/sec Loss 1.9550 LearningRate 0.0069 Epoch: 14 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:42,508-Speed 3393.87 samples/sec Loss 2.0473 LearningRate 0.0069 Epoch: 14 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:45,531-Speed 3388.55 samples/sec Loss 2.0397 LearningRate 0.0069 Epoch: 14 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:48,555-Speed 3387.77 samples/sec Loss 2.0202 LearningRate 0.0069 Epoch: 14 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:51,580-Speed 3385.38 samples/sec Loss 2.0774 LearningRate 0.0069 Epoch: 14 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:54,607-Speed 3383.66 samples/sec Loss 1.9777 LearningRate 0.0069 Epoch: 14 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:05:57,630-Speed 3387.46 samples/sec Loss 1.9720 LearningRate 0.0069 Epoch: 14 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:00,652-Speed 3389.96 samples/sec Loss 1.9967 LearningRate 0.0069 Epoch: 14 Global Step: 83860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:06:03,655-Speed 3409.79 samples/sec Loss 2.0524 LearningRate 0.0069 Epoch: 14 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:06,679-Speed 3387.46 samples/sec Loss 2.0426 LearningRate 0.0069 Epoch: 14 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:09,716-Speed 3373.00 samples/sec Loss 2.0091 LearningRate 0.0069 Epoch: 14 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:12,789-Speed 3332.72 samples/sec Loss 1.9870 LearningRate 0.0069 Epoch: 14 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:15,820-Speed 3379.69 samples/sec Loss 2.0935 LearningRate 0.0069 Epoch: 14 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:06:18,829-Speed 3403.85 samples/sec Loss 2.1489 LearningRate 0.0069 Epoch: 14 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:21,845-Speed 3395.40 samples/sec Loss 2.0953 LearningRate 0.0069 Epoch: 14 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:24,886-Speed 3368.35 samples/sec Loss 2.0319 LearningRate 0.0069 Epoch: 14 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:27,915-Speed 3381.36 samples/sec Loss 2.0194 LearningRate 0.0069 Epoch: 14 Global Step: 83950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:31,049-Speed 3267.91 samples/sec Loss 2.0613 LearningRate 0.0068 Epoch: 14 Global Step: 83960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:34,118-Speed 3337.96 samples/sec Loss 1.9598 LearningRate 0.0068 Epoch: 14 Global Step: 83970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:37,218-Speed 3303.99 samples/sec Loss 1.9135 LearningRate 0.0068 Epoch: 14 Global Step: 83980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:40,245-Speed 3384.34 samples/sec Loss 2.1090 LearningRate 0.0068 Epoch: 14 Global Step: 83990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:06:43,273-Speed 3382.38 samples/sec Loss 1.9963 LearningRate 0.0068 Epoch: 14 Global Step: 84000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:07:26,457-[lfw][84000]XNorm: 21.186704 Training: 2022-04-27 10:07:26,457-[lfw][84000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-27 10:07:26,458-[lfw][84000]Accuracy-Highest: 0.99817 Training: 2022-04-27 10:08:16,704-[cfp_fp][84000]XNorm: 20.323462 Training: 2022-04-27 10:08:16,704-[cfp_fp][84000]Accuracy-Flip: 0.97843+-0.00698 Training: 2022-04-27 10:08:16,705-[cfp_fp][84000]Accuracy-Highest: 0.97843 Training: 2022-04-27 10:09:00,318-[agedb_30][84000]XNorm: 21.553200 Training: 2022-04-27 10:09:00,318-[agedb_30][84000]Accuracy-Flip: 0.98050+-0.00606 Training: 2022-04-27 10:09:00,319-[agedb_30][84000]Accuracy-Highest: 0.98133 Training: 2022-04-27 10:09:03,338-Speed 73.11 samples/sec Loss 1.9241 LearningRate 0.0068 Epoch: 14 Global Step: 84010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:09:06,342-Speed 3408.98 samples/sec Loss 1.9987 LearningRate 0.0068 Epoch: 14 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:09,360-Speed 3393.38 samples/sec Loss 1.9864 LearningRate 0.0068 Epoch: 14 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:12,370-Speed 3403.64 samples/sec Loss 2.1260 LearningRate 0.0068 Epoch: 14 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:15,381-Speed 3401.28 samples/sec Loss 2.0263 LearningRate 0.0068 Epoch: 14 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:18,389-Speed 3404.38 samples/sec Loss 2.0006 LearningRate 0.0068 Epoch: 14 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:21,404-Speed 3397.62 samples/sec Loss 1.9460 LearningRate 0.0068 Epoch: 14 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:24,427-Speed 3387.94 samples/sec Loss 2.0955 LearningRate 0.0068 Epoch: 14 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:27,455-Speed 3383.40 samples/sec Loss 2.0299 LearningRate 0.0068 Epoch: 14 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:30,469-Speed 3398.30 samples/sec Loss 1.9939 LearningRate 0.0068 Epoch: 14 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:33,490-Speed 3390.38 samples/sec Loss 1.9697 LearningRate 0.0068 Epoch: 14 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:36,520-Speed 3380.27 samples/sec Loss 1.9109 LearningRate 0.0068 Epoch: 14 Global Step: 84120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:09:39,527-Speed 3405.85 samples/sec Loss 2.0871 LearningRate 0.0068 Epoch: 14 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:42,546-Speed 3393.73 samples/sec Loss 2.0341 LearningRate 0.0068 Epoch: 14 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:45,593-Speed 3361.83 samples/sec Loss 2.0093 LearningRate 0.0068 Epoch: 14 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:48,677-Speed 3321.19 samples/sec Loss 2.0739 LearningRate 0.0068 Epoch: 14 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:51,772-Speed 3308.27 samples/sec Loss 2.0806 LearningRate 0.0068 Epoch: 14 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:54,796-Speed 3387.41 samples/sec Loss 1.9476 LearningRate 0.0067 Epoch: 14 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:09:57,825-Speed 3382.25 samples/sec Loss 1.9572 LearningRate 0.0067 Epoch: 14 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:00,852-Speed 3383.16 samples/sec Loss 2.0525 LearningRate 0.0067 Epoch: 14 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:03,887-Speed 3375.19 samples/sec Loss 1.9387 LearningRate 0.0067 Epoch: 14 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:06,914-Speed 3384.67 samples/sec Loss 1.9795 LearningRate 0.0067 Epoch: 14 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:09,921-Speed 3405.71 samples/sec Loss 2.0780 LearningRate 0.0067 Epoch: 14 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:12,981-Speed 3347.02 samples/sec Loss 2.0118 LearningRate 0.0067 Epoch: 14 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:16,026-Speed 3364.39 samples/sec Loss 2.0538 LearningRate 0.0067 Epoch: 14 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:19,049-Speed 3387.66 samples/sec Loss 1.9968 LearningRate 0.0067 Epoch: 14 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:22,070-Speed 3391.27 samples/sec Loss 2.0221 LearningRate 0.0067 Epoch: 14 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:25,089-Speed 3392.25 samples/sec Loss 1.9502 LearningRate 0.0067 Epoch: 14 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:28,122-Speed 3377.41 samples/sec Loss 2.0475 LearningRate 0.0067 Epoch: 14 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:31,143-Speed 3390.16 samples/sec Loss 1.9867 LearningRate 0.0067 Epoch: 14 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:34,158-Speed 3396.42 samples/sec Loss 2.0420 LearningRate 0.0067 Epoch: 14 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:37,181-Speed 3388.73 samples/sec Loss 2.0096 LearningRate 0.0067 Epoch: 14 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:40,208-Speed 3383.97 samples/sec Loss 1.9767 LearningRate 0.0067 Epoch: 14 Global Step: 84330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:10:43,213-Speed 3408.57 samples/sec Loss 2.0215 LearningRate 0.0067 Epoch: 14 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:46,224-Speed 3400.56 samples/sec Loss 1.9859 LearningRate 0.0067 Epoch: 14 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:49,242-Speed 3394.68 samples/sec Loss 1.9729 LearningRate 0.0067 Epoch: 14 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:52,254-Speed 3399.97 samples/sec Loss 1.9250 LearningRate 0.0067 Epoch: 14 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:55,265-Speed 3401.59 samples/sec Loss 1.9899 LearningRate 0.0067 Epoch: 14 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:10:58,287-Speed 3389.86 samples/sec Loss 2.0699 LearningRate 0.0067 Epoch: 14 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:01,318-Speed 3378.65 samples/sec Loss 1.9505 LearningRate 0.0066 Epoch: 14 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:04,330-Speed 3400.87 samples/sec Loss 2.1290 LearningRate 0.0066 Epoch: 14 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:07,350-Speed 3391.58 samples/sec Loss 2.0123 LearningRate 0.0066 Epoch: 14 Global Step: 84420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:10,358-Speed 3405.38 samples/sec Loss 1.9350 LearningRate 0.0066 Epoch: 14 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:13,373-Speed 3396.79 samples/sec Loss 2.0459 LearningRate 0.0066 Epoch: 14 Global Step: 84440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:11:16,362-Speed 3426.92 samples/sec Loss 2.0413 LearningRate 0.0066 Epoch: 14 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:19,371-Speed 3403.62 samples/sec Loss 2.0634 LearningRate 0.0066 Epoch: 14 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:22,384-Speed 3399.29 samples/sec Loss 2.0125 LearningRate 0.0066 Epoch: 14 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:25,407-Speed 3388.96 samples/sec Loss 2.0256 LearningRate 0.0066 Epoch: 14 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:28,422-Speed 3397.20 samples/sec Loss 1.9854 LearningRate 0.0066 Epoch: 14 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:31,434-Speed 3400.06 samples/sec Loss 1.9691 LearningRate 0.0066 Epoch: 14 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:34,444-Speed 3402.01 samples/sec Loss 1.9805 LearningRate 0.0066 Epoch: 14 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:37,452-Speed 3405.57 samples/sec Loss 2.0963 LearningRate 0.0066 Epoch: 14 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:40,467-Speed 3397.17 samples/sec Loss 1.9659 LearningRate 0.0066 Epoch: 14 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:43,476-Speed 3404.50 samples/sec Loss 2.0057 LearningRate 0.0066 Epoch: 14 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:46,490-Speed 3398.48 samples/sec Loss 1.9065 LearningRate 0.0066 Epoch: 14 Global Step: 84550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:11:49,482-Speed 3422.62 samples/sec Loss 2.0647 LearningRate 0.0066 Epoch: 14 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:52,507-Speed 3385.44 samples/sec Loss 1.9857 LearningRate 0.0066 Epoch: 14 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:55,534-Speed 3384.96 samples/sec Loss 1.9850 LearningRate 0.0066 Epoch: 14 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:11:58,542-Speed 3404.94 samples/sec Loss 1.9638 LearningRate 0.0066 Epoch: 14 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:01,559-Speed 3394.60 samples/sec Loss 1.9735 LearningRate 0.0066 Epoch: 14 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:04,570-Speed 3401.13 samples/sec Loss 2.0173 LearningRate 0.0066 Epoch: 14 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:07,582-Speed 3401.07 samples/sec Loss 1.9669 LearningRate 0.0065 Epoch: 14 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:10,600-Speed 3393.24 samples/sec Loss 2.0311 LearningRate 0.0065 Epoch: 14 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:13,629-Speed 3381.97 samples/sec Loss 1.9994 LearningRate 0.0065 Epoch: 14 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:16,644-Speed 3396.82 samples/sec Loss 2.0273 LearningRate 0.0065 Epoch: 14 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:19,637-Speed 3422.26 samples/sec Loss 1.9700 LearningRate 0.0065 Epoch: 14 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:22,650-Speed 3399.46 samples/sec Loss 2.0217 LearningRate 0.0065 Epoch: 14 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:25,663-Speed 3399.19 samples/sec Loss 1.9693 LearningRate 0.0065 Epoch: 14 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:28,678-Speed 3396.89 samples/sec Loss 1.8946 LearningRate 0.0065 Epoch: 14 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:31,699-Speed 3390.98 samples/sec Loss 1.9994 LearningRate 0.0065 Epoch: 14 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:34,712-Speed 3399.04 samples/sec Loss 1.9270 LearningRate 0.0065 Epoch: 14 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:37,735-Speed 3388.76 samples/sec Loss 1.9178 LearningRate 0.0065 Epoch: 14 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:40,749-Speed 3398.59 samples/sec Loss 1.9102 LearningRate 0.0065 Epoch: 14 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:43,759-Speed 3402.26 samples/sec Loss 1.9468 LearningRate 0.0065 Epoch: 14 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:46,772-Speed 3399.36 samples/sec Loss 2.0810 LearningRate 0.0065 Epoch: 14 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:49,788-Speed 3395.90 samples/sec Loss 1.9649 LearningRate 0.0065 Epoch: 14 Global Step: 84760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:12:52,783-Speed 3419.48 samples/sec Loss 1.9801 LearningRate 0.0065 Epoch: 14 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:55,799-Speed 3396.93 samples/sec Loss 2.0313 LearningRate 0.0065 Epoch: 14 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:12:58,816-Speed 3394.54 samples/sec Loss 1.9944 LearningRate 0.0065 Epoch: 14 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:01,830-Speed 3397.99 samples/sec Loss 1.9610 LearningRate 0.0065 Epoch: 14 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:04,836-Speed 3407.60 samples/sec Loss 1.9884 LearningRate 0.0065 Epoch: 14 Global Step: 84810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:07,851-Speed 3397.16 samples/sec Loss 1.9061 LearningRate 0.0065 Epoch: 14 Global Step: 84820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:10,867-Speed 3396.67 samples/sec Loss 1.9747 LearningRate 0.0065 Epoch: 14 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:13,877-Speed 3402.73 samples/sec Loss 2.0359 LearningRate 0.0064 Epoch: 14 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:16,890-Speed 3399.07 samples/sec Loss 2.0096 LearningRate 0.0064 Epoch: 14 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:19,907-Speed 3395.42 samples/sec Loss 2.0456 LearningRate 0.0064 Epoch: 14 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:22,920-Speed 3398.66 samples/sec Loss 2.0463 LearningRate 0.0064 Epoch: 14 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:25,934-Speed 3399.32 samples/sec Loss 1.9388 LearningRate 0.0064 Epoch: 14 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:28,955-Speed 3389.58 samples/sec Loss 1.9988 LearningRate 0.0064 Epoch: 14 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:31,975-Speed 3392.28 samples/sec Loss 1.9368 LearningRate 0.0064 Epoch: 14 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:13:34,991-Speed 3395.76 samples/sec Loss 2.0065 LearningRate 0.0064 Epoch: 14 Global Step: 84910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:38,003-Speed 3401.05 samples/sec Loss 2.0315 LearningRate 0.0064 Epoch: 14 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:41,016-Speed 3399.01 samples/sec Loss 2.0641 LearningRate 0.0064 Epoch: 14 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:44,025-Speed 3403.80 samples/sec Loss 2.0283 LearningRate 0.0064 Epoch: 14 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:47,052-Speed 3384.16 samples/sec Loss 1.9925 LearningRate 0.0064 Epoch: 14 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:50,069-Speed 3394.36 samples/sec Loss 2.0263 LearningRate 0.0064 Epoch: 14 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:53,085-Speed 3395.90 samples/sec Loss 2.0125 LearningRate 0.0064 Epoch: 14 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:56,094-Speed 3404.74 samples/sec Loss 2.0517 LearningRate 0.0064 Epoch: 14 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:13:59,104-Speed 3402.89 samples/sec Loss 1.9334 LearningRate 0.0064 Epoch: 14 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:02,116-Speed 3400.09 samples/sec Loss 1.9329 LearningRate 0.0064 Epoch: 14 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:05,149-Speed 3377.11 samples/sec Loss 1.9364 LearningRate 0.0064 Epoch: 14 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:14:08,150-Speed 3413.22 samples/sec Loss 2.0254 LearningRate 0.0064 Epoch: 14 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:11,166-Speed 3395.94 samples/sec Loss 1.9222 LearningRate 0.0064 Epoch: 14 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:14,179-Speed 3400.15 samples/sec Loss 1.9659 LearningRate 0.0064 Epoch: 14 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:17,195-Speed 3395.05 samples/sec Loss 1.9735 LearningRate 0.0064 Epoch: 14 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:20,211-Speed 3396.29 samples/sec Loss 1.9769 LearningRate 0.0064 Epoch: 14 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:23,221-Speed 3402.60 samples/sec Loss 1.9997 LearningRate 0.0063 Epoch: 14 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:26,234-Speed 3400.51 samples/sec Loss 1.9807 LearningRate 0.0063 Epoch: 14 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:14:29,248-Speed 3397.89 samples/sec Loss 1.9763 LearningRate 0.0063 Epoch: 14 Global Step: 85090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:32,260-Speed 3400.31 samples/sec Loss 2.0154 LearningRate 0.0063 Epoch: 14 Global Step: 85100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:35,276-Speed 3396.04 samples/sec Loss 1.9170 LearningRate 0.0063 Epoch: 14 Global Step: 85110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:38,288-Speed 3400.52 samples/sec Loss 1.8738 LearningRate 0.0063 Epoch: 14 Global Step: 85120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:41,301-Speed 3398.86 samples/sec Loss 1.8638 LearningRate 0.0063 Epoch: 14 Global Step: 85130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:44,315-Speed 3398.47 samples/sec Loss 2.0134 LearningRate 0.0063 Epoch: 14 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:47,332-Speed 3395.70 samples/sec Loss 1.9286 LearningRate 0.0063 Epoch: 14 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:50,352-Speed 3391.06 samples/sec Loss 1.9891 LearningRate 0.0063 Epoch: 14 Global Step: 85160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:53,371-Speed 3392.74 samples/sec Loss 1.9681 LearningRate 0.0063 Epoch: 14 Global Step: 85170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:56,384-Speed 3399.82 samples/sec Loss 1.9172 LearningRate 0.0063 Epoch: 14 Global Step: 85180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:14:59,399-Speed 3397.02 samples/sec Loss 1.8849 LearningRate 0.0063 Epoch: 14 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:02,423-Speed 3386.26 samples/sec Loss 2.0098 LearningRate 0.0063 Epoch: 14 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:05,442-Speed 3393.39 samples/sec Loss 1.8999 LearningRate 0.0063 Epoch: 14 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:08,462-Speed 3391.14 samples/sec Loss 1.9369 LearningRate 0.0063 Epoch: 14 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:11,479-Speed 3394.45 samples/sec Loss 2.0581 LearningRate 0.0063 Epoch: 14 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:14,493-Speed 3398.51 samples/sec Loss 2.0384 LearningRate 0.0063 Epoch: 14 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:17,507-Speed 3398.08 samples/sec Loss 1.8645 LearningRate 0.0063 Epoch: 14 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:20,523-Speed 3396.57 samples/sec Loss 1.9839 LearningRate 0.0063 Epoch: 14 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:23,558-Speed 3375.17 samples/sec Loss 1.9740 LearningRate 0.0063 Epoch: 14 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:26,754-Speed 3204.79 samples/sec Loss 1.9393 LearningRate 0.0063 Epoch: 14 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:29,745-Speed 3423.77 samples/sec Loss 1.9709 LearningRate 0.0063 Epoch: 14 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:46,321-Speed 617.83 samples/sec Loss 1.4859 LearningRate 0.0062 Epoch: 15 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:49,336-Speed 3397.95 samples/sec Loss 1.3967 LearningRate 0.0062 Epoch: 15 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:52,391-Speed 3351.84 samples/sec Loss 1.4759 LearningRate 0.0062 Epoch: 15 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:55,418-Speed 3384.45 samples/sec Loss 1.4253 LearningRate 0.0062 Epoch: 15 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:15:58,437-Speed 3392.24 samples/sec Loss 1.3946 LearningRate 0.0062 Epoch: 15 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:01,450-Speed 3400.54 samples/sec Loss 1.4735 LearningRate 0.0062 Epoch: 15 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:04,481-Speed 3378.73 samples/sec Loss 1.4373 LearningRate 0.0062 Epoch: 15 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:07,516-Speed 3375.01 samples/sec Loss 1.3340 LearningRate 0.0062 Epoch: 15 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:10,532-Speed 3396.25 samples/sec Loss 1.4235 LearningRate 0.0062 Epoch: 15 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:13,639-Speed 3296.54 samples/sec Loss 1.4158 LearningRate 0.0062 Epoch: 15 Global Step: 85390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:16:16,659-Speed 3391.06 samples/sec Loss 1.3531 LearningRate 0.0062 Epoch: 15 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:19,685-Speed 3385.17 samples/sec Loss 1.5333 LearningRate 0.0062 Epoch: 15 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:22,714-Speed 3381.31 samples/sec Loss 1.4386 LearningRate 0.0062 Epoch: 15 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:25,733-Speed 3392.51 samples/sec Loss 1.5236 LearningRate 0.0062 Epoch: 15 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:28,802-Speed 3337.40 samples/sec Loss 1.4402 LearningRate 0.0062 Epoch: 15 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:31,818-Speed 3396.09 samples/sec Loss 1.3959 LearningRate 0.0062 Epoch: 15 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:34,844-Speed 3384.67 samples/sec Loss 1.5307 LearningRate 0.0062 Epoch: 15 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:37,916-Speed 3334.67 samples/sec Loss 1.4679 LearningRate 0.0062 Epoch: 15 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:41,085-Speed 3231.59 samples/sec Loss 1.5547 LearningRate 0.0062 Epoch: 15 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:44,098-Speed 3399.84 samples/sec Loss 1.5495 LearningRate 0.0062 Epoch: 15 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:47,111-Speed 3400.11 samples/sec Loss 1.4535 LearningRate 0.0062 Epoch: 15 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:50,133-Speed 3389.39 samples/sec Loss 1.4218 LearningRate 0.0062 Epoch: 15 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:53,156-Speed 3387.05 samples/sec Loss 1.4565 LearningRate 0.0061 Epoch: 15 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:56,177-Speed 3391.23 samples/sec Loss 1.5860 LearningRate 0.0061 Epoch: 15 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:16:59,209-Speed 3377.13 samples/sec Loss 1.3869 LearningRate 0.0061 Epoch: 15 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:02,291-Speed 3323.60 samples/sec Loss 1.4996 LearningRate 0.0061 Epoch: 15 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:05,355-Speed 3342.49 samples/sec Loss 1.4511 LearningRate 0.0061 Epoch: 15 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:08,379-Speed 3387.44 samples/sec Loss 1.3752 LearningRate 0.0061 Epoch: 15 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:11,401-Speed 3389.03 samples/sec Loss 1.4982 LearningRate 0.0061 Epoch: 15 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:14,432-Speed 3380.00 samples/sec Loss 1.4488 LearningRate 0.0061 Epoch: 15 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:17,443-Speed 3401.14 samples/sec Loss 1.4307 LearningRate 0.0061 Epoch: 15 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:20,465-Speed 3389.37 samples/sec Loss 1.4141 LearningRate 0.0061 Epoch: 15 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:23,488-Speed 3388.42 samples/sec Loss 1.5647 LearningRate 0.0061 Epoch: 15 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:26,515-Speed 3383.47 samples/sec Loss 1.3850 LearningRate 0.0061 Epoch: 15 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:29,542-Speed 3383.62 samples/sec Loss 1.4294 LearningRate 0.0061 Epoch: 15 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:32,568-Speed 3385.57 samples/sec Loss 1.4824 LearningRate 0.0061 Epoch: 15 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:35,593-Speed 3386.05 samples/sec Loss 1.5053 LearningRate 0.0061 Epoch: 15 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:38,618-Speed 3385.54 samples/sec Loss 1.5893 LearningRate 0.0061 Epoch: 15 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:41,654-Speed 3373.58 samples/sec Loss 1.4146 LearningRate 0.0061 Epoch: 15 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:44,676-Speed 3389.44 samples/sec Loss 1.4678 LearningRate 0.0061 Epoch: 15 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:17:47,701-Speed 3385.93 samples/sec Loss 1.4813 LearningRate 0.0061 Epoch: 15 Global Step: 85700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:17:50,692-Speed 3424.33 samples/sec Loss 1.4777 LearningRate 0.0061 Epoch: 15 Global Step: 85710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:17:53,724-Speed 3377.99 samples/sec Loss 1.5258 LearningRate 0.0061 Epoch: 15 Global Step: 85720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:17:56,750-Speed 3384.63 samples/sec Loss 1.5785 LearningRate 0.0061 Epoch: 15 Global Step: 85730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:17:59,777-Speed 3383.57 samples/sec Loss 1.5311 LearningRate 0.0061 Epoch: 15 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:02,797-Speed 3391.78 samples/sec Loss 1.3717 LearningRate 0.0060 Epoch: 15 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:05,827-Speed 3380.52 samples/sec Loss 1.3866 LearningRate 0.0060 Epoch: 15 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:08,847-Speed 3392.01 samples/sec Loss 1.4479 LearningRate 0.0060 Epoch: 15 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:11,869-Speed 3389.29 samples/sec Loss 1.4519 LearningRate 0.0060 Epoch: 15 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:14,896-Speed 3382.84 samples/sec Loss 1.5298 LearningRate 0.0060 Epoch: 15 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:17,922-Speed 3384.97 samples/sec Loss 1.4214 LearningRate 0.0060 Epoch: 15 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:18:20,950-Speed 3382.84 samples/sec Loss 1.5460 LearningRate 0.0060 Epoch: 15 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:23,971-Speed 3390.46 samples/sec Loss 1.4316 LearningRate 0.0060 Epoch: 15 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:26,989-Speed 3394.40 samples/sec Loss 1.4934 LearningRate 0.0060 Epoch: 15 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:30,016-Speed 3383.56 samples/sec Loss 1.4944 LearningRate 0.0060 Epoch: 15 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:33,044-Speed 3382.78 samples/sec Loss 1.5029 LearningRate 0.0060 Epoch: 15 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:36,087-Speed 3365.21 samples/sec Loss 1.4857 LearningRate 0.0060 Epoch: 15 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:39,140-Speed 3355.54 samples/sec Loss 1.4712 LearningRate 0.0060 Epoch: 15 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:42,163-Speed 3387.60 samples/sec Loss 1.5386 LearningRate 0.0060 Epoch: 15 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:45,185-Speed 3389.61 samples/sec Loss 1.4654 LearningRate 0.0060 Epoch: 15 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:48,210-Speed 3385.99 samples/sec Loss 1.4517 LearningRate 0.0060 Epoch: 15 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:51,243-Speed 3376.83 samples/sec Loss 1.5516 LearningRate 0.0060 Epoch: 15 Global Step: 85910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:18:54,250-Speed 3406.12 samples/sec Loss 1.5173 LearningRate 0.0060 Epoch: 15 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:18:57,270-Speed 3391.47 samples/sec Loss 1.5602 LearningRate 0.0060 Epoch: 15 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:00,296-Speed 3384.55 samples/sec Loss 1.5593 LearningRate 0.0060 Epoch: 15 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:03,322-Speed 3385.50 samples/sec Loss 1.5660 LearningRate 0.0060 Epoch: 15 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:06,345-Speed 3387.96 samples/sec Loss 1.6122 LearningRate 0.0060 Epoch: 15 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:09,369-Speed 3387.08 samples/sec Loss 1.4995 LearningRate 0.0060 Epoch: 15 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:12,396-Speed 3382.79 samples/sec Loss 1.4977 LearningRate 0.0060 Epoch: 15 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:15,424-Speed 3383.34 samples/sec Loss 1.5299 LearningRate 0.0059 Epoch: 15 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:19:18,459-Speed 3373.99 samples/sec Loss 1.5473 LearningRate 0.0059 Epoch: 15 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:20:01,719-[lfw][86000]XNorm: 21.542649 Training: 2022-04-27 10:20:01,720-[lfw][86000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-27 10:20:01,720-[lfw][86000]Accuracy-Highest: 0.99817 Training: 2022-04-27 10:20:51,877-[cfp_fp][86000]XNorm: 20.821875 Training: 2022-04-27 10:20:51,878-[cfp_fp][86000]Accuracy-Flip: 0.97843+-0.00671 Training: 2022-04-27 10:20:51,878-[cfp_fp][86000]Accuracy-Highest: 0.97843 Training: 2022-04-27 10:21:35,061-[agedb_30][86000]XNorm: 21.790368 Training: 2022-04-27 10:21:35,062-[agedb_30][86000]Accuracy-Flip: 0.98117+-0.00667 Training: 2022-04-27 10:21:35,062-[agedb_30][86000]Accuracy-Highest: 0.98133 Training: 2022-04-27 10:21:38,102-Speed 73.33 samples/sec Loss 1.4744 LearningRate 0.0059 Epoch: 15 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:41,095-Speed 3422.58 samples/sec Loss 1.5950 LearningRate 0.0059 Epoch: 15 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:44,107-Speed 3401.05 samples/sec Loss 1.4545 LearningRate 0.0059 Epoch: 15 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:47,120-Speed 3399.73 samples/sec Loss 1.5157 LearningRate 0.0059 Epoch: 15 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:50,139-Speed 3391.83 samples/sec Loss 1.4122 LearningRate 0.0059 Epoch: 15 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:53,161-Speed 3388.89 samples/sec Loss 1.4399 LearningRate 0.0059 Epoch: 15 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:56,178-Speed 3395.31 samples/sec Loss 1.4436 LearningRate 0.0059 Epoch: 15 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:21:59,212-Speed 3376.16 samples/sec Loss 1.5344 LearningRate 0.0059 Epoch: 15 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:02,226-Speed 3398.35 samples/sec Loss 1.4951 LearningRate 0.0059 Epoch: 15 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:05,249-Speed 3387.92 samples/sec Loss 1.5301 LearningRate 0.0059 Epoch: 15 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:08,274-Speed 3385.88 samples/sec Loss 1.5261 LearningRate 0.0059 Epoch: 15 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:11,325-Speed 3357.32 samples/sec Loss 1.5562 LearningRate 0.0059 Epoch: 15 Global Step: 86120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:22:14,355-Speed 3381.13 samples/sec Loss 1.5574 LearningRate 0.0059 Epoch: 15 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:17,449-Speed 3310.16 samples/sec Loss 1.4477 LearningRate 0.0059 Epoch: 15 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:20,476-Speed 3383.53 samples/sec Loss 1.5243 LearningRate 0.0059 Epoch: 15 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:23,500-Speed 3387.04 samples/sec Loss 1.5315 LearningRate 0.0059 Epoch: 15 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:26,525-Speed 3385.67 samples/sec Loss 1.5136 LearningRate 0.0059 Epoch: 15 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:29,552-Speed 3384.21 samples/sec Loss 1.5110 LearningRate 0.0059 Epoch: 15 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:32,578-Speed 3384.06 samples/sec Loss 1.5144 LearningRate 0.0059 Epoch: 15 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:35,606-Speed 3383.00 samples/sec Loss 1.5902 LearningRate 0.0059 Epoch: 15 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:38,634-Speed 3382.63 samples/sec Loss 1.5956 LearningRate 0.0059 Epoch: 15 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:41,660-Speed 3385.12 samples/sec Loss 1.5530 LearningRate 0.0058 Epoch: 15 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:44,679-Speed 3393.18 samples/sec Loss 1.4270 LearningRate 0.0058 Epoch: 15 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:47,702-Speed 3387.54 samples/sec Loss 1.5653 LearningRate 0.0058 Epoch: 15 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:50,743-Speed 3368.86 samples/sec Loss 1.4809 LearningRate 0.0058 Epoch: 15 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:53,769-Speed 3384.71 samples/sec Loss 1.5979 LearningRate 0.0058 Epoch: 15 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:56,796-Speed 3383.09 samples/sec Loss 1.5790 LearningRate 0.0058 Epoch: 15 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:22:59,831-Speed 3375.09 samples/sec Loss 1.5410 LearningRate 0.0058 Epoch: 15 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:03,018-Speed 3213.96 samples/sec Loss 1.5083 LearningRate 0.0058 Epoch: 15 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:06,098-Speed 3325.17 samples/sec Loss 1.5463 LearningRate 0.0058 Epoch: 15 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:09,124-Speed 3384.95 samples/sec Loss 1.5634 LearningRate 0.0058 Epoch: 15 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:12,151-Speed 3383.65 samples/sec Loss 1.5763 LearningRate 0.0058 Epoch: 15 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:15,161-Speed 3402.96 samples/sec Loss 1.5594 LearningRate 0.0058 Epoch: 15 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:18,183-Speed 3389.36 samples/sec Loss 1.6538 LearningRate 0.0058 Epoch: 15 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:21,204-Speed 3390.02 samples/sec Loss 1.5677 LearningRate 0.0058 Epoch: 15 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:24,228-Speed 3387.47 samples/sec Loss 1.5334 LearningRate 0.0058 Epoch: 15 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:27,349-Speed 3280.94 samples/sec Loss 1.5534 LearningRate 0.0058 Epoch: 15 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:30,516-Speed 3233.97 samples/sec Loss 1.6135 LearningRate 0.0058 Epoch: 15 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:33,540-Speed 3388.21 samples/sec Loss 1.5656 LearningRate 0.0058 Epoch: 15 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:36,562-Speed 3388.78 samples/sec Loss 1.4985 LearningRate 0.0058 Epoch: 15 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:39,581-Speed 3392.33 samples/sec Loss 1.5046 LearningRate 0.0058 Epoch: 15 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:23:42,581-Speed 3415.18 samples/sec Loss 1.5446 LearningRate 0.0058 Epoch: 15 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:23:45,604-Speed 3387.53 samples/sec Loss 1.5667 LearningRate 0.0058 Epoch: 15 Global Step: 86430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:23:48,629-Speed 3386.27 samples/sec Loss 1.4914 LearningRate 0.0058 Epoch: 15 Global Step: 86440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:23:51,657-Speed 3381.75 samples/sec Loss 1.5742 LearningRate 0.0058 Epoch: 15 Global Step: 86450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:23:54,681-Speed 3387.78 samples/sec Loss 1.4021 LearningRate 0.0057 Epoch: 15 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:23:57,744-Speed 3343.65 samples/sec Loss 1.5065 LearningRate 0.0057 Epoch: 15 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:24:00,788-Speed 3365.33 samples/sec Loss 1.6607 LearningRate 0.0057 Epoch: 15 Global Step: 86480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:24:03,820-Speed 3377.56 samples/sec Loss 1.5547 LearningRate 0.0057 Epoch: 15 Global Step: 86490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:24:06,839-Speed 3392.39 samples/sec Loss 1.5914 LearningRate 0.0057 Epoch: 15 Global Step: 86500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:24:09,859-Speed 3391.88 samples/sec Loss 1.6592 LearningRate 0.0057 Epoch: 15 Global Step: 86510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:24:12,882-Speed 3388.57 samples/sec Loss 1.5679 LearningRate 0.0057 Epoch: 15 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:15,916-Speed 3375.67 samples/sec Loss 1.6273 LearningRate 0.0057 Epoch: 15 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:18,931-Speed 3397.41 samples/sec Loss 1.4877 LearningRate 0.0057 Epoch: 15 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:21,955-Speed 3386.66 samples/sec Loss 1.5322 LearningRate 0.0057 Epoch: 15 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:24,990-Speed 3375.35 samples/sec Loss 1.5513 LearningRate 0.0057 Epoch: 15 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:28,006-Speed 3395.37 samples/sec Loss 1.6790 LearningRate 0.0057 Epoch: 15 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:31,022-Speed 3396.53 samples/sec Loss 1.5215 LearningRate 0.0057 Epoch: 15 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:34,041-Speed 3392.89 samples/sec Loss 1.5916 LearningRate 0.0057 Epoch: 15 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:37,060-Speed 3392.93 samples/sec Loss 1.6378 LearningRate 0.0057 Epoch: 15 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:40,076-Speed 3395.07 samples/sec Loss 1.6171 LearningRate 0.0057 Epoch: 15 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:43,096-Speed 3391.39 samples/sec Loss 1.5732 LearningRate 0.0057 Epoch: 15 Global Step: 86620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:24:46,100-Speed 3410.38 samples/sec Loss 1.5993 LearningRate 0.0057 Epoch: 15 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:49,126-Speed 3384.60 samples/sec Loss 1.4955 LearningRate 0.0057 Epoch: 15 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:52,153-Speed 3384.31 samples/sec Loss 1.5942 LearningRate 0.0057 Epoch: 15 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:55,167-Speed 3397.76 samples/sec Loss 1.5584 LearningRate 0.0057 Epoch: 15 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:24:58,185-Speed 3393.38 samples/sec Loss 1.5902 LearningRate 0.0057 Epoch: 15 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:01,210-Speed 3386.03 samples/sec Loss 1.4975 LearningRate 0.0057 Epoch: 15 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:04,265-Speed 3353.17 samples/sec Loss 1.5660 LearningRate 0.0056 Epoch: 15 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:07,284-Speed 3391.78 samples/sec Loss 1.6133 LearningRate 0.0056 Epoch: 15 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:10,316-Speed 3378.99 samples/sec Loss 1.6046 LearningRate 0.0056 Epoch: 15 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:13,380-Speed 3342.02 samples/sec Loss 1.5492 LearningRate 0.0056 Epoch: 15 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:16,401-Speed 3391.05 samples/sec Loss 1.5923 LearningRate 0.0056 Epoch: 15 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:19,425-Speed 3387.17 samples/sec Loss 1.5883 LearningRate 0.0056 Epoch: 15 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:22,453-Speed 3382.75 samples/sec Loss 1.5417 LearningRate 0.0056 Epoch: 15 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:25,597-Speed 3257.20 samples/sec Loss 1.5727 LearningRate 0.0056 Epoch: 15 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:28,614-Speed 3394.59 samples/sec Loss 1.5216 LearningRate 0.0056 Epoch: 15 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:31,638-Speed 3388.11 samples/sec Loss 1.4687 LearningRate 0.0056 Epoch: 15 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:34,654-Speed 3395.08 samples/sec Loss 1.5749 LearningRate 0.0056 Epoch: 15 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:37,681-Speed 3383.41 samples/sec Loss 1.5914 LearningRate 0.0056 Epoch: 15 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:40,703-Speed 3389.18 samples/sec Loss 1.5311 LearningRate 0.0056 Epoch: 15 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:43,732-Speed 3382.34 samples/sec Loss 1.5769 LearningRate 0.0056 Epoch: 15 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:46,757-Speed 3386.03 samples/sec Loss 1.6556 LearningRate 0.0056 Epoch: 15 Global Step: 86830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:25:49,814-Speed 3349.91 samples/sec Loss 1.6062 LearningRate 0.0056 Epoch: 15 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:52,977-Speed 3238.17 samples/sec Loss 1.4731 LearningRate 0.0056 Epoch: 15 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:56,014-Speed 3372.68 samples/sec Loss 1.6328 LearningRate 0.0056 Epoch: 15 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:25:59,040-Speed 3385.09 samples/sec Loss 1.6599 LearningRate 0.0056 Epoch: 15 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:02,062-Speed 3388.61 samples/sec Loss 1.5928 LearningRate 0.0056 Epoch: 15 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:05,097-Speed 3375.14 samples/sec Loss 1.6765 LearningRate 0.0056 Epoch: 15 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:08,127-Speed 3379.94 samples/sec Loss 1.5602 LearningRate 0.0056 Epoch: 15 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:11,147-Speed 3391.49 samples/sec Loss 1.5812 LearningRate 0.0056 Epoch: 15 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:14,169-Speed 3390.26 samples/sec Loss 1.6327 LearningRate 0.0056 Epoch: 15 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:17,191-Speed 3389.20 samples/sec Loss 1.6155 LearningRate 0.0055 Epoch: 15 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:20,201-Speed 3403.71 samples/sec Loss 1.5218 LearningRate 0.0055 Epoch: 15 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:23,235-Speed 3375.22 samples/sec Loss 1.6235 LearningRate 0.0055 Epoch: 15 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:26,263-Speed 3383.15 samples/sec Loss 1.5942 LearningRate 0.0055 Epoch: 15 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:29,290-Speed 3383.35 samples/sec Loss 1.5518 LearningRate 0.0055 Epoch: 15 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:32,318-Speed 3382.47 samples/sec Loss 1.5455 LearningRate 0.0055 Epoch: 15 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:35,347-Speed 3381.41 samples/sec Loss 1.6409 LearningRate 0.0055 Epoch: 15 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:38,378-Speed 3378.94 samples/sec Loss 1.5652 LearningRate 0.0055 Epoch: 15 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:41,402-Speed 3387.35 samples/sec Loss 1.5713 LearningRate 0.0055 Epoch: 15 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:44,424-Speed 3389.08 samples/sec Loss 1.6285 LearningRate 0.0055 Epoch: 15 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:47,453-Speed 3381.56 samples/sec Loss 1.5093 LearningRate 0.0055 Epoch: 15 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:50,481-Speed 3381.96 samples/sec Loss 1.5724 LearningRate 0.0055 Epoch: 15 Global Step: 87040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:26:53,491-Speed 3403.85 samples/sec Loss 1.5701 LearningRate 0.0055 Epoch: 15 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:56,540-Speed 3359.18 samples/sec Loss 1.6433 LearningRate 0.0055 Epoch: 15 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:26:59,574-Speed 3375.10 samples/sec Loss 1.5937 LearningRate 0.0055 Epoch: 15 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:02,618-Speed 3364.91 samples/sec Loss 1.5353 LearningRate 0.0055 Epoch: 15 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:05,671-Speed 3355.10 samples/sec Loss 1.6973 LearningRate 0.0055 Epoch: 15 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:08,695-Speed 3387.04 samples/sec Loss 1.6117 LearningRate 0.0055 Epoch: 15 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:11,792-Speed 3307.56 samples/sec Loss 1.6089 LearningRate 0.0055 Epoch: 15 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:14,818-Speed 3384.24 samples/sec Loss 1.5426 LearningRate 0.0055 Epoch: 15 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:17,862-Speed 3365.22 samples/sec Loss 1.5963 LearningRate 0.0055 Epoch: 15 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:20,891-Speed 3381.31 samples/sec Loss 1.6475 LearningRate 0.0055 Epoch: 15 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:23,899-Speed 3405.30 samples/sec Loss 1.6090 LearningRate 0.0055 Epoch: 15 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:26,932-Speed 3376.53 samples/sec Loss 1.6238 LearningRate 0.0055 Epoch: 15 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:29,966-Speed 3376.31 samples/sec Loss 1.5499 LearningRate 0.0055 Epoch: 15 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:33,009-Speed 3365.59 samples/sec Loss 1.5713 LearningRate 0.0054 Epoch: 15 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:36,035-Speed 3385.46 samples/sec Loss 1.6522 LearningRate 0.0054 Epoch: 15 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:39,091-Speed 3351.04 samples/sec Loss 1.6803 LearningRate 0.0054 Epoch: 15 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:43,150-Speed 2523.21 samples/sec Loss 1.6224 LearningRate 0.0054 Epoch: 15 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:46,192-Speed 3367.07 samples/sec Loss 1.6390 LearningRate 0.0054 Epoch: 15 Global Step: 87220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:49,281-Speed 3315.75 samples/sec Loss 1.5524 LearningRate 0.0054 Epoch: 15 Global Step: 87230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:52,343-Speed 3345.07 samples/sec Loss 1.5492 LearningRate 0.0054 Epoch: 15 Global Step: 87240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:27:55,380-Speed 3372.10 samples/sec Loss 1.6383 LearningRate 0.0054 Epoch: 15 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:27:58,403-Speed 3389.26 samples/sec Loss 1.5171 LearningRate 0.0054 Epoch: 15 Global Step: 87260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:01,430-Speed 3383.34 samples/sec Loss 1.6026 LearningRate 0.0054 Epoch: 15 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:04,455-Speed 3385.99 samples/sec Loss 1.6173 LearningRate 0.0054 Epoch: 15 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:07,486-Speed 3379.65 samples/sec Loss 1.6114 LearningRate 0.0054 Epoch: 15 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:10,515-Speed 3381.17 samples/sec Loss 1.6423 LearningRate 0.0054 Epoch: 15 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:13,542-Speed 3383.05 samples/sec Loss 1.6154 LearningRate 0.0054 Epoch: 15 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:16,574-Speed 3378.08 samples/sec Loss 1.5884 LearningRate 0.0054 Epoch: 15 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:19,597-Speed 3387.97 samples/sec Loss 1.6416 LearningRate 0.0054 Epoch: 15 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:22,625-Speed 3382.79 samples/sec Loss 1.5158 LearningRate 0.0054 Epoch: 15 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:25,658-Speed 3376.83 samples/sec Loss 1.5815 LearningRate 0.0054 Epoch: 15 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:28,683-Speed 3386.93 samples/sec Loss 1.5446 LearningRate 0.0054 Epoch: 15 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:28:31,690-Speed 3405.48 samples/sec Loss 1.5387 LearningRate 0.0054 Epoch: 15 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:34,724-Speed 3376.25 samples/sec Loss 1.6602 LearningRate 0.0054 Epoch: 15 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:37,757-Speed 3376.85 samples/sec Loss 1.4831 LearningRate 0.0054 Epoch: 15 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:40,788-Speed 3378.53 samples/sec Loss 1.5671 LearningRate 0.0054 Epoch: 15 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:43,816-Speed 3382.72 samples/sec Loss 1.6067 LearningRate 0.0054 Epoch: 15 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:46,847-Speed 3379.58 samples/sec Loss 1.5513 LearningRate 0.0053 Epoch: 15 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:49,875-Speed 3382.00 samples/sec Loss 1.7207 LearningRate 0.0053 Epoch: 15 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:52,910-Speed 3374.85 samples/sec Loss 1.6437 LearningRate 0.0053 Epoch: 15 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:55,940-Speed 3380.71 samples/sec Loss 1.5862 LearningRate 0.0053 Epoch: 15 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:28:58,994-Speed 3353.80 samples/sec Loss 1.5504 LearningRate 0.0053 Epoch: 15 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:02,097-Speed 3301.33 samples/sec Loss 1.6292 LearningRate 0.0053 Epoch: 15 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:05,108-Speed 3401.41 samples/sec Loss 1.4840 LearningRate 0.0053 Epoch: 15 Global Step: 87480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:08,134-Speed 3384.38 samples/sec Loss 1.5584 LearningRate 0.0053 Epoch: 15 Global Step: 87490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:11,161-Speed 3383.71 samples/sec Loss 1.6630 LearningRate 0.0053 Epoch: 15 Global Step: 87500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:14,193-Speed 3378.04 samples/sec Loss 1.7001 LearningRate 0.0053 Epoch: 15 Global Step: 87510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:17,227-Speed 3376.17 samples/sec Loss 1.5827 LearningRate 0.0053 Epoch: 15 Global Step: 87520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:20,259-Speed 3377.79 samples/sec Loss 1.5978 LearningRate 0.0053 Epoch: 15 Global Step: 87530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:23,289-Speed 3380.86 samples/sec Loss 1.5291 LearningRate 0.0053 Epoch: 15 Global Step: 87540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:26,321-Speed 3377.49 samples/sec Loss 1.5276 LearningRate 0.0053 Epoch: 15 Global Step: 87550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:29,355-Speed 3376.88 samples/sec Loss 1.5937 LearningRate 0.0053 Epoch: 15 Global Step: 87560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:32,379-Speed 3386.04 samples/sec Loss 1.5851 LearningRate 0.0053 Epoch: 15 Global Step: 87570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:29:35,405-Speed 3384.76 samples/sec Loss 1.5532 LearningRate 0.0053 Epoch: 15 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:38,429-Speed 3386.88 samples/sec Loss 1.6109 LearningRate 0.0053 Epoch: 15 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:41,454-Speed 3386.37 samples/sec Loss 1.6297 LearningRate 0.0053 Epoch: 15 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:44,485-Speed 3379.02 samples/sec Loss 1.6277 LearningRate 0.0053 Epoch: 15 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:47,536-Speed 3357.17 samples/sec Loss 1.5422 LearningRate 0.0053 Epoch: 15 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:50,617-Speed 3324.29 samples/sec Loss 1.5849 LearningRate 0.0053 Epoch: 15 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:53,793-Speed 3225.28 samples/sec Loss 1.5303 LearningRate 0.0053 Epoch: 15 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:56,820-Speed 3383.49 samples/sec Loss 1.5705 LearningRate 0.0053 Epoch: 15 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:29:59,860-Speed 3369.82 samples/sec Loss 1.6224 LearningRate 0.0053 Epoch: 15 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:02,899-Speed 3369.98 samples/sec Loss 1.7015 LearningRate 0.0052 Epoch: 15 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:05,931-Speed 3377.80 samples/sec Loss 1.5707 LearningRate 0.0052 Epoch: 15 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:30:08,935-Speed 3409.26 samples/sec Loss 1.5940 LearningRate 0.0052 Epoch: 15 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:11,977-Speed 3367.20 samples/sec Loss 1.6113 LearningRate 0.0052 Epoch: 15 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:15,011-Speed 3376.47 samples/sec Loss 1.4478 LearningRate 0.0052 Epoch: 15 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:18,078-Speed 3339.30 samples/sec Loss 1.5984 LearningRate 0.0052 Epoch: 15 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:21,102-Speed 3386.78 samples/sec Loss 1.6082 LearningRate 0.0052 Epoch: 15 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:24,129-Speed 3384.17 samples/sec Loss 1.6324 LearningRate 0.0052 Epoch: 15 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:30:27,140-Speed 3401.86 samples/sec Loss 1.6577 LearningRate 0.0052 Epoch: 15 Global Step: 87750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:30,164-Speed 3386.39 samples/sec Loss 1.5609 LearningRate 0.0052 Epoch: 15 Global Step: 87760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:33,188-Speed 3387.94 samples/sec Loss 1.6255 LearningRate 0.0052 Epoch: 15 Global Step: 87770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:36,239-Speed 3356.90 samples/sec Loss 1.5500 LearningRate 0.0052 Epoch: 15 Global Step: 87780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:39,259-Speed 3391.03 samples/sec Loss 1.5829 LearningRate 0.0052 Epoch: 15 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:42,287-Speed 3382.66 samples/sec Loss 1.5983 LearningRate 0.0052 Epoch: 15 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:45,311-Speed 3387.26 samples/sec Loss 1.6840 LearningRate 0.0052 Epoch: 15 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:48,336-Speed 3386.57 samples/sec Loss 1.6602 LearningRate 0.0052 Epoch: 15 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:51,371-Speed 3373.86 samples/sec Loss 1.6534 LearningRate 0.0052 Epoch: 15 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:54,413-Speed 3367.76 samples/sec Loss 1.6560 LearningRate 0.0052 Epoch: 15 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:30:57,443-Speed 3380.30 samples/sec Loss 1.6310 LearningRate 0.0052 Epoch: 15 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:00,469-Speed 3384.64 samples/sec Loss 1.5851 LearningRate 0.0052 Epoch: 15 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:03,499-Speed 3380.35 samples/sec Loss 1.6391 LearningRate 0.0052 Epoch: 15 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:06,537-Speed 3370.88 samples/sec Loss 1.4551 LearningRate 0.0052 Epoch: 15 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:09,568-Speed 3380.21 samples/sec Loss 1.6156 LearningRate 0.0052 Epoch: 15 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:12,591-Speed 3388.37 samples/sec Loss 1.6567 LearningRate 0.0052 Epoch: 15 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:15,612-Speed 3389.84 samples/sec Loss 1.6608 LearningRate 0.0052 Epoch: 15 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:18,645-Speed 3376.53 samples/sec Loss 1.5519 LearningRate 0.0051 Epoch: 15 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:21,668-Speed 3388.49 samples/sec Loss 1.5856 LearningRate 0.0051 Epoch: 15 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:24,708-Speed 3368.66 samples/sec Loss 1.5691 LearningRate 0.0051 Epoch: 15 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:27,752-Speed 3365.04 samples/sec Loss 1.5105 LearningRate 0.0051 Epoch: 15 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:30,784-Speed 3378.09 samples/sec Loss 1.5307 LearningRate 0.0051 Epoch: 15 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:33,852-Speed 3338.79 samples/sec Loss 1.5671 LearningRate 0.0051 Epoch: 15 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:36,888-Speed 3373.80 samples/sec Loss 1.5978 LearningRate 0.0051 Epoch: 15 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:39,913-Speed 3386.00 samples/sec Loss 1.6198 LearningRate 0.0051 Epoch: 15 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:31:42,941-Speed 3382.51 samples/sec Loss 1.6323 LearningRate 0.0051 Epoch: 15 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:32:26,129-[lfw][88000]XNorm: 21.941218 Training: 2022-04-27 10:32:26,129-[lfw][88000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-27 10:32:26,130-[lfw][88000]Accuracy-Highest: 0.99817 Training: 2022-04-27 10:33:16,244-[cfp_fp][88000]XNorm: 21.148112 Training: 2022-04-27 10:33:16,245-[cfp_fp][88000]Accuracy-Flip: 0.98257+-0.00530 Training: 2022-04-27 10:33:16,245-[cfp_fp][88000]Accuracy-Highest: 0.98257 Training: 2022-04-27 10:33:59,544-[agedb_30][88000]XNorm: 22.234802 Training: 2022-04-27 10:33:59,545-[agedb_30][88000]Accuracy-Flip: 0.97917+-0.00807 Training: 2022-04-27 10:33:59,545-[agedb_30][88000]Accuracy-Highest: 0.98133 Training: 2022-04-27 10:34:02,563-Speed 73.34 samples/sec Loss 1.4934 LearningRate 0.0051 Epoch: 15 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:05,567-Speed 3408.71 samples/sec Loss 1.5938 LearningRate 0.0051 Epoch: 15 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:08,583-Speed 3396.41 samples/sec Loss 1.4990 LearningRate 0.0051 Epoch: 15 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:11,592-Speed 3403.63 samples/sec Loss 1.5828 LearningRate 0.0051 Epoch: 15 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:14,617-Speed 3385.89 samples/sec Loss 1.6293 LearningRate 0.0051 Epoch: 15 Global Step: 88050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:34:17,614-Speed 3418.02 samples/sec Loss 1.6837 LearningRate 0.0051 Epoch: 15 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:20,635-Speed 3390.04 samples/sec Loss 1.5367 LearningRate 0.0051 Epoch: 15 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:23,651-Speed 3396.53 samples/sec Loss 1.6277 LearningRate 0.0051 Epoch: 15 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:26,665-Speed 3398.00 samples/sec Loss 1.5439 LearningRate 0.0051 Epoch: 15 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:29,684-Speed 3392.72 samples/sec Loss 1.7030 LearningRate 0.0051 Epoch: 15 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:32,698-Speed 3397.90 samples/sec Loss 1.5379 LearningRate 0.0051 Epoch: 15 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:35,735-Speed 3372.69 samples/sec Loss 1.6235 LearningRate 0.0051 Epoch: 15 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:38,750-Speed 3397.18 samples/sec Loss 1.5665 LearningRate 0.0051 Epoch: 15 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:41,764-Speed 3398.01 samples/sec Loss 1.5753 LearningRate 0.0051 Epoch: 15 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:44,779-Speed 3396.73 samples/sec Loss 1.5127 LearningRate 0.0051 Epoch: 15 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:47,782-Speed 3411.07 samples/sec Loss 1.4788 LearningRate 0.0051 Epoch: 15 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:50,797-Speed 3396.94 samples/sec Loss 1.6584 LearningRate 0.0050 Epoch: 15 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:53,812-Speed 3397.33 samples/sec Loss 1.5218 LearningRate 0.0050 Epoch: 15 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:56,829-Speed 3395.35 samples/sec Loss 1.6262 LearningRate 0.0050 Epoch: 15 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:34:59,843-Speed 3398.02 samples/sec Loss 1.5104 LearningRate 0.0050 Epoch: 15 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:02,868-Speed 3386.36 samples/sec Loss 1.6174 LearningRate 0.0050 Epoch: 15 Global Step: 88210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:05,886-Speed 3392.80 samples/sec Loss 1.6389 LearningRate 0.0050 Epoch: 15 Global Step: 88220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:08,911-Speed 3386.22 samples/sec Loss 1.6163 LearningRate 0.0050 Epoch: 15 Global Step: 88230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:11,936-Speed 3386.62 samples/sec Loss 1.7125 LearningRate 0.0050 Epoch: 15 Global Step: 88240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:14,949-Speed 3398.62 samples/sec Loss 1.5865 LearningRate 0.0050 Epoch: 15 Global Step: 88250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:17,974-Speed 3386.21 samples/sec Loss 1.6252 LearningRate 0.0050 Epoch: 15 Global Step: 88260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:20,992-Speed 3393.93 samples/sec Loss 1.6026 LearningRate 0.0050 Epoch: 15 Global Step: 88270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:24,016-Speed 3387.11 samples/sec Loss 1.4805 LearningRate 0.0050 Epoch: 15 Global Step: 88280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:27,028-Speed 3400.57 samples/sec Loss 1.5606 LearningRate 0.0050 Epoch: 15 Global Step: 88290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:30,038-Speed 3402.00 samples/sec Loss 1.6676 LearningRate 0.0050 Epoch: 15 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:35:33,053-Speed 3397.70 samples/sec Loss 1.6206 LearningRate 0.0050 Epoch: 15 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:36,083-Speed 3380.84 samples/sec Loss 1.5974 LearningRate 0.0050 Epoch: 15 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:39,096-Speed 3398.34 samples/sec Loss 1.6643 LearningRate 0.0050 Epoch: 15 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:42,122-Speed 3385.41 samples/sec Loss 1.4757 LearningRate 0.0050 Epoch: 15 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:45,143-Speed 3390.57 samples/sec Loss 1.5958 LearningRate 0.0050 Epoch: 15 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:48,166-Speed 3387.61 samples/sec Loss 1.5383 LearningRate 0.0050 Epoch: 15 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:51,180-Speed 3398.44 samples/sec Loss 1.6458 LearningRate 0.0050 Epoch: 15 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:54,194-Speed 3398.58 samples/sec Loss 1.5559 LearningRate 0.0050 Epoch: 15 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:35:57,206-Speed 3400.16 samples/sec Loss 1.6235 LearningRate 0.0050 Epoch: 15 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:00,222-Speed 3396.45 samples/sec Loss 1.5190 LearningRate 0.0050 Epoch: 15 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:03,218-Speed 3418.92 samples/sec Loss 1.6227 LearningRate 0.0050 Epoch: 15 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:06,233-Speed 3397.38 samples/sec Loss 1.6180 LearningRate 0.0049 Epoch: 15 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:09,252-Speed 3391.82 samples/sec Loss 1.6148 LearningRate 0.0049 Epoch: 15 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:12,284-Speed 3380.06 samples/sec Loss 1.4630 LearningRate 0.0049 Epoch: 15 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:15,323-Speed 3370.73 samples/sec Loss 1.5377 LearningRate 0.0049 Epoch: 15 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:18,337-Speed 3397.85 samples/sec Loss 1.6281 LearningRate 0.0049 Epoch: 15 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:21,347-Speed 3402.83 samples/sec Loss 1.6743 LearningRate 0.0049 Epoch: 15 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:24,359-Speed 3400.25 samples/sec Loss 1.6182 LearningRate 0.0049 Epoch: 15 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:27,473-Speed 3289.11 samples/sec Loss 1.6027 LearningRate 0.0049 Epoch: 15 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:30,493-Speed 3391.87 samples/sec Loss 1.5852 LearningRate 0.0049 Epoch: 15 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:33,489-Speed 3417.71 samples/sec Loss 1.6310 LearningRate 0.0049 Epoch: 15 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:36,511-Speed 3389.81 samples/sec Loss 1.5899 LearningRate 0.0049 Epoch: 15 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:39,540-Speed 3381.49 samples/sec Loss 1.6472 LearningRate 0.0049 Epoch: 15 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:42,587-Speed 3362.06 samples/sec Loss 1.5710 LearningRate 0.0049 Epoch: 15 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:45,601-Speed 3398.68 samples/sec Loss 1.5701 LearningRate 0.0049 Epoch: 15 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:48,617-Speed 3395.94 samples/sec Loss 1.6149 LearningRate 0.0049 Epoch: 15 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:51,646-Speed 3380.75 samples/sec Loss 1.5975 LearningRate 0.0049 Epoch: 15 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:54,659-Speed 3399.39 samples/sec Loss 1.5936 LearningRate 0.0049 Epoch: 15 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:36:57,671-Speed 3400.33 samples/sec Loss 1.5810 LearningRate 0.0049 Epoch: 15 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:00,753-Speed 3323.54 samples/sec Loss 1.5369 LearningRate 0.0049 Epoch: 15 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:03,765-Speed 3400.59 samples/sec Loss 1.7694 LearningRate 0.0049 Epoch: 15 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:06,789-Speed 3386.31 samples/sec Loss 1.6209 LearningRate 0.0049 Epoch: 15 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:09,822-Speed 3377.91 samples/sec Loss 1.5191 LearningRate 0.0049 Epoch: 15 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:12,835-Speed 3399.58 samples/sec Loss 1.5781 LearningRate 0.0049 Epoch: 15 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:15,848-Speed 3399.30 samples/sec Loss 1.5871 LearningRate 0.0049 Epoch: 15 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:18,865-Speed 3394.66 samples/sec Loss 1.5709 LearningRate 0.0049 Epoch: 15 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:21,879-Speed 3398.58 samples/sec Loss 1.5176 LearningRate 0.0049 Epoch: 15 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:24,992-Speed 3289.44 samples/sec Loss 1.6016 LearningRate 0.0048 Epoch: 15 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:28,057-Speed 3342.25 samples/sec Loss 1.5309 LearningRate 0.0048 Epoch: 15 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:31,149-Speed 3312.67 samples/sec Loss 1.5851 LearningRate 0.0048 Epoch: 15 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:34,146-Speed 3416.61 samples/sec Loss 1.5036 LearningRate 0.0048 Epoch: 15 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:37,180-Speed 3377.24 samples/sec Loss 1.5949 LearningRate 0.0048 Epoch: 15 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:40,199-Speed 3393.15 samples/sec Loss 1.5681 LearningRate 0.0048 Epoch: 15 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:43,215-Speed 3396.38 samples/sec Loss 1.6351 LearningRate 0.0048 Epoch: 15 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:46,230-Speed 3396.36 samples/sec Loss 1.6005 LearningRate 0.0048 Epoch: 15 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:49,378-Speed 3253.70 samples/sec Loss 1.6195 LearningRate 0.0048 Epoch: 15 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:52,493-Speed 3288.07 samples/sec Loss 1.6260 LearningRate 0.0048 Epoch: 15 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:55,538-Speed 3363.94 samples/sec Loss 1.5569 LearningRate 0.0048 Epoch: 15 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:37:58,578-Speed 3369.03 samples/sec Loss 1.6519 LearningRate 0.0048 Epoch: 15 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:01,599-Speed 3390.31 samples/sec Loss 1.5504 LearningRate 0.0048 Epoch: 15 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:04,606-Speed 3405.54 samples/sec Loss 1.5817 LearningRate 0.0048 Epoch: 15 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:07,637-Speed 3379.98 samples/sec Loss 1.6296 LearningRate 0.0048 Epoch: 15 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:10,666-Speed 3381.52 samples/sec Loss 1.6996 LearningRate 0.0048 Epoch: 15 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:13,688-Speed 3389.68 samples/sec Loss 1.5785 LearningRate 0.0048 Epoch: 15 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:16,758-Speed 3335.59 samples/sec Loss 1.5560 LearningRate 0.0048 Epoch: 15 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:19,778-Speed 3391.43 samples/sec Loss 1.6168 LearningRate 0.0048 Epoch: 15 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:22,796-Speed 3394.49 samples/sec Loss 1.6047 LearningRate 0.0048 Epoch: 15 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:25,794-Speed 3416.19 samples/sec Loss 1.6301 LearningRate 0.0048 Epoch: 15 Global Step: 88880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:28,811-Speed 3393.94 samples/sec Loss 1.5607 LearningRate 0.0048 Epoch: 15 Global Step: 88890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:31,829-Speed 3395.31 samples/sec Loss 1.5583 LearningRate 0.0048 Epoch: 15 Global Step: 88900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:34,879-Speed 3357.80 samples/sec Loss 1.6130 LearningRate 0.0048 Epoch: 15 Global Step: 88910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:37,901-Speed 3389.52 samples/sec Loss 1.5157 LearningRate 0.0048 Epoch: 15 Global Step: 88920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:40,922-Speed 3389.59 samples/sec Loss 1.5620 LearningRate 0.0048 Epoch: 15 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:43,943-Speed 3390.33 samples/sec Loss 1.6359 LearningRate 0.0047 Epoch: 15 Global Step: 88940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:46,974-Speed 3379.63 samples/sec Loss 1.6093 LearningRate 0.0047 Epoch: 15 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:50,011-Speed 3372.15 samples/sec Loss 1.6498 LearningRate 0.0047 Epoch: 15 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:53,036-Speed 3385.98 samples/sec Loss 1.5594 LearningRate 0.0047 Epoch: 15 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 10:38:56,052-Speed 3396.51 samples/sec Loss 1.5632 LearningRate 0.0047 Epoch: 15 Global Step: 88980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:38:59,068-Speed 3395.99 samples/sec Loss 1.5704 LearningRate 0.0047 Epoch: 15 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:02,179-Speed 3291.83 samples/sec Loss 1.6079 LearningRate 0.0047 Epoch: 15 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:05,203-Speed 3387.51 samples/sec Loss 1.5333 LearningRate 0.0047 Epoch: 15 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:08,231-Speed 3382.54 samples/sec Loss 1.6557 LearningRate 0.0047 Epoch: 15 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:11,248-Speed 3394.20 samples/sec Loss 1.6109 LearningRate 0.0047 Epoch: 15 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:14,324-Speed 3330.26 samples/sec Loss 1.4790 LearningRate 0.0047 Epoch: 15 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:17,346-Speed 3389.32 samples/sec Loss 1.5912 LearningRate 0.0047 Epoch: 15 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:20,374-Speed 3381.76 samples/sec Loss 1.6479 LearningRate 0.0047 Epoch: 15 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:23,481-Speed 3296.89 samples/sec Loss 1.4959 LearningRate 0.0047 Epoch: 15 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:26,531-Speed 3358.71 samples/sec Loss 1.6438 LearningRate 0.0047 Epoch: 15 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:29,553-Speed 3389.64 samples/sec Loss 1.6775 LearningRate 0.0047 Epoch: 15 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:32,569-Speed 3395.52 samples/sec Loss 1.5353 LearningRate 0.0047 Epoch: 15 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:35,588-Speed 3392.33 samples/sec Loss 1.5693 LearningRate 0.0047 Epoch: 15 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:38,609-Speed 3390.63 samples/sec Loss 1.5834 LearningRate 0.0047 Epoch: 15 Global Step: 89120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:41,632-Speed 3388.23 samples/sec Loss 1.5653 LearningRate 0.0047 Epoch: 15 Global Step: 89130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:44,655-Speed 3387.91 samples/sec Loss 1.6013 LearningRate 0.0047 Epoch: 15 Global Step: 89140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:47,678-Speed 3388.38 samples/sec Loss 1.5908 LearningRate 0.0047 Epoch: 15 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:50,710-Speed 3377.41 samples/sec Loss 1.5830 LearningRate 0.0047 Epoch: 15 Global Step: 89160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:53,863-Speed 3249.15 samples/sec Loss 1.6577 LearningRate 0.0047 Epoch: 15 Global Step: 89170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:39:56,903-Speed 3369.53 samples/sec Loss 1.4824 LearningRate 0.0047 Epoch: 15 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:39:59,967-Speed 3342.94 samples/sec Loss 1.6118 LearningRate 0.0047 Epoch: 15 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 10:40:02,969-Speed 3411.46 samples/sec Loss 1.5786 LearningRate 0.0046 Epoch: 15 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:05,988-Speed 3393.10 samples/sec Loss 1.5660 LearningRate 0.0046 Epoch: 15 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:09,005-Speed 3394.29 samples/sec Loss 1.5625 LearningRate 0.0046 Epoch: 15 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:12,026-Speed 3389.86 samples/sec Loss 1.6323 LearningRate 0.0046 Epoch: 15 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:15,063-Speed 3372.70 samples/sec Loss 1.5471 LearningRate 0.0046 Epoch: 15 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:18,094-Speed 3380.52 samples/sec Loss 1.5823 LearningRate 0.0046 Epoch: 15 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:21,108-Speed 3397.27 samples/sec Loss 1.7151 LearningRate 0.0046 Epoch: 15 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:24,133-Speed 3386.73 samples/sec Loss 1.5951 LearningRate 0.0046 Epoch: 15 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:27,149-Speed 3396.13 samples/sec Loss 1.5857 LearningRate 0.0046 Epoch: 15 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:30,165-Speed 3395.53 samples/sec Loss 1.6314 LearningRate 0.0046 Epoch: 15 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:33,167-Speed 3411.96 samples/sec Loss 1.5156 LearningRate 0.0046 Epoch: 15 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:36,197-Speed 3379.86 samples/sec Loss 1.5014 LearningRate 0.0046 Epoch: 15 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:39,219-Speed 3389.11 samples/sec Loss 1.5376 LearningRate 0.0046 Epoch: 15 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:42,240-Speed 3391.26 samples/sec Loss 1.6009 LearningRate 0.0046 Epoch: 15 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:45,256-Speed 3395.85 samples/sec Loss 1.5456 LearningRate 0.0046 Epoch: 15 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:48,271-Speed 3397.75 samples/sec Loss 1.5160 LearningRate 0.0046 Epoch: 15 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:51,289-Speed 3393.41 samples/sec Loss 1.5434 LearningRate 0.0046 Epoch: 15 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:54,312-Speed 3388.89 samples/sec Loss 1.5408 LearningRate 0.0046 Epoch: 15 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:40:57,337-Speed 3385.11 samples/sec Loss 1.6257 LearningRate 0.0046 Epoch: 15 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:00,383-Speed 3363.15 samples/sec Loss 1.5900 LearningRate 0.0046 Epoch: 15 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:03,599-Speed 3184.88 samples/sec Loss 1.5349 LearningRate 0.0046 Epoch: 15 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:06,619-Speed 3391.26 samples/sec Loss 1.6348 LearningRate 0.0046 Epoch: 15 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:09,644-Speed 3385.24 samples/sec Loss 1.5433 LearningRate 0.0046 Epoch: 15 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:12,675-Speed 3379.15 samples/sec Loss 1.5869 LearningRate 0.0046 Epoch: 15 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:15,699-Speed 3387.54 samples/sec Loss 1.6618 LearningRate 0.0046 Epoch: 15 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:18,717-Speed 3393.63 samples/sec Loss 1.5495 LearningRate 0.0046 Epoch: 15 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 10:41:21,739-Speed 3389.97 samples/sec Loss 1.5060 LearningRate 0.0046 Epoch: 15 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:24,778-Speed 3369.62 samples/sec Loss 1.5827 LearningRate 0.0045 Epoch: 15 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:27,799-Speed 3391.23 samples/sec Loss 1.6015 LearningRate 0.0045 Epoch: 15 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:30,818-Speed 3391.73 samples/sec Loss 1.5580 LearningRate 0.0045 Epoch: 15 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:33,844-Speed 3385.47 samples/sec Loss 1.6199 LearningRate 0.0045 Epoch: 15 Global Step: 89500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:41:36,849-Speed 3407.77 samples/sec Loss 1.5462 LearningRate 0.0045 Epoch: 15 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:39,869-Speed 3392.07 samples/sec Loss 1.6482 LearningRate 0.0045 Epoch: 15 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:42,897-Speed 3382.52 samples/sec Loss 1.5367 LearningRate 0.0045 Epoch: 15 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:45,929-Speed 3377.78 samples/sec Loss 1.6248 LearningRate 0.0045 Epoch: 15 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:49,011-Speed 3323.49 samples/sec Loss 1.5882 LearningRate 0.0045 Epoch: 15 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:41:52,018-Speed 3406.81 samples/sec Loss 1.5237 LearningRate 0.0045 Epoch: 15 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:41:55,039-Speed 3389.69 samples/sec Loss 1.4964 LearningRate 0.0045 Epoch: 15 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:41:58,086-Speed 3362.31 samples/sec Loss 1.6260 LearningRate 0.0045 Epoch: 15 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:01,120-Speed 3375.07 samples/sec Loss 1.5017 LearningRate 0.0045 Epoch: 15 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:04,159-Speed 3371.00 samples/sec Loss 1.5960 LearningRate 0.0045 Epoch: 15 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:07,188-Speed 3381.47 samples/sec Loss 1.6329 LearningRate 0.0045 Epoch: 15 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:10,212-Speed 3387.49 samples/sec Loss 1.5154 LearningRate 0.0045 Epoch: 15 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:13,234-Speed 3388.70 samples/sec Loss 1.5075 LearningRate 0.0045 Epoch: 15 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:16,253-Speed 3392.59 samples/sec Loss 1.5227 LearningRate 0.0045 Epoch: 15 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:19,276-Speed 3388.46 samples/sec Loss 1.6283 LearningRate 0.0045 Epoch: 15 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:42:22,297-Speed 3389.92 samples/sec Loss 1.5515 LearningRate 0.0045 Epoch: 15 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:25,319-Speed 3389.84 samples/sec Loss 1.5765 LearningRate 0.0045 Epoch: 15 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:28,347-Speed 3382.44 samples/sec Loss 1.5833 LearningRate 0.0045 Epoch: 15 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:31,373-Speed 3385.06 samples/sec Loss 1.5694 LearningRate 0.0045 Epoch: 15 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:34,391-Speed 3393.29 samples/sec Loss 1.5323 LearningRate 0.0045 Epoch: 15 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:37,413-Speed 3389.10 samples/sec Loss 1.4869 LearningRate 0.0045 Epoch: 15 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:40,474-Speed 3346.73 samples/sec Loss 1.5506 LearningRate 0.0045 Epoch: 15 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:43,501-Speed 3383.93 samples/sec Loss 1.5518 LearningRate 0.0045 Epoch: 15 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:46,523-Speed 3389.05 samples/sec Loss 1.6140 LearningRate 0.0044 Epoch: 15 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:49,543-Speed 3390.88 samples/sec Loss 1.4716 LearningRate 0.0044 Epoch: 15 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:52,565-Speed 3389.84 samples/sec Loss 1.6395 LearningRate 0.0044 Epoch: 15 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:42:55,568-Speed 3410.02 samples/sec Loss 1.6047 LearningRate 0.0044 Epoch: 15 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:42:58,606-Speed 3372.30 samples/sec Loss 1.4694 LearningRate 0.0044 Epoch: 15 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:01,633-Speed 3383.47 samples/sec Loss 1.5588 LearningRate 0.0044 Epoch: 15 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:04,698-Speed 3341.75 samples/sec Loss 1.5536 LearningRate 0.0044 Epoch: 15 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:07,725-Speed 3383.77 samples/sec Loss 1.5227 LearningRate 0.0044 Epoch: 15 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:10,783-Speed 3349.10 samples/sec Loss 1.5074 LearningRate 0.0044 Epoch: 15 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:13,810-Speed 3384.17 samples/sec Loss 1.4583 LearningRate 0.0044 Epoch: 15 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:16,840-Speed 3379.46 samples/sec Loss 1.5786 LearningRate 0.0044 Epoch: 15 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:19,863-Speed 3388.76 samples/sec Loss 1.5845 LearningRate 0.0044 Epoch: 15 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:22,885-Speed 3389.33 samples/sec Loss 1.5939 LearningRate 0.0044 Epoch: 15 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:25,896-Speed 3401.34 samples/sec Loss 1.5513 LearningRate 0.0044 Epoch: 15 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:28,948-Speed 3355.79 samples/sec Loss 1.5559 LearningRate 0.0044 Epoch: 15 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:31,975-Speed 3384.25 samples/sec Loss 1.5765 LearningRate 0.0044 Epoch: 15 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:35,009-Speed 3376.43 samples/sec Loss 1.4988 LearningRate 0.0044 Epoch: 15 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:43:38,013-Speed 3409.31 samples/sec Loss 1.5299 LearningRate 0.0044 Epoch: 15 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:41,036-Speed 3387.51 samples/sec Loss 1.5585 LearningRate 0.0044 Epoch: 15 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:44,061-Speed 3386.10 samples/sec Loss 1.6002 LearningRate 0.0044 Epoch: 15 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:47,089-Speed 3382.32 samples/sec Loss 1.5384 LearningRate 0.0044 Epoch: 15 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:50,118-Speed 3381.93 samples/sec Loss 1.5846 LearningRate 0.0044 Epoch: 15 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:53,151-Speed 3376.48 samples/sec Loss 1.6068 LearningRate 0.0044 Epoch: 15 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:56,175-Speed 3386.77 samples/sec Loss 1.5463 LearningRate 0.0044 Epoch: 15 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:43:59,197-Speed 3389.69 samples/sec Loss 1.4849 LearningRate 0.0044 Epoch: 15 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:44:02,220-Speed 3388.17 samples/sec Loss 1.5819 LearningRate 0.0044 Epoch: 15 Global Step: 89990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:44:05,248-Speed 3382.71 samples/sec Loss 1.4688 LearningRate 0.0044 Epoch: 15 Global Step: 90000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:44:48,746-[lfw][90000]XNorm: 21.617160 Training: 2022-04-27 10:44:48,747-[lfw][90000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-04-27 10:44:48,747-[lfw][90000]Accuracy-Highest: 0.99817 Training: 2022-04-27 10:45:39,218-[cfp_fp][90000]XNorm: 21.238212 Training: 2022-04-27 10:45:39,219-[cfp_fp][90000]Accuracy-Flip: 0.98029+-0.00753 Training: 2022-04-27 10:45:39,219-[cfp_fp][90000]Accuracy-Highest: 0.98257 Training: 2022-04-27 10:46:22,799-[agedb_30][90000]XNorm: 22.096798 Training: 2022-04-27 10:46:22,800-[agedb_30][90000]Accuracy-Flip: 0.98017+-0.00751 Training: 2022-04-27 10:46:22,801-[agedb_30][90000]Accuracy-Highest: 0.98133 Training: 2022-04-27 10:46:25,814-Speed 72.85 samples/sec Loss 1.6027 LearningRate 0.0043 Epoch: 15 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:46:28,817-Speed 3410.53 samples/sec Loss 1.6096 LearningRate 0.0043 Epoch: 15 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:46:31,828-Speed 3401.96 samples/sec Loss 1.5861 LearningRate 0.0043 Epoch: 15 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:46:34,840-Speed 3400.36 samples/sec Loss 1.4919 LearningRate 0.0043 Epoch: 15 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:46:37,849-Speed 3402.80 samples/sec Loss 1.5338 LearningRate 0.0043 Epoch: 15 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:40,865-Speed 3396.85 samples/sec Loss 1.5723 LearningRate 0.0043 Epoch: 15 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:43,881-Speed 3395.97 samples/sec Loss 1.6864 LearningRate 0.0043 Epoch: 15 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:46,902-Speed 3389.95 samples/sec Loss 1.5314 LearningRate 0.0043 Epoch: 15 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:49,927-Speed 3386.19 samples/sec Loss 1.6422 LearningRate 0.0043 Epoch: 15 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:52,945-Speed 3393.80 samples/sec Loss 1.5587 LearningRate 0.0043 Epoch: 15 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:55,961-Speed 3396.57 samples/sec Loss 1.6120 LearningRate 0.0043 Epoch: 15 Global Step: 90110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:46:58,977-Speed 3396.25 samples/sec Loss 1.4965 LearningRate 0.0043 Epoch: 15 Global Step: 90120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:47:02,015-Speed 3370.74 samples/sec Loss 1.5256 LearningRate 0.0043 Epoch: 15 Global Step: 90130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:47:05,039-Speed 3387.62 samples/sec Loss 1.4954 LearningRate 0.0043 Epoch: 15 Global Step: 90140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:47:08,063-Speed 3387.06 samples/sec Loss 1.6125 LearningRate 0.0043 Epoch: 15 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:11,087-Speed 3386.52 samples/sec Loss 1.4547 LearningRate 0.0043 Epoch: 15 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:14,116-Speed 3381.56 samples/sec Loss 1.6163 LearningRate 0.0043 Epoch: 15 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:17,144-Speed 3382.56 samples/sec Loss 1.5189 LearningRate 0.0043 Epoch: 15 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:20,172-Speed 3382.53 samples/sec Loss 1.4897 LearningRate 0.0043 Epoch: 15 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:23,206-Speed 3376.31 samples/sec Loss 1.4590 LearningRate 0.0043 Epoch: 15 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:26,350-Speed 3257.51 samples/sec Loss 1.5721 LearningRate 0.0043 Epoch: 15 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:29,481-Speed 3270.90 samples/sec Loss 1.5345 LearningRate 0.0043 Epoch: 15 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:32,505-Speed 3387.18 samples/sec Loss 1.5185 LearningRate 0.0043 Epoch: 15 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:35,528-Speed 3388.65 samples/sec Loss 1.5751 LearningRate 0.0043 Epoch: 15 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:38,553-Speed 3385.50 samples/sec Loss 1.5056 LearningRate 0.0043 Epoch: 15 Global Step: 90250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:47:41,555-Speed 3412.04 samples/sec Loss 1.4816 LearningRate 0.0043 Epoch: 15 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:44,572-Speed 3394.30 samples/sec Loss 1.5779 LearningRate 0.0043 Epoch: 15 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:47,602-Speed 3381.38 samples/sec Loss 1.6437 LearningRate 0.0042 Epoch: 15 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:50,618-Speed 3395.90 samples/sec Loss 1.5296 LearningRate 0.0042 Epoch: 15 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:53,638-Speed 3391.91 samples/sec Loss 1.5329 LearningRate 0.0042 Epoch: 15 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:56,655-Speed 3394.28 samples/sec Loss 1.6609 LearningRate 0.0042 Epoch: 15 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:47:59,679-Speed 3387.27 samples/sec Loss 1.4752 LearningRate 0.0042 Epoch: 15 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:02,695-Speed 3395.48 samples/sec Loss 1.6082 LearningRate 0.0042 Epoch: 15 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:05,709-Speed 3398.22 samples/sec Loss 1.6235 LearningRate 0.0042 Epoch: 15 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:08,728-Speed 3393.13 samples/sec Loss 1.5991 LearningRate 0.0042 Epoch: 15 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:11,808-Speed 3324.68 samples/sec Loss 1.5314 LearningRate 0.0042 Epoch: 15 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:14,940-Speed 3271.43 samples/sec Loss 1.5396 LearningRate 0.0042 Epoch: 15 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:17,948-Speed 3404.93 samples/sec Loss 1.5293 LearningRate 0.0042 Epoch: 15 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:20,957-Speed 3403.51 samples/sec Loss 1.4683 LearningRate 0.0042 Epoch: 15 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:23,968-Speed 3402.58 samples/sec Loss 1.5035 LearningRate 0.0042 Epoch: 15 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:26,983-Speed 3396.86 samples/sec Loss 1.5586 LearningRate 0.0042 Epoch: 15 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:29,995-Speed 3399.83 samples/sec Loss 1.5668 LearningRate 0.0042 Epoch: 15 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:33,018-Speed 3388.46 samples/sec Loss 1.6052 LearningRate 0.0042 Epoch: 15 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:36,041-Speed 3387.72 samples/sec Loss 1.5386 LearningRate 0.0042 Epoch: 15 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:39,184-Speed 3258.98 samples/sec Loss 1.5378 LearningRate 0.0042 Epoch: 15 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:42,217-Speed 3377.26 samples/sec Loss 1.5975 LearningRate 0.0042 Epoch: 15 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:45,224-Speed 3406.47 samples/sec Loss 1.5723 LearningRate 0.0042 Epoch: 15 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:48,233-Speed 3403.79 samples/sec Loss 1.5486 LearningRate 0.0042 Epoch: 15 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:51,243-Speed 3402.42 samples/sec Loss 1.4823 LearningRate 0.0042 Epoch: 15 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:54,261-Speed 3394.10 samples/sec Loss 1.5988 LearningRate 0.0042 Epoch: 15 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:48:57,277-Speed 3396.42 samples/sec Loss 1.5390 LearningRate 0.0042 Epoch: 15 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:00,291-Speed 3397.70 samples/sec Loss 1.6033 LearningRate 0.0042 Epoch: 15 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:03,302-Speed 3402.26 samples/sec Loss 1.6021 LearningRate 0.0042 Epoch: 15 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:06,339-Speed 3372.69 samples/sec Loss 1.5335 LearningRate 0.0042 Epoch: 15 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:09,353-Speed 3397.86 samples/sec Loss 1.5270 LearningRate 0.0042 Epoch: 15 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:12,373-Speed 3391.14 samples/sec Loss 1.5674 LearningRate 0.0041 Epoch: 15 Global Step: 90560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:49:15,370-Speed 3417.47 samples/sec Loss 1.5404 LearningRate 0.0041 Epoch: 15 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:18,378-Speed 3405.57 samples/sec Loss 1.5688 LearningRate 0.0041 Epoch: 15 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:21,387-Speed 3404.45 samples/sec Loss 1.5961 LearningRate 0.0041 Epoch: 15 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:24,413-Speed 3384.99 samples/sec Loss 1.6255 LearningRate 0.0041 Epoch: 15 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:27,445-Speed 3377.14 samples/sec Loss 1.6120 LearningRate 0.0041 Epoch: 15 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:30,777-Speed 3073.52 samples/sec Loss 1.4914 LearningRate 0.0041 Epoch: 15 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:33,793-Speed 3396.90 samples/sec Loss 1.4572 LearningRate 0.0041 Epoch: 15 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:36,816-Speed 3387.45 samples/sec Loss 1.5048 LearningRate 0.0041 Epoch: 15 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:39,831-Speed 3397.38 samples/sec Loss 1.5210 LearningRate 0.0041 Epoch: 15 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:42,853-Speed 3389.23 samples/sec Loss 1.5270 LearningRate 0.0041 Epoch: 15 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:45,865-Speed 3401.29 samples/sec Loss 1.5486 LearningRate 0.0041 Epoch: 15 Global Step: 90670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:49:48,892-Speed 3383.58 samples/sec Loss 1.5824 LearningRate 0.0041 Epoch: 15 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:51,910-Speed 3393.91 samples/sec Loss 1.5434 LearningRate 0.0041 Epoch: 15 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:54,936-Speed 3384.43 samples/sec Loss 1.5871 LearningRate 0.0041 Epoch: 15 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:49:57,965-Speed 3381.61 samples/sec Loss 1.4611 LearningRate 0.0041 Epoch: 15 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:00,981-Speed 3395.68 samples/sec Loss 1.4632 LearningRate 0.0041 Epoch: 15 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:03,992-Speed 3402.46 samples/sec Loss 1.6067 LearningRate 0.0041 Epoch: 15 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:07,007-Speed 3397.42 samples/sec Loss 1.5176 LearningRate 0.0041 Epoch: 15 Global Step: 90740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:10,069-Speed 3345.06 samples/sec Loss 1.5961 LearningRate 0.0041 Epoch: 15 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:13,084-Speed 3396.87 samples/sec Loss 1.5335 LearningRate 0.0041 Epoch: 15 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:16,095-Speed 3401.92 samples/sec Loss 1.6518 LearningRate 0.0041 Epoch: 15 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:19,111-Speed 3395.71 samples/sec Loss 1.5320 LearningRate 0.0041 Epoch: 15 Global Step: 90780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:50:22,109-Speed 3417.62 samples/sec Loss 1.4586 LearningRate 0.0041 Epoch: 15 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:25,125-Speed 3395.04 samples/sec Loss 1.6333 LearningRate 0.0041 Epoch: 15 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:28,136-Speed 3401.44 samples/sec Loss 1.5230 LearningRate 0.0041 Epoch: 15 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:31,155-Speed 3393.11 samples/sec Loss 1.5588 LearningRate 0.0041 Epoch: 15 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:34,169-Speed 3398.27 samples/sec Loss 1.5486 LearningRate 0.0041 Epoch: 15 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:37,184-Speed 3397.83 samples/sec Loss 1.4725 LearningRate 0.0040 Epoch: 15 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:40,221-Speed 3371.57 samples/sec Loss 1.5950 LearningRate 0.0040 Epoch: 15 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:43,246-Speed 3386.26 samples/sec Loss 1.6629 LearningRate 0.0040 Epoch: 15 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:50:46,243-Speed 3417.31 samples/sec Loss 1.5737 LearningRate 0.0040 Epoch: 15 Global Step: 90870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:50:49,306-Speed 3343.44 samples/sec Loss 1.4966 LearningRate 0.0040 Epoch: 15 Global Step: 90880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:50:52,323-Speed 3395.29 samples/sec Loss 1.6125 LearningRate 0.0040 Epoch: 15 Global Step: 90890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:50:55,340-Speed 3394.66 samples/sec Loss 1.5931 LearningRate 0.0040 Epoch: 15 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:50:58,360-Speed 3392.18 samples/sec Loss 1.4636 LearningRate 0.0040 Epoch: 15 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:01,401-Speed 3367.54 samples/sec Loss 1.5840 LearningRate 0.0040 Epoch: 15 Global Step: 90920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:04,442-Speed 3369.27 samples/sec Loss 1.5816 LearningRate 0.0040 Epoch: 15 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:07,455-Speed 3399.14 samples/sec Loss 1.5637 LearningRate 0.0040 Epoch: 15 Global Step: 90940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:10,468-Speed 3398.55 samples/sec Loss 1.5739 LearningRate 0.0040 Epoch: 15 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:13,484-Speed 3396.39 samples/sec Loss 1.5486 LearningRate 0.0040 Epoch: 15 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 10:51:16,584-Speed 3304.17 samples/sec Loss 1.5311 LearningRate 0.0040 Epoch: 15 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:30,847-Speed 717.98 samples/sec Loss 1.4254 LearningRate 0.0040 Epoch: 16 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:33,866-Speed 3393.20 samples/sec Loss 1.1047 LearningRate 0.0040 Epoch: 16 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:36,886-Speed 3391.27 samples/sec Loss 1.1057 LearningRate 0.0040 Epoch: 16 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:39,920-Speed 3376.34 samples/sec Loss 1.1322 LearningRate 0.0040 Epoch: 16 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:42,944-Speed 3388.00 samples/sec Loss 1.1464 LearningRate 0.0040 Epoch: 16 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:45,955-Speed 3400.84 samples/sec Loss 1.0460 LearningRate 0.0040 Epoch: 16 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:48,986-Speed 3379.70 samples/sec Loss 1.1435 LearningRate 0.0040 Epoch: 16 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:52,056-Speed 3336.31 samples/sec Loss 1.0344 LearningRate 0.0040 Epoch: 16 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:55,079-Speed 3387.95 samples/sec Loss 1.1144 LearningRate 0.0040 Epoch: 16 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:51:58,082-Speed 3410.58 samples/sec Loss 1.0752 LearningRate 0.0040 Epoch: 16 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:01,124-Speed 3367.71 samples/sec Loss 1.1338 LearningRate 0.0040 Epoch: 16 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:04,142-Speed 3393.66 samples/sec Loss 1.0081 LearningRate 0.0040 Epoch: 16 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:07,168-Speed 3384.85 samples/sec Loss 1.1636 LearningRate 0.0040 Epoch: 16 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:10,188-Speed 3390.66 samples/sec Loss 1.0964 LearningRate 0.0040 Epoch: 16 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:13,207-Speed 3393.08 samples/sec Loss 1.0482 LearningRate 0.0039 Epoch: 16 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:16,235-Speed 3383.14 samples/sec Loss 1.1196 LearningRate 0.0039 Epoch: 16 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:19,247-Speed 3400.34 samples/sec Loss 1.0240 LearningRate 0.0039 Epoch: 16 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:22,279-Speed 3378.50 samples/sec Loss 1.0611 LearningRate 0.0039 Epoch: 16 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:25,299-Speed 3391.11 samples/sec Loss 1.0039 LearningRate 0.0039 Epoch: 16 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:28,324-Speed 3386.05 samples/sec Loss 1.1084 LearningRate 0.0039 Epoch: 16 Global Step: 91170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:52:31,340-Speed 3395.28 samples/sec Loss 1.1061 LearningRate 0.0039 Epoch: 16 Global Step: 91180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:52:34,375-Speed 3374.91 samples/sec Loss 1.0780 LearningRate 0.0039 Epoch: 16 Global Step: 91190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:52:37,396-Speed 3390.50 samples/sec Loss 1.0369 LearningRate 0.0039 Epoch: 16 Global Step: 91200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:52:40,400-Speed 3409.93 samples/sec Loss 1.1529 LearningRate 0.0039 Epoch: 16 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:43,430-Speed 3379.84 samples/sec Loss 1.0677 LearningRate 0.0039 Epoch: 16 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:46,452-Speed 3390.01 samples/sec Loss 1.0558 LearningRate 0.0039 Epoch: 16 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:49,477-Speed 3385.17 samples/sec Loss 1.1891 LearningRate 0.0039 Epoch: 16 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:52,503-Speed 3385.65 samples/sec Loss 1.1470 LearningRate 0.0039 Epoch: 16 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:55,541-Speed 3370.78 samples/sec Loss 1.0848 LearningRate 0.0039 Epoch: 16 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:52:58,572-Speed 3379.51 samples/sec Loss 1.0658 LearningRate 0.0039 Epoch: 16 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:01,601-Speed 3380.29 samples/sec Loss 1.0849 LearningRate 0.0039 Epoch: 16 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:04,632-Speed 3379.68 samples/sec Loss 1.0561 LearningRate 0.0039 Epoch: 16 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:07,662-Speed 3380.44 samples/sec Loss 1.0914 LearningRate 0.0039 Epoch: 16 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:10,717-Speed 3353.45 samples/sec Loss 1.0596 LearningRate 0.0039 Epoch: 16 Global Step: 91310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:53:13,722-Speed 3407.62 samples/sec Loss 1.2266 LearningRate 0.0039 Epoch: 16 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:16,762-Speed 3369.88 samples/sec Loss 1.0718 LearningRate 0.0039 Epoch: 16 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:19,790-Speed 3381.88 samples/sec Loss 1.0615 LearningRate 0.0039 Epoch: 16 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:22,916-Speed 3277.13 samples/sec Loss 1.0614 LearningRate 0.0039 Epoch: 16 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:26,052-Speed 3265.22 samples/sec Loss 1.1071 LearningRate 0.0039 Epoch: 16 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:29,223-Speed 3230.30 samples/sec Loss 1.0848 LearningRate 0.0039 Epoch: 16 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:32,251-Speed 3383.09 samples/sec Loss 1.0742 LearningRate 0.0039 Epoch: 16 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:35,278-Speed 3383.68 samples/sec Loss 1.0702 LearningRate 0.0039 Epoch: 16 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:38,300-Speed 3389.74 samples/sec Loss 1.0992 LearningRate 0.0039 Epoch: 16 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:41,330-Speed 3380.43 samples/sec Loss 1.1247 LearningRate 0.0038 Epoch: 16 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:44,356-Speed 3384.71 samples/sec Loss 1.1434 LearningRate 0.0038 Epoch: 16 Global Step: 91420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:53:47,375-Speed 3392.51 samples/sec Loss 1.1657 LearningRate 0.0038 Epoch: 16 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:50,420-Speed 3363.78 samples/sec Loss 1.0922 LearningRate 0.0038 Epoch: 16 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:53,489-Speed 3339.55 samples/sec Loss 1.0924 LearningRate 0.0038 Epoch: 16 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:56,511-Speed 3389.74 samples/sec Loss 1.0602 LearningRate 0.0038 Epoch: 16 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:53:59,533-Speed 3388.60 samples/sec Loss 1.0817 LearningRate 0.0038 Epoch: 16 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:02,577-Speed 3365.08 samples/sec Loss 1.1260 LearningRate 0.0038 Epoch: 16 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:05,647-Speed 3336.15 samples/sec Loss 1.0707 LearningRate 0.0038 Epoch: 16 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:08,666-Speed 3391.82 samples/sec Loss 1.1637 LearningRate 0.0038 Epoch: 16 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:11,697-Speed 3379.49 samples/sec Loss 1.1043 LearningRate 0.0038 Epoch: 16 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:14,732-Speed 3375.12 samples/sec Loss 1.0589 LearningRate 0.0038 Epoch: 16 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:17,768-Speed 3374.35 samples/sec Loss 1.1358 LearningRate 0.0038 Epoch: 16 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:20,791-Speed 3387.78 samples/sec Loss 1.1601 LearningRate 0.0038 Epoch: 16 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:23,818-Speed 3383.42 samples/sec Loss 1.1392 LearningRate 0.0038 Epoch: 16 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:26,840-Speed 3389.22 samples/sec Loss 1.0752 LearningRate 0.0038 Epoch: 16 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:29,862-Speed 3389.80 samples/sec Loss 1.0184 LearningRate 0.0038 Epoch: 16 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:32,884-Speed 3388.35 samples/sec Loss 1.1514 LearningRate 0.0038 Epoch: 16 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:35,909-Speed 3386.57 samples/sec Loss 1.1135 LearningRate 0.0038 Epoch: 16 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:38,966-Speed 3349.79 samples/sec Loss 1.1416 LearningRate 0.0038 Epoch: 16 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:41,983-Speed 3395.75 samples/sec Loss 1.1679 LearningRate 0.0038 Epoch: 16 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:45,003-Speed 3390.88 samples/sec Loss 1.1198 LearningRate 0.0038 Epoch: 16 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:48,028-Speed 3386.46 samples/sec Loss 1.0263 LearningRate 0.0038 Epoch: 16 Global Step: 91630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:54:51,034-Speed 3407.47 samples/sec Loss 1.1589 LearningRate 0.0038 Epoch: 16 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:54,056-Speed 3389.55 samples/sec Loss 1.2029 LearningRate 0.0038 Epoch: 16 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:54:57,079-Speed 3387.56 samples/sec Loss 1.0880 LearningRate 0.0038 Epoch: 16 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:00,107-Speed 3382.70 samples/sec Loss 1.1423 LearningRate 0.0038 Epoch: 16 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:03,135-Speed 3382.55 samples/sec Loss 1.0955 LearningRate 0.0038 Epoch: 16 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:06,156-Speed 3390.00 samples/sec Loss 1.1831 LearningRate 0.0038 Epoch: 16 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:09,182-Speed 3384.67 samples/sec Loss 1.1776 LearningRate 0.0037 Epoch: 16 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:12,206-Speed 3388.05 samples/sec Loss 1.0931 LearningRate 0.0037 Epoch: 16 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:15,230-Speed 3386.63 samples/sec Loss 1.1108 LearningRate 0.0037 Epoch: 16 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:18,253-Speed 3388.19 samples/sec Loss 1.1379 LearningRate 0.0037 Epoch: 16 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:21,278-Speed 3385.74 samples/sec Loss 1.0672 LearningRate 0.0037 Epoch: 16 Global Step: 91740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:55:24,289-Speed 3402.03 samples/sec Loss 1.1473 LearningRate 0.0037 Epoch: 16 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:27,320-Speed 3378.90 samples/sec Loss 1.1107 LearningRate 0.0037 Epoch: 16 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:30,356-Speed 3373.58 samples/sec Loss 1.0758 LearningRate 0.0037 Epoch: 16 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:33,388-Speed 3378.29 samples/sec Loss 1.0879 LearningRate 0.0037 Epoch: 16 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:36,424-Speed 3373.00 samples/sec Loss 1.1325 LearningRate 0.0037 Epoch: 16 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:39,460-Speed 3373.75 samples/sec Loss 1.1329 LearningRate 0.0037 Epoch: 16 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:42,488-Speed 3383.09 samples/sec Loss 1.1909 LearningRate 0.0037 Epoch: 16 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:45,508-Speed 3391.85 samples/sec Loss 1.1345 LearningRate 0.0037 Epoch: 16 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:48,534-Speed 3384.28 samples/sec Loss 1.1129 LearningRate 0.0037 Epoch: 16 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:51,613-Speed 3327.25 samples/sec Loss 1.1250 LearningRate 0.0037 Epoch: 16 Global Step: 91840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:55:54,653-Speed 3369.03 samples/sec Loss 1.0962 LearningRate 0.0037 Epoch: 16 Global Step: 91850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 10:55:57,669-Speed 3396.42 samples/sec Loss 1.1588 LearningRate 0.0037 Epoch: 16 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:00,699-Speed 3379.66 samples/sec Loss 1.1393 LearningRate 0.0037 Epoch: 16 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:03,725-Speed 3384.15 samples/sec Loss 1.0714 LearningRate 0.0037 Epoch: 16 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:06,748-Speed 3388.81 samples/sec Loss 1.0963 LearningRate 0.0037 Epoch: 16 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:09,771-Speed 3388.38 samples/sec Loss 1.1166 LearningRate 0.0037 Epoch: 16 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:12,800-Speed 3381.12 samples/sec Loss 1.1368 LearningRate 0.0037 Epoch: 16 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:15,824-Speed 3387.33 samples/sec Loss 1.1970 LearningRate 0.0037 Epoch: 16 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:18,850-Speed 3384.79 samples/sec Loss 1.0742 LearningRate 0.0037 Epoch: 16 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:21,877-Speed 3383.45 samples/sec Loss 1.1281 LearningRate 0.0037 Epoch: 16 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:24,899-Speed 3389.29 samples/sec Loss 1.0851 LearningRate 0.0037 Epoch: 16 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:27,909-Speed 3402.51 samples/sec Loss 1.1561 LearningRate 0.0037 Epoch: 16 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:30,933-Speed 3387.61 samples/sec Loss 1.1022 LearningRate 0.0037 Epoch: 16 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:34,006-Speed 3332.76 samples/sec Loss 1.1016 LearningRate 0.0037 Epoch: 16 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:37,061-Speed 3352.81 samples/sec Loss 1.1326 LearningRate 0.0037 Epoch: 16 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:56:40,094-Speed 3377.51 samples/sec Loss 1.1607 LearningRate 0.0036 Epoch: 16 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:57:23,411-[lfw][92000]XNorm: 22.736918 Training: 2022-04-27 10:57:23,411-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-27 10:57:23,412-[lfw][92000]Accuracy-Highest: 0.99817 Training: 2022-04-27 10:58:13,799-[cfp_fp][92000]XNorm: 22.136400 Training: 2022-04-27 10:58:13,800-[cfp_fp][92000]Accuracy-Flip: 0.98229+-0.00633 Training: 2022-04-27 10:58:13,800-[cfp_fp][92000]Accuracy-Highest: 0.98257 Training: 2022-04-27 10:58:57,140-[agedb_30][92000]XNorm: 22.949669 Training: 2022-04-27 10:58:57,140-[agedb_30][92000]Accuracy-Flip: 0.98033+-0.00809 Training: 2022-04-27 10:58:57,141-[agedb_30][92000]Accuracy-Highest: 0.98133 Training: 2022-04-27 10:59:00,158-Speed 73.11 samples/sec Loss 1.1203 LearningRate 0.0036 Epoch: 16 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:03,166-Speed 3404.86 samples/sec Loss 1.1125 LearningRate 0.0036 Epoch: 16 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:06,169-Speed 3410.56 samples/sec Loss 1.1243 LearningRate 0.0036 Epoch: 16 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:09,179-Speed 3402.85 samples/sec Loss 1.1897 LearningRate 0.0036 Epoch: 16 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:12,187-Speed 3404.49 samples/sec Loss 1.1279 LearningRate 0.0036 Epoch: 16 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:15,174-Speed 3429.64 samples/sec Loss 1.1026 LearningRate 0.0036 Epoch: 16 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:18,184-Speed 3401.92 samples/sec Loss 1.1377 LearningRate 0.0036 Epoch: 16 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:21,199-Speed 3397.77 samples/sec Loss 1.0748 LearningRate 0.0036 Epoch: 16 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:24,215-Speed 3396.10 samples/sec Loss 1.0761 LearningRate 0.0036 Epoch: 16 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:27,267-Speed 3355.98 samples/sec Loss 1.1425 LearningRate 0.0036 Epoch: 16 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:30,309-Speed 3366.63 samples/sec Loss 1.1362 LearningRate 0.0036 Epoch: 16 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:33,338-Speed 3381.38 samples/sec Loss 1.1583 LearningRate 0.0036 Epoch: 16 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:36,358-Speed 3391.92 samples/sec Loss 1.2198 LearningRate 0.0036 Epoch: 16 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:39,378-Speed 3391.51 samples/sec Loss 1.1577 LearningRate 0.0036 Epoch: 16 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:42,398-Speed 3391.25 samples/sec Loss 1.1897 LearningRate 0.0036 Epoch: 16 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:45,409-Speed 3402.01 samples/sec Loss 1.1276 LearningRate 0.0036 Epoch: 16 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:48,440-Speed 3379.10 samples/sec Loss 1.1908 LearningRate 0.0036 Epoch: 16 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:51,471-Speed 3379.17 samples/sec Loss 1.0984 LearningRate 0.0036 Epoch: 16 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:54,497-Speed 3385.19 samples/sec Loss 1.1584 LearningRate 0.0036 Epoch: 16 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 10:59:57,522-Speed 3385.51 samples/sec Loss 1.0561 LearningRate 0.0036 Epoch: 16 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:00,593-Speed 3334.76 samples/sec Loss 1.1178 LearningRate 0.0036 Epoch: 16 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:03,623-Speed 3380.79 samples/sec Loss 1.1952 LearningRate 0.0036 Epoch: 16 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:06,688-Speed 3342.30 samples/sec Loss 1.0512 LearningRate 0.0036 Epoch: 16 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:09,711-Speed 3388.51 samples/sec Loss 1.1371 LearningRate 0.0036 Epoch: 16 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:12,744-Speed 3376.37 samples/sec Loss 1.1759 LearningRate 0.0036 Epoch: 16 Global Step: 92250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:15,773-Speed 3381.65 samples/sec Loss 1.1096 LearningRate 0.0036 Epoch: 16 Global Step: 92260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:00:18,789-Speed 3395.85 samples/sec Loss 1.1165 LearningRate 0.0036 Epoch: 16 Global Step: 92270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:00:21,806-Speed 3394.86 samples/sec Loss 1.0967 LearningRate 0.0036 Epoch: 16 Global Step: 92280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:00:24,811-Speed 3408.70 samples/sec Loss 1.1042 LearningRate 0.0036 Epoch: 16 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:27,876-Speed 3342.20 samples/sec Loss 1.1077 LearningRate 0.0035 Epoch: 16 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:31,031-Speed 3246.13 samples/sec Loss 1.1988 LearningRate 0.0035 Epoch: 16 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:34,137-Speed 3297.43 samples/sec Loss 1.1256 LearningRate 0.0035 Epoch: 16 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:37,155-Speed 3393.36 samples/sec Loss 1.2016 LearningRate 0.0035 Epoch: 16 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:40,217-Speed 3345.37 samples/sec Loss 1.0581 LearningRate 0.0035 Epoch: 16 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:43,253-Speed 3373.84 samples/sec Loss 1.0629 LearningRate 0.0035 Epoch: 16 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:46,296-Speed 3365.63 samples/sec Loss 1.1597 LearningRate 0.0035 Epoch: 16 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:49,306-Speed 3402.84 samples/sec Loss 1.1920 LearningRate 0.0035 Epoch: 16 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:52,313-Speed 3406.59 samples/sec Loss 1.1635 LearningRate 0.0035 Epoch: 16 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:55,304-Speed 3424.88 samples/sec Loss 1.2393 LearningRate 0.0035 Epoch: 16 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:00:58,327-Speed 3388.03 samples/sec Loss 1.1245 LearningRate 0.0035 Epoch: 16 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:01,333-Speed 3406.33 samples/sec Loss 1.1639 LearningRate 0.0035 Epoch: 16 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:04,348-Speed 3397.93 samples/sec Loss 1.2112 LearningRate 0.0035 Epoch: 16 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:07,359-Speed 3401.89 samples/sec Loss 1.1122 LearningRate 0.0035 Epoch: 16 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:10,373-Speed 3399.03 samples/sec Loss 1.1592 LearningRate 0.0035 Epoch: 16 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:13,383-Speed 3402.77 samples/sec Loss 1.1922 LearningRate 0.0035 Epoch: 16 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:16,395-Speed 3400.37 samples/sec Loss 1.1998 LearningRate 0.0035 Epoch: 16 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:19,402-Speed 3405.97 samples/sec Loss 1.1460 LearningRate 0.0035 Epoch: 16 Global Step: 92470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:22,409-Speed 3406.38 samples/sec Loss 1.1524 LearningRate 0.0035 Epoch: 16 Global Step: 92480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:25,420-Speed 3401.35 samples/sec Loss 1.1905 LearningRate 0.0035 Epoch: 16 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:28,430-Speed 3402.40 samples/sec Loss 1.1831 LearningRate 0.0035 Epoch: 16 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:31,436-Speed 3408.25 samples/sec Loss 1.1015 LearningRate 0.0035 Epoch: 16 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:34,442-Speed 3406.53 samples/sec Loss 1.1083 LearningRate 0.0035 Epoch: 16 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:37,449-Speed 3406.96 samples/sec Loss 1.0978 LearningRate 0.0035 Epoch: 16 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:01:40,457-Speed 3405.33 samples/sec Loss 1.2127 LearningRate 0.0035 Epoch: 16 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:43,474-Speed 3394.68 samples/sec Loss 1.2073 LearningRate 0.0035 Epoch: 16 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:46,496-Speed 3389.18 samples/sec Loss 1.0667 LearningRate 0.0035 Epoch: 16 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:49,506-Speed 3402.23 samples/sec Loss 1.2215 LearningRate 0.0035 Epoch: 16 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:52,530-Speed 3387.51 samples/sec Loss 1.1864 LearningRate 0.0035 Epoch: 16 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:55,541-Speed 3401.55 samples/sec Loss 1.1413 LearningRate 0.0035 Epoch: 16 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:01:58,552-Speed 3401.42 samples/sec Loss 1.1275 LearningRate 0.0034 Epoch: 16 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:01,562-Speed 3402.60 samples/sec Loss 1.1353 LearningRate 0.0034 Epoch: 16 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:04,717-Speed 3247.05 samples/sec Loss 1.1314 LearningRate 0.0034 Epoch: 16 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:07,750-Speed 3376.22 samples/sec Loss 1.1972 LearningRate 0.0034 Epoch: 16 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:10,776-Speed 3385.50 samples/sec Loss 1.1046 LearningRate 0.0034 Epoch: 16 Global Step: 92640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:02:13,773-Speed 3417.37 samples/sec Loss 1.1859 LearningRate 0.0034 Epoch: 16 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:16,929-Speed 3245.32 samples/sec Loss 1.1623 LearningRate 0.0034 Epoch: 16 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:20,005-Speed 3329.65 samples/sec Loss 1.1639 LearningRate 0.0034 Epoch: 16 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:23,022-Speed 3395.34 samples/sec Loss 1.0923 LearningRate 0.0034 Epoch: 16 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:26,060-Speed 3371.75 samples/sec Loss 1.1775 LearningRate 0.0034 Epoch: 16 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:29,069-Speed 3402.91 samples/sec Loss 1.1520 LearningRate 0.0034 Epoch: 16 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:32,079-Speed 3402.66 samples/sec Loss 1.2058 LearningRate 0.0034 Epoch: 16 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:35,095-Speed 3396.69 samples/sec Loss 1.2262 LearningRate 0.0034 Epoch: 16 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:38,130-Speed 3374.18 samples/sec Loss 1.1903 LearningRate 0.0034 Epoch: 16 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:41,198-Speed 3339.35 samples/sec Loss 1.1672 LearningRate 0.0034 Epoch: 16 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:44,212-Speed 3398.30 samples/sec Loss 1.0412 LearningRate 0.0034 Epoch: 16 Global Step: 92750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:02:47,212-Speed 3413.86 samples/sec Loss 1.1082 LearningRate 0.0034 Epoch: 16 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:50,234-Speed 3388.81 samples/sec Loss 1.1578 LearningRate 0.0034 Epoch: 16 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:53,247-Speed 3400.09 samples/sec Loss 1.1064 LearningRate 0.0034 Epoch: 16 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:02:56,235-Speed 3427.04 samples/sec Loss 1.2022 LearningRate 0.0034 Epoch: 16 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:02:59,269-Speed 3376.08 samples/sec Loss 1.2267 LearningRate 0.0034 Epoch: 16 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:02,304-Speed 3374.88 samples/sec Loss 1.1595 LearningRate 0.0034 Epoch: 16 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:05,320-Speed 3396.64 samples/sec Loss 1.2650 LearningRate 0.0034 Epoch: 16 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:08,331-Speed 3401.50 samples/sec Loss 1.1962 LearningRate 0.0034 Epoch: 16 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:11,348-Speed 3394.95 samples/sec Loss 1.1154 LearningRate 0.0034 Epoch: 16 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:14,392-Speed 3364.58 samples/sec Loss 1.1360 LearningRate 0.0034 Epoch: 16 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:17,417-Speed 3385.97 samples/sec Loss 1.1970 LearningRate 0.0034 Epoch: 16 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:20,434-Speed 3394.66 samples/sec Loss 1.1981 LearningRate 0.0034 Epoch: 16 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:23,560-Speed 3276.32 samples/sec Loss 1.1619 LearningRate 0.0034 Epoch: 16 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:26,649-Speed 3315.57 samples/sec Loss 1.1496 LearningRate 0.0034 Epoch: 16 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:03:29,666-Speed 3396.98 samples/sec Loss 1.1351 LearningRate 0.0034 Epoch: 16 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:03:32,663-Speed 3417.48 samples/sec Loss 1.1591 LearningRate 0.0033 Epoch: 16 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:35,678-Speed 3397.69 samples/sec Loss 1.1254 LearningRate 0.0033 Epoch: 16 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:38,693-Speed 3396.94 samples/sec Loss 1.1391 LearningRate 0.0033 Epoch: 16 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:41,706-Speed 3399.46 samples/sec Loss 1.1728 LearningRate 0.0033 Epoch: 16 Global Step: 92940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:44,722-Speed 3395.62 samples/sec Loss 1.1919 LearningRate 0.0033 Epoch: 16 Global Step: 92950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:47,737-Speed 3396.89 samples/sec Loss 1.1281 LearningRate 0.0033 Epoch: 16 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:50,761-Speed 3387.28 samples/sec Loss 1.2219 LearningRate 0.0033 Epoch: 16 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:53,810-Speed 3359.04 samples/sec Loss 1.2001 LearningRate 0.0033 Epoch: 16 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:56,842-Speed 3378.18 samples/sec Loss 1.1672 LearningRate 0.0033 Epoch: 16 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:03:59,860-Speed 3394.48 samples/sec Loss 1.1979 LearningRate 0.0033 Epoch: 16 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:02,875-Speed 3397.61 samples/sec Loss 1.1582 LearningRate 0.0033 Epoch: 16 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:05,891-Speed 3395.00 samples/sec Loss 1.1120 LearningRate 0.0033 Epoch: 16 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:08,911-Speed 3391.83 samples/sec Loss 1.2155 LearningRate 0.0033 Epoch: 16 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:11,929-Speed 3393.30 samples/sec Loss 1.1922 LearningRate 0.0033 Epoch: 16 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:14,950-Speed 3390.93 samples/sec Loss 1.3117 LearningRate 0.0033 Epoch: 16 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:18,013-Speed 3343.52 samples/sec Loss 1.1946 LearningRate 0.0033 Epoch: 16 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:21,029-Speed 3396.35 samples/sec Loss 1.1727 LearningRate 0.0033 Epoch: 16 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:24,053-Speed 3387.27 samples/sec Loss 1.2025 LearningRate 0.0033 Epoch: 16 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:27,070-Speed 3395.18 samples/sec Loss 1.1528 LearningRate 0.0033 Epoch: 16 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:30,092-Speed 3388.41 samples/sec Loss 1.1964 LearningRate 0.0033 Epoch: 16 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:33,110-Speed 3393.97 samples/sec Loss 1.1587 LearningRate 0.0033 Epoch: 16 Global Step: 93110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:04:36,111-Speed 3412.77 samples/sec Loss 1.1114 LearningRate 0.0033 Epoch: 16 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:39,142-Speed 3379.83 samples/sec Loss 1.1946 LearningRate 0.0033 Epoch: 16 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:04:42,204-Speed 3344.80 samples/sec Loss 1.2389 LearningRate 0.0033 Epoch: 16 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:45,224-Speed 3390.77 samples/sec Loss 1.2368 LearningRate 0.0033 Epoch: 16 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:48,248-Speed 3387.12 samples/sec Loss 1.1057 LearningRate 0.0033 Epoch: 16 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:51,264-Speed 3396.57 samples/sec Loss 1.1612 LearningRate 0.0033 Epoch: 16 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:54,282-Speed 3393.68 samples/sec Loss 1.1740 LearningRate 0.0033 Epoch: 16 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:04:57,300-Speed 3393.80 samples/sec Loss 1.1890 LearningRate 0.0033 Epoch: 16 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:05:00,319-Speed 3392.89 samples/sec Loss 1.2285 LearningRate 0.0033 Epoch: 16 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:05:03,338-Speed 3393.21 samples/sec Loss 1.1814 LearningRate 0.0033 Epoch: 16 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:05:06,353-Speed 3397.02 samples/sec Loss 1.2443 LearningRate 0.0032 Epoch: 16 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:05:09,369-Speed 3395.76 samples/sec Loss 1.1362 LearningRate 0.0032 Epoch: 16 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:05:12,390-Speed 3391.01 samples/sec Loss 1.1410 LearningRate 0.0032 Epoch: 16 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:15,405-Speed 3396.54 samples/sec Loss 1.2042 LearningRate 0.0032 Epoch: 16 Global Step: 93250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:18,427-Speed 3389.14 samples/sec Loss 1.2221 LearningRate 0.0032 Epoch: 16 Global Step: 93260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:21,447-Speed 3391.96 samples/sec Loss 1.2105 LearningRate 0.0032 Epoch: 16 Global Step: 93270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:24,462-Speed 3396.80 samples/sec Loss 1.1058 LearningRate 0.0032 Epoch: 16 Global Step: 93280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:27,483-Speed 3390.21 samples/sec Loss 1.1311 LearningRate 0.0032 Epoch: 16 Global Step: 93290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:30,514-Speed 3379.78 samples/sec Loss 1.2003 LearningRate 0.0032 Epoch: 16 Global Step: 93300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:33,534-Speed 3391.48 samples/sec Loss 1.1816 LearningRate 0.0032 Epoch: 16 Global Step: 93310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:36,555-Speed 3390.46 samples/sec Loss 1.1126 LearningRate 0.0032 Epoch: 16 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:39,572-Speed 3395.11 samples/sec Loss 1.1254 LearningRate 0.0032 Epoch: 16 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:42,587-Speed 3396.88 samples/sec Loss 1.1456 LearningRate 0.0032 Epoch: 16 Global Step: 93340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:05:45,592-Speed 3408.19 samples/sec Loss 1.1376 LearningRate 0.0032 Epoch: 16 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:48,608-Speed 3396.62 samples/sec Loss 1.1631 LearningRate 0.0032 Epoch: 16 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:51,637-Speed 3381.17 samples/sec Loss 1.1307 LearningRate 0.0032 Epoch: 16 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:54,665-Speed 3382.70 samples/sec Loss 1.2190 LearningRate 0.0032 Epoch: 16 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:05:57,681-Speed 3396.17 samples/sec Loss 1.1387 LearningRate 0.0032 Epoch: 16 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:00,713-Speed 3378.16 samples/sec Loss 1.1531 LearningRate 0.0032 Epoch: 16 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:03,735-Speed 3389.21 samples/sec Loss 1.1166 LearningRate 0.0032 Epoch: 16 Global Step: 93410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:06,752-Speed 3395.19 samples/sec Loss 1.1848 LearningRate 0.0032 Epoch: 16 Global Step: 93420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:09,780-Speed 3382.35 samples/sec Loss 1.1387 LearningRate 0.0032 Epoch: 16 Global Step: 93430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:12,801-Speed 3390.40 samples/sec Loss 1.1328 LearningRate 0.0032 Epoch: 16 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:15,809-Speed 3404.33 samples/sec Loss 1.1708 LearningRate 0.0032 Epoch: 16 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:18,832-Speed 3388.05 samples/sec Loss 1.2067 LearningRate 0.0032 Epoch: 16 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:21,861-Speed 3381.60 samples/sec Loss 1.1958 LearningRate 0.0032 Epoch: 16 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:24,988-Speed 3276.14 samples/sec Loss 1.2336 LearningRate 0.0032 Epoch: 16 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:28,009-Speed 3390.74 samples/sec Loss 1.0817 LearningRate 0.0032 Epoch: 16 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:31,033-Speed 3386.41 samples/sec Loss 1.1647 LearningRate 0.0032 Epoch: 16 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:34,062-Speed 3381.74 samples/sec Loss 1.0974 LearningRate 0.0032 Epoch: 16 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:37,087-Speed 3385.95 samples/sec Loss 1.1858 LearningRate 0.0032 Epoch: 16 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:40,109-Speed 3388.44 samples/sec Loss 1.2567 LearningRate 0.0032 Epoch: 16 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:43,133-Speed 3387.53 samples/sec Loss 1.1120 LearningRate 0.0031 Epoch: 16 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:46,156-Speed 3387.95 samples/sec Loss 1.1018 LearningRate 0.0031 Epoch: 16 Global Step: 93550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:06:49,168-Speed 3400.77 samples/sec Loss 1.2222 LearningRate 0.0031 Epoch: 16 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:52,191-Speed 3388.33 samples/sec Loss 1.1836 LearningRate 0.0031 Epoch: 16 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:55,210-Speed 3393.23 samples/sec Loss 1.0979 LearningRate 0.0031 Epoch: 16 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:06:58,259-Speed 3359.24 samples/sec Loss 1.0484 LearningRate 0.0031 Epoch: 16 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:07:01,282-Speed 3388.23 samples/sec Loss 1.1506 LearningRate 0.0031 Epoch: 16 Global Step: 93600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:04,434-Speed 3250.21 samples/sec Loss 1.1678 LearningRate 0.0031 Epoch: 16 Global Step: 93610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:07,462-Speed 3382.88 samples/sec Loss 1.1775 LearningRate 0.0031 Epoch: 16 Global Step: 93620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:10,489-Speed 3382.80 samples/sec Loss 1.2001 LearningRate 0.0031 Epoch: 16 Global Step: 93630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:13,582-Speed 3311.69 samples/sec Loss 1.1493 LearningRate 0.0031 Epoch: 16 Global Step: 93640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:16,603-Speed 3391.24 samples/sec Loss 1.1140 LearningRate 0.0031 Epoch: 16 Global Step: 93650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:19,626-Speed 3388.19 samples/sec Loss 1.1720 LearningRate 0.0031 Epoch: 16 Global Step: 93660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:22,645-Speed 3394.87 samples/sec Loss 1.1196 LearningRate 0.0031 Epoch: 16 Global Step: 93670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:25,731-Speed 3318.38 samples/sec Loss 1.1856 LearningRate 0.0031 Epoch: 16 Global Step: 93680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:28,765-Speed 3375.91 samples/sec Loss 1.2144 LearningRate 0.0031 Epoch: 16 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:31,801-Speed 3373.81 samples/sec Loss 1.1932 LearningRate 0.0031 Epoch: 16 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:07:34,939-Speed 3263.73 samples/sec Loss 1.1954 LearningRate 0.0031 Epoch: 16 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:07:37,957-Speed 3394.18 samples/sec Loss 1.1537 LearningRate 0.0031 Epoch: 16 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:40,983-Speed 3385.00 samples/sec Loss 1.1744 LearningRate 0.0031 Epoch: 16 Global Step: 93730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:44,010-Speed 3383.69 samples/sec Loss 1.1755 LearningRate 0.0031 Epoch: 16 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:47,036-Speed 3384.08 samples/sec Loss 1.1780 LearningRate 0.0031 Epoch: 16 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:50,088-Speed 3357.03 samples/sec Loss 1.2176 LearningRate 0.0031 Epoch: 16 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:53,199-Speed 3291.86 samples/sec Loss 1.1355 LearningRate 0.0031 Epoch: 16 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:56,217-Speed 3394.19 samples/sec Loss 1.1370 LearningRate 0.0031 Epoch: 16 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:07:59,245-Speed 3382.19 samples/sec Loss 1.2009 LearningRate 0.0031 Epoch: 16 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:02,288-Speed 3365.86 samples/sec Loss 1.1789 LearningRate 0.0031 Epoch: 16 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:05,316-Speed 3382.32 samples/sec Loss 1.1049 LearningRate 0.0031 Epoch: 16 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:08,337-Speed 3390.81 samples/sec Loss 1.2026 LearningRate 0.0031 Epoch: 16 Global Step: 93820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:11,359-Speed 3389.50 samples/sec Loss 1.2202 LearningRate 0.0031 Epoch: 16 Global Step: 93830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:14,386-Speed 3383.24 samples/sec Loss 1.1852 LearningRate 0.0031 Epoch: 16 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:17,413-Speed 3383.40 samples/sec Loss 1.1864 LearningRate 0.0031 Epoch: 16 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:20,433-Speed 3391.26 samples/sec Loss 1.1210 LearningRate 0.0030 Epoch: 16 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:23,456-Speed 3388.95 samples/sec Loss 1.1676 LearningRate 0.0030 Epoch: 16 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:26,485-Speed 3381.20 samples/sec Loss 1.1928 LearningRate 0.0030 Epoch: 16 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:29,514-Speed 3381.43 samples/sec Loss 1.2186 LearningRate 0.0030 Epoch: 16 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:32,551-Speed 3372.92 samples/sec Loss 1.1794 LearningRate 0.0030 Epoch: 16 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:35,672-Speed 3281.70 samples/sec Loss 1.2145 LearningRate 0.0030 Epoch: 16 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:38,728-Speed 3351.30 samples/sec Loss 1.2014 LearningRate 0.0030 Epoch: 16 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:41,754-Speed 3384.33 samples/sec Loss 1.1532 LearningRate 0.0030 Epoch: 16 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:44,780-Speed 3385.80 samples/sec Loss 1.1907 LearningRate 0.0030 Epoch: 16 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:47,803-Speed 3387.50 samples/sec Loss 1.2013 LearningRate 0.0030 Epoch: 16 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:08:50,822-Speed 3393.23 samples/sec Loss 1.2338 LearningRate 0.0030 Epoch: 16 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:53,899-Speed 3327.81 samples/sec Loss 1.1824 LearningRate 0.0030 Epoch: 16 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:56,928-Speed 3381.43 samples/sec Loss 1.1851 LearningRate 0.0030 Epoch: 16 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:08:59,987-Speed 3349.11 samples/sec Loss 1.2136 LearningRate 0.0030 Epoch: 16 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:09:03,012-Speed 3386.07 samples/sec Loss 1.1268 LearningRate 0.0030 Epoch: 16 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:09:46,430-[lfw][94000]XNorm: 21.774760 Training: 2022-04-27 11:09:46,431-[lfw][94000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-27 11:09:46,431-[lfw][94000]Accuracy-Highest: 0.99817 Training: 2022-04-27 11:10:36,982-[cfp_fp][94000]XNorm: 21.304880 Training: 2022-04-27 11:10:36,983-[cfp_fp][94000]Accuracy-Flip: 0.98386+-0.00670 Training: 2022-04-27 11:10:36,983-[cfp_fp][94000]Accuracy-Highest: 0.98386 Training: 2022-04-27 11:11:20,791-[agedb_30][94000]XNorm: 22.298897 Training: 2022-04-27 11:11:20,791-[agedb_30][94000]Accuracy-Flip: 0.98133+-0.00849 Training: 2022-04-27 11:11:20,792-[agedb_30][94000]Accuracy-Highest: 0.98133 Training: 2022-04-27 11:11:23,815-Speed 72.73 samples/sec Loss 1.1243 LearningRate 0.0030 Epoch: 16 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:26,829-Speed 3397.59 samples/sec Loss 1.1874 LearningRate 0.0030 Epoch: 16 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:29,840-Speed 3402.55 samples/sec Loss 1.1513 LearningRate 0.0030 Epoch: 16 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:32,848-Speed 3405.28 samples/sec Loss 1.0855 LearningRate 0.0030 Epoch: 16 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:35,871-Speed 3387.65 samples/sec Loss 1.1185 LearningRate 0.0030 Epoch: 16 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:38,885-Speed 3398.85 samples/sec Loss 1.2069 LearningRate 0.0030 Epoch: 16 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:11:41,883-Speed 3416.47 samples/sec Loss 1.0700 LearningRate 0.0030 Epoch: 16 Global Step: 94070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:44,907-Speed 3386.91 samples/sec Loss 1.0960 LearningRate 0.0030 Epoch: 16 Global Step: 94080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:47,925-Speed 3393.45 samples/sec Loss 1.1690 LearningRate 0.0030 Epoch: 16 Global Step: 94090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:50,943-Speed 3394.16 samples/sec Loss 1.2167 LearningRate 0.0030 Epoch: 16 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:54,010-Speed 3340.00 samples/sec Loss 1.0889 LearningRate 0.0030 Epoch: 16 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:11:57,022-Speed 3400.32 samples/sec Loss 1.1402 LearningRate 0.0030 Epoch: 16 Global Step: 94120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:00,035-Speed 3398.71 samples/sec Loss 1.1065 LearningRate 0.0030 Epoch: 16 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:03,051-Speed 3397.12 samples/sec Loss 1.1517 LearningRate 0.0030 Epoch: 16 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:06,077-Speed 3384.42 samples/sec Loss 1.1602 LearningRate 0.0030 Epoch: 16 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:09,091-Speed 3398.59 samples/sec Loss 1.1298 LearningRate 0.0030 Epoch: 16 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:12,109-Speed 3393.39 samples/sec Loss 1.0889 LearningRate 0.0030 Epoch: 16 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:15,142-Speed 3377.63 samples/sec Loss 1.1267 LearningRate 0.0030 Epoch: 16 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:18,172-Speed 3380.34 samples/sec Loss 1.2176 LearningRate 0.0029 Epoch: 16 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:21,212-Speed 3368.77 samples/sec Loss 1.1535 LearningRate 0.0029 Epoch: 16 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:24,274-Speed 3345.11 samples/sec Loss 1.0837 LearningRate 0.0029 Epoch: 16 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:27,295-Speed 3390.16 samples/sec Loss 1.2037 LearningRate 0.0029 Epoch: 16 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:30,314-Speed 3392.84 samples/sec Loss 1.2052 LearningRate 0.0029 Epoch: 16 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:33,330-Speed 3396.16 samples/sec Loss 1.1710 LearningRate 0.0029 Epoch: 16 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:12:36,341-Speed 3401.80 samples/sec Loss 1.2269 LearningRate 0.0029 Epoch: 16 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:39,363-Speed 3388.83 samples/sec Loss 1.1556 LearningRate 0.0029 Epoch: 16 Global Step: 94260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:42,383-Speed 3391.67 samples/sec Loss 1.1632 LearningRate 0.0029 Epoch: 16 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:45,404-Speed 3390.03 samples/sec Loss 1.1930 LearningRate 0.0029 Epoch: 16 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:48,429-Speed 3386.11 samples/sec Loss 1.1865 LearningRate 0.0029 Epoch: 16 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:51,454-Speed 3386.38 samples/sec Loss 1.1033 LearningRate 0.0029 Epoch: 16 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:54,484-Speed 3380.57 samples/sec Loss 1.2663 LearningRate 0.0029 Epoch: 16 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:12:57,506-Speed 3389.13 samples/sec Loss 1.2414 LearningRate 0.0029 Epoch: 16 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:00,529-Speed 3387.11 samples/sec Loss 1.1621 LearningRate 0.0029 Epoch: 16 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:03,554-Speed 3386.72 samples/sec Loss 1.1576 LearningRate 0.0029 Epoch: 16 Global Step: 94340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:06,575-Speed 3389.39 samples/sec Loss 1.1477 LearningRate 0.0029 Epoch: 16 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:09,608-Speed 3377.37 samples/sec Loss 1.1269 LearningRate 0.0029 Epoch: 16 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:12,631-Speed 3388.35 samples/sec Loss 1.2605 LearningRate 0.0029 Epoch: 16 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:15,650-Speed 3393.46 samples/sec Loss 1.2093 LearningRate 0.0029 Epoch: 16 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:18,671-Speed 3389.45 samples/sec Loss 1.1738 LearningRate 0.0029 Epoch: 16 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:21,692-Speed 3390.33 samples/sec Loss 1.2203 LearningRate 0.0029 Epoch: 16 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:24,724-Speed 3378.41 samples/sec Loss 1.2269 LearningRate 0.0029 Epoch: 16 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:27,742-Speed 3393.33 samples/sec Loss 1.1232 LearningRate 0.0029 Epoch: 16 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:30,760-Speed 3393.67 samples/sec Loss 1.1818 LearningRate 0.0029 Epoch: 16 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:33,778-Speed 3393.79 samples/sec Loss 1.2082 LearningRate 0.0029 Epoch: 16 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:13:36,806-Speed 3383.31 samples/sec Loss 1.2430 LearningRate 0.0029 Epoch: 16 Global Step: 94450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:13:39,787-Speed 3435.70 samples/sec Loss 1.1780 LearningRate 0.0029 Epoch: 16 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:42,804-Speed 3394.82 samples/sec Loss 1.1930 LearningRate 0.0029 Epoch: 16 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:45,832-Speed 3382.65 samples/sec Loss 1.2915 LearningRate 0.0029 Epoch: 16 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:48,856-Speed 3387.58 samples/sec Loss 1.2803 LearningRate 0.0029 Epoch: 16 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:51,933-Speed 3328.86 samples/sec Loss 1.0839 LearningRate 0.0029 Epoch: 16 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:54,948-Speed 3396.55 samples/sec Loss 1.1607 LearningRate 0.0029 Epoch: 16 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:13:57,969-Speed 3390.04 samples/sec Loss 1.1311 LearningRate 0.0029 Epoch: 16 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:00,990-Speed 3390.57 samples/sec Loss 1.1640 LearningRate 0.0028 Epoch: 16 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:04,008-Speed 3394.20 samples/sec Loss 1.1677 LearningRate 0.0028 Epoch: 16 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:07,029-Speed 3390.63 samples/sec Loss 1.1921 LearningRate 0.0028 Epoch: 16 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:10,049-Speed 3391.29 samples/sec Loss 1.2988 LearningRate 0.0028 Epoch: 16 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:13,191-Speed 3260.10 samples/sec Loss 1.1410 LearningRate 0.0028 Epoch: 16 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:16,208-Speed 3394.33 samples/sec Loss 1.1311 LearningRate 0.0028 Epoch: 16 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:19,223-Speed 3396.65 samples/sec Loss 1.1712 LearningRate 0.0028 Epoch: 16 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:22,239-Speed 3396.12 samples/sec Loss 1.1729 LearningRate 0.0028 Epoch: 16 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:25,257-Speed 3394.07 samples/sec Loss 1.2323 LearningRate 0.0028 Epoch: 16 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:28,286-Speed 3381.34 samples/sec Loss 1.1112 LearningRate 0.0028 Epoch: 16 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:31,309-Speed 3388.63 samples/sec Loss 1.1430 LearningRate 0.0028 Epoch: 16 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:14:34,316-Speed 3406.01 samples/sec Loss 1.0799 LearningRate 0.0028 Epoch: 16 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:37,359-Speed 3366.29 samples/sec Loss 1.1497 LearningRate 0.0028 Epoch: 16 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:40,484-Speed 3277.20 samples/sec Loss 1.1541 LearningRate 0.0028 Epoch: 16 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:43,504-Speed 3391.06 samples/sec Loss 1.1613 LearningRate 0.0028 Epoch: 16 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:46,538-Speed 3376.66 samples/sec Loss 1.2218 LearningRate 0.0028 Epoch: 16 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:49,561-Speed 3388.01 samples/sec Loss 1.1757 LearningRate 0.0028 Epoch: 16 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:52,589-Speed 3382.65 samples/sec Loss 1.1310 LearningRate 0.0028 Epoch: 16 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:55,611-Speed 3388.77 samples/sec Loss 1.1157 LearningRate 0.0028 Epoch: 16 Global Step: 94710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:14:58,630-Speed 3392.63 samples/sec Loss 1.1763 LearningRate 0.0028 Epoch: 16 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:01,656-Speed 3384.72 samples/sec Loss 1.1600 LearningRate 0.0028 Epoch: 16 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:04,749-Speed 3311.87 samples/sec Loss 1.1840 LearningRate 0.0028 Epoch: 16 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:07,776-Speed 3384.03 samples/sec Loss 1.1353 LearningRate 0.0028 Epoch: 16 Global Step: 94750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:10,860-Speed 3320.31 samples/sec Loss 1.1508 LearningRate 0.0028 Epoch: 16 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:13,967-Speed 3296.45 samples/sec Loss 1.1987 LearningRate 0.0028 Epoch: 16 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:16,999-Speed 3377.89 samples/sec Loss 1.1397 LearningRate 0.0028 Epoch: 16 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:20,027-Speed 3383.41 samples/sec Loss 1.1703 LearningRate 0.0028 Epoch: 16 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:23,053-Speed 3384.78 samples/sec Loss 1.1701 LearningRate 0.0028 Epoch: 16 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:26,075-Speed 3388.87 samples/sec Loss 1.1397 LearningRate 0.0028 Epoch: 16 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:29,106-Speed 3379.87 samples/sec Loss 1.1616 LearningRate 0.0028 Epoch: 16 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:32,127-Speed 3389.98 samples/sec Loss 1.0947 LearningRate 0.0028 Epoch: 16 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:35,151-Speed 3387.55 samples/sec Loss 1.1761 LearningRate 0.0028 Epoch: 16 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:38,176-Speed 3385.00 samples/sec Loss 1.1706 LearningRate 0.0028 Epoch: 16 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:41,203-Speed 3384.21 samples/sec Loss 1.1937 LearningRate 0.0028 Epoch: 16 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:15:44,231-Speed 3382.09 samples/sec Loss 1.0785 LearningRate 0.0027 Epoch: 16 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:47,252-Speed 3391.02 samples/sec Loss 1.2193 LearningRate 0.0027 Epoch: 16 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:50,314-Speed 3344.72 samples/sec Loss 1.1965 LearningRate 0.0027 Epoch: 16 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:53,383-Speed 3338.11 samples/sec Loss 1.1593 LearningRate 0.0027 Epoch: 16 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:56,408-Speed 3385.68 samples/sec Loss 1.1505 LearningRate 0.0027 Epoch: 16 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:15:59,444-Speed 3373.40 samples/sec Loss 1.1225 LearningRate 0.0027 Epoch: 16 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:02,466-Speed 3389.66 samples/sec Loss 1.1081 LearningRate 0.0027 Epoch: 16 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:05,501-Speed 3374.11 samples/sec Loss 1.1669 LearningRate 0.0027 Epoch: 16 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:08,547-Speed 3362.43 samples/sec Loss 1.1618 LearningRate 0.0027 Epoch: 16 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:11,576-Speed 3381.66 samples/sec Loss 1.1357 LearningRate 0.0027 Epoch: 16 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:14,609-Speed 3376.95 samples/sec Loss 1.1324 LearningRate 0.0027 Epoch: 16 Global Step: 94970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:16:17,617-Speed 3405.13 samples/sec Loss 1.1562 LearningRate 0.0027 Epoch: 16 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:20,639-Speed 3388.96 samples/sec Loss 1.1572 LearningRate 0.0027 Epoch: 16 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:23,665-Speed 3385.84 samples/sec Loss 1.1280 LearningRate 0.0027 Epoch: 16 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:26,696-Speed 3378.80 samples/sec Loss 1.1663 LearningRate 0.0027 Epoch: 16 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:29,719-Speed 3388.03 samples/sec Loss 1.1092 LearningRate 0.0027 Epoch: 16 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:32,743-Speed 3386.59 samples/sec Loss 1.1525 LearningRate 0.0027 Epoch: 16 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:35,767-Speed 3386.87 samples/sec Loss 1.0696 LearningRate 0.0027 Epoch: 16 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:38,889-Speed 3280.77 samples/sec Loss 1.1271 LearningRate 0.0027 Epoch: 16 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:41,964-Speed 3330.94 samples/sec Loss 1.0765 LearningRate 0.0027 Epoch: 16 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:44,995-Speed 3380.12 samples/sec Loss 1.2175 LearningRate 0.0027 Epoch: 16 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:48,004-Speed 3403.26 samples/sec Loss 1.0925 LearningRate 0.0027 Epoch: 16 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:51,028-Speed 3387.99 samples/sec Loss 1.2361 LearningRate 0.0027 Epoch: 16 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:54,049-Speed 3389.82 samples/sec Loss 1.1223 LearningRate 0.0027 Epoch: 16 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:16:57,079-Speed 3380.05 samples/sec Loss 1.0909 LearningRate 0.0027 Epoch: 16 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:17:00,107-Speed 3382.61 samples/sec Loss 1.0441 LearningRate 0.0027 Epoch: 16 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:17:03,119-Speed 3401.53 samples/sec Loss 1.1325 LearningRate 0.0027 Epoch: 16 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:06,144-Speed 3385.66 samples/sec Loss 1.1655 LearningRate 0.0027 Epoch: 16 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:09,170-Speed 3384.39 samples/sec Loss 1.2057 LearningRate 0.0027 Epoch: 16 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:12,196-Speed 3385.37 samples/sec Loss 1.1540 LearningRate 0.0027 Epoch: 16 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:15,228-Speed 3378.35 samples/sec Loss 1.0425 LearningRate 0.0027 Epoch: 16 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:18,253-Speed 3385.52 samples/sec Loss 1.1939 LearningRate 0.0027 Epoch: 16 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:21,283-Speed 3380.64 samples/sec Loss 1.2048 LearningRate 0.0027 Epoch: 16 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:24,307-Speed 3386.94 samples/sec Loss 1.1918 LearningRate 0.0027 Epoch: 16 Global Step: 95200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:27,333-Speed 3386.10 samples/sec Loss 1.1149 LearningRate 0.0026 Epoch: 16 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:30,358-Speed 3385.44 samples/sec Loss 1.2532 LearningRate 0.0026 Epoch: 16 Global Step: 95220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:33,393-Speed 3374.69 samples/sec Loss 1.1238 LearningRate 0.0026 Epoch: 16 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:17:36,403-Speed 3402.44 samples/sec Loss 1.1176 LearningRate 0.0026 Epoch: 16 Global Step: 95240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:39,435-Speed 3378.86 samples/sec Loss 1.2607 LearningRate 0.0026 Epoch: 16 Global Step: 95250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:42,459-Speed 3386.44 samples/sec Loss 1.1576 LearningRate 0.0026 Epoch: 16 Global Step: 95260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:45,495-Speed 3373.86 samples/sec Loss 1.1532 LearningRate 0.0026 Epoch: 16 Global Step: 95270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:48,524-Speed 3381.76 samples/sec Loss 1.0530 LearningRate 0.0026 Epoch: 16 Global Step: 95280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:51,557-Speed 3377.09 samples/sec Loss 1.1882 LearningRate 0.0026 Epoch: 16 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:54,587-Speed 3379.83 samples/sec Loss 1.1207 LearningRate 0.0026 Epoch: 16 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:17:57,665-Speed 3327.89 samples/sec Loss 1.1434 LearningRate 0.0026 Epoch: 16 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:18:00,692-Speed 3383.68 samples/sec Loss 1.1472 LearningRate 0.0026 Epoch: 16 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:18:03,719-Speed 3382.90 samples/sec Loss 1.1300 LearningRate 0.0026 Epoch: 16 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:18:06,766-Speed 3362.17 samples/sec Loss 1.1681 LearningRate 0.0026 Epoch: 16 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:09,893-Speed 3275.65 samples/sec Loss 1.1550 LearningRate 0.0026 Epoch: 16 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:12,946-Speed 3354.37 samples/sec Loss 1.1003 LearningRate 0.0026 Epoch: 16 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:15,980-Speed 3376.33 samples/sec Loss 1.1793 LearningRate 0.0026 Epoch: 16 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:19,008-Speed 3382.29 samples/sec Loss 1.0572 LearningRate 0.0026 Epoch: 16 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:22,040-Speed 3377.65 samples/sec Loss 1.1558 LearningRate 0.0026 Epoch: 16 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:25,078-Speed 3372.26 samples/sec Loss 1.0856 LearningRate 0.0026 Epoch: 16 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:28,110-Speed 3378.40 samples/sec Loss 1.1043 LearningRate 0.0026 Epoch: 16 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:31,133-Speed 3387.57 samples/sec Loss 1.1950 LearningRate 0.0026 Epoch: 16 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:34,172-Speed 3370.11 samples/sec Loss 1.1293 LearningRate 0.0026 Epoch: 16 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:37,185-Speed 3399.94 samples/sec Loss 1.0737 LearningRate 0.0026 Epoch: 16 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:40,216-Speed 3379.64 samples/sec Loss 1.2071 LearningRate 0.0026 Epoch: 16 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:43,242-Speed 3385.24 samples/sec Loss 1.2554 LearningRate 0.0026 Epoch: 16 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:46,267-Speed 3385.78 samples/sec Loss 1.1660 LearningRate 0.0026 Epoch: 16 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:49,292-Speed 3385.00 samples/sec Loss 1.1161 LearningRate 0.0026 Epoch: 16 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:52,317-Speed 3385.59 samples/sec Loss 1.0973 LearningRate 0.0026 Epoch: 16 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:55,350-Speed 3378.07 samples/sec Loss 1.1405 LearningRate 0.0026 Epoch: 16 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:18:58,394-Speed 3364.70 samples/sec Loss 1.1138 LearningRate 0.0026 Epoch: 16 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:01,432-Speed 3370.94 samples/sec Loss 1.1622 LearningRate 0.0026 Epoch: 16 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:04,460-Speed 3382.86 samples/sec Loss 1.2153 LearningRate 0.0026 Epoch: 16 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:07,485-Speed 3386.17 samples/sec Loss 1.1577 LearningRate 0.0026 Epoch: 16 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:19:10,498-Speed 3398.65 samples/sec Loss 1.2353 LearningRate 0.0026 Epoch: 16 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:13,523-Speed 3386.98 samples/sec Loss 1.1515 LearningRate 0.0026 Epoch: 16 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:16,565-Speed 3365.97 samples/sec Loss 1.1498 LearningRate 0.0025 Epoch: 16 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:19,586-Speed 3390.45 samples/sec Loss 1.1691 LearningRate 0.0025 Epoch: 16 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:22,617-Speed 3379.53 samples/sec Loss 1.0668 LearningRate 0.0025 Epoch: 16 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:25,655-Speed 3371.03 samples/sec Loss 1.1376 LearningRate 0.0025 Epoch: 16 Global Step: 95600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:28,713-Speed 3349.40 samples/sec Loss 1.1850 LearningRate 0.0025 Epoch: 16 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:31,736-Speed 3389.43 samples/sec Loss 1.1036 LearningRate 0.0025 Epoch: 16 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:34,764-Speed 3381.65 samples/sec Loss 1.2407 LearningRate 0.0025 Epoch: 16 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:37,788-Speed 3387.71 samples/sec Loss 1.0612 LearningRate 0.0025 Epoch: 16 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:40,808-Speed 3391.46 samples/sec Loss 1.0820 LearningRate 0.0025 Epoch: 16 Global Step: 95650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:19:43,828-Speed 3390.93 samples/sec Loss 1.0818 LearningRate 0.0025 Epoch: 16 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:46,853-Speed 3385.94 samples/sec Loss 1.1522 LearningRate 0.0025 Epoch: 16 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:49,876-Speed 3387.69 samples/sec Loss 1.2030 LearningRate 0.0025 Epoch: 16 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:52,905-Speed 3382.27 samples/sec Loss 1.1790 LearningRate 0.0025 Epoch: 16 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:55,924-Speed 3392.50 samples/sec Loss 1.0790 LearningRate 0.0025 Epoch: 16 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:19:58,953-Speed 3381.48 samples/sec Loss 1.1992 LearningRate 0.0025 Epoch: 16 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:01,984-Speed 3379.58 samples/sec Loss 1.1750 LearningRate 0.0025 Epoch: 16 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:05,121-Speed 3264.84 samples/sec Loss 1.1459 LearningRate 0.0025 Epoch: 16 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:08,153-Speed 3378.24 samples/sec Loss 1.1201 LearningRate 0.0025 Epoch: 16 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:11,190-Speed 3372.12 samples/sec Loss 1.1500 LearningRate 0.0025 Epoch: 16 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:14,219-Speed 3381.71 samples/sec Loss 1.1437 LearningRate 0.0025 Epoch: 16 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:17,245-Speed 3384.80 samples/sec Loss 1.2002 LearningRate 0.0025 Epoch: 16 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:20,278-Speed 3376.59 samples/sec Loss 1.1507 LearningRate 0.0025 Epoch: 16 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:23,314-Speed 3374.12 samples/sec Loss 1.1594 LearningRate 0.0025 Epoch: 16 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:26,344-Speed 3380.55 samples/sec Loss 1.0676 LearningRate 0.0025 Epoch: 16 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:29,480-Speed 3265.33 samples/sec Loss 1.2086 LearningRate 0.0025 Epoch: 16 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:32,508-Speed 3382.59 samples/sec Loss 1.1265 LearningRate 0.0025 Epoch: 16 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:35,609-Speed 3302.73 samples/sec Loss 1.2075 LearningRate 0.0025 Epoch: 16 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:38,653-Speed 3365.28 samples/sec Loss 1.1239 LearningRate 0.0025 Epoch: 16 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:41,681-Speed 3382.66 samples/sec Loss 1.2465 LearningRate 0.0025 Epoch: 16 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:44,695-Speed 3398.19 samples/sec Loss 1.0938 LearningRate 0.0025 Epoch: 16 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:47,726-Speed 3379.18 samples/sec Loss 1.1082 LearningRate 0.0025 Epoch: 16 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:50,758-Speed 3378.29 samples/sec Loss 1.2203 LearningRate 0.0025 Epoch: 16 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:53,788-Speed 3380.34 samples/sec Loss 1.1023 LearningRate 0.0025 Epoch: 16 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:56,815-Speed 3384.21 samples/sec Loss 1.1488 LearningRate 0.0025 Epoch: 16 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:20:59,843-Speed 3381.81 samples/sec Loss 1.1815 LearningRate 0.0025 Epoch: 16 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:21:02,879-Speed 3374.25 samples/sec Loss 1.1002 LearningRate 0.0025 Epoch: 16 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:21:05,897-Speed 3393.68 samples/sec Loss 1.2135 LearningRate 0.0024 Epoch: 16 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:08,944-Speed 3361.26 samples/sec Loss 1.1731 LearningRate 0.0024 Epoch: 16 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:11,986-Speed 3366.57 samples/sec Loss 1.1789 LearningRate 0.0024 Epoch: 16 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:15,029-Speed 3365.63 samples/sec Loss 1.1989 LearningRate 0.0024 Epoch: 16 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:18,059-Speed 3380.54 samples/sec Loss 1.1200 LearningRate 0.0024 Epoch: 16 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:21,088-Speed 3381.59 samples/sec Loss 1.1181 LearningRate 0.0024 Epoch: 16 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:24,133-Speed 3363.75 samples/sec Loss 1.0332 LearningRate 0.0024 Epoch: 16 Global Step: 95990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:21:27,165-Speed 3378.09 samples/sec Loss 1.0504 LearningRate 0.0024 Epoch: 16 Global Step: 96000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:22:10,523-[lfw][96000]XNorm: 22.444819 Training: 2022-04-27 11:22:10,524-[lfw][96000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-27 11:22:10,524-[lfw][96000]Accuracy-Highest: 0.99817 Training: 2022-04-27 11:23:01,139-[cfp_fp][96000]XNorm: 21.908401 Training: 2022-04-27 11:23:01,140-[cfp_fp][96000]Accuracy-Flip: 0.98286+-0.00688 Training: 2022-04-27 11:23:01,140-[cfp_fp][96000]Accuracy-Highest: 0.98386 Training: 2022-04-27 11:23:44,485-[agedb_30][96000]XNorm: 22.459557 Training: 2022-04-27 11:23:44,486-[agedb_30][96000]Accuracy-Flip: 0.98233+-0.00779 Training: 2022-04-27 11:23:44,486-[agedb_30][96000]Accuracy-Highest: 0.98233 Training: 2022-04-27 11:23:47,507-Speed 72.97 samples/sec Loss 1.1703 LearningRate 0.0024 Epoch: 16 Global Step: 96010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:23:50,533-Speed 3384.75 samples/sec Loss 1.1087 LearningRate 0.0024 Epoch: 16 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:23:53,539-Speed 3408.11 samples/sec Loss 1.1369 LearningRate 0.0024 Epoch: 16 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:23:56,547-Speed 3404.04 samples/sec Loss 1.1461 LearningRate 0.0024 Epoch: 16 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:23:59,608-Speed 3346.57 samples/sec Loss 1.1152 LearningRate 0.0024 Epoch: 16 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:02,624-Speed 3396.10 samples/sec Loss 1.1299 LearningRate 0.0024 Epoch: 16 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:05,643-Speed 3393.03 samples/sec Loss 1.1687 LearningRate 0.0024 Epoch: 16 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:08,659-Speed 3395.58 samples/sec Loss 1.0954 LearningRate 0.0024 Epoch: 16 Global Step: 96080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:11,676-Speed 3395.06 samples/sec Loss 1.1969 LearningRate 0.0024 Epoch: 16 Global Step: 96090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:14,699-Speed 3388.08 samples/sec Loss 1.0688 LearningRate 0.0024 Epoch: 16 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:17,720-Speed 3390.04 samples/sec Loss 1.1047 LearningRate 0.0024 Epoch: 16 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:20,740-Speed 3391.49 samples/sec Loss 1.1751 LearningRate 0.0024 Epoch: 16 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:23,773-Speed 3376.75 samples/sec Loss 1.1876 LearningRate 0.0024 Epoch: 16 Global Step: 96130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:24:26,774-Speed 3413.62 samples/sec Loss 1.0949 LearningRate 0.0024 Epoch: 16 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:29,792-Speed 3393.80 samples/sec Loss 1.0637 LearningRate 0.0024 Epoch: 16 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:32,808-Speed 3396.21 samples/sec Loss 1.1670 LearningRate 0.0024 Epoch: 16 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:35,825-Speed 3394.59 samples/sec Loss 1.1146 LearningRate 0.0024 Epoch: 16 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:38,840-Speed 3397.04 samples/sec Loss 1.0924 LearningRate 0.0024 Epoch: 16 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:41,849-Speed 3404.18 samples/sec Loss 1.1564 LearningRate 0.0024 Epoch: 16 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:44,861-Speed 3400.46 samples/sec Loss 1.1612 LearningRate 0.0024 Epoch: 16 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:47,871-Speed 3402.25 samples/sec Loss 1.1603 LearningRate 0.0024 Epoch: 16 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:50,887-Speed 3396.10 samples/sec Loss 1.1539 LearningRate 0.0024 Epoch: 16 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:53,913-Speed 3385.20 samples/sec Loss 1.1165 LearningRate 0.0024 Epoch: 16 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:24:56,934-Speed 3390.27 samples/sec Loss 1.1168 LearningRate 0.0024 Epoch: 16 Global Step: 96240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:24:59,924-Speed 3425.16 samples/sec Loss 1.1487 LearningRate 0.0024 Epoch: 16 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:25:02,955-Speed 3379.98 samples/sec Loss 1.1368 LearningRate 0.0024 Epoch: 16 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:25:05,953-Speed 3416.59 samples/sec Loss 1.1639 LearningRate 0.0024 Epoch: 16 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:08,968-Speed 3396.65 samples/sec Loss 1.0394 LearningRate 0.0024 Epoch: 16 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:11,985-Speed 3395.24 samples/sec Loss 1.1419 LearningRate 0.0023 Epoch: 16 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:14,996-Speed 3401.79 samples/sec Loss 1.2004 LearningRate 0.0023 Epoch: 16 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:18,010-Speed 3397.68 samples/sec Loss 1.1062 LearningRate 0.0023 Epoch: 16 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:21,025-Speed 3397.27 samples/sec Loss 1.1485 LearningRate 0.0023 Epoch: 16 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:24,046-Speed 3390.58 samples/sec Loss 1.1507 LearningRate 0.0023 Epoch: 16 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:27,061-Speed 3396.90 samples/sec Loss 1.1785 LearningRate 0.0023 Epoch: 16 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:30,072-Speed 3401.23 samples/sec Loss 1.1854 LearningRate 0.0023 Epoch: 16 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:33,102-Speed 3381.20 samples/sec Loss 1.1627 LearningRate 0.0023 Epoch: 16 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:36,101-Speed 3414.95 samples/sec Loss 1.1423 LearningRate 0.0023 Epoch: 16 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:39,117-Speed 3396.05 samples/sec Loss 1.1711 LearningRate 0.0023 Epoch: 16 Global Step: 96380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:42,136-Speed 3393.18 samples/sec Loss 1.1325 LearningRate 0.0023 Epoch: 16 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:45,152-Speed 3395.54 samples/sec Loss 1.1214 LearningRate 0.0023 Epoch: 16 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:48,162-Speed 3402.32 samples/sec Loss 1.1313 LearningRate 0.0023 Epoch: 16 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:51,176-Speed 3398.28 samples/sec Loss 1.1620 LearningRate 0.0023 Epoch: 16 Global Step: 96420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:54,189-Speed 3399.25 samples/sec Loss 1.0531 LearningRate 0.0023 Epoch: 16 Global Step: 96430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:25:57,203-Speed 3399.30 samples/sec Loss 1.1078 LearningRate 0.0023 Epoch: 16 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:00,241-Speed 3370.86 samples/sec Loss 1.1697 LearningRate 0.0023 Epoch: 16 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:03,254-Speed 3400.10 samples/sec Loss 1.2284 LearningRate 0.0023 Epoch: 16 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:06,265-Speed 3401.34 samples/sec Loss 1.1204 LearningRate 0.0023 Epoch: 16 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:09,288-Speed 3388.33 samples/sec Loss 1.1268 LearningRate 0.0023 Epoch: 16 Global Step: 96480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:12,417-Speed 3273.04 samples/sec Loss 1.1115 LearningRate 0.0023 Epoch: 16 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:15,460-Speed 3366.09 samples/sec Loss 1.1871 LearningRate 0.0023 Epoch: 16 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:18,476-Speed 3395.53 samples/sec Loss 1.1425 LearningRate 0.0023 Epoch: 16 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:21,488-Speed 3400.55 samples/sec Loss 1.1595 LearningRate 0.0023 Epoch: 16 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:24,501-Speed 3399.81 samples/sec Loss 1.1843 LearningRate 0.0023 Epoch: 16 Global Step: 96530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:27,519-Speed 3393.87 samples/sec Loss 1.1374 LearningRate 0.0023 Epoch: 16 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:30,535-Speed 3396.07 samples/sec Loss 1.1351 LearningRate 0.0023 Epoch: 16 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:33,549-Speed 3398.47 samples/sec Loss 1.1188 LearningRate 0.0023 Epoch: 16 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:36,565-Speed 3395.32 samples/sec Loss 1.0857 LearningRate 0.0023 Epoch: 16 Global Step: 96570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:26:39,658-Speed 3311.87 samples/sec Loss 1.1543 LearningRate 0.0023 Epoch: 16 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:26:42,665-Speed 3406.64 samples/sec Loss 1.1386 LearningRate 0.0023 Epoch: 16 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:45,691-Speed 3384.05 samples/sec Loss 1.1846 LearningRate 0.0023 Epoch: 16 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:48,733-Speed 3367.02 samples/sec Loss 1.1319 LearningRate 0.0023 Epoch: 16 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:51,753-Speed 3391.38 samples/sec Loss 1.0955 LearningRate 0.0023 Epoch: 16 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:54,774-Speed 3390.62 samples/sec Loss 1.1120 LearningRate 0.0023 Epoch: 16 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:26:57,790-Speed 3396.32 samples/sec Loss 1.1280 LearningRate 0.0023 Epoch: 16 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:27:00,806-Speed 3396.52 samples/sec Loss 1.1298 LearningRate 0.0023 Epoch: 16 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:27:03,897-Speed 3313.21 samples/sec Loss 1.0689 LearningRate 0.0023 Epoch: 16 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:27:17,151-Speed 772.66 samples/sec Loss 0.8309 LearningRate 0.0022 Epoch: 17 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:27:20,161-Speed 3402.86 samples/sec Loss 0.8191 LearningRate 0.0022 Epoch: 17 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:27:23,183-Speed 3389.94 samples/sec Loss 0.7564 LearningRate 0.0022 Epoch: 17 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:26,225-Speed 3366.34 samples/sec Loss 0.8170 LearningRate 0.0022 Epoch: 17 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:29,260-Speed 3374.78 samples/sec Loss 0.7908 LearningRate 0.0022 Epoch: 17 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:32,291-Speed 3378.68 samples/sec Loss 0.7564 LearningRate 0.0022 Epoch: 17 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:35,331-Speed 3369.55 samples/sec Loss 0.7287 LearningRate 0.0022 Epoch: 17 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:38,373-Speed 3367.43 samples/sec Loss 0.8058 LearningRate 0.0022 Epoch: 17 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:41,391-Speed 3394.34 samples/sec Loss 0.7069 LearningRate 0.0022 Epoch: 17 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:44,418-Speed 3383.34 samples/sec Loss 0.7949 LearningRate 0.0022 Epoch: 17 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:47,444-Speed 3384.13 samples/sec Loss 0.7713 LearningRate 0.0022 Epoch: 17 Global Step: 96770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:50,479-Speed 3374.64 samples/sec Loss 0.8263 LearningRate 0.0022 Epoch: 17 Global Step: 96780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:27:53,505-Speed 3385.31 samples/sec Loss 0.8016 LearningRate 0.0022 Epoch: 17 Global Step: 96790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:27:56,534-Speed 3381.26 samples/sec Loss 0.7392 LearningRate 0.0022 Epoch: 17 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:27:59,540-Speed 3407.49 samples/sec Loss 0.8107 LearningRate 0.0022 Epoch: 17 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:02,562-Speed 3388.94 samples/sec Loss 0.7245 LearningRate 0.0022 Epoch: 17 Global Step: 96820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:05,584-Speed 3389.92 samples/sec Loss 0.8274 LearningRate 0.0022 Epoch: 17 Global Step: 96830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:08,608-Speed 3386.57 samples/sec Loss 0.8283 LearningRate 0.0022 Epoch: 17 Global Step: 96840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:11,692-Speed 3321.27 samples/sec Loss 0.7840 LearningRate 0.0022 Epoch: 17 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:14,707-Speed 3397.01 samples/sec Loss 0.8065 LearningRate 0.0022 Epoch: 17 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:17,731-Speed 3387.19 samples/sec Loss 0.7830 LearningRate 0.0022 Epoch: 17 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:20,735-Speed 3408.93 samples/sec Loss 0.8186 LearningRate 0.0022 Epoch: 17 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:23,752-Speed 3395.24 samples/sec Loss 0.7403 LearningRate 0.0022 Epoch: 17 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:26,770-Speed 3394.07 samples/sec Loss 0.8370 LearningRate 0.0022 Epoch: 17 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:29,786-Speed 3396.44 samples/sec Loss 0.8804 LearningRate 0.0022 Epoch: 17 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:32,812-Speed 3385.06 samples/sec Loss 0.7808 LearningRate 0.0022 Epoch: 17 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:35,872-Speed 3346.31 samples/sec Loss 0.7608 LearningRate 0.0022 Epoch: 17 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:38,900-Speed 3382.38 samples/sec Loss 0.7514 LearningRate 0.0022 Epoch: 17 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:41,921-Speed 3390.39 samples/sec Loss 0.8459 LearningRate 0.0022 Epoch: 17 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:44,964-Speed 3365.98 samples/sec Loss 0.8356 LearningRate 0.0022 Epoch: 17 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:48,016-Speed 3356.07 samples/sec Loss 0.7859 LearningRate 0.0022 Epoch: 17 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:28:51,062-Speed 3365.38 samples/sec Loss 0.8722 LearningRate 0.0022 Epoch: 17 Global Step: 96980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:54,168-Speed 3297.46 samples/sec Loss 0.7171 LearningRate 0.0022 Epoch: 17 Global Step: 96990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:28:57,196-Speed 3382.95 samples/sec Loss 0.9039 LearningRate 0.0022 Epoch: 17 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:00,234-Speed 3371.86 samples/sec Loss 0.8146 LearningRate 0.0022 Epoch: 17 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:03,268-Speed 3375.60 samples/sec Loss 0.8004 LearningRate 0.0022 Epoch: 17 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:06,305-Speed 3372.64 samples/sec Loss 0.8258 LearningRate 0.0022 Epoch: 17 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:09,324-Speed 3391.80 samples/sec Loss 0.7963 LearningRate 0.0022 Epoch: 17 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:12,392-Speed 3338.47 samples/sec Loss 0.7942 LearningRate 0.0021 Epoch: 17 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:15,421-Speed 3381.59 samples/sec Loss 0.8383 LearningRate 0.0021 Epoch: 17 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:18,451-Speed 3380.49 samples/sec Loss 0.7884 LearningRate 0.0021 Epoch: 17 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:21,478-Speed 3384.36 samples/sec Loss 0.8048 LearningRate 0.0021 Epoch: 17 Global Step: 97080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:29:24,482-Speed 3409.70 samples/sec Loss 0.8359 LearningRate 0.0021 Epoch: 17 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:27,559-Speed 3328.91 samples/sec Loss 0.7714 LearningRate 0.0021 Epoch: 17 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:30,638-Speed 3325.54 samples/sec Loss 0.7600 LearningRate 0.0021 Epoch: 17 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:33,660-Speed 3389.51 samples/sec Loss 0.7788 LearningRate 0.0021 Epoch: 17 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:36,682-Speed 3389.63 samples/sec Loss 0.8414 LearningRate 0.0021 Epoch: 17 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:39,707-Speed 3385.34 samples/sec Loss 0.8785 LearningRate 0.0021 Epoch: 17 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:29:42,719-Speed 3400.43 samples/sec Loss 0.7712 LearningRate 0.0021 Epoch: 17 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:29:45,742-Speed 3388.69 samples/sec Loss 0.8237 LearningRate 0.0021 Epoch: 17 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:29:48,818-Speed 3329.88 samples/sec Loss 0.8051 LearningRate 0.0021 Epoch: 17 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:29:51,849-Speed 3378.72 samples/sec Loss 0.8229 LearningRate 0.0021 Epoch: 17 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:29:54,880-Speed 3379.66 samples/sec Loss 0.8337 LearningRate 0.0021 Epoch: 17 Global Step: 97190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:29:57,910-Speed 3380.45 samples/sec Loss 0.7940 LearningRate 0.0021 Epoch: 17 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:30:00,933-Speed 3388.37 samples/sec Loss 0.7311 LearningRate 0.0021 Epoch: 17 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:30:03,964-Speed 3378.84 samples/sec Loss 0.8310 LearningRate 0.0021 Epoch: 17 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:30:06,998-Speed 3375.83 samples/sec Loss 0.8356 LearningRate 0.0021 Epoch: 17 Global Step: 97230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:30:10,032-Speed 3376.01 samples/sec Loss 0.7604 LearningRate 0.0021 Epoch: 17 Global Step: 97240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:30:13,069-Speed 3372.85 samples/sec Loss 0.8463 LearningRate 0.0021 Epoch: 17 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:16,095-Speed 3384.01 samples/sec Loss 0.7939 LearningRate 0.0021 Epoch: 17 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:19,118-Speed 3388.39 samples/sec Loss 0.8148 LearningRate 0.0021 Epoch: 17 Global Step: 97270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:22,150-Speed 3378.45 samples/sec Loss 0.8159 LearningRate 0.0021 Epoch: 17 Global Step: 97280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:25,174-Speed 3386.61 samples/sec Loss 0.8591 LearningRate 0.0021 Epoch: 17 Global Step: 97290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:28,201-Speed 3383.38 samples/sec Loss 0.7709 LearningRate 0.0021 Epoch: 17 Global Step: 97300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:31,228-Speed 3383.62 samples/sec Loss 0.8051 LearningRate 0.0021 Epoch: 17 Global Step: 97310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:34,264-Speed 3374.16 samples/sec Loss 0.7583 LearningRate 0.0021 Epoch: 17 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:37,393-Speed 3273.62 samples/sec Loss 0.8117 LearningRate 0.0021 Epoch: 17 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:40,492-Speed 3305.23 samples/sec Loss 0.8159 LearningRate 0.0021 Epoch: 17 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:43,498-Speed 3407.32 samples/sec Loss 0.7016 LearningRate 0.0021 Epoch: 17 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:46,522-Speed 3386.57 samples/sec Loss 0.8588 LearningRate 0.0021 Epoch: 17 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:49,561-Speed 3369.68 samples/sec Loss 0.7942 LearningRate 0.0021 Epoch: 17 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:52,584-Speed 3388.57 samples/sec Loss 0.8794 LearningRate 0.0021 Epoch: 17 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:55,606-Speed 3388.69 samples/sec Loss 0.8018 LearningRate 0.0021 Epoch: 17 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:30:58,633-Speed 3383.68 samples/sec Loss 0.8181 LearningRate 0.0021 Epoch: 17 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:01,645-Speed 3401.66 samples/sec Loss 0.7939 LearningRate 0.0021 Epoch: 17 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:04,673-Speed 3382.29 samples/sec Loss 0.7785 LearningRate 0.0021 Epoch: 17 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:07,698-Speed 3386.19 samples/sec Loss 0.8327 LearningRate 0.0021 Epoch: 17 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:10,723-Speed 3385.84 samples/sec Loss 0.8686 LearningRate 0.0020 Epoch: 17 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:13,755-Speed 3377.82 samples/sec Loss 0.8489 LearningRate 0.0020 Epoch: 17 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:16,777-Speed 3388.86 samples/sec Loss 0.8021 LearningRate 0.0020 Epoch: 17 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:19,805-Speed 3382.70 samples/sec Loss 0.8079 LearningRate 0.0020 Epoch: 17 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:22,850-Speed 3363.65 samples/sec Loss 0.8359 LearningRate 0.0020 Epoch: 17 Global Step: 97480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:25,927-Speed 3328.95 samples/sec Loss 0.8265 LearningRate 0.0020 Epoch: 17 Global Step: 97490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:28,953-Speed 3384.36 samples/sec Loss 0.7852 LearningRate 0.0020 Epoch: 17 Global Step: 97500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:31:32,019-Speed 3341.26 samples/sec Loss 0.7613 LearningRate 0.0020 Epoch: 17 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:35,068-Speed 3358.84 samples/sec Loss 0.7547 LearningRate 0.0020 Epoch: 17 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:38,094-Speed 3384.68 samples/sec Loss 0.7852 LearningRate 0.0020 Epoch: 17 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:41,118-Speed 3387.03 samples/sec Loss 0.8086 LearningRate 0.0020 Epoch: 17 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:44,144-Speed 3385.13 samples/sec Loss 0.7879 LearningRate 0.0020 Epoch: 17 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:47,173-Speed 3381.41 samples/sec Loss 0.8322 LearningRate 0.0020 Epoch: 17 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:50,204-Speed 3379.42 samples/sec Loss 0.7877 LearningRate 0.0020 Epoch: 17 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:53,232-Speed 3382.02 samples/sec Loss 0.8197 LearningRate 0.0020 Epoch: 17 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:56,257-Speed 3386.47 samples/sec Loss 0.7897 LearningRate 0.0020 Epoch: 17 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:31:59,292-Speed 3374.71 samples/sec Loss 0.8231 LearningRate 0.0020 Epoch: 17 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:02,312-Speed 3391.49 samples/sec Loss 0.8300 LearningRate 0.0020 Epoch: 17 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:05,349-Speed 3372.05 samples/sec Loss 0.7348 LearningRate 0.0020 Epoch: 17 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:08,371-Speed 3389.28 samples/sec Loss 0.8104 LearningRate 0.0020 Epoch: 17 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:11,404-Speed 3376.87 samples/sec Loss 0.8523 LearningRate 0.0020 Epoch: 17 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:14,428-Speed 3387.26 samples/sec Loss 0.8280 LearningRate 0.0020 Epoch: 17 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:17,451-Speed 3388.22 samples/sec Loss 0.7855 LearningRate 0.0020 Epoch: 17 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:20,490-Speed 3370.72 samples/sec Loss 0.7757 LearningRate 0.0020 Epoch: 17 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:23,516-Speed 3384.91 samples/sec Loss 0.8272 LearningRate 0.0020 Epoch: 17 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:26,543-Speed 3382.92 samples/sec Loss 0.8387 LearningRate 0.0020 Epoch: 17 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:29,570-Speed 3384.29 samples/sec Loss 0.7754 LearningRate 0.0020 Epoch: 17 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:32,598-Speed 3382.85 samples/sec Loss 0.8228 LearningRate 0.0020 Epoch: 17 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:32:35,605-Speed 3405.26 samples/sec Loss 0.7825 LearningRate 0.0020 Epoch: 17 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:38,671-Speed 3340.83 samples/sec Loss 0.7712 LearningRate 0.0020 Epoch: 17 Global Step: 97730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:41,702-Speed 3379.23 samples/sec Loss 0.7272 LearningRate 0.0020 Epoch: 17 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:44,735-Speed 3376.67 samples/sec Loss 0.8576 LearningRate 0.0020 Epoch: 17 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:47,761-Speed 3385.26 samples/sec Loss 0.7548 LearningRate 0.0020 Epoch: 17 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:50,791-Speed 3380.50 samples/sec Loss 0.8174 LearningRate 0.0020 Epoch: 17 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:53,824-Speed 3377.20 samples/sec Loss 0.8804 LearningRate 0.0020 Epoch: 17 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:56,846-Speed 3389.40 samples/sec Loss 0.8023 LearningRate 0.0020 Epoch: 17 Global Step: 97790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:32:59,881-Speed 3374.44 samples/sec Loss 0.7718 LearningRate 0.0020 Epoch: 17 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:02,955-Speed 3331.54 samples/sec Loss 0.8363 LearningRate 0.0020 Epoch: 17 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:05,971-Speed 3396.37 samples/sec Loss 0.8267 LearningRate 0.0020 Epoch: 17 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:08,997-Speed 3384.23 samples/sec Loss 0.7385 LearningRate 0.0020 Epoch: 17 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:12,032-Speed 3375.64 samples/sec Loss 0.7919 LearningRate 0.0019 Epoch: 17 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:15,075-Speed 3365.77 samples/sec Loss 0.8298 LearningRate 0.0019 Epoch: 17 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:18,104-Speed 3380.73 samples/sec Loss 0.7695 LearningRate 0.0019 Epoch: 17 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:21,129-Speed 3386.86 samples/sec Loss 0.8285 LearningRate 0.0019 Epoch: 17 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:24,164-Speed 3374.37 samples/sec Loss 0.8209 LearningRate 0.0019 Epoch: 17 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:27,198-Speed 3375.90 samples/sec Loss 0.7795 LearningRate 0.0019 Epoch: 17 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:30,225-Speed 3383.49 samples/sec Loss 0.8110 LearningRate 0.0019 Epoch: 17 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:33,252-Speed 3383.71 samples/sec Loss 0.8109 LearningRate 0.0019 Epoch: 17 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:36,262-Speed 3402.79 samples/sec Loss 0.7961 LearningRate 0.0019 Epoch: 17 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:39,297-Speed 3374.87 samples/sec Loss 0.7930 LearningRate 0.0019 Epoch: 17 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:42,322-Speed 3385.04 samples/sec Loss 0.8051 LearningRate 0.0019 Epoch: 17 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:45,349-Speed 3384.27 samples/sec Loss 0.7941 LearningRate 0.0019 Epoch: 17 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:48,385-Speed 3373.88 samples/sec Loss 0.7763 LearningRate 0.0019 Epoch: 17 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:51,412-Speed 3384.24 samples/sec Loss 0.8450 LearningRate 0.0019 Epoch: 17 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:54,437-Speed 3384.96 samples/sec Loss 0.7952 LearningRate 0.0019 Epoch: 17 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:33:57,488-Speed 3357.68 samples/sec Loss 0.8232 LearningRate 0.0019 Epoch: 17 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:34:00,591-Speed 3302.58 samples/sec Loss 0.7851 LearningRate 0.0019 Epoch: 17 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:34:44,358-[lfw][98000]XNorm: 21.285736 Training: 2022-04-27 11:34:44,358-[lfw][98000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-27 11:34:44,359-[lfw][98000]Accuracy-Highest: 0.99817 Training: 2022-04-27 11:35:35,015-[cfp_fp][98000]XNorm: 21.325338 Training: 2022-04-27 11:35:35,016-[cfp_fp][98000]Accuracy-Flip: 0.98471+-0.00564 Training: 2022-04-27 11:35:35,016-[cfp_fp][98000]Accuracy-Highest: 0.98471 Training: 2022-04-27 11:36:18,550-[agedb_30][98000]XNorm: 22.087762 Training: 2022-04-27 11:36:18,551-[agedb_30][98000]Accuracy-Flip: 0.98100+-0.00923 Training: 2022-04-27 11:36:18,551-[agedb_30][98000]Accuracy-Highest: 0.98233 Training: 2022-04-27 11:36:21,586-Speed 72.63 samples/sec Loss 0.7787 LearningRate 0.0019 Epoch: 17 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:36:24,581-Speed 3419.39 samples/sec Loss 0.8522 LearningRate 0.0019 Epoch: 17 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:36:27,588-Speed 3407.37 samples/sec Loss 0.8012 LearningRate 0.0019 Epoch: 17 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:36:30,597-Speed 3403.19 samples/sec Loss 0.7725 LearningRate 0.0019 Epoch: 17 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:36:33,590-Speed 3422.53 samples/sec Loss 0.8154 LearningRate 0.0019 Epoch: 17 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:36,679-Speed 3314.73 samples/sec Loss 0.8335 LearningRate 0.0019 Epoch: 17 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:39,688-Speed 3403.79 samples/sec Loss 0.8766 LearningRate 0.0019 Epoch: 17 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:42,696-Speed 3405.33 samples/sec Loss 0.7831 LearningRate 0.0019 Epoch: 17 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:45,704-Speed 3404.75 samples/sec Loss 0.8351 LearningRate 0.0019 Epoch: 17 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:48,715-Speed 3402.25 samples/sec Loss 0.7524 LearningRate 0.0019 Epoch: 17 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:51,741-Speed 3385.03 samples/sec Loss 0.8130 LearningRate 0.0019 Epoch: 17 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:54,769-Speed 3382.59 samples/sec Loss 0.7786 LearningRate 0.0019 Epoch: 17 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:36:57,784-Speed 3396.46 samples/sec Loss 0.8162 LearningRate 0.0019 Epoch: 17 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:37:00,803-Speed 3393.72 samples/sec Loss 0.7367 LearningRate 0.0019 Epoch: 17 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:37:03,884-Speed 3323.86 samples/sec Loss 0.7505 LearningRate 0.0019 Epoch: 17 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:06,925-Speed 3367.63 samples/sec Loss 0.8464 LearningRate 0.0019 Epoch: 17 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:09,937-Speed 3400.35 samples/sec Loss 0.8288 LearningRate 0.0019 Epoch: 17 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:12,964-Speed 3383.75 samples/sec Loss 0.7568 LearningRate 0.0019 Epoch: 17 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:15,974-Speed 3402.78 samples/sec Loss 0.8101 LearningRate 0.0019 Epoch: 17 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:18,991-Speed 3396.02 samples/sec Loss 0.8236 LearningRate 0.0019 Epoch: 17 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:22,017-Speed 3383.96 samples/sec Loss 0.8002 LearningRate 0.0019 Epoch: 17 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:25,043-Speed 3384.98 samples/sec Loss 0.8382 LearningRate 0.0019 Epoch: 17 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:28,061-Speed 3393.87 samples/sec Loss 0.7616 LearningRate 0.0019 Epoch: 17 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:31,079-Speed 3394.30 samples/sec Loss 0.8188 LearningRate 0.0019 Epoch: 17 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:34,089-Speed 3401.94 samples/sec Loss 0.8271 LearningRate 0.0019 Epoch: 17 Global Step: 98250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:37:37,092-Speed 3410.49 samples/sec Loss 0.8310 LearningRate 0.0018 Epoch: 17 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:40,112-Speed 3392.01 samples/sec Loss 0.8261 LearningRate 0.0018 Epoch: 17 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:43,127-Speed 3397.07 samples/sec Loss 0.7760 LearningRate 0.0018 Epoch: 17 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:46,138-Speed 3401.98 samples/sec Loss 0.8620 LearningRate 0.0018 Epoch: 17 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:49,152-Speed 3398.41 samples/sec Loss 0.7657 LearningRate 0.0018 Epoch: 17 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:52,167-Speed 3397.44 samples/sec Loss 0.7681 LearningRate 0.0018 Epoch: 17 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:55,192-Speed 3385.41 samples/sec Loss 0.8107 LearningRate 0.0018 Epoch: 17 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:37:58,211-Speed 3392.84 samples/sec Loss 0.8378 LearningRate 0.0018 Epoch: 17 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:01,248-Speed 3372.32 samples/sec Loss 0.8533 LearningRate 0.0018 Epoch: 17 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:04,263-Speed 3397.89 samples/sec Loss 0.7978 LearningRate 0.0018 Epoch: 17 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:07,281-Speed 3394.21 samples/sec Loss 0.9189 LearningRate 0.0018 Epoch: 17 Global Step: 98360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:38:10,280-Speed 3415.06 samples/sec Loss 0.8280 LearningRate 0.0018 Epoch: 17 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:13,292-Speed 3400.98 samples/sec Loss 0.8180 LearningRate 0.0018 Epoch: 17 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:16,311-Speed 3392.37 samples/sec Loss 0.8023 LearningRate 0.0018 Epoch: 17 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:19,330-Speed 3392.92 samples/sec Loss 0.7937 LearningRate 0.0018 Epoch: 17 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:22,343-Speed 3398.86 samples/sec Loss 0.8844 LearningRate 0.0018 Epoch: 17 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:25,361-Speed 3393.76 samples/sec Loss 0.8103 LearningRate 0.0018 Epoch: 17 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:38:28,361-Speed 3414.21 samples/sec Loss 0.7305 LearningRate 0.0018 Epoch: 17 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:31,372-Speed 3402.08 samples/sec Loss 0.8654 LearningRate 0.0018 Epoch: 17 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:34,387-Speed 3396.46 samples/sec Loss 0.7039 LearningRate 0.0018 Epoch: 17 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:37,402-Speed 3396.74 samples/sec Loss 0.8918 LearningRate 0.0018 Epoch: 17 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:40,416-Speed 3398.80 samples/sec Loss 0.7929 LearningRate 0.0018 Epoch: 17 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:43,863-Speed 3393.57 samples/sec Loss 0.8379 LearningRate 0.0018 Epoch: 17 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:46,880-Speed 3395.90 samples/sec Loss 0.8004 LearningRate 0.0018 Epoch: 17 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:49,896-Speed 3395.87 samples/sec Loss 0.8051 LearningRate 0.0018 Epoch: 17 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:52,910-Speed 3398.29 samples/sec Loss 0.8313 LearningRate 0.0018 Epoch: 17 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:55,920-Speed 3401.90 samples/sec Loss 0.8362 LearningRate 0.0018 Epoch: 17 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:38:58,927-Speed 3406.53 samples/sec Loss 0.8328 LearningRate 0.0018 Epoch: 17 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:39:01,951-Speed 3386.84 samples/sec Loss 0.7415 LearningRate 0.0018 Epoch: 17 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:39:04,948-Speed 3417.95 samples/sec Loss 0.7755 LearningRate 0.0018 Epoch: 17 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:07,964-Speed 3395.22 samples/sec Loss 0.8355 LearningRate 0.0018 Epoch: 17 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:10,984-Speed 3392.47 samples/sec Loss 0.9174 LearningRate 0.0018 Epoch: 17 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:13,998-Speed 3398.35 samples/sec Loss 0.8423 LearningRate 0.0018 Epoch: 17 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:17,007-Speed 3403.49 samples/sec Loss 0.8650 LearningRate 0.0018 Epoch: 17 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:20,018-Speed 3401.11 samples/sec Loss 0.8118 LearningRate 0.0018 Epoch: 17 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:23,033-Speed 3397.89 samples/sec Loss 0.8233 LearningRate 0.0018 Epoch: 17 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:26,046-Speed 3399.55 samples/sec Loss 0.8221 LearningRate 0.0018 Epoch: 17 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:29,072-Speed 3384.51 samples/sec Loss 0.8493 LearningRate 0.0018 Epoch: 17 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:32,084-Speed 3400.18 samples/sec Loss 0.6967 LearningRate 0.0018 Epoch: 17 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:35,098-Speed 3398.90 samples/sec Loss 0.7510 LearningRate 0.0018 Epoch: 17 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:39:38,101-Speed 3410.17 samples/sec Loss 0.8314 LearningRate 0.0018 Epoch: 17 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:41,116-Speed 3397.22 samples/sec Loss 0.7981 LearningRate 0.0018 Epoch: 17 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:44,132-Speed 3396.63 samples/sec Loss 0.8682 LearningRate 0.0017 Epoch: 17 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:47,153-Speed 3390.17 samples/sec Loss 0.8739 LearningRate 0.0017 Epoch: 17 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:50,168-Speed 3396.79 samples/sec Loss 0.8338 LearningRate 0.0017 Epoch: 17 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:53,185-Speed 3394.57 samples/sec Loss 0.8777 LearningRate 0.0017 Epoch: 17 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:56,217-Speed 3378.69 samples/sec Loss 0.7962 LearningRate 0.0017 Epoch: 17 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:39:59,282-Speed 3340.85 samples/sec Loss 0.8131 LearningRate 0.0017 Epoch: 17 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:02,302-Speed 3391.42 samples/sec Loss 0.8077 LearningRate 0.0017 Epoch: 17 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:05,318-Speed 3396.49 samples/sec Loss 0.7828 LearningRate 0.0017 Epoch: 17 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:08,334-Speed 3396.93 samples/sec Loss 0.8342 LearningRate 0.0017 Epoch: 17 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:11,370-Speed 3373.35 samples/sec Loss 0.8703 LearningRate 0.0017 Epoch: 17 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:14,446-Speed 3328.96 samples/sec Loss 0.7781 LearningRate 0.0017 Epoch: 17 Global Step: 98780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:17,482-Speed 3373.80 samples/sec Loss 0.8334 LearningRate 0.0017 Epoch: 17 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:20,497-Speed 3397.88 samples/sec Loss 0.8074 LearningRate 0.0017 Epoch: 17 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:23,514-Speed 3394.74 samples/sec Loss 0.8988 LearningRate 0.0017 Epoch: 17 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:26,536-Speed 3388.37 samples/sec Loss 0.8271 LearningRate 0.0017 Epoch: 17 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:29,555-Speed 3392.92 samples/sec Loss 0.8522 LearningRate 0.0017 Epoch: 17 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:40:32,557-Speed 3412.38 samples/sec Loss 0.8143 LearningRate 0.0017 Epoch: 17 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:35,576-Speed 3393.21 samples/sec Loss 0.8744 LearningRate 0.0017 Epoch: 17 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:38,603-Speed 3383.25 samples/sec Loss 0.8181 LearningRate 0.0017 Epoch: 17 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:41,623-Speed 3390.96 samples/sec Loss 0.8865 LearningRate 0.0017 Epoch: 17 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:44,643-Speed 3392.03 samples/sec Loss 0.7637 LearningRate 0.0017 Epoch: 17 Global Step: 98880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:47,664-Speed 3389.83 samples/sec Loss 0.7711 LearningRate 0.0017 Epoch: 17 Global Step: 98890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:50,685-Speed 3390.48 samples/sec Loss 0.8365 LearningRate 0.0017 Epoch: 17 Global Step: 98900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:53,705-Speed 3391.88 samples/sec Loss 0.8233 LearningRate 0.0017 Epoch: 17 Global Step: 98910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:56,728-Speed 3387.40 samples/sec Loss 0.8421 LearningRate 0.0017 Epoch: 17 Global Step: 98920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:40:59,757-Speed 3381.59 samples/sec Loss 0.8449 LearningRate 0.0017 Epoch: 17 Global Step: 98930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 11:41:02,829-Speed 3334.49 samples/sec Loss 0.7784 LearningRate 0.0017 Epoch: 17 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:05,907-Speed 3328.08 samples/sec Loss 0.7573 LearningRate 0.0017 Epoch: 17 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:08,924-Speed 3395.09 samples/sec Loss 0.8482 LearningRate 0.0017 Epoch: 17 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:11,942-Speed 3393.57 samples/sec Loss 0.8138 LearningRate 0.0017 Epoch: 17 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:14,967-Speed 3385.33 samples/sec Loss 0.7575 LearningRate 0.0017 Epoch: 17 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:17,986-Speed 3393.12 samples/sec Loss 0.8395 LearningRate 0.0017 Epoch: 17 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:21,001-Speed 3396.44 samples/sec Loss 0.8014 LearningRate 0.0017 Epoch: 17 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:24,022-Speed 3390.53 samples/sec Loss 0.8331 LearningRate 0.0017 Epoch: 17 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:27,053-Speed 3379.37 samples/sec Loss 0.7519 LearningRate 0.0017 Epoch: 17 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:30,147-Speed 3310.59 samples/sec Loss 0.8537 LearningRate 0.0017 Epoch: 17 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:33,180-Speed 3376.35 samples/sec Loss 0.8041 LearningRate 0.0017 Epoch: 17 Global Step: 99040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 11:41:36,184-Speed 3410.66 samples/sec Loss 0.8082 LearningRate 0.0017 Epoch: 17 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:39,207-Speed 3387.42 samples/sec Loss 0.7949 LearningRate 0.0017 Epoch: 17 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:42,225-Speed 3393.79 samples/sec Loss 0.8227 LearningRate 0.0017 Epoch: 17 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:45,242-Speed 3394.37 samples/sec Loss 0.7808 LearningRate 0.0017 Epoch: 17 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:48,258-Speed 3396.25 samples/sec Loss 0.7792 LearningRate 0.0017 Epoch: 17 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:51,280-Speed 3389.30 samples/sec Loss 0.7896 LearningRate 0.0017 Epoch: 17 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:54,301-Speed 3391.20 samples/sec Loss 0.8153 LearningRate 0.0017 Epoch: 17 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:41:57,321-Speed 3390.74 samples/sec Loss 0.8677 LearningRate 0.0016 Epoch: 17 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:00,346-Speed 3386.76 samples/sec Loss 0.9056 LearningRate 0.0016 Epoch: 17 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:03,364-Speed 3393.31 samples/sec Loss 0.8052 LearningRate 0.0016 Epoch: 17 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:06,364-Speed 3413.66 samples/sec Loss 0.8314 LearningRate 0.0016 Epoch: 17 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:09,383-Speed 3392.76 samples/sec Loss 0.8558 LearningRate 0.0016 Epoch: 17 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:12,414-Speed 3378.77 samples/sec Loss 0.8401 LearningRate 0.0016 Epoch: 17 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:15,431-Speed 3395.60 samples/sec Loss 0.7526 LearningRate 0.0016 Epoch: 17 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 11:42:18,457-Speed 3384.82 samples/sec Loss 0.8001 LearningRate 0.0016 Epoch: 17 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:21,481-Speed 3388.11 samples/sec Loss 0.8051 LearningRate 0.0016 Epoch: 17 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:24,549-Speed 3337.94 samples/sec Loss 0.8187 LearningRate 0.0016 Epoch: 17 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:27,588-Speed 3370.75 samples/sec Loss 0.7827 LearningRate 0.0016 Epoch: 17 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:30,613-Speed 3385.06 samples/sec Loss 0.9193 LearningRate 0.0016 Epoch: 17 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:33,635-Speed 3389.24 samples/sec Loss 0.8814 LearningRate 0.0016 Epoch: 17 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:42:36,652-Speed 3395.78 samples/sec Loss 0.8269 LearningRate 0.0016 Epoch: 17 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:39,670-Speed 3393.10 samples/sec Loss 0.8331 LearningRate 0.0016 Epoch: 17 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:42,692-Speed 3389.50 samples/sec Loss 0.7834 LearningRate 0.0016 Epoch: 17 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:45,713-Speed 3390.01 samples/sec Loss 0.8623 LearningRate 0.0016 Epoch: 17 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:48,744-Speed 3380.17 samples/sec Loss 0.8657 LearningRate 0.0016 Epoch: 17 Global Step: 99290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:51,767-Speed 3387.58 samples/sec Loss 0.8378 LearningRate 0.0016 Epoch: 17 Global Step: 99300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:54,784-Speed 3395.44 samples/sec Loss 0.7932 LearningRate 0.0016 Epoch: 17 Global Step: 99310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:42:57,833-Speed 3359.50 samples/sec Loss 0.8518 LearningRate 0.0016 Epoch: 17 Global Step: 99320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:43:00,867-Speed 3375.15 samples/sec Loss 0.7850 LearningRate 0.0016 Epoch: 17 Global Step: 99330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:43:03,920-Speed 3354.39 samples/sec Loss 0.8823 LearningRate 0.0016 Epoch: 17 Global Step: 99340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:43:06,946-Speed 3385.81 samples/sec Loss 0.9058 LearningRate 0.0016 Epoch: 17 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:09,972-Speed 3384.04 samples/sec Loss 0.7666 LearningRate 0.0016 Epoch: 17 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:13,005-Speed 3376.75 samples/sec Loss 0.8661 LearningRate 0.0016 Epoch: 17 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:16,035-Speed 3380.55 samples/sec Loss 0.7471 LearningRate 0.0016 Epoch: 17 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:19,060-Speed 3386.07 samples/sec Loss 0.7727 LearningRate 0.0016 Epoch: 17 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:22,095-Speed 3374.97 samples/sec Loss 0.8243 LearningRate 0.0016 Epoch: 17 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:25,129-Speed 3376.31 samples/sec Loss 0.8707 LearningRate 0.0016 Epoch: 17 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:28,196-Speed 3339.29 samples/sec Loss 0.8344 LearningRate 0.0016 Epoch: 17 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:31,217-Speed 3389.64 samples/sec Loss 0.8733 LearningRate 0.0016 Epoch: 17 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:34,270-Speed 3355.15 samples/sec Loss 0.8355 LearningRate 0.0016 Epoch: 17 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:37,415-Speed 3256.41 samples/sec Loss 0.8387 LearningRate 0.0016 Epoch: 17 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:40,451-Speed 3374.06 samples/sec Loss 0.8282 LearningRate 0.0016 Epoch: 17 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:43,482-Speed 3379.64 samples/sec Loss 0.7353 LearningRate 0.0016 Epoch: 17 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:46,513-Speed 3379.13 samples/sec Loss 0.8784 LearningRate 0.0016 Epoch: 17 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:49,535-Speed 3389.09 samples/sec Loss 0.8289 LearningRate 0.0016 Epoch: 17 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:43:52,543-Speed 3404.79 samples/sec Loss 0.7944 LearningRate 0.0016 Epoch: 17 Global Step: 99500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:43:55,565-Speed 3389.36 samples/sec Loss 0.8282 LearningRate 0.0016 Epoch: 17 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:43:58,590-Speed 3385.91 samples/sec Loss 0.8873 LearningRate 0.0016 Epoch: 17 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:01,617-Speed 3383.61 samples/sec Loss 0.8050 LearningRate 0.0016 Epoch: 17 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:04,645-Speed 3382.89 samples/sec Loss 0.8036 LearningRate 0.0016 Epoch: 17 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:07,674-Speed 3381.24 samples/sec Loss 0.7798 LearningRate 0.0016 Epoch: 17 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:10,696-Speed 3389.24 samples/sec Loss 0.7250 LearningRate 0.0016 Epoch: 17 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:13,773-Speed 3328.95 samples/sec Loss 0.8364 LearningRate 0.0015 Epoch: 17 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:16,796-Speed 3387.68 samples/sec Loss 0.8089 LearningRate 0.0015 Epoch: 17 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:19,820-Speed 3387.68 samples/sec Loss 0.8137 LearningRate 0.0015 Epoch: 17 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:44:22,847-Speed 3383.75 samples/sec Loss 0.8714 LearningRate 0.0015 Epoch: 17 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:25,899-Speed 3355.66 samples/sec Loss 0.8478 LearningRate 0.0015 Epoch: 17 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:28,942-Speed 3366.05 samples/sec Loss 0.7931 LearningRate 0.0015 Epoch: 17 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:31,969-Speed 3383.53 samples/sec Loss 0.7688 LearningRate 0.0015 Epoch: 17 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:34,998-Speed 3380.71 samples/sec Loss 0.8183 LearningRate 0.0015 Epoch: 17 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:38,027-Speed 3382.54 samples/sec Loss 0.8743 LearningRate 0.0015 Epoch: 17 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:41,071-Speed 3364.78 samples/sec Loss 0.7793 LearningRate 0.0015 Epoch: 17 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:44,107-Speed 3372.93 samples/sec Loss 0.8914 LearningRate 0.0015 Epoch: 17 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:47,133-Speed 3384.98 samples/sec Loss 0.8480 LearningRate 0.0015 Epoch: 17 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:50,268-Speed 3266.88 samples/sec Loss 0.8167 LearningRate 0.0015 Epoch: 17 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:53,344-Speed 3330.18 samples/sec Loss 0.8091 LearningRate 0.0015 Epoch: 17 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:44:56,356-Speed 3399.71 samples/sec Loss 0.9036 LearningRate 0.0015 Epoch: 17 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:44:59,379-Speed 3388.55 samples/sec Loss 0.7759 LearningRate 0.0015 Epoch: 17 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:02,443-Speed 3343.29 samples/sec Loss 0.7540 LearningRate 0.0015 Epoch: 17 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:05,542-Speed 3305.15 samples/sec Loss 0.8012 LearningRate 0.0015 Epoch: 17 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:08,564-Speed 3388.57 samples/sec Loss 0.8358 LearningRate 0.0015 Epoch: 17 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:11,583-Speed 3393.53 samples/sec Loss 0.8736 LearningRate 0.0015 Epoch: 17 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:14,638-Speed 3351.98 samples/sec Loss 0.7465 LearningRate 0.0015 Epoch: 17 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:17,662-Speed 3386.78 samples/sec Loss 0.8649 LearningRate 0.0015 Epoch: 17 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:20,686-Speed 3387.14 samples/sec Loss 0.7552 LearningRate 0.0015 Epoch: 17 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:23,800-Speed 3289.36 samples/sec Loss 0.7903 LearningRate 0.0015 Epoch: 17 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:26,872-Speed 3333.46 samples/sec Loss 0.9074 LearningRate 0.0015 Epoch: 17 Global Step: 99810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:45:29,893-Speed 3391.32 samples/sec Loss 0.8327 LearningRate 0.0015 Epoch: 17 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:32,938-Speed 3363.96 samples/sec Loss 0.8823 LearningRate 0.0015 Epoch: 17 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:35,965-Speed 3383.16 samples/sec Loss 0.8429 LearningRate 0.0015 Epoch: 17 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:38,992-Speed 3384.24 samples/sec Loss 0.8276 LearningRate 0.0015 Epoch: 17 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:42,016-Speed 3386.86 samples/sec Loss 0.7808 LearningRate 0.0015 Epoch: 17 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:45,039-Speed 3387.72 samples/sec Loss 0.8565 LearningRate 0.0015 Epoch: 17 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:48,092-Speed 3354.32 samples/sec Loss 0.8761 LearningRate 0.0015 Epoch: 17 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:51,137-Speed 3363.86 samples/sec Loss 0.7561 LearningRate 0.0015 Epoch: 17 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:54,162-Speed 3386.40 samples/sec Loss 0.8701 LearningRate 0.0015 Epoch: 17 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:45:57,186-Speed 3386.48 samples/sec Loss 0.8443 LearningRate 0.0015 Epoch: 17 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:00,202-Speed 3397.16 samples/sec Loss 0.8330 LearningRate 0.0015 Epoch: 17 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:03,224-Speed 3389.47 samples/sec Loss 0.8380 LearningRate 0.0015 Epoch: 17 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:06,252-Speed 3381.74 samples/sec Loss 0.7886 LearningRate 0.0015 Epoch: 17 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:09,274-Speed 3389.09 samples/sec Loss 0.8672 LearningRate 0.0015 Epoch: 17 Global Step: 99950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:12,298-Speed 3387.27 samples/sec Loss 0.8146 LearningRate 0.0015 Epoch: 17 Global Step: 99960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:15,329-Speed 3379.72 samples/sec Loss 0.8554 LearningRate 0.0015 Epoch: 17 Global Step: 99970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:18,354-Speed 3384.91 samples/sec Loss 0.8065 LearningRate 0.0015 Epoch: 17 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:21,378-Speed 3387.18 samples/sec Loss 0.7915 LearningRate 0.0015 Epoch: 17 Global Step: 99990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:46:24,404-Speed 3384.98 samples/sec Loss 0.7611 LearningRate 0.0015 Epoch: 17 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:47:07,682-[lfw][100000]XNorm: 21.826810 Training: 2022-04-27 11:47:07,682-[lfw][100000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-27 11:47:07,683-[lfw][100000]Accuracy-Highest: 0.99817 Training: 2022-04-27 11:47:58,371-[cfp_fp][100000]XNorm: 21.727874 Training: 2022-04-27 11:47:58,372-[cfp_fp][100000]Accuracy-Flip: 0.98614+-0.00593 Training: 2022-04-27 11:47:58,372-[cfp_fp][100000]Accuracy-Highest: 0.98614 Training: 2022-04-27 11:48:41,865-[agedb_30][100000]XNorm: 22.020445 Training: 2022-04-27 11:48:41,866-[agedb_30][100000]Accuracy-Flip: 0.97983+-0.00883 Training: 2022-04-27 11:48:41,866-[agedb_30][100000]Accuracy-Highest: 0.98233 Training: 2022-04-27 11:48:44,856-Speed 72.91 samples/sec Loss 0.8884 LearningRate 0.0015 Epoch: 17 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:48:47,879-Speed 3388.39 samples/sec Loss 0.7642 LearningRate 0.0015 Epoch: 17 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:48:50,882-Speed 3411.19 samples/sec Loss 0.7448 LearningRate 0.0014 Epoch: 17 Global Step: 100030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:48:53,881-Speed 3415.48 samples/sec Loss 0.8352 LearningRate 0.0014 Epoch: 17 Global Step: 100040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:48:56,884-Speed 3409.82 samples/sec Loss 0.7856 LearningRate 0.0014 Epoch: 17 Global Step: 100050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:48:59,927-Speed 3365.95 samples/sec Loss 0.8482 LearningRate 0.0014 Epoch: 17 Global Step: 100060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:02,938-Speed 3401.35 samples/sec Loss 0.7606 LearningRate 0.0014 Epoch: 17 Global Step: 100070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:05,946-Speed 3405.54 samples/sec Loss 0.8731 LearningRate 0.0014 Epoch: 17 Global Step: 100080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:08,959-Speed 3399.33 samples/sec Loss 0.8663 LearningRate 0.0014 Epoch: 17 Global Step: 100090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:11,976-Speed 3396.05 samples/sec Loss 0.8777 LearningRate 0.0014 Epoch: 17 Global Step: 100100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:14,995-Speed 3392.67 samples/sec Loss 0.7943 LearningRate 0.0014 Epoch: 17 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:18,003-Speed 3404.22 samples/sec Loss 0.8326 LearningRate 0.0014 Epoch: 17 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:21,014-Speed 3402.16 samples/sec Loss 0.8284 LearningRate 0.0014 Epoch: 17 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:24,035-Speed 3389.96 samples/sec Loss 0.8357 LearningRate 0.0014 Epoch: 17 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:27,059-Speed 3387.21 samples/sec Loss 0.8420 LearningRate 0.0014 Epoch: 17 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:30,075-Speed 3396.12 samples/sec Loss 0.8482 LearningRate 0.0014 Epoch: 17 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:33,096-Speed 3389.92 samples/sec Loss 0.8337 LearningRate 0.0014 Epoch: 17 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:36,123-Speed 3384.35 samples/sec Loss 0.7482 LearningRate 0.0014 Epoch: 17 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:39,150-Speed 3383.55 samples/sec Loss 0.8048 LearningRate 0.0014 Epoch: 17 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:42,173-Speed 3387.87 samples/sec Loss 0.8378 LearningRate 0.0014 Epoch: 17 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:45,172-Speed 3415.86 samples/sec Loss 0.8195 LearningRate 0.0014 Epoch: 17 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:48,208-Speed 3373.65 samples/sec Loss 0.8747 LearningRate 0.0014 Epoch: 17 Global Step: 100220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:49:51,223-Speed 3397.15 samples/sec Loss 0.8567 LearningRate 0.0014 Epoch: 17 Global Step: 100230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:54,258-Speed 3374.74 samples/sec Loss 0.7500 LearningRate 0.0014 Epoch: 17 Global Step: 100240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:49:57,292-Speed 3375.16 samples/sec Loss 0.8279 LearningRate 0.0014 Epoch: 17 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:00,321-Speed 3381.01 samples/sec Loss 0.7461 LearningRate 0.0014 Epoch: 17 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:03,357-Speed 3373.60 samples/sec Loss 0.7825 LearningRate 0.0014 Epoch: 17 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:06,382-Speed 3386.82 samples/sec Loss 0.7691 LearningRate 0.0014 Epoch: 17 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:09,410-Speed 3382.80 samples/sec Loss 0.8660 LearningRate 0.0014 Epoch: 17 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:12,431-Speed 3389.87 samples/sec Loss 0.8802 LearningRate 0.0014 Epoch: 17 Global Step: 100300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:15,468-Speed 3372.32 samples/sec Loss 0.8422 LearningRate 0.0014 Epoch: 17 Global Step: 100310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:18,491-Speed 3387.96 samples/sec Loss 0.8290 LearningRate 0.0014 Epoch: 17 Global Step: 100320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:50:21,506-Speed 3397.90 samples/sec Loss 0.7841 LearningRate 0.0014 Epoch: 17 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:24,523-Speed 3394.68 samples/sec Loss 0.8964 LearningRate 0.0014 Epoch: 17 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:27,531-Speed 3404.76 samples/sec Loss 0.8165 LearningRate 0.0014 Epoch: 17 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:30,542-Speed 3401.19 samples/sec Loss 0.8559 LearningRate 0.0014 Epoch: 17 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:33,563-Speed 3390.88 samples/sec Loss 0.8668 LearningRate 0.0014 Epoch: 17 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:36,580-Speed 3394.79 samples/sec Loss 0.8846 LearningRate 0.0014 Epoch: 17 Global Step: 100380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:39,598-Speed 3394.12 samples/sec Loss 0.8907 LearningRate 0.0014 Epoch: 17 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:42,609-Speed 3402.11 samples/sec Loss 0.8233 LearningRate 0.0014 Epoch: 17 Global Step: 100400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:45,633-Speed 3386.37 samples/sec Loss 0.9233 LearningRate 0.0014 Epoch: 17 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:48,725-Speed 3312.30 samples/sec Loss 0.8133 LearningRate 0.0014 Epoch: 17 Global Step: 100420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:51,749-Speed 3387.39 samples/sec Loss 0.8182 LearningRate 0.0014 Epoch: 17 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:50:54,748-Speed 3415.38 samples/sec Loss 0.8318 LearningRate 0.0014 Epoch: 17 Global Step: 100440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:50:57,774-Speed 3384.30 samples/sec Loss 0.8675 LearningRate 0.0014 Epoch: 17 Global Step: 100450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:00,787-Speed 3399.93 samples/sec Loss 0.7902 LearningRate 0.0014 Epoch: 17 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:03,807-Speed 3391.34 samples/sec Loss 0.8391 LearningRate 0.0014 Epoch: 17 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:06,822-Speed 3397.03 samples/sec Loss 0.8260 LearningRate 0.0014 Epoch: 17 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:09,831-Speed 3404.76 samples/sec Loss 0.8124 LearningRate 0.0014 Epoch: 17 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:12,858-Speed 3383.00 samples/sec Loss 0.8456 LearningRate 0.0014 Epoch: 17 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:15,897-Speed 3370.13 samples/sec Loss 0.7673 LearningRate 0.0013 Epoch: 17 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:18,914-Speed 3394.84 samples/sec Loss 0.8450 LearningRate 0.0013 Epoch: 17 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:21,928-Speed 3398.82 samples/sec Loss 0.8891 LearningRate 0.0013 Epoch: 17 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:24,946-Speed 3393.91 samples/sec Loss 0.8323 LearningRate 0.0013 Epoch: 17 Global Step: 100540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:51:27,956-Speed 3402.46 samples/sec Loss 0.8043 LearningRate 0.0013 Epoch: 17 Global Step: 100550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:51:30,961-Speed 3408.21 samples/sec Loss 0.7945 LearningRate 0.0013 Epoch: 17 Global Step: 100560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:51:33,950-Speed 3427.30 samples/sec Loss 0.7975 LearningRate 0.0013 Epoch: 17 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:36,964-Speed 3397.66 samples/sec Loss 0.8391 LearningRate 0.0013 Epoch: 17 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:39,979-Speed 3397.63 samples/sec Loss 0.7407 LearningRate 0.0013 Epoch: 17 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:42,990-Speed 3401.66 samples/sec Loss 0.7932 LearningRate 0.0013 Epoch: 17 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:46,022-Speed 3377.51 samples/sec Loss 0.8654 LearningRate 0.0013 Epoch: 17 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:49,037-Speed 3396.89 samples/sec Loss 0.8166 LearningRate 0.0013 Epoch: 17 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:52,057-Speed 3391.60 samples/sec Loss 0.7692 LearningRate 0.0013 Epoch: 17 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:55,068-Speed 3402.59 samples/sec Loss 0.8255 LearningRate 0.0013 Epoch: 17 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:51:58,079-Speed 3401.38 samples/sec Loss 0.7992 LearningRate 0.0013 Epoch: 17 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:01,116-Speed 3372.13 samples/sec Loss 0.8548 LearningRate 0.0013 Epoch: 17 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:04,129-Speed 3400.38 samples/sec Loss 0.7973 LearningRate 0.0013 Epoch: 17 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:52:07,120-Speed 3423.51 samples/sec Loss 0.7402 LearningRate 0.0013 Epoch: 17 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:10,128-Speed 3405.99 samples/sec Loss 0.7782 LearningRate 0.0013 Epoch: 17 Global Step: 100690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:13,155-Speed 3383.46 samples/sec Loss 0.7576 LearningRate 0.0013 Epoch: 17 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:16,169-Speed 3398.12 samples/sec Loss 0.8671 LearningRate 0.0013 Epoch: 17 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:19,177-Speed 3405.15 samples/sec Loss 0.8331 LearningRate 0.0013 Epoch: 17 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:22,208-Speed 3378.80 samples/sec Loss 0.8185 LearningRate 0.0013 Epoch: 17 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:25,219-Speed 3402.39 samples/sec Loss 0.7798 LearningRate 0.0013 Epoch: 17 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:28,232-Speed 3399.29 samples/sec Loss 0.8212 LearningRate 0.0013 Epoch: 17 Global Step: 100750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:31,244-Speed 3399.93 samples/sec Loss 0.8982 LearningRate 0.0013 Epoch: 17 Global Step: 100760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:34,259-Speed 3396.92 samples/sec Loss 0.8437 LearningRate 0.0013 Epoch: 17 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:37,288-Speed 3381.94 samples/sec Loss 0.8422 LearningRate 0.0013 Epoch: 17 Global Step: 100780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:52:40,303-Speed 3397.62 samples/sec Loss 0.8949 LearningRate 0.0013 Epoch: 17 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:43,317-Speed 3398.39 samples/sec Loss 0.7821 LearningRate 0.0013 Epoch: 17 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:46,333-Speed 3395.28 samples/sec Loss 0.8334 LearningRate 0.0013 Epoch: 17 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:49,344-Speed 3401.97 samples/sec Loss 0.8211 LearningRate 0.0013 Epoch: 17 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:52,359-Speed 3397.09 samples/sec Loss 0.7736 LearningRate 0.0013 Epoch: 17 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:55,372-Speed 3399.46 samples/sec Loss 0.8380 LearningRate 0.0013 Epoch: 17 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:52:58,384-Speed 3400.43 samples/sec Loss 0.8379 LearningRate 0.0013 Epoch: 17 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:01,421-Speed 3373.34 samples/sec Loss 0.8139 LearningRate 0.0013 Epoch: 17 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:04,440-Speed 3392.31 samples/sec Loss 0.8167 LearningRate 0.0013 Epoch: 17 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:07,463-Speed 3388.23 samples/sec Loss 0.8690 LearningRate 0.0013 Epoch: 17 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:10,475-Speed 3400.16 samples/sec Loss 0.8178 LearningRate 0.0013 Epoch: 17 Global Step: 100890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:53:13,490-Speed 3397.55 samples/sec Loss 0.8319 LearningRate 0.0013 Epoch: 17 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:16,549-Speed 3348.34 samples/sec Loss 0.8026 LearningRate 0.0013 Epoch: 17 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:19,565-Speed 3395.63 samples/sec Loss 0.7908 LearningRate 0.0013 Epoch: 17 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:22,583-Speed 3394.10 samples/sec Loss 0.8556 LearningRate 0.0013 Epoch: 17 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:25,597-Speed 3398.68 samples/sec Loss 0.8375 LearningRate 0.0013 Epoch: 17 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:28,615-Speed 3393.45 samples/sec Loss 0.8799 LearningRate 0.0013 Epoch: 17 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:31,631-Speed 3395.57 samples/sec Loss 0.7904 LearningRate 0.0013 Epoch: 17 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:34,645-Speed 3398.43 samples/sec Loss 0.8148 LearningRate 0.0013 Epoch: 17 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:37,665-Speed 3391.59 samples/sec Loss 0.7602 LearningRate 0.0013 Epoch: 17 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:40,684-Speed 3392.62 samples/sec Loss 0.8088 LearningRate 0.0013 Epoch: 17 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:43,689-Speed 3409.17 samples/sec Loss 0.7536 LearningRate 0.0013 Epoch: 17 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:46,714-Speed 3385.31 samples/sec Loss 0.8203 LearningRate 0.0012 Epoch: 17 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:49,730-Speed 3396.13 samples/sec Loss 0.8146 LearningRate 0.0012 Epoch: 17 Global Step: 101020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:52,748-Speed 3393.20 samples/sec Loss 0.7910 LearningRate 0.0012 Epoch: 17 Global Step: 101030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:55,765-Speed 3395.26 samples/sec Loss 0.8152 LearningRate 0.0012 Epoch: 17 Global Step: 101040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:53:58,798-Speed 3377.28 samples/sec Loss 0.8045 LearningRate 0.0012 Epoch: 17 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:01,834-Speed 3373.32 samples/sec Loss 0.8115 LearningRate 0.0012 Epoch: 17 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:04,851-Speed 3394.86 samples/sec Loss 0.8140 LearningRate 0.0012 Epoch: 17 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:07,881-Speed 3380.38 samples/sec Loss 0.8001 LearningRate 0.0012 Epoch: 17 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:10,900-Speed 3392.85 samples/sec Loss 0.8156 LearningRate 0.0012 Epoch: 17 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:13,906-Speed 3407.47 samples/sec Loss 0.8108 LearningRate 0.0012 Epoch: 17 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:16,905-Speed 3415.64 samples/sec Loss 0.8196 LearningRate 0.0012 Epoch: 17 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:19,920-Speed 3397.34 samples/sec Loss 0.7964 LearningRate 0.0012 Epoch: 17 Global Step: 101120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:22,936-Speed 3395.60 samples/sec Loss 0.8185 LearningRate 0.0012 Epoch: 17 Global Step: 101130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:25,957-Speed 3390.59 samples/sec Loss 0.8283 LearningRate 0.0012 Epoch: 17 Global Step: 101140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:28,977-Speed 3390.97 samples/sec Loss 0.8276 LearningRate 0.0012 Epoch: 17 Global Step: 101150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:31,995-Speed 3393.77 samples/sec Loss 0.8258 LearningRate 0.0012 Epoch: 17 Global Step: 101160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:35,026-Speed 3380.21 samples/sec Loss 0.7968 LearningRate 0.0012 Epoch: 17 Global Step: 101170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:38,127-Speed 3302.80 samples/sec Loss 0.8414 LearningRate 0.0012 Epoch: 17 Global Step: 101180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:41,145-Speed 3393.35 samples/sec Loss 0.8219 LearningRate 0.0012 Epoch: 17 Global Step: 101190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:44,173-Speed 3382.80 samples/sec Loss 0.8260 LearningRate 0.0012 Epoch: 17 Global Step: 101200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:54:47,201-Speed 3382.80 samples/sec Loss 0.7888 LearningRate 0.0012 Epoch: 17 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:50,219-Speed 3393.97 samples/sec Loss 0.8290 LearningRate 0.0012 Epoch: 17 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:53,237-Speed 3393.29 samples/sec Loss 0.7300 LearningRate 0.0012 Epoch: 17 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:56,260-Speed 3388.34 samples/sec Loss 0.8057 LearningRate 0.0012 Epoch: 17 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:54:59,280-Speed 3391.91 samples/sec Loss 0.8412 LearningRate 0.0012 Epoch: 17 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:02,370-Speed 3314.73 samples/sec Loss 0.8483 LearningRate 0.0012 Epoch: 17 Global Step: 101260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:05,490-Speed 3282.55 samples/sec Loss 0.7395 LearningRate 0.0012 Epoch: 17 Global Step: 101270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:08,515-Speed 3386.11 samples/sec Loss 0.8123 LearningRate 0.0012 Epoch: 17 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:11,543-Speed 3381.98 samples/sec Loss 0.8455 LearningRate 0.0012 Epoch: 17 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:14,557-Speed 3398.93 samples/sec Loss 0.8428 LearningRate 0.0012 Epoch: 17 Global Step: 101300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:17,581-Speed 3386.06 samples/sec Loss 0.8110 LearningRate 0.0012 Epoch: 17 Global Step: 101310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:20,600-Speed 3393.12 samples/sec Loss 0.7896 LearningRate 0.0012 Epoch: 17 Global Step: 101320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:23,625-Speed 3386.30 samples/sec Loss 0.8669 LearningRate 0.0012 Epoch: 17 Global Step: 101330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:26,665-Speed 3369.01 samples/sec Loss 0.7545 LearningRate 0.0012 Epoch: 17 Global Step: 101340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:29,690-Speed 3386.35 samples/sec Loss 0.8096 LearningRate 0.0012 Epoch: 17 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:32,703-Speed 3398.36 samples/sec Loss 0.7605 LearningRate 0.0012 Epoch: 17 Global Step: 101360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:35,722-Speed 3393.26 samples/sec Loss 0.7813 LearningRate 0.0012 Epoch: 17 Global Step: 101370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:38,750-Speed 3382.17 samples/sec Loss 0.7517 LearningRate 0.0012 Epoch: 17 Global Step: 101380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:41,774-Speed 3387.66 samples/sec Loss 0.8452 LearningRate 0.0012 Epoch: 17 Global Step: 101390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:44,793-Speed 3392.13 samples/sec Loss 0.8597 LearningRate 0.0012 Epoch: 17 Global Step: 101400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:47,812-Speed 3392.52 samples/sec Loss 0.7872 LearningRate 0.0012 Epoch: 17 Global Step: 101410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:55:50,812-Speed 3414.63 samples/sec Loss 0.8731 LearningRate 0.0012 Epoch: 17 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:53,833-Speed 3389.82 samples/sec Loss 0.7535 LearningRate 0.0012 Epoch: 17 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:56,859-Speed 3385.26 samples/sec Loss 0.8005 LearningRate 0.0012 Epoch: 17 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:55:59,876-Speed 3395.26 samples/sec Loss 0.8139 LearningRate 0.0012 Epoch: 17 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:02,900-Speed 3386.87 samples/sec Loss 0.7925 LearningRate 0.0012 Epoch: 17 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:05,923-Speed 3387.45 samples/sec Loss 0.8048 LearningRate 0.0012 Epoch: 17 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:08,940-Speed 3395.21 samples/sec Loss 0.8440 LearningRate 0.0012 Epoch: 17 Global Step: 101480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:11,960-Speed 3390.94 samples/sec Loss 0.8099 LearningRate 0.0012 Epoch: 17 Global Step: 101490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:15,023-Speed 3344.21 samples/sec Loss 0.8146 LearningRate 0.0012 Epoch: 17 Global Step: 101500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:18,053-Speed 3380.06 samples/sec Loss 0.8510 LearningRate 0.0012 Epoch: 17 Global Step: 101510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:21,072-Speed 3393.12 samples/sec Loss 0.8318 LearningRate 0.0012 Epoch: 17 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:24,115-Speed 3366.10 samples/sec Loss 0.8483 LearningRate 0.0011 Epoch: 17 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:27,138-Speed 3388.73 samples/sec Loss 0.8318 LearningRate 0.0011 Epoch: 17 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:30,158-Speed 3391.54 samples/sec Loss 0.7857 LearningRate 0.0011 Epoch: 17 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:33,181-Speed 3387.03 samples/sec Loss 0.8432 LearningRate 0.0011 Epoch: 17 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:36,214-Speed 3377.87 samples/sec Loss 0.6957 LearningRate 0.0011 Epoch: 17 Global Step: 101570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:39,237-Speed 3388.26 samples/sec Loss 0.8131 LearningRate 0.0011 Epoch: 17 Global Step: 101580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:42,257-Speed 3390.75 samples/sec Loss 0.8238 LearningRate 0.0011 Epoch: 17 Global Step: 101590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:45,274-Speed 3395.18 samples/sec Loss 0.7793 LearningRate 0.0011 Epoch: 17 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:48,302-Speed 3382.06 samples/sec Loss 0.7968 LearningRate 0.0011 Epoch: 17 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:56:51,318-Speed 3396.93 samples/sec Loss 0.8872 LearningRate 0.0011 Epoch: 17 Global Step: 101620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:54,348-Speed 3380.11 samples/sec Loss 0.8185 LearningRate 0.0011 Epoch: 17 Global Step: 101630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:56:57,373-Speed 3386.25 samples/sec Loss 0.8429 LearningRate 0.0011 Epoch: 17 Global Step: 101640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:00,406-Speed 3376.94 samples/sec Loss 0.8568 LearningRate 0.0011 Epoch: 17 Global Step: 101650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:03,473-Speed 3338.96 samples/sec Loss 0.8154 LearningRate 0.0011 Epoch: 17 Global Step: 101660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:06,575-Speed 3302.20 samples/sec Loss 0.8398 LearningRate 0.0011 Epoch: 17 Global Step: 101670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:09,594-Speed 3393.03 samples/sec Loss 0.7585 LearningRate 0.0011 Epoch: 17 Global Step: 101680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:12,620-Speed 3384.44 samples/sec Loss 0.7701 LearningRate 0.0011 Epoch: 17 Global Step: 101690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:15,646-Speed 3384.11 samples/sec Loss 0.7594 LearningRate 0.0011 Epoch: 17 Global Step: 101700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:18,686-Speed 3369.83 samples/sec Loss 0.8101 LearningRate 0.0011 Epoch: 17 Global Step: 101710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 11:57:21,710-Speed 3387.82 samples/sec Loss 0.8366 LearningRate 0.0011 Epoch: 17 Global Step: 101720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:24,736-Speed 3384.60 samples/sec Loss 0.8166 LearningRate 0.0011 Epoch: 17 Global Step: 101730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:27,767-Speed 3378.38 samples/sec Loss 0.8379 LearningRate 0.0011 Epoch: 17 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:30,790-Speed 3387.83 samples/sec Loss 0.7880 LearningRate 0.0011 Epoch: 17 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:33,813-Speed 3388.67 samples/sec Loss 0.7359 LearningRate 0.0011 Epoch: 17 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:36,840-Speed 3383.98 samples/sec Loss 0.8079 LearningRate 0.0011 Epoch: 17 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:39,867-Speed 3383.06 samples/sec Loss 0.8959 LearningRate 0.0011 Epoch: 17 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:42,891-Speed 3387.66 samples/sec Loss 0.8621 LearningRate 0.0011 Epoch: 17 Global Step: 101790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:45,916-Speed 3385.55 samples/sec Loss 0.8097 LearningRate 0.0011 Epoch: 17 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:48,942-Speed 3384.90 samples/sec Loss 0.7981 LearningRate 0.0011 Epoch: 17 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:57:51,967-Speed 3386.56 samples/sec Loss 0.8330 LearningRate 0.0011 Epoch: 17 Global Step: 101820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:57:54,989-Speed 3389.27 samples/sec Loss 0.8612 LearningRate 0.0011 Epoch: 17 Global Step: 101830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:57:57,998-Speed 3403.64 samples/sec Loss 0.8519 LearningRate 0.0011 Epoch: 17 Global Step: 101840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:01,028-Speed 3380.05 samples/sec Loss 0.8025 LearningRate 0.0011 Epoch: 17 Global Step: 101850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:04,061-Speed 3376.89 samples/sec Loss 0.8107 LearningRate 0.0011 Epoch: 17 Global Step: 101860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:07,091-Speed 3380.37 samples/sec Loss 0.7677 LearningRate 0.0011 Epoch: 17 Global Step: 101870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:10,120-Speed 3381.11 samples/sec Loss 0.7251 LearningRate 0.0011 Epoch: 17 Global Step: 101880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:13,151-Speed 3378.94 samples/sec Loss 0.7737 LearningRate 0.0011 Epoch: 17 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:16,214-Speed 3344.30 samples/sec Loss 0.8188 LearningRate 0.0011 Epoch: 17 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:19,243-Speed 3381.62 samples/sec Loss 0.8005 LearningRate 0.0011 Epoch: 17 Global Step: 101910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:22,268-Speed 3385.97 samples/sec Loss 0.8383 LearningRate 0.0011 Epoch: 17 Global Step: 101920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:25,295-Speed 3384.11 samples/sec Loss 0.7585 LearningRate 0.0011 Epoch: 17 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:28,324-Speed 3380.81 samples/sec Loss 0.7511 LearningRate 0.0011 Epoch: 17 Global Step: 101940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:58:31,363-Speed 3370.77 samples/sec Loss 0.8269 LearningRate 0.0011 Epoch: 17 Global Step: 101950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:58:34,396-Speed 3376.76 samples/sec Loss 0.8369 LearningRate 0.0011 Epoch: 17 Global Step: 101960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 11:58:37,410-Speed 3398.48 samples/sec Loss 0.7689 LearningRate 0.0011 Epoch: 17 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:40,444-Speed 3375.64 samples/sec Loss 0.8358 LearningRate 0.0011 Epoch: 17 Global Step: 101980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:43,470-Speed 3385.29 samples/sec Loss 0.7846 LearningRate 0.0011 Epoch: 17 Global Step: 101990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:58:46,493-Speed 3388.06 samples/sec Loss 0.8409 LearningRate 0.0011 Epoch: 17 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 11:59:29,952-[lfw][102000]XNorm: 21.864571 Training: 2022-04-27 11:59:29,952-[lfw][102000]Accuracy-Flip: 0.99733+-0.00318 Training: 2022-04-27 11:59:29,953-[lfw][102000]Accuracy-Highest: 0.99817 Training: 2022-04-27 12:00:20,388-[cfp_fp][102000]XNorm: 21.876868 Training: 2022-04-27 12:00:20,388-[cfp_fp][102000]Accuracy-Flip: 0.98586+-0.00445 Training: 2022-04-27 12:00:20,389-[cfp_fp][102000]Accuracy-Highest: 0.98614 Training: 2022-04-27 12:01:03,778-[agedb_30][102000]XNorm: 22.357217 Training: 2022-04-27 12:01:03,779-[agedb_30][102000]Accuracy-Flip: 0.98183+-0.00724 Training: 2022-04-27 12:01:03,779-[agedb_30][102000]Accuracy-Highest: 0.98233 Training: 2022-04-27 12:01:06,792-Speed 72.99 samples/sec Loss 0.7178 LearningRate 0.0011 Epoch: 17 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:09,795-Speed 3411.82 samples/sec Loss 0.7886 LearningRate 0.0011 Epoch: 17 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:12,802-Speed 3406.32 samples/sec Loss 0.8263 LearningRate 0.0011 Epoch: 17 Global Step: 102030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:15,807-Speed 3407.62 samples/sec Loss 0.8298 LearningRate 0.0011 Epoch: 17 Global Step: 102040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:18,814-Speed 3406.67 samples/sec Loss 0.7546 LearningRate 0.0011 Epoch: 17 Global Step: 102050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:21,830-Speed 3395.65 samples/sec Loss 0.8317 LearningRate 0.0011 Epoch: 17 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:24,843-Speed 3399.49 samples/sec Loss 0.7555 LearningRate 0.0010 Epoch: 17 Global Step: 102070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:01:27,868-Speed 3385.39 samples/sec Loss 0.8746 LearningRate 0.0010 Epoch: 17 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:30,887-Speed 3392.87 samples/sec Loss 0.9242 LearningRate 0.0010 Epoch: 17 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:33,906-Speed 3392.42 samples/sec Loss 0.8185 LearningRate 0.0010 Epoch: 17 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:36,918-Speed 3401.22 samples/sec Loss 0.7864 LearningRate 0.0010 Epoch: 17 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:39,929-Speed 3401.69 samples/sec Loss 0.7417 LearningRate 0.0010 Epoch: 17 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:42,946-Speed 3394.47 samples/sec Loss 0.8191 LearningRate 0.0010 Epoch: 17 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:45,961-Speed 3398.01 samples/sec Loss 0.9015 LearningRate 0.0010 Epoch: 17 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:48,983-Speed 3388.81 samples/sec Loss 0.8249 LearningRate 0.0010 Epoch: 17 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:51,997-Speed 3398.71 samples/sec Loss 0.7549 LearningRate 0.0010 Epoch: 17 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:01:54,996-Speed 3414.79 samples/sec Loss 0.7541 LearningRate 0.0010 Epoch: 17 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:01:58,009-Speed 3399.08 samples/sec Loss 0.7012 LearningRate 0.0010 Epoch: 17 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:01,037-Speed 3382.94 samples/sec Loss 0.8964 LearningRate 0.0010 Epoch: 17 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:04,057-Speed 3391.61 samples/sec Loss 0.7932 LearningRate 0.0010 Epoch: 17 Global Step: 102200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:07,077-Speed 3391.55 samples/sec Loss 0.7876 LearningRate 0.0010 Epoch: 17 Global Step: 102210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:10,103-Speed 3384.81 samples/sec Loss 0.8208 LearningRate 0.0010 Epoch: 17 Global Step: 102220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:13,134-Speed 3378.90 samples/sec Loss 0.7571 LearningRate 0.0010 Epoch: 17 Global Step: 102230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:16,183-Speed 3359.74 samples/sec Loss 0.8648 LearningRate 0.0010 Epoch: 17 Global Step: 102240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:19,196-Speed 3398.84 samples/sec Loss 0.8535 LearningRate 0.0010 Epoch: 17 Global Step: 102250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:22,214-Speed 3394.48 samples/sec Loss 0.8247 LearningRate 0.0010 Epoch: 17 Global Step: 102260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:02:25,237-Speed 3388.11 samples/sec Loss 0.8420 LearningRate 0.0010 Epoch: 17 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:28,252-Speed 3396.95 samples/sec Loss 0.8066 LearningRate 0.0010 Epoch: 17 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:31,275-Speed 3388.69 samples/sec Loss 0.8079 LearningRate 0.0010 Epoch: 17 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:34,296-Speed 3390.15 samples/sec Loss 0.8466 LearningRate 0.0010 Epoch: 17 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:37,319-Speed 3387.70 samples/sec Loss 0.7315 LearningRate 0.0010 Epoch: 17 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:40,380-Speed 3346.51 samples/sec Loss 0.7711 LearningRate 0.0010 Epoch: 17 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:43,395-Speed 3397.25 samples/sec Loss 0.8595 LearningRate 0.0010 Epoch: 17 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:46,493-Speed 3305.33 samples/sec Loss 0.8394 LearningRate 0.0010 Epoch: 17 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:02:59,627-Speed 779.73 samples/sec Loss 0.7455 LearningRate 0.0010 Epoch: 18 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:02,733-Speed 3298.62 samples/sec Loss 0.6029 LearningRate 0.0010 Epoch: 18 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:05,789-Speed 3350.95 samples/sec Loss 0.6102 LearningRate 0.0010 Epoch: 18 Global Step: 102370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:03:08,812-Speed 3388.16 samples/sec Loss 0.5861 LearningRate 0.0010 Epoch: 18 Global Step: 102380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:03:11,814-Speed 3412.76 samples/sec Loss 0.6070 LearningRate 0.0010 Epoch: 18 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:14,834-Speed 3391.61 samples/sec Loss 0.6093 LearningRate 0.0010 Epoch: 18 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:17,855-Speed 3389.86 samples/sec Loss 0.5643 LearningRate 0.0010 Epoch: 18 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:20,871-Speed 3396.23 samples/sec Loss 0.5832 LearningRate 0.0010 Epoch: 18 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:23,888-Speed 3394.15 samples/sec Loss 0.5994 LearningRate 0.0010 Epoch: 18 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:26,917-Speed 3382.08 samples/sec Loss 0.5514 LearningRate 0.0010 Epoch: 18 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:29,940-Speed 3388.46 samples/sec Loss 0.5307 LearningRate 0.0010 Epoch: 18 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:32,952-Speed 3399.85 samples/sec Loss 0.6105 LearningRate 0.0010 Epoch: 18 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:03:35,953-Speed 3413.11 samples/sec Loss 0.5696 LearningRate 0.0010 Epoch: 18 Global Step: 102470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:39,005-Speed 3355.60 samples/sec Loss 0.6684 LearningRate 0.0010 Epoch: 18 Global Step: 102480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:42,024-Speed 3393.40 samples/sec Loss 0.5501 LearningRate 0.0010 Epoch: 18 Global Step: 102490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:45,075-Speed 3357.24 samples/sec Loss 0.5844 LearningRate 0.0010 Epoch: 18 Global Step: 102500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:48,092-Speed 3394.42 samples/sec Loss 0.5839 LearningRate 0.0010 Epoch: 18 Global Step: 102510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:51,112-Speed 3391.43 samples/sec Loss 0.6407 LearningRate 0.0010 Epoch: 18 Global Step: 102520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:54,132-Speed 3391.71 samples/sec Loss 0.6625 LearningRate 0.0010 Epoch: 18 Global Step: 102530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:03:57,163-Speed 3379.21 samples/sec Loss 0.5468 LearningRate 0.0010 Epoch: 18 Global Step: 102540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:04:00,190-Speed 3383.93 samples/sec Loss 0.5838 LearningRate 0.0010 Epoch: 18 Global Step: 102550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:04:03,257-Speed 3339.17 samples/sec Loss 0.6883 LearningRate 0.0010 Epoch: 18 Global Step: 102560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:04:06,276-Speed 3393.43 samples/sec Loss 0.5760 LearningRate 0.0010 Epoch: 18 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:09,335-Speed 3348.12 samples/sec Loss 0.6140 LearningRate 0.0010 Epoch: 18 Global Step: 102580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:12,357-Speed 3389.07 samples/sec Loss 0.6207 LearningRate 0.0010 Epoch: 18 Global Step: 102590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:15,398-Speed 3367.55 samples/sec Loss 0.5683 LearningRate 0.0010 Epoch: 18 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:18,423-Speed 3386.55 samples/sec Loss 0.5398 LearningRate 0.0010 Epoch: 18 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:21,456-Speed 3376.87 samples/sec Loss 0.5661 LearningRate 0.0010 Epoch: 18 Global Step: 102620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:24,483-Speed 3383.81 samples/sec Loss 0.5837 LearningRate 0.0010 Epoch: 18 Global Step: 102630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:27,524-Speed 3368.18 samples/sec Loss 0.5581 LearningRate 0.0009 Epoch: 18 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:30,549-Speed 3386.04 samples/sec Loss 0.5799 LearningRate 0.0009 Epoch: 18 Global Step: 102650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:33,575-Speed 3385.25 samples/sec Loss 0.5869 LearningRate 0.0009 Epoch: 18 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:36,602-Speed 3383.31 samples/sec Loss 0.5870 LearningRate 0.0009 Epoch: 18 Global Step: 102670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:04:39,613-Speed 3401.61 samples/sec Loss 0.6253 LearningRate 0.0009 Epoch: 18 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:42,654-Speed 3368.60 samples/sec Loss 0.5276 LearningRate 0.0009 Epoch: 18 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:45,716-Speed 3344.66 samples/sec Loss 0.5981 LearningRate 0.0009 Epoch: 18 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:48,771-Speed 3352.90 samples/sec Loss 0.5983 LearningRate 0.0009 Epoch: 18 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:51,810-Speed 3370.17 samples/sec Loss 0.5757 LearningRate 0.0009 Epoch: 18 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:54,841-Speed 3379.48 samples/sec Loss 0.6346 LearningRate 0.0009 Epoch: 18 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:04:57,865-Speed 3387.21 samples/sec Loss 0.6059 LearningRate 0.0009 Epoch: 18 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:00,907-Speed 3367.89 samples/sec Loss 0.5598 LearningRate 0.0009 Epoch: 18 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:03,932-Speed 3385.51 samples/sec Loss 0.6182 LearningRate 0.0009 Epoch: 18 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:06,978-Speed 3362.79 samples/sec Loss 0.5649 LearningRate 0.0009 Epoch: 18 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:09,982-Speed 3409.04 samples/sec Loss 0.6232 LearningRate 0.0009 Epoch: 18 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:13,147-Speed 3236.14 samples/sec Loss 0.5253 LearningRate 0.0009 Epoch: 18 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:16,191-Speed 3365.36 samples/sec Loss 0.6410 LearningRate 0.0009 Epoch: 18 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:19,210-Speed 3392.58 samples/sec Loss 0.6398 LearningRate 0.0009 Epoch: 18 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:22,223-Speed 3399.72 samples/sec Loss 0.6238 LearningRate 0.0009 Epoch: 18 Global Step: 102820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:25,260-Speed 3372.76 samples/sec Loss 0.6539 LearningRate 0.0009 Epoch: 18 Global Step: 102830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:28,335-Speed 3330.51 samples/sec Loss 0.6683 LearningRate 0.0009 Epoch: 18 Global Step: 102840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:31,363-Speed 3381.89 samples/sec Loss 0.5666 LearningRate 0.0009 Epoch: 18 Global Step: 102850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:34,468-Speed 3299.50 samples/sec Loss 0.5715 LearningRate 0.0009 Epoch: 18 Global Step: 102860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:37,502-Speed 3375.19 samples/sec Loss 0.5971 LearningRate 0.0009 Epoch: 18 Global Step: 102870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:40,535-Speed 3377.47 samples/sec Loss 0.6394 LearningRate 0.0009 Epoch: 18 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:43,557-Speed 3388.97 samples/sec Loss 0.6026 LearningRate 0.0009 Epoch: 18 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:46,602-Speed 3364.54 samples/sec Loss 0.6492 LearningRate 0.0009 Epoch: 18 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:49,627-Speed 3385.68 samples/sec Loss 0.5496 LearningRate 0.0009 Epoch: 18 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:05:52,655-Speed 3381.89 samples/sec Loss 0.5817 LearningRate 0.0009 Epoch: 18 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:55,685-Speed 3380.05 samples/sec Loss 0.5683 LearningRate 0.0009 Epoch: 18 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:05:58,713-Speed 3383.09 samples/sec Loss 0.5470 LearningRate 0.0009 Epoch: 18 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:01,740-Speed 3383.48 samples/sec Loss 0.5641 LearningRate 0.0009 Epoch: 18 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:04,768-Speed 3382.94 samples/sec Loss 0.5889 LearningRate 0.0009 Epoch: 18 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:07,799-Speed 3378.53 samples/sec Loss 0.5531 LearningRate 0.0009 Epoch: 18 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:10,824-Speed 3386.10 samples/sec Loss 0.6519 LearningRate 0.0009 Epoch: 18 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:13,846-Speed 3389.11 samples/sec Loss 0.6672 LearningRate 0.0009 Epoch: 18 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:16,870-Speed 3387.71 samples/sec Loss 0.5950 LearningRate 0.0009 Epoch: 18 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:19,889-Speed 3392.56 samples/sec Loss 0.5271 LearningRate 0.0009 Epoch: 18 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:22,918-Speed 3381.77 samples/sec Loss 0.6224 LearningRate 0.0009 Epoch: 18 Global Step: 103020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:06:25,925-Speed 3405.28 samples/sec Loss 0.6218 LearningRate 0.0009 Epoch: 18 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:28,945-Speed 3391.53 samples/sec Loss 0.5755 LearningRate 0.0009 Epoch: 18 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:31,967-Speed 3389.77 samples/sec Loss 0.5425 LearningRate 0.0009 Epoch: 18 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:34,999-Speed 3377.90 samples/sec Loss 0.5940 LearningRate 0.0009 Epoch: 18 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:38,028-Speed 3381.44 samples/sec Loss 0.5869 LearningRate 0.0009 Epoch: 18 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:41,063-Speed 3375.34 samples/sec Loss 0.5423 LearningRate 0.0009 Epoch: 18 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:44,091-Speed 3381.68 samples/sec Loss 0.5114 LearningRate 0.0009 Epoch: 18 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:47,116-Speed 3385.94 samples/sec Loss 0.6864 LearningRate 0.0009 Epoch: 18 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:50,200-Speed 3321.92 samples/sec Loss 0.6472 LearningRate 0.0009 Epoch: 18 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:53,225-Speed 3386.03 samples/sec Loss 0.5857 LearningRate 0.0009 Epoch: 18 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:56,228-Speed 3410.31 samples/sec Loss 0.6540 LearningRate 0.0009 Epoch: 18 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:06:59,258-Speed 3379.68 samples/sec Loss 0.6385 LearningRate 0.0009 Epoch: 18 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:02,287-Speed 3381.79 samples/sec Loss 0.5794 LearningRate 0.0009 Epoch: 18 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:05,336-Speed 3359.72 samples/sec Loss 0.6476 LearningRate 0.0009 Epoch: 18 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:08,364-Speed 3383.07 samples/sec Loss 0.5998 LearningRate 0.0009 Epoch: 18 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:11,367-Speed 3409.62 samples/sec Loss 0.6345 LearningRate 0.0009 Epoch: 18 Global Step: 103180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:14,391-Speed 3387.76 samples/sec Loss 0.5966 LearningRate 0.0009 Epoch: 18 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:17,409-Speed 3393.05 samples/sec Loss 0.5682 LearningRate 0.0009 Epoch: 18 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:20,433-Speed 3387.21 samples/sec Loss 0.5549 LearningRate 0.0009 Epoch: 18 Global Step: 103210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:23,822-Speed 3022.07 samples/sec Loss 0.5950 LearningRate 0.0009 Epoch: 18 Global Step: 103220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:26,854-Speed 3378.07 samples/sec Loss 0.5975 LearningRate 0.0009 Epoch: 18 Global Step: 103230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:29,882-Speed 3383.08 samples/sec Loss 0.5700 LearningRate 0.0008 Epoch: 18 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:32,918-Speed 3373.81 samples/sec Loss 0.5957 LearningRate 0.0008 Epoch: 18 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:35,943-Speed 3385.59 samples/sec Loss 0.5508 LearningRate 0.0008 Epoch: 18 Global Step: 103260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:38,969-Speed 3384.76 samples/sec Loss 0.5868 LearningRate 0.0008 Epoch: 18 Global Step: 103270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:07:42,074-Speed 3298.81 samples/sec Loss 0.6115 LearningRate 0.0008 Epoch: 18 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:45,112-Speed 3371.69 samples/sec Loss 0.6520 LearningRate 0.0008 Epoch: 18 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:48,152-Speed 3368.33 samples/sec Loss 0.6168 LearningRate 0.0008 Epoch: 18 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:51,254-Speed 3302.48 samples/sec Loss 0.5960 LearningRate 0.0008 Epoch: 18 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:54,308-Speed 3353.80 samples/sec Loss 0.5897 LearningRate 0.0008 Epoch: 18 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:07:57,337-Speed 3381.70 samples/sec Loss 0.5849 LearningRate 0.0008 Epoch: 18 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:00,385-Speed 3359.88 samples/sec Loss 0.6350 LearningRate 0.0008 Epoch: 18 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:03,473-Speed 3316.76 samples/sec Loss 0.5957 LearningRate 0.0008 Epoch: 18 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:06,492-Speed 3393.31 samples/sec Loss 0.5882 LearningRate 0.0008 Epoch: 18 Global Step: 103360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:09,518-Speed 3384.47 samples/sec Loss 0.5781 LearningRate 0.0008 Epoch: 18 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:12,555-Speed 3372.13 samples/sec Loss 0.6161 LearningRate 0.0008 Epoch: 18 Global Step: 103380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:08:15,576-Speed 3390.76 samples/sec Loss 0.6599 LearningRate 0.0008 Epoch: 18 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:18,605-Speed 3381.79 samples/sec Loss 0.6011 LearningRate 0.0008 Epoch: 18 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:21,639-Speed 3375.55 samples/sec Loss 0.5866 LearningRate 0.0008 Epoch: 18 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:24,666-Speed 3383.66 samples/sec Loss 0.5823 LearningRate 0.0008 Epoch: 18 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:27,688-Speed 3389.85 samples/sec Loss 0.5942 LearningRate 0.0008 Epoch: 18 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:30,717-Speed 3380.50 samples/sec Loss 0.5627 LearningRate 0.0008 Epoch: 18 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:33,735-Speed 3394.16 samples/sec Loss 0.6585 LearningRate 0.0008 Epoch: 18 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:36,764-Speed 3381.43 samples/sec Loss 0.6116 LearningRate 0.0008 Epoch: 18 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:39,795-Speed 3379.13 samples/sec Loss 0.5854 LearningRate 0.0008 Epoch: 18 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:42,826-Speed 3379.67 samples/sec Loss 0.5463 LearningRate 0.0008 Epoch: 18 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:45,863-Speed 3372.18 samples/sec Loss 0.5966 LearningRate 0.0008 Epoch: 18 Global Step: 103490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:08:48,871-Speed 3404.95 samples/sec Loss 0.5805 LearningRate 0.0008 Epoch: 18 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:51,894-Speed 3388.81 samples/sec Loss 0.5996 LearningRate 0.0008 Epoch: 18 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:54,914-Speed 3391.69 samples/sec Loss 0.6394 LearningRate 0.0008 Epoch: 18 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:08:57,931-Speed 3394.39 samples/sec Loss 0.5758 LearningRate 0.0008 Epoch: 18 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:09:00,953-Speed 3389.05 samples/sec Loss 0.6161 LearningRate 0.0008 Epoch: 18 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:09:03,954-Speed 3412.99 samples/sec Loss 0.5935 LearningRate 0.0008 Epoch: 18 Global Step: 103550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:06,973-Speed 3392.70 samples/sec Loss 0.5783 LearningRate 0.0008 Epoch: 18 Global Step: 103560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:09,996-Speed 3387.89 samples/sec Loss 0.6053 LearningRate 0.0008 Epoch: 18 Global Step: 103570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:13,020-Speed 3387.39 samples/sec Loss 0.5408 LearningRate 0.0008 Epoch: 18 Global Step: 103580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:16,048-Speed 3382.88 samples/sec Loss 0.6119 LearningRate 0.0008 Epoch: 18 Global Step: 103590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:19,077-Speed 3381.17 samples/sec Loss 0.5935 LearningRate 0.0008 Epoch: 18 Global Step: 103600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:22,113-Speed 3373.69 samples/sec Loss 0.5507 LearningRate 0.0008 Epoch: 18 Global Step: 103610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:25,146-Speed 3377.60 samples/sec Loss 0.5986 LearningRate 0.0008 Epoch: 18 Global Step: 103620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:28,210-Speed 3342.06 samples/sec Loss 0.5157 LearningRate 0.0008 Epoch: 18 Global Step: 103630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:31,239-Speed 3381.23 samples/sec Loss 0.6011 LearningRate 0.0008 Epoch: 18 Global Step: 103640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:34,258-Speed 3393.14 samples/sec Loss 0.5937 LearningRate 0.0008 Epoch: 18 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:09:37,259-Speed 3412.95 samples/sec Loss 0.6388 LearningRate 0.0008 Epoch: 18 Global Step: 103660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:40,280-Speed 3390.44 samples/sec Loss 0.6274 LearningRate 0.0008 Epoch: 18 Global Step: 103670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:43,306-Speed 3384.40 samples/sec Loss 0.5540 LearningRate 0.0008 Epoch: 18 Global Step: 103680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:46,325-Speed 3392.59 samples/sec Loss 0.6037 LearningRate 0.0008 Epoch: 18 Global Step: 103690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:49,348-Speed 3388.92 samples/sec Loss 0.6142 LearningRate 0.0008 Epoch: 18 Global Step: 103700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:52,373-Speed 3385.85 samples/sec Loss 0.5735 LearningRate 0.0008 Epoch: 18 Global Step: 103710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:55,402-Speed 3381.56 samples/sec Loss 0.5742 LearningRate 0.0008 Epoch: 18 Global Step: 103720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:09:58,422-Speed 3391.01 samples/sec Loss 0.6581 LearningRate 0.0008 Epoch: 18 Global Step: 103730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:10:01,442-Speed 3391.60 samples/sec Loss 0.6585 LearningRate 0.0008 Epoch: 18 Global Step: 103740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:10:04,466-Speed 3387.65 samples/sec Loss 0.6121 LearningRate 0.0008 Epoch: 18 Global Step: 103750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:10:07,485-Speed 3392.12 samples/sec Loss 0.5988 LearningRate 0.0008 Epoch: 18 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:10,510-Speed 3385.45 samples/sec Loss 0.6008 LearningRate 0.0008 Epoch: 18 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:13,532-Speed 3389.06 samples/sec Loss 0.6081 LearningRate 0.0008 Epoch: 18 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:16,559-Speed 3384.06 samples/sec Loss 0.6549 LearningRate 0.0008 Epoch: 18 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:19,579-Speed 3392.25 samples/sec Loss 0.5812 LearningRate 0.0008 Epoch: 18 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:22,606-Speed 3383.91 samples/sec Loss 0.5618 LearningRate 0.0008 Epoch: 18 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:25,634-Speed 3381.94 samples/sec Loss 0.6175 LearningRate 0.0008 Epoch: 18 Global Step: 103820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:28,654-Speed 3391.71 samples/sec Loss 0.6474 LearningRate 0.0008 Epoch: 18 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:31,673-Speed 3392.38 samples/sec Loss 0.5732 LearningRate 0.0008 Epoch: 18 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:34,692-Speed 3392.77 samples/sec Loss 0.6113 LearningRate 0.0008 Epoch: 18 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:37,715-Speed 3388.26 samples/sec Loss 0.6235 LearningRate 0.0008 Epoch: 18 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:40,737-Speed 3388.46 samples/sec Loss 0.5494 LearningRate 0.0008 Epoch: 18 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:43,769-Speed 3378.33 samples/sec Loss 0.5973 LearningRate 0.0007 Epoch: 18 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:46,786-Speed 3396.01 samples/sec Loss 0.6015 LearningRate 0.0007 Epoch: 18 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:49,814-Speed 3381.72 samples/sec Loss 0.6953 LearningRate 0.0007 Epoch: 18 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:52,830-Speed 3396.33 samples/sec Loss 0.5865 LearningRate 0.0007 Epoch: 18 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:55,851-Speed 3390.95 samples/sec Loss 0.5423 LearningRate 0.0007 Epoch: 18 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:10:58,882-Speed 3379.02 samples/sec Loss 0.6611 LearningRate 0.0007 Epoch: 18 Global Step: 103930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:01,912-Speed 3380.11 samples/sec Loss 0.5234 LearningRate 0.0007 Epoch: 18 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:04,936-Speed 3386.35 samples/sec Loss 0.6275 LearningRate 0.0007 Epoch: 18 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:07,969-Speed 3376.84 samples/sec Loss 0.5837 LearningRate 0.0007 Epoch: 18 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:11:10,981-Speed 3401.26 samples/sec Loss 0.5897 LearningRate 0.0007 Epoch: 18 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:13,998-Speed 3394.76 samples/sec Loss 0.6858 LearningRate 0.0007 Epoch: 18 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:17,024-Speed 3385.34 samples/sec Loss 0.6322 LearningRate 0.0007 Epoch: 18 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:11:20,042-Speed 3393.48 samples/sec Loss 0.6417 LearningRate 0.0007 Epoch: 18 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:12:03,285-[lfw][104000]XNorm: 21.637584 Training: 2022-04-27 12:12:03,285-[lfw][104000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-27 12:12:03,286-[lfw][104000]Accuracy-Highest: 0.99817 Training: 2022-04-27 12:12:53,380-[cfp_fp][104000]XNorm: 21.790685 Training: 2022-04-27 12:12:53,381-[cfp_fp][104000]Accuracy-Flip: 0.98371+-0.00520 Training: 2022-04-27 12:12:53,381-[cfp_fp][104000]Accuracy-Highest: 0.98614 Training: 2022-04-27 12:13:36,544-[agedb_30][104000]XNorm: 22.020270 Training: 2022-04-27 12:13:36,545-[agedb_30][104000]Accuracy-Flip: 0.98183+-0.00851 Training: 2022-04-27 12:13:36,545-[agedb_30][104000]Accuracy-Highest: 0.98233 Training: 2022-04-27 12:13:39,562-Speed 73.39 samples/sec Loss 0.5285 LearningRate 0.0007 Epoch: 18 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:42,572-Speed 3403.34 samples/sec Loss 0.5770 LearningRate 0.0007 Epoch: 18 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:45,576-Speed 3409.15 samples/sec Loss 0.5167 LearningRate 0.0007 Epoch: 18 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:48,587-Speed 3401.72 samples/sec Loss 0.5947 LearningRate 0.0007 Epoch: 18 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:51,604-Speed 3394.33 samples/sec Loss 0.6518 LearningRate 0.0007 Epoch: 18 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:54,635-Speed 3380.06 samples/sec Loss 0.5671 LearningRate 0.0007 Epoch: 18 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:13:57,633-Speed 3415.78 samples/sec Loss 0.5888 LearningRate 0.0007 Epoch: 18 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:00,653-Speed 3391.75 samples/sec Loss 0.6008 LearningRate 0.0007 Epoch: 18 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:03,674-Speed 3390.69 samples/sec Loss 0.5500 LearningRate 0.0007 Epoch: 18 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:06,694-Speed 3391.03 samples/sec Loss 0.5561 LearningRate 0.0007 Epoch: 18 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:09,717-Speed 3388.28 samples/sec Loss 0.6480 LearningRate 0.0007 Epoch: 18 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:12,757-Speed 3369.80 samples/sec Loss 0.6303 LearningRate 0.0007 Epoch: 18 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:15,826-Speed 3336.39 samples/sec Loss 0.5931 LearningRate 0.0007 Epoch: 18 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:18,841-Speed 3397.21 samples/sec Loss 0.6069 LearningRate 0.0007 Epoch: 18 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:21,864-Speed 3388.98 samples/sec Loss 0.5490 LearningRate 0.0007 Epoch: 18 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:24,879-Speed 3396.53 samples/sec Loss 0.6045 LearningRate 0.0007 Epoch: 18 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:27,892-Speed 3399.48 samples/sec Loss 0.5702 LearningRate 0.0007 Epoch: 18 Global Step: 104170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:14:30,902-Speed 3402.33 samples/sec Loss 0.6519 LearningRate 0.0007 Epoch: 18 Global Step: 104180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:14:33,904-Speed 3412.58 samples/sec Loss 0.5965 LearningRate 0.0007 Epoch: 18 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:36,919-Speed 3396.99 samples/sec Loss 0.6643 LearningRate 0.0007 Epoch: 18 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:39,933-Speed 3398.76 samples/sec Loss 0.5940 LearningRate 0.0007 Epoch: 18 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:42,946-Speed 3398.68 samples/sec Loss 0.6140 LearningRate 0.0007 Epoch: 18 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:45,956-Speed 3402.58 samples/sec Loss 0.5288 LearningRate 0.0007 Epoch: 18 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:48,967-Speed 3402.20 samples/sec Loss 0.5850 LearningRate 0.0007 Epoch: 18 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:14:51,961-Speed 3420.80 samples/sec Loss 0.5909 LearningRate 0.0007 Epoch: 18 Global Step: 104250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:14:54,972-Speed 3401.79 samples/sec Loss 0.5831 LearningRate 0.0007 Epoch: 18 Global Step: 104260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:14:58,018-Speed 3361.86 samples/sec Loss 0.6446 LearningRate 0.0007 Epoch: 18 Global Step: 104270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:01,042-Speed 3388.19 samples/sec Loss 0.6298 LearningRate 0.0007 Epoch: 18 Global Step: 104280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:04,064-Speed 3389.10 samples/sec Loss 0.6546 LearningRate 0.0007 Epoch: 18 Global Step: 104290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:07,074-Speed 3402.14 samples/sec Loss 0.5733 LearningRate 0.0007 Epoch: 18 Global Step: 104300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:10,087-Speed 3400.18 samples/sec Loss 0.6481 LearningRate 0.0007 Epoch: 18 Global Step: 104310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:13,099-Speed 3400.16 samples/sec Loss 0.5793 LearningRate 0.0007 Epoch: 18 Global Step: 104320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:16,132-Speed 3376.36 samples/sec Loss 0.5466 LearningRate 0.0007 Epoch: 18 Global Step: 104330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:19,146-Speed 3398.70 samples/sec Loss 0.5688 LearningRate 0.0007 Epoch: 18 Global Step: 104340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:22,176-Speed 3380.57 samples/sec Loss 0.6144 LearningRate 0.0007 Epoch: 18 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:15:25,248-Speed 3333.52 samples/sec Loss 0.5972 LearningRate 0.0007 Epoch: 18 Global Step: 104360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:28,271-Speed 3388.23 samples/sec Loss 0.5835 LearningRate 0.0007 Epoch: 18 Global Step: 104370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:31,281-Speed 3403.97 samples/sec Loss 0.5512 LearningRate 0.0007 Epoch: 18 Global Step: 104380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:34,313-Speed 3377.61 samples/sec Loss 0.5659 LearningRate 0.0007 Epoch: 18 Global Step: 104390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:37,334-Speed 3390.29 samples/sec Loss 0.5927 LearningRate 0.0007 Epoch: 18 Global Step: 104400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:40,342-Speed 3404.24 samples/sec Loss 0.5569 LearningRate 0.0007 Epoch: 18 Global Step: 104410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:43,352-Speed 3403.12 samples/sec Loss 0.6885 LearningRate 0.0007 Epoch: 18 Global Step: 104420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:46,367-Speed 3397.34 samples/sec Loss 0.6014 LearningRate 0.0007 Epoch: 18 Global Step: 104430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:49,382-Speed 3397.16 samples/sec Loss 0.5873 LearningRate 0.0007 Epoch: 18 Global Step: 104440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:52,404-Speed 3389.49 samples/sec Loss 0.5947 LearningRate 0.0007 Epoch: 18 Global Step: 104450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:15:55,424-Speed 3391.80 samples/sec Loss 0.5872 LearningRate 0.0007 Epoch: 18 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:15:58,436-Speed 3400.78 samples/sec Loss 0.5917 LearningRate 0.0007 Epoch: 18 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:01,451-Speed 3396.34 samples/sec Loss 0.5986 LearningRate 0.0007 Epoch: 18 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:04,466-Speed 3397.19 samples/sec Loss 0.6169 LearningRate 0.0007 Epoch: 18 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:07,481-Speed 3397.01 samples/sec Loss 0.5708 LearningRate 0.0007 Epoch: 18 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:10,516-Speed 3374.61 samples/sec Loss 0.6147 LearningRate 0.0007 Epoch: 18 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:13,532-Speed 3397.02 samples/sec Loss 0.5725 LearningRate 0.0007 Epoch: 18 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:16,551-Speed 3392.11 samples/sec Loss 0.5628 LearningRate 0.0007 Epoch: 18 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:19,547-Speed 3419.02 samples/sec Loss 0.6504 LearningRate 0.0007 Epoch: 18 Global Step: 104540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:22,568-Speed 3390.10 samples/sec Loss 0.6117 LearningRate 0.0007 Epoch: 18 Global Step: 104550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:25,583-Speed 3397.12 samples/sec Loss 0.6500 LearningRate 0.0006 Epoch: 18 Global Step: 104560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:28,592-Speed 3404.21 samples/sec Loss 0.6206 LearningRate 0.0006 Epoch: 18 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:31,606-Speed 3398.07 samples/sec Loss 0.5598 LearningRate 0.0006 Epoch: 18 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:34,627-Speed 3390.99 samples/sec Loss 0.5680 LearningRate 0.0006 Epoch: 18 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:37,744-Speed 3285.25 samples/sec Loss 0.5806 LearningRate 0.0006 Epoch: 18 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:40,860-Speed 3286.76 samples/sec Loss 0.6875 LearningRate 0.0006 Epoch: 18 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:43,872-Speed 3400.46 samples/sec Loss 0.5978 LearningRate 0.0006 Epoch: 18 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:46,884-Speed 3401.33 samples/sec Loss 0.5403 LearningRate 0.0006 Epoch: 18 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:16:49,900-Speed 3395.68 samples/sec Loss 0.5939 LearningRate 0.0006 Epoch: 18 Global Step: 104640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:52,912-Speed 3400.92 samples/sec Loss 0.5681 LearningRate 0.0006 Epoch: 18 Global Step: 104650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:55,938-Speed 3384.11 samples/sec Loss 0.6553 LearningRate 0.0006 Epoch: 18 Global Step: 104660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:16:58,964-Speed 3385.44 samples/sec Loss 0.6221 LearningRate 0.0006 Epoch: 18 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:01,986-Speed 3389.77 samples/sec Loss 0.6069 LearningRate 0.0006 Epoch: 18 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:05,005-Speed 3392.41 samples/sec Loss 0.6107 LearningRate 0.0006 Epoch: 18 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:08,016-Speed 3400.90 samples/sec Loss 0.6002 LearningRate 0.0006 Epoch: 18 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:11,035-Speed 3393.19 samples/sec Loss 0.6479 LearningRate 0.0006 Epoch: 18 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:14,047-Speed 3400.16 samples/sec Loss 0.5805 LearningRate 0.0006 Epoch: 18 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:17,058-Speed 3401.37 samples/sec Loss 0.6626 LearningRate 0.0006 Epoch: 18 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:20,052-Speed 3422.31 samples/sec Loss 0.5504 LearningRate 0.0006 Epoch: 18 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:23,050-Speed 3416.79 samples/sec Loss 0.6180 LearningRate 0.0006 Epoch: 18 Global Step: 104750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:26,083-Speed 3377.07 samples/sec Loss 0.5961 LearningRate 0.0006 Epoch: 18 Global Step: 104760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:29,115-Speed 3377.55 samples/sec Loss 0.6685 LearningRate 0.0006 Epoch: 18 Global Step: 104770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:32,129-Speed 3398.93 samples/sec Loss 0.5190 LearningRate 0.0006 Epoch: 18 Global Step: 104780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:35,147-Speed 3393.44 samples/sec Loss 0.5802 LearningRate 0.0006 Epoch: 18 Global Step: 104790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:38,157-Speed 3402.63 samples/sec Loss 0.5733 LearningRate 0.0006 Epoch: 18 Global Step: 104800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:41,172-Speed 3396.58 samples/sec Loss 0.6223 LearningRate 0.0006 Epoch: 18 Global Step: 104810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:44,189-Speed 3396.09 samples/sec Loss 0.5617 LearningRate 0.0006 Epoch: 18 Global Step: 104820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:47,234-Speed 3363.45 samples/sec Loss 0.5581 LearningRate 0.0006 Epoch: 18 Global Step: 104830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:50,329-Speed 3309.33 samples/sec Loss 0.6080 LearningRate 0.0006 Epoch: 18 Global Step: 104840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:17:53,365-Speed 3373.55 samples/sec Loss 0.6323 LearningRate 0.0006 Epoch: 18 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:56,377-Speed 3400.70 samples/sec Loss 0.6567 LearningRate 0.0006 Epoch: 18 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:17:59,388-Speed 3401.15 samples/sec Loss 0.6073 LearningRate 0.0006 Epoch: 18 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:02,471-Speed 3322.23 samples/sec Loss 0.6809 LearningRate 0.0006 Epoch: 18 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:05,485-Speed 3398.10 samples/sec Loss 0.6175 LearningRate 0.0006 Epoch: 18 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:08,505-Speed 3391.89 samples/sec Loss 0.6053 LearningRate 0.0006 Epoch: 18 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:11,541-Speed 3374.06 samples/sec Loss 0.5269 LearningRate 0.0006 Epoch: 18 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:14,569-Speed 3382.84 samples/sec Loss 0.5614 LearningRate 0.0006 Epoch: 18 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:17,590-Speed 3389.79 samples/sec Loss 0.5874 LearningRate 0.0006 Epoch: 18 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:20,617-Speed 3383.36 samples/sec Loss 0.5827 LearningRate 0.0006 Epoch: 18 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:23,629-Speed 3400.53 samples/sec Loss 0.5887 LearningRate 0.0006 Epoch: 18 Global Step: 104950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:18:26,628-Speed 3415.47 samples/sec Loss 0.5761 LearningRate 0.0006 Epoch: 18 Global Step: 104960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:29,643-Speed 3397.50 samples/sec Loss 0.5480 LearningRate 0.0006 Epoch: 18 Global Step: 104970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:32,659-Speed 3395.96 samples/sec Loss 0.5587 LearningRate 0.0006 Epoch: 18 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:35,670-Speed 3401.08 samples/sec Loss 0.5609 LearningRate 0.0006 Epoch: 18 Global Step: 104990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:38,687-Speed 3394.91 samples/sec Loss 0.5838 LearningRate 0.0006 Epoch: 18 Global Step: 105000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:41,705-Speed 3394.39 samples/sec Loss 0.6025 LearningRate 0.0006 Epoch: 18 Global Step: 105010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:44,721-Speed 3396.08 samples/sec Loss 0.6471 LearningRate 0.0006 Epoch: 18 Global Step: 105020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:47,757-Speed 3373.39 samples/sec Loss 0.6121 LearningRate 0.0006 Epoch: 18 Global Step: 105030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:50,774-Speed 3394.85 samples/sec Loss 0.6398 LearningRate 0.0006 Epoch: 18 Global Step: 105040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:53,789-Speed 3396.66 samples/sec Loss 0.5585 LearningRate 0.0006 Epoch: 18 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:56,787-Speed 3417.29 samples/sec Loss 0.5861 LearningRate 0.0006 Epoch: 18 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:18:59,818-Speed 3378.74 samples/sec Loss 0.5848 LearningRate 0.0006 Epoch: 18 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:02,854-Speed 3373.82 samples/sec Loss 0.6027 LearningRate 0.0006 Epoch: 18 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:05,871-Speed 3394.97 samples/sec Loss 0.5582 LearningRate 0.0006 Epoch: 18 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:08,895-Speed 3386.74 samples/sec Loss 0.6629 LearningRate 0.0006 Epoch: 18 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:11,911-Speed 3396.86 samples/sec Loss 0.5780 LearningRate 0.0006 Epoch: 18 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:14,942-Speed 3379.44 samples/sec Loss 0.6139 LearningRate 0.0006 Epoch: 18 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:17,958-Speed 3395.00 samples/sec Loss 0.5909 LearningRate 0.0006 Epoch: 18 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:20,965-Speed 3407.03 samples/sec Loss 0.6164 LearningRate 0.0006 Epoch: 18 Global Step: 105140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:23,990-Speed 3385.76 samples/sec Loss 0.6747 LearningRate 0.0006 Epoch: 18 Global Step: 105150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:27,011-Speed 3389.91 samples/sec Loss 0.6098 LearningRate 0.0006 Epoch: 18 Global Step: 105160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:30,027-Speed 3396.63 samples/sec Loss 0.5900 LearningRate 0.0006 Epoch: 18 Global Step: 105170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:33,047-Speed 3391.70 samples/sec Loss 0.5977 LearningRate 0.0006 Epoch: 18 Global Step: 105180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:36,075-Speed 3382.21 samples/sec Loss 0.5474 LearningRate 0.0006 Epoch: 18 Global Step: 105190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:39,090-Speed 3397.55 samples/sec Loss 0.5994 LearningRate 0.0006 Epoch: 18 Global Step: 105200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:42,105-Speed 3396.63 samples/sec Loss 0.6339 LearningRate 0.0006 Epoch: 18 Global Step: 105210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:45,119-Speed 3398.58 samples/sec Loss 0.6657 LearningRate 0.0006 Epoch: 18 Global Step: 105220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:48,149-Speed 3380.09 samples/sec Loss 0.5962 LearningRate 0.0006 Epoch: 18 Global Step: 105230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:19:51,180-Speed 3379.11 samples/sec Loss 0.5259 LearningRate 0.0006 Epoch: 18 Global Step: 105240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:54,210-Speed 3380.39 samples/sec Loss 0.5721 LearningRate 0.0006 Epoch: 18 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:19:57,231-Speed 3389.67 samples/sec Loss 0.6507 LearningRate 0.0006 Epoch: 18 Global Step: 105260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:00,254-Speed 3389.18 samples/sec Loss 0.6270 LearningRate 0.0006 Epoch: 18 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:03,301-Speed 3361.13 samples/sec Loss 0.6167 LearningRate 0.0006 Epoch: 18 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:06,328-Speed 3383.32 samples/sec Loss 0.6212 LearningRate 0.0005 Epoch: 18 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:09,364-Speed 3373.95 samples/sec Loss 0.5445 LearningRate 0.0005 Epoch: 18 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:12,382-Speed 3393.57 samples/sec Loss 0.5250 LearningRate 0.0005 Epoch: 18 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:15,488-Speed 3297.12 samples/sec Loss 0.5951 LearningRate 0.0005 Epoch: 18 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:18,579-Speed 3314.02 samples/sec Loss 0.6182 LearningRate 0.0005 Epoch: 18 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:21,578-Speed 3414.56 samples/sec Loss 0.5671 LearningRate 0.0005 Epoch: 18 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:24,607-Speed 3382.12 samples/sec Loss 0.5920 LearningRate 0.0005 Epoch: 18 Global Step: 105350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:27,636-Speed 3382.04 samples/sec Loss 0.5918 LearningRate 0.0005 Epoch: 18 Global Step: 105360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:30,656-Speed 3391.60 samples/sec Loss 0.6157 LearningRate 0.0005 Epoch: 18 Global Step: 105370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:33,676-Speed 3391.41 samples/sec Loss 0.5373 LearningRate 0.0005 Epoch: 18 Global Step: 105380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:36,697-Speed 3390.29 samples/sec Loss 0.4980 LearningRate 0.0005 Epoch: 18 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:39,714-Speed 3394.04 samples/sec Loss 0.6164 LearningRate 0.0005 Epoch: 18 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:42,730-Speed 3396.62 samples/sec Loss 0.5121 LearningRate 0.0005 Epoch: 18 Global Step: 105410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:45,749-Speed 3392.18 samples/sec Loss 0.5556 LearningRate 0.0005 Epoch: 18 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:48,771-Speed 3389.14 samples/sec Loss 0.5991 LearningRate 0.0005 Epoch: 18 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:51,784-Speed 3399.18 samples/sec Loss 0.5819 LearningRate 0.0005 Epoch: 18 Global Step: 105440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:20:54,784-Speed 3414.61 samples/sec Loss 0.5830 LearningRate 0.0005 Epoch: 18 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:20:57,803-Speed 3393.06 samples/sec Loss 0.6139 LearningRate 0.0005 Epoch: 18 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:00,832-Speed 3381.40 samples/sec Loss 0.6402 LearningRate 0.0005 Epoch: 18 Global Step: 105470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:03,851-Speed 3393.29 samples/sec Loss 0.6183 LearningRate 0.0005 Epoch: 18 Global Step: 105480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:06,870-Speed 3391.88 samples/sec Loss 0.6575 LearningRate 0.0005 Epoch: 18 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:09,892-Speed 3388.94 samples/sec Loss 0.4956 LearningRate 0.0005 Epoch: 18 Global Step: 105500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:12,908-Speed 3396.41 samples/sec Loss 0.6224 LearningRate 0.0005 Epoch: 18 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:15,907-Speed 3415.66 samples/sec Loss 0.5893 LearningRate 0.0005 Epoch: 18 Global Step: 105520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:18,930-Speed 3387.29 samples/sec Loss 0.6551 LearningRate 0.0005 Epoch: 18 Global Step: 105530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:21,945-Speed 3397.52 samples/sec Loss 0.6160 LearningRate 0.0005 Epoch: 18 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:24,970-Speed 3386.15 samples/sec Loss 0.6223 LearningRate 0.0005 Epoch: 18 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:27,989-Speed 3392.77 samples/sec Loss 0.6046 LearningRate 0.0005 Epoch: 18 Global Step: 105560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:31,004-Speed 3396.88 samples/sec Loss 0.5704 LearningRate 0.0005 Epoch: 18 Global Step: 105570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:34,025-Speed 3390.71 samples/sec Loss 0.5035 LearningRate 0.0005 Epoch: 18 Global Step: 105580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:37,054-Speed 3381.56 samples/sec Loss 0.5809 LearningRate 0.0005 Epoch: 18 Global Step: 105590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:40,070-Speed 3395.85 samples/sec Loss 0.6013 LearningRate 0.0005 Epoch: 18 Global Step: 105600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:43,085-Speed 3397.44 samples/sec Loss 0.6247 LearningRate 0.0005 Epoch: 18 Global Step: 105610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:21:46,102-Speed 3394.99 samples/sec Loss 0.6018 LearningRate 0.0005 Epoch: 18 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:49,128-Speed 3384.07 samples/sec Loss 0.5938 LearningRate 0.0005 Epoch: 18 Global Step: 105630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:52,152-Speed 3387.54 samples/sec Loss 0.6241 LearningRate 0.0005 Epoch: 18 Global Step: 105640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:55,180-Speed 3383.01 samples/sec Loss 0.5693 LearningRate 0.0005 Epoch: 18 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:21:58,197-Speed 3394.68 samples/sec Loss 0.5480 LearningRate 0.0005 Epoch: 18 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:01,212-Speed 3397.50 samples/sec Loss 0.5926 LearningRate 0.0005 Epoch: 18 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:04,229-Speed 3394.05 samples/sec Loss 0.6216 LearningRate 0.0005 Epoch: 18 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:07,244-Speed 3397.56 samples/sec Loss 0.6223 LearningRate 0.0005 Epoch: 18 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:10,262-Speed 3393.51 samples/sec Loss 0.5817 LearningRate 0.0005 Epoch: 18 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:13,302-Speed 3369.47 samples/sec Loss 0.5861 LearningRate 0.0005 Epoch: 18 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:16,300-Speed 3416.04 samples/sec Loss 0.5405 LearningRate 0.0005 Epoch: 18 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:19,319-Speed 3392.64 samples/sec Loss 0.5835 LearningRate 0.0005 Epoch: 18 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:22,334-Speed 3397.33 samples/sec Loss 0.5642 LearningRate 0.0005 Epoch: 18 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:25,355-Speed 3390.71 samples/sec Loss 0.6542 LearningRate 0.0005 Epoch: 18 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:28,370-Speed 3397.15 samples/sec Loss 0.6514 LearningRate 0.0005 Epoch: 18 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:31,386-Speed 3395.68 samples/sec Loss 0.5986 LearningRate 0.0005 Epoch: 18 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:34,401-Speed 3397.06 samples/sec Loss 0.5923 LearningRate 0.0005 Epoch: 18 Global Step: 105780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:37,445-Speed 3365.18 samples/sec Loss 0.5839 LearningRate 0.0005 Epoch: 18 Global Step: 105790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:40,603-Speed 3243.39 samples/sec Loss 0.5943 LearningRate 0.0005 Epoch: 18 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:43,626-Speed 3388.54 samples/sec Loss 0.5620 LearningRate 0.0005 Epoch: 18 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:22:46,683-Speed 3350.18 samples/sec Loss 0.6675 LearningRate 0.0005 Epoch: 18 Global Step: 105820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:22:49,702-Speed 3392.76 samples/sec Loss 0.6673 LearningRate 0.0005 Epoch: 18 Global Step: 105830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:22:52,727-Speed 3386.24 samples/sec Loss 0.6141 LearningRate 0.0005 Epoch: 18 Global Step: 105840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:22:55,744-Speed 3394.84 samples/sec Loss 0.6596 LearningRate 0.0005 Epoch: 18 Global Step: 105850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:22:58,742-Speed 3416.03 samples/sec Loss 0.6068 LearningRate 0.0005 Epoch: 18 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:01,765-Speed 3388.15 samples/sec Loss 0.5339 LearningRate 0.0005 Epoch: 18 Global Step: 105870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:04,787-Speed 3390.94 samples/sec Loss 0.6013 LearningRate 0.0005 Epoch: 18 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:07,811-Speed 3386.31 samples/sec Loss 0.5442 LearningRate 0.0005 Epoch: 18 Global Step: 105890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:10,832-Speed 3390.66 samples/sec Loss 0.6665 LearningRate 0.0005 Epoch: 18 Global Step: 105900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:13,851-Speed 3392.93 samples/sec Loss 0.6018 LearningRate 0.0005 Epoch: 18 Global Step: 105910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:16,880-Speed 3381.83 samples/sec Loss 0.5510 LearningRate 0.0005 Epoch: 18 Global Step: 105920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:19,897-Speed 3394.36 samples/sec Loss 0.5895 LearningRate 0.0005 Epoch: 18 Global Step: 105930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:22,916-Speed 3392.50 samples/sec Loss 0.6196 LearningRate 0.0005 Epoch: 18 Global Step: 105940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:25,937-Speed 3390.51 samples/sec Loss 0.6709 LearningRate 0.0005 Epoch: 18 Global Step: 105950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:28,945-Speed 3405.34 samples/sec Loss 0.6533 LearningRate 0.0005 Epoch: 18 Global Step: 105960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:31,970-Speed 3385.19 samples/sec Loss 0.6275 LearningRate 0.0005 Epoch: 18 Global Step: 105970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:34,987-Speed 3395.78 samples/sec Loss 0.6161 LearningRate 0.0005 Epoch: 18 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:38,006-Speed 3393.01 samples/sec Loss 0.6282 LearningRate 0.0005 Epoch: 18 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:23:41,032-Speed 3384.14 samples/sec Loss 0.6342 LearningRate 0.0005 Epoch: 18 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:24:24,513-[lfw][106000]XNorm: 21.776894 Training: 2022-04-27 12:24:24,514-[lfw][106000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-27 12:24:24,514-[lfw][106000]Accuracy-Highest: 0.99817 Training: 2022-04-27 12:25:15,057-[cfp_fp][106000]XNorm: 22.000450 Training: 2022-04-27 12:25:15,057-[cfp_fp][106000]Accuracy-Flip: 0.98486+-0.00492 Training: 2022-04-27 12:25:15,058-[cfp_fp][106000]Accuracy-Highest: 0.98614 Training: 2022-04-27 12:25:58,465-[agedb_30][106000]XNorm: 22.311786 Training: 2022-04-27 12:25:58,466-[agedb_30][106000]Accuracy-Flip: 0.98167+-0.00734 Training: 2022-04-27 12:25:58,466-[agedb_30][106000]Accuracy-Highest: 0.98233 Training: 2022-04-27 12:26:01,470-Speed 72.92 samples/sec Loss 0.5434 LearningRate 0.0005 Epoch: 18 Global Step: 106010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:04,495-Speed 3386.61 samples/sec Loss 0.6617 LearningRate 0.0005 Epoch: 18 Global Step: 106020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:07,523-Speed 3382.44 samples/sec Loss 0.6195 LearningRate 0.0005 Epoch: 18 Global Step: 106030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:10,539-Speed 3395.70 samples/sec Loss 0.5708 LearningRate 0.0005 Epoch: 18 Global Step: 106040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:13,581-Speed 3366.77 samples/sec Loss 0.5582 LearningRate 0.0005 Epoch: 18 Global Step: 106050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:16,882-Speed 3102.92 samples/sec Loss 0.5975 LearningRate 0.0005 Epoch: 18 Global Step: 106060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:19,901-Speed 3392.39 samples/sec Loss 0.6512 LearningRate 0.0005 Epoch: 18 Global Step: 106070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:22,943-Speed 3366.49 samples/sec Loss 0.5753 LearningRate 0.0005 Epoch: 18 Global Step: 106080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:26,086-Speed 3259.08 samples/sec Loss 0.6347 LearningRate 0.0005 Epoch: 18 Global Step: 106090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:29,103-Speed 3395.27 samples/sec Loss 0.6185 LearningRate 0.0004 Epoch: 18 Global Step: 106100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:26:32,121-Speed 3394.56 samples/sec Loss 0.5854 LearningRate 0.0004 Epoch: 18 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:35,138-Speed 3394.43 samples/sec Loss 0.5694 LearningRate 0.0004 Epoch: 18 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:38,166-Speed 3382.54 samples/sec Loss 0.6336 LearningRate 0.0004 Epoch: 18 Global Step: 106130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:41,210-Speed 3364.20 samples/sec Loss 0.6458 LearningRate 0.0004 Epoch: 18 Global Step: 106140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:44,233-Speed 3388.60 samples/sec Loss 0.5697 LearningRate 0.0004 Epoch: 18 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:47,260-Speed 3383.32 samples/sec Loss 0.5703 LearningRate 0.0004 Epoch: 18 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:50,285-Speed 3386.17 samples/sec Loss 0.5790 LearningRate 0.0004 Epoch: 18 Global Step: 106170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:53,323-Speed 3372.10 samples/sec Loss 0.5234 LearningRate 0.0004 Epoch: 18 Global Step: 106180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:56,348-Speed 3385.79 samples/sec Loss 0.6642 LearningRate 0.0004 Epoch: 18 Global Step: 106190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:26:59,369-Speed 3391.00 samples/sec Loss 0.5631 LearningRate 0.0004 Epoch: 18 Global Step: 106200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:02,392-Speed 3387.73 samples/sec Loss 0.5595 LearningRate 0.0004 Epoch: 18 Global Step: 106210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:27:05,414-Speed 3388.60 samples/sec Loss 0.6224 LearningRate 0.0004 Epoch: 18 Global Step: 106220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:27:08,431-Speed 3395.56 samples/sec Loss 0.6697 LearningRate 0.0004 Epoch: 18 Global Step: 106230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:27:11,453-Speed 3388.49 samples/sec Loss 0.6279 LearningRate 0.0004 Epoch: 18 Global Step: 106240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:27:14,448-Speed 3419.94 samples/sec Loss 0.5547 LearningRate 0.0004 Epoch: 18 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:17,472-Speed 3386.60 samples/sec Loss 0.6187 LearningRate 0.0004 Epoch: 18 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:20,498-Speed 3385.98 samples/sec Loss 0.6137 LearningRate 0.0004 Epoch: 18 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:23,516-Speed 3393.30 samples/sec Loss 0.6186 LearningRate 0.0004 Epoch: 18 Global Step: 106280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:26,560-Speed 3364.91 samples/sec Loss 0.6418 LearningRate 0.0004 Epoch: 18 Global Step: 106290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:29,570-Speed 3402.62 samples/sec Loss 0.5466 LearningRate 0.0004 Epoch: 18 Global Step: 106300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:32,580-Speed 3403.32 samples/sec Loss 0.5999 LearningRate 0.0004 Epoch: 18 Global Step: 106310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:35,595-Speed 3397.19 samples/sec Loss 0.6134 LearningRate 0.0004 Epoch: 18 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:38,609-Speed 3398.17 samples/sec Loss 0.5812 LearningRate 0.0004 Epoch: 18 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:41,626-Speed 3395.13 samples/sec Loss 0.6422 LearningRate 0.0004 Epoch: 18 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:44,640-Speed 3397.32 samples/sec Loss 0.5906 LearningRate 0.0004 Epoch: 18 Global Step: 106350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:27:47,638-Speed 3416.37 samples/sec Loss 0.6160 LearningRate 0.0004 Epoch: 18 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:50,661-Speed 3388.79 samples/sec Loss 0.5023 LearningRate 0.0004 Epoch: 18 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:53,667-Speed 3407.34 samples/sec Loss 0.6052 LearningRate 0.0004 Epoch: 18 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:56,681-Speed 3398.07 samples/sec Loss 0.6352 LearningRate 0.0004 Epoch: 18 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:27:59,720-Speed 3370.34 samples/sec Loss 0.5674 LearningRate 0.0004 Epoch: 18 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:02,742-Speed 3389.33 samples/sec Loss 0.6180 LearningRate 0.0004 Epoch: 18 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:05,748-Speed 3407.01 samples/sec Loss 0.6401 LearningRate 0.0004 Epoch: 18 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:08,756-Speed 3405.34 samples/sec Loss 0.5945 LearningRate 0.0004 Epoch: 18 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:11,837-Speed 3324.33 samples/sec Loss 0.6243 LearningRate 0.0004 Epoch: 18 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:14,863-Speed 3385.08 samples/sec Loss 0.6127 LearningRate 0.0004 Epoch: 18 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:17,873-Speed 3402.92 samples/sec Loss 0.6137 LearningRate 0.0004 Epoch: 18 Global Step: 106460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:28:20,863-Speed 3425.54 samples/sec Loss 0.6039 LearningRate 0.0004 Epoch: 18 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:23,916-Speed 3354.54 samples/sec Loss 0.6219 LearningRate 0.0004 Epoch: 18 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:26,973-Speed 3350.65 samples/sec Loss 0.6501 LearningRate 0.0004 Epoch: 18 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:29,980-Speed 3406.28 samples/sec Loss 0.5812 LearningRate 0.0004 Epoch: 18 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:32,991-Speed 3401.87 samples/sec Loss 0.5829 LearningRate 0.0004 Epoch: 18 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:36,016-Speed 3385.40 samples/sec Loss 0.6255 LearningRate 0.0004 Epoch: 18 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:39,035-Speed 3392.95 samples/sec Loss 0.5995 LearningRate 0.0004 Epoch: 18 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:42,046-Speed 3401.29 samples/sec Loss 0.6136 LearningRate 0.0004 Epoch: 18 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:45,056-Speed 3402.71 samples/sec Loss 0.5317 LearningRate 0.0004 Epoch: 18 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:48,067-Speed 3402.05 samples/sec Loss 0.6079 LearningRate 0.0004 Epoch: 18 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:51,059-Speed 3423.72 samples/sec Loss 0.6375 LearningRate 0.0004 Epoch: 18 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:54,068-Speed 3403.64 samples/sec Loss 0.5648 LearningRate 0.0004 Epoch: 18 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:28:57,080-Speed 3400.56 samples/sec Loss 0.5678 LearningRate 0.0004 Epoch: 18 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:00,105-Speed 3385.69 samples/sec Loss 0.6671 LearningRate 0.0004 Epoch: 18 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:03,120-Speed 3397.54 samples/sec Loss 0.5999 LearningRate 0.0004 Epoch: 18 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:06,134-Speed 3397.34 samples/sec Loss 0.5884 LearningRate 0.0004 Epoch: 18 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:09,148-Speed 3398.17 samples/sec Loss 0.6503 LearningRate 0.0004 Epoch: 18 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:12,189-Speed 3368.52 samples/sec Loss 0.6299 LearningRate 0.0004 Epoch: 18 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:15,203-Speed 3399.58 samples/sec Loss 0.5647 LearningRate 0.0004 Epoch: 18 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:18,216-Speed 3399.11 samples/sec Loss 0.6323 LearningRate 0.0004 Epoch: 18 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:21,240-Speed 3387.29 samples/sec Loss 0.6697 LearningRate 0.0004 Epoch: 18 Global Step: 106670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:29:24,242-Speed 3411.60 samples/sec Loss 0.6041 LearningRate 0.0004 Epoch: 18 Global Step: 106680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:27,264-Speed 3389.58 samples/sec Loss 0.5961 LearningRate 0.0004 Epoch: 18 Global Step: 106690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:30,286-Speed 3388.56 samples/sec Loss 0.5790 LearningRate 0.0004 Epoch: 18 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:33,296-Speed 3402.96 samples/sec Loss 0.6024 LearningRate 0.0004 Epoch: 18 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:36,306-Speed 3402.60 samples/sec Loss 0.6367 LearningRate 0.0004 Epoch: 18 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:39,327-Speed 3390.73 samples/sec Loss 0.6058 LearningRate 0.0004 Epoch: 18 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:42,339-Speed 3401.35 samples/sec Loss 0.6519 LearningRate 0.0004 Epoch: 18 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:45,348-Speed 3403.72 samples/sec Loss 0.5740 LearningRate 0.0004 Epoch: 18 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:48,366-Speed 3394.22 samples/sec Loss 0.6174 LearningRate 0.0004 Epoch: 18 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:51,380-Speed 3398.10 samples/sec Loss 0.5494 LearningRate 0.0004 Epoch: 18 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:29:54,388-Speed 3404.04 samples/sec Loss 0.6457 LearningRate 0.0004 Epoch: 18 Global Step: 106780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:29:57,381-Speed 3422.81 samples/sec Loss 0.5790 LearningRate 0.0004 Epoch: 18 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:00,399-Speed 3394.05 samples/sec Loss 0.5904 LearningRate 0.0004 Epoch: 18 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:03,414-Speed 3396.50 samples/sec Loss 0.5915 LearningRate 0.0004 Epoch: 18 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:06,434-Speed 3392.61 samples/sec Loss 0.6136 LearningRate 0.0004 Epoch: 18 Global Step: 106820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:09,466-Speed 3377.49 samples/sec Loss 0.6643 LearningRate 0.0004 Epoch: 18 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:12,484-Speed 3394.36 samples/sec Loss 0.6016 LearningRate 0.0004 Epoch: 18 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:15,476-Speed 3422.85 samples/sec Loss 0.6530 LearningRate 0.0004 Epoch: 18 Global Step: 106850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:18,489-Speed 3399.66 samples/sec Loss 0.5486 LearningRate 0.0004 Epoch: 18 Global Step: 106860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:21,499-Speed 3401.79 samples/sec Loss 0.5648 LearningRate 0.0004 Epoch: 18 Global Step: 106870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:24,509-Speed 3403.19 samples/sec Loss 0.6523 LearningRate 0.0004 Epoch: 18 Global Step: 106880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:27,520-Speed 3401.66 samples/sec Loss 0.5780 LearningRate 0.0004 Epoch: 18 Global Step: 106890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:30,534-Speed 3398.67 samples/sec Loss 0.5621 LearningRate 0.0004 Epoch: 18 Global Step: 106900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:33,545-Speed 3401.48 samples/sec Loss 0.5922 LearningRate 0.0004 Epoch: 18 Global Step: 106910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:36,555-Speed 3402.97 samples/sec Loss 0.6578 LearningRate 0.0004 Epoch: 18 Global Step: 106920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:39,567-Speed 3400.90 samples/sec Loss 0.6059 LearningRate 0.0004 Epoch: 18 Global Step: 106930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:42,574-Speed 3405.89 samples/sec Loss 0.6606 LearningRate 0.0004 Epoch: 18 Global Step: 106940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:30:45,588-Speed 3398.09 samples/sec Loss 0.6701 LearningRate 0.0004 Epoch: 18 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:48,595-Speed 3405.62 samples/sec Loss 0.5547 LearningRate 0.0004 Epoch: 18 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:51,612-Speed 3395.27 samples/sec Loss 0.6207 LearningRate 0.0004 Epoch: 18 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:54,620-Speed 3405.07 samples/sec Loss 0.5507 LearningRate 0.0004 Epoch: 18 Global Step: 106980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:30:57,628-Speed 3404.92 samples/sec Loss 0.6766 LearningRate 0.0004 Epoch: 18 Global Step: 106990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:00,658-Speed 3380.73 samples/sec Loss 0.6180 LearningRate 0.0003 Epoch: 18 Global Step: 107000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:03,666-Speed 3404.82 samples/sec Loss 0.5907 LearningRate 0.0003 Epoch: 18 Global Step: 107010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:06,676-Speed 3402.70 samples/sec Loss 0.5963 LearningRate 0.0003 Epoch: 18 Global Step: 107020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:09,690-Speed 3398.93 samples/sec Loss 0.6297 LearningRate 0.0003 Epoch: 18 Global Step: 107030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:12,698-Speed 3404.56 samples/sec Loss 0.6261 LearningRate 0.0003 Epoch: 18 Global Step: 107040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:15,704-Speed 3407.11 samples/sec Loss 0.5946 LearningRate 0.0003 Epoch: 18 Global Step: 107050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:18,720-Speed 3396.93 samples/sec Loss 0.5774 LearningRate 0.0003 Epoch: 18 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:21,744-Speed 3386.01 samples/sec Loss 0.5872 LearningRate 0.0003 Epoch: 18 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:24,765-Speed 3391.24 samples/sec Loss 0.5807 LearningRate 0.0003 Epoch: 18 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:31:27,760-Speed 3420.29 samples/sec Loss 0.5878 LearningRate 0.0003 Epoch: 18 Global Step: 107090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:30,768-Speed 3404.29 samples/sec Loss 0.6088 LearningRate 0.0003 Epoch: 18 Global Step: 107100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:33,778-Speed 3403.48 samples/sec Loss 0.5763 LearningRate 0.0003 Epoch: 18 Global Step: 107110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:36,787-Speed 3403.66 samples/sec Loss 0.6347 LearningRate 0.0003 Epoch: 18 Global Step: 107120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:39,795-Speed 3404.48 samples/sec Loss 0.6351 LearningRate 0.0003 Epoch: 18 Global Step: 107130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:42,808-Speed 3399.57 samples/sec Loss 0.6031 LearningRate 0.0003 Epoch: 18 Global Step: 107140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:45,837-Speed 3381.76 samples/sec Loss 0.5696 LearningRate 0.0003 Epoch: 18 Global Step: 107150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:48,849-Speed 3399.97 samples/sec Loss 0.6700 LearningRate 0.0003 Epoch: 18 Global Step: 107160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:51,856-Speed 3406.19 samples/sec Loss 0.5515 LearningRate 0.0003 Epoch: 18 Global Step: 107170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:54,867-Speed 3401.90 samples/sec Loss 0.6163 LearningRate 0.0003 Epoch: 18 Global Step: 107180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:31:57,877-Speed 3403.02 samples/sec Loss 0.5933 LearningRate 0.0003 Epoch: 18 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:00,867-Speed 3425.63 samples/sec Loss 0.5821 LearningRate 0.0003 Epoch: 18 Global Step: 107200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:03,898-Speed 3379.77 samples/sec Loss 0.6184 LearningRate 0.0003 Epoch: 18 Global Step: 107210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:06,914-Speed 3395.95 samples/sec Loss 0.6247 LearningRate 0.0003 Epoch: 18 Global Step: 107220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:09,927-Speed 3399.21 samples/sec Loss 0.5735 LearningRate 0.0003 Epoch: 18 Global Step: 107230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:12,957-Speed 3380.31 samples/sec Loss 0.5668 LearningRate 0.0003 Epoch: 18 Global Step: 107240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:15,988-Speed 3378.89 samples/sec Loss 0.6134 LearningRate 0.0003 Epoch: 18 Global Step: 107250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:19,014-Speed 3384.28 samples/sec Loss 0.5809 LearningRate 0.0003 Epoch: 18 Global Step: 107260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:22,023-Speed 3404.58 samples/sec Loss 0.5740 LearningRate 0.0003 Epoch: 18 Global Step: 107270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:25,038-Speed 3397.18 samples/sec Loss 0.5818 LearningRate 0.0003 Epoch: 18 Global Step: 107280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:28,063-Speed 3386.33 samples/sec Loss 0.5725 LearningRate 0.0003 Epoch: 18 Global Step: 107290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:32:31,079-Speed 3395.54 samples/sec Loss 0.6012 LearningRate 0.0003 Epoch: 18 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:34,095-Speed 3395.97 samples/sec Loss 0.5632 LearningRate 0.0003 Epoch: 18 Global Step: 107310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:37,107-Speed 3400.59 samples/sec Loss 0.5798 LearningRate 0.0003 Epoch: 18 Global Step: 107320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:40,119-Speed 3400.42 samples/sec Loss 0.5828 LearningRate 0.0003 Epoch: 18 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:43,134-Speed 3397.78 samples/sec Loss 0.5884 LearningRate 0.0003 Epoch: 18 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:46,155-Speed 3390.41 samples/sec Loss 0.5946 LearningRate 0.0003 Epoch: 18 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:49,187-Speed 3377.13 samples/sec Loss 0.5344 LearningRate 0.0003 Epoch: 18 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:52,203-Speed 3396.79 samples/sec Loss 0.5396 LearningRate 0.0003 Epoch: 18 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:55,214-Speed 3401.66 samples/sec Loss 0.5806 LearningRate 0.0003 Epoch: 18 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:32:58,245-Speed 3378.90 samples/sec Loss 0.5654 LearningRate 0.0003 Epoch: 18 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:01,257-Speed 3400.61 samples/sec Loss 0.6277 LearningRate 0.0003 Epoch: 18 Global Step: 107400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:33:04,286-Speed 3381.78 samples/sec Loss 0.5329 LearningRate 0.0003 Epoch: 18 Global Step: 107410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:33:07,287-Speed 3413.07 samples/sec Loss 0.6411 LearningRate 0.0003 Epoch: 18 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:10,311-Speed 3387.01 samples/sec Loss 0.6046 LearningRate 0.0003 Epoch: 18 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:13,338-Speed 3383.50 samples/sec Loss 0.5413 LearningRate 0.0003 Epoch: 18 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:16,349-Speed 3401.52 samples/sec Loss 0.5455 LearningRate 0.0003 Epoch: 18 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:19,374-Speed 3386.33 samples/sec Loss 0.6245 LearningRate 0.0003 Epoch: 18 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:22,399-Speed 3386.45 samples/sec Loss 0.5914 LearningRate 0.0003 Epoch: 18 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:25,424-Speed 3385.05 samples/sec Loss 0.5589 LearningRate 0.0003 Epoch: 18 Global Step: 107480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:28,449-Speed 3386.60 samples/sec Loss 0.5898 LearningRate 0.0003 Epoch: 18 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:31,461-Speed 3400.56 samples/sec Loss 0.5764 LearningRate 0.0003 Epoch: 18 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:34,472-Speed 3400.94 samples/sec Loss 0.6164 LearningRate 0.0003 Epoch: 18 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:37,497-Speed 3385.95 samples/sec Loss 0.6677 LearningRate 0.0003 Epoch: 18 Global Step: 107520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:33:40,499-Speed 3411.67 samples/sec Loss 0.5746 LearningRate 0.0003 Epoch: 18 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:43,512-Speed 3399.99 samples/sec Loss 0.6453 LearningRate 0.0003 Epoch: 18 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:46,527-Speed 3397.64 samples/sec Loss 0.6408 LearningRate 0.0003 Epoch: 18 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:49,612-Speed 3319.23 samples/sec Loss 0.6679 LearningRate 0.0003 Epoch: 18 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:52,631-Speed 3392.83 samples/sec Loss 0.5962 LearningRate 0.0003 Epoch: 18 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:55,646-Speed 3398.00 samples/sec Loss 0.6097 LearningRate 0.0003 Epoch: 18 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:33:58,671-Speed 3385.46 samples/sec Loss 0.6279 LearningRate 0.0003 Epoch: 18 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:01,692-Speed 3390.40 samples/sec Loss 0.5393 LearningRate 0.0003 Epoch: 18 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:04,688-Speed 3418.48 samples/sec Loss 0.5760 LearningRate 0.0003 Epoch: 18 Global Step: 107610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:07,702-Speed 3398.64 samples/sec Loss 0.6010 LearningRate 0.0003 Epoch: 18 Global Step: 107620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:10,714-Speed 3400.60 samples/sec Loss 0.5606 LearningRate 0.0003 Epoch: 18 Global Step: 107630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:13,754-Speed 3369.55 samples/sec Loss 0.5878 LearningRate 0.0003 Epoch: 18 Global Step: 107640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:16,821-Speed 3339.25 samples/sec Loss 0.5504 LearningRate 0.0003 Epoch: 18 Global Step: 107650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:19,838-Speed 3395.28 samples/sec Loss 0.6284 LearningRate 0.0003 Epoch: 18 Global Step: 107660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:22,849-Speed 3401.03 samples/sec Loss 0.5674 LearningRate 0.0003 Epoch: 18 Global Step: 107670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:25,861-Speed 3401.62 samples/sec Loss 0.6287 LearningRate 0.0003 Epoch: 18 Global Step: 107680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:28,884-Speed 3388.12 samples/sec Loss 0.6172 LearningRate 0.0003 Epoch: 18 Global Step: 107690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:31,897-Speed 3399.06 samples/sec Loss 0.5855 LearningRate 0.0003 Epoch: 18 Global Step: 107700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:34:34,912-Speed 3397.81 samples/sec Loss 0.6072 LearningRate 0.0003 Epoch: 18 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:37,937-Speed 3386.64 samples/sec Loss 0.5678 LearningRate 0.0003 Epoch: 18 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:40,950-Speed 3399.20 samples/sec Loss 0.5613 LearningRate 0.0003 Epoch: 18 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:43,963-Speed 3399.11 samples/sec Loss 0.6311 LearningRate 0.0003 Epoch: 18 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:46,981-Speed 3394.15 samples/sec Loss 0.5471 LearningRate 0.0003 Epoch: 18 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:50,004-Speed 3387.48 samples/sec Loss 0.5738 LearningRate 0.0003 Epoch: 18 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:53,019-Speed 3397.06 samples/sec Loss 0.5878 LearningRate 0.0003 Epoch: 18 Global Step: 107770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:56,044-Speed 3386.57 samples/sec Loss 0.6354 LearningRate 0.0003 Epoch: 18 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:34:59,063-Speed 3392.49 samples/sec Loss 0.6211 LearningRate 0.0003 Epoch: 18 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:02,094-Speed 3379.44 samples/sec Loss 0.5872 LearningRate 0.0003 Epoch: 18 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:05,113-Speed 3392.16 samples/sec Loss 0.5458 LearningRate 0.0003 Epoch: 18 Global Step: 107810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:35:08,108-Speed 3420.02 samples/sec Loss 0.6214 LearningRate 0.0003 Epoch: 18 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:11,130-Speed 3390.38 samples/sec Loss 0.6277 LearningRate 0.0003 Epoch: 18 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:14,143-Speed 3399.42 samples/sec Loss 0.5888 LearningRate 0.0003 Epoch: 18 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:17,160-Speed 3395.07 samples/sec Loss 0.6286 LearningRate 0.0003 Epoch: 18 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:20,173-Speed 3398.62 samples/sec Loss 0.6234 LearningRate 0.0003 Epoch: 18 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:23,208-Speed 3375.43 samples/sec Loss 0.6166 LearningRate 0.0003 Epoch: 18 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:26,236-Speed 3382.10 samples/sec Loss 0.6123 LearningRate 0.0003 Epoch: 18 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:29,425-Speed 3211.55 samples/sec Loss 0.5591 LearningRate 0.0003 Epoch: 18 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:32,440-Speed 3398.11 samples/sec Loss 0.5812 LearningRate 0.0003 Epoch: 18 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:35,455-Speed 3397.20 samples/sec Loss 0.6540 LearningRate 0.0003 Epoch: 18 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:38,473-Speed 3393.91 samples/sec Loss 0.6088 LearningRate 0.0003 Epoch: 18 Global Step: 107920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:35:41,468-Speed 3418.66 samples/sec Loss 0.5477 LearningRate 0.0003 Epoch: 18 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:44,484-Speed 3396.63 samples/sec Loss 0.5745 LearningRate 0.0003 Epoch: 18 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:47,503-Speed 3392.19 samples/sec Loss 0.5935 LearningRate 0.0003 Epoch: 18 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:50,520-Speed 3395.23 samples/sec Loss 0.6218 LearningRate 0.0003 Epoch: 18 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:53,546-Speed 3384.99 samples/sec Loss 0.5347 LearningRate 0.0003 Epoch: 18 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:56,564-Speed 3393.77 samples/sec Loss 0.6456 LearningRate 0.0003 Epoch: 18 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:35:59,578-Speed 3398.89 samples/sec Loss 0.5836 LearningRate 0.0003 Epoch: 18 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:36:02,593-Speed 3396.57 samples/sec Loss 0.5949 LearningRate 0.0003 Epoch: 18 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:36:45,708-[lfw][108000]XNorm: 21.752915 Training: 2022-04-27 12:36:45,709-[lfw][108000]Accuracy-Flip: 0.99750+-0.00300 Training: 2022-04-27 12:36:45,709-[lfw][108000]Accuracy-Highest: 0.99817 Training: 2022-04-27 12:37:36,252-[cfp_fp][108000]XNorm: 22.033458 Training: 2022-04-27 12:37:36,253-[cfp_fp][108000]Accuracy-Flip: 0.98586+-0.00517 Training: 2022-04-27 12:37:36,253-[cfp_fp][108000]Accuracy-Highest: 0.98614 Training: 2022-04-27 12:38:19,611-[agedb_30][108000]XNorm: 22.249081 Training: 2022-04-27 12:38:19,611-[agedb_30][108000]Accuracy-Flip: 0.98083+-0.00837 Training: 2022-04-27 12:38:19,612-[agedb_30][108000]Accuracy-Highest: 0.98233 Training: 2022-04-27 12:38:22,632-Speed 73.12 samples/sec Loss 0.6393 LearningRate 0.0003 Epoch: 18 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:25,652-Speed 3391.94 samples/sec Loss 0.5140 LearningRate 0.0003 Epoch: 18 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:28,844-Speed 3209.31 samples/sec Loss 0.5625 LearningRate 0.0003 Epoch: 18 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:42,033-Speed 776.43 samples/sec Loss 0.4842 LearningRate 0.0002 Epoch: 19 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:45,064-Speed 3379.68 samples/sec Loss 0.4742 LearningRate 0.0002 Epoch: 19 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:48,156-Speed 3312.93 samples/sec Loss 0.5031 LearningRate 0.0002 Epoch: 19 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:51,258-Speed 3302.35 samples/sec Loss 0.4644 LearningRate 0.0002 Epoch: 19 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:54,280-Speed 3388.67 samples/sec Loss 0.5099 LearningRate 0.0002 Epoch: 19 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:38:57,307-Speed 3384.58 samples/sec Loss 0.4506 LearningRate 0.0002 Epoch: 19 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:00,413-Speed 3297.41 samples/sec Loss 0.4759 LearningRate 0.0002 Epoch: 19 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:03,448-Speed 3374.60 samples/sec Loss 0.5270 LearningRate 0.0002 Epoch: 19 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:06,477-Speed 3382.09 samples/sec Loss 0.4726 LearningRate 0.0002 Epoch: 19 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:09,493-Speed 3395.14 samples/sec Loss 0.5547 LearningRate 0.0002 Epoch: 19 Global Step: 108130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:39:12,509-Speed 3396.77 samples/sec Loss 0.4903 LearningRate 0.0002 Epoch: 19 Global Step: 108140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:39:15,523-Speed 3398.38 samples/sec Loss 0.4811 LearningRate 0.0002 Epoch: 19 Global Step: 108150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:39:18,542-Speed 3392.70 samples/sec Loss 0.5179 LearningRate 0.0002 Epoch: 19 Global Step: 108160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:39:21,558-Speed 3396.01 samples/sec Loss 0.4855 LearningRate 0.0002 Epoch: 19 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:24,608-Speed 3357.80 samples/sec Loss 0.5058 LearningRate 0.0002 Epoch: 19 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:27,652-Speed 3365.08 samples/sec Loss 0.4820 LearningRate 0.0002 Epoch: 19 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:30,681-Speed 3381.36 samples/sec Loss 0.5246 LearningRate 0.0002 Epoch: 19 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:33,720-Speed 3370.58 samples/sec Loss 0.5415 LearningRate 0.0002 Epoch: 19 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:36,745-Speed 3386.18 samples/sec Loss 0.4780 LearningRate 0.0002 Epoch: 19 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:39,779-Speed 3376.30 samples/sec Loss 0.4737 LearningRate 0.0002 Epoch: 19 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:42,810-Speed 3379.65 samples/sec Loss 0.4860 LearningRate 0.0002 Epoch: 19 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:45,870-Speed 3346.55 samples/sec Loss 0.4707 LearningRate 0.0002 Epoch: 19 Global Step: 108250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:48,906-Speed 3374.11 samples/sec Loss 0.5058 LearningRate 0.0002 Epoch: 19 Global Step: 108260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:51,915-Speed 3404.95 samples/sec Loss 0.5341 LearningRate 0.0002 Epoch: 19 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:54,938-Speed 3388.51 samples/sec Loss 0.5408 LearningRate 0.0002 Epoch: 19 Global Step: 108280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:39:57,961-Speed 3388.29 samples/sec Loss 0.5431 LearningRate 0.0002 Epoch: 19 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:01,005-Speed 3364.72 samples/sec Loss 0.4884 LearningRate 0.0002 Epoch: 19 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:04,051-Speed 3362.69 samples/sec Loss 0.4828 LearningRate 0.0002 Epoch: 19 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:07,076-Speed 3384.97 samples/sec Loss 0.5295 LearningRate 0.0002 Epoch: 19 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:10,106-Speed 3380.75 samples/sec Loss 0.5060 LearningRate 0.0002 Epoch: 19 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:13,129-Speed 3388.49 samples/sec Loss 0.3917 LearningRate 0.0002 Epoch: 19 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:16,197-Speed 3337.86 samples/sec Loss 0.4942 LearningRate 0.0002 Epoch: 19 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:19,228-Speed 3380.54 samples/sec Loss 0.5013 LearningRate 0.0002 Epoch: 19 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:22,250-Speed 3388.13 samples/sec Loss 0.4746 LearningRate 0.0002 Epoch: 19 Global Step: 108370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:40:25,252-Speed 3411.95 samples/sec Loss 0.4412 LearningRate 0.0002 Epoch: 19 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:28,271-Speed 3392.57 samples/sec Loss 0.4859 LearningRate 0.0002 Epoch: 19 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:31,289-Speed 3394.31 samples/sec Loss 0.5219 LearningRate 0.0002 Epoch: 19 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:34,331-Speed 3367.00 samples/sec Loss 0.4790 LearningRate 0.0002 Epoch: 19 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:37,362-Speed 3378.90 samples/sec Loss 0.5384 LearningRate 0.0002 Epoch: 19 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:40,384-Speed 3389.41 samples/sec Loss 0.5091 LearningRate 0.0002 Epoch: 19 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:43,402-Speed 3393.37 samples/sec Loss 0.4665 LearningRate 0.0002 Epoch: 19 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:46,428-Speed 3385.34 samples/sec Loss 0.4816 LearningRate 0.0002 Epoch: 19 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:49,446-Speed 3393.43 samples/sec Loss 0.4269 LearningRate 0.0002 Epoch: 19 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:52,465-Speed 3392.96 samples/sec Loss 0.5007 LearningRate 0.0002 Epoch: 19 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:55,469-Speed 3409.14 samples/sec Loss 0.5360 LearningRate 0.0002 Epoch: 19 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:40:58,504-Speed 3375.15 samples/sec Loss 0.4842 LearningRate 0.0002 Epoch: 19 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:01,539-Speed 3375.00 samples/sec Loss 0.4946 LearningRate 0.0002 Epoch: 19 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:04,562-Speed 3387.88 samples/sec Loss 0.5333 LearningRate 0.0002 Epoch: 19 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:07,590-Speed 3382.98 samples/sec Loss 0.4684 LearningRate 0.0002 Epoch: 19 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:10,610-Speed 3392.23 samples/sec Loss 0.4896 LearningRate 0.0002 Epoch: 19 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:13,628-Speed 3394.32 samples/sec Loss 0.4735 LearningRate 0.0002 Epoch: 19 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:16,643-Speed 3396.28 samples/sec Loss 0.4904 LearningRate 0.0002 Epoch: 19 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:19,661-Speed 3394.79 samples/sec Loss 0.4798 LearningRate 0.0002 Epoch: 19 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:22,685-Speed 3386.47 samples/sec Loss 0.4698 LearningRate 0.0002 Epoch: 19 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:25,799-Speed 3289.16 samples/sec Loss 0.4807 LearningRate 0.0002 Epoch: 19 Global Step: 108580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 12:41:28,800-Speed 3412.09 samples/sec Loss 0.5164 LearningRate 0.0002 Epoch: 19 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:31,817-Speed 3396.01 samples/sec Loss 0.5243 LearningRate 0.0002 Epoch: 19 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:34,835-Speed 3393.43 samples/sec Loss 0.5058 LearningRate 0.0002 Epoch: 19 Global Step: 108610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 12:41:37,832-Speed 3417.06 samples/sec Loss 0.4543 LearningRate 0.0002 Epoch: 19 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:40,858-Speed 3385.38 samples/sec Loss 0.4887 LearningRate 0.0002 Epoch: 19 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:43,881-Speed 3388.60 samples/sec Loss 0.5257 LearningRate 0.0002 Epoch: 19 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:46,904-Speed 3387.31 samples/sec Loss 0.4827 LearningRate 0.0002 Epoch: 19 Global Step: 108650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:49,932-Speed 3383.50 samples/sec Loss 0.5586 LearningRate 0.0002 Epoch: 19 Global Step: 108660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:52,946-Speed 3397.27 samples/sec Loss 0.5379 LearningRate 0.0002 Epoch: 19 Global Step: 108670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:55,964-Speed 3393.82 samples/sec Loss 0.5505 LearningRate 0.0002 Epoch: 19 Global Step: 108680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:41:58,980-Speed 3396.13 samples/sec Loss 0.5053 LearningRate 0.0002 Epoch: 19 Global Step: 108690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:01,993-Speed 3400.23 samples/sec Loss 0.5146 LearningRate 0.0002 Epoch: 19 Global Step: 108700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:05,009-Speed 3395.08 samples/sec Loss 0.4984 LearningRate 0.0002 Epoch: 19 Global Step: 108710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:08,027-Speed 3394.37 samples/sec Loss 0.4366 LearningRate 0.0002 Epoch: 19 Global Step: 108720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:11,075-Speed 3359.67 samples/sec Loss 0.4365 LearningRate 0.0002 Epoch: 19 Global Step: 108730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:14,101-Speed 3385.08 samples/sec Loss 0.4985 LearningRate 0.0002 Epoch: 19 Global Step: 108740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:17,135-Speed 3376.16 samples/sec Loss 0.4982 LearningRate 0.0002 Epoch: 19 Global Step: 108750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:20,163-Speed 3382.44 samples/sec Loss 0.5130 LearningRate 0.0002 Epoch: 19 Global Step: 108760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:23,181-Speed 3393.89 samples/sec Loss 0.4593 LearningRate 0.0002 Epoch: 19 Global Step: 108770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:26,199-Speed 3393.49 samples/sec Loss 0.5629 LearningRate 0.0002 Epoch: 19 Global Step: 108780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:29,222-Speed 3388.50 samples/sec Loss 0.4796 LearningRate 0.0002 Epoch: 19 Global Step: 108790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 12:42:32,241-Speed 3392.62 samples/sec Loss 0.4621 LearningRate 0.0002 Epoch: 19 Global Step: 108800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:35,267-Speed 3385.24 samples/sec Loss 0.5100 LearningRate 0.0002 Epoch: 19 Global Step: 108810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:38,290-Speed 3388.36 samples/sec Loss 0.4740 LearningRate 0.0002 Epoch: 19 Global Step: 108820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:41,310-Speed 3391.51 samples/sec Loss 0.5060 LearningRate 0.0002 Epoch: 19 Global Step: 108830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:44,344-Speed 3376.01 samples/sec Loss 0.5251 LearningRate 0.0002 Epoch: 19 Global Step: 108840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:47,365-Speed 3390.65 samples/sec Loss 0.5171 LearningRate 0.0002 Epoch: 19 Global Step: 108850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:50,382-Speed 3394.47 samples/sec Loss 0.4384 LearningRate 0.0002 Epoch: 19 Global Step: 108860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:53,459-Speed 3328.83 samples/sec Loss 0.4540 LearningRate 0.0002 Epoch: 19 Global Step: 108870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 12:42:56,511-Speed 3356.46 samples/sec Loss 0.5274 LearningRate 0.0002 Epoch: 19 Global Step: 108880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:42:59,569-Speed 3349.24 samples/sec Loss 0.5171 LearningRate 0.0002 Epoch: 19 Global Step: 108890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:02,609-Speed 3369.33 samples/sec Loss 0.5589 LearningRate 0.0002 Epoch: 19 Global Step: 108900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:05,611-Speed 3411.67 samples/sec Loss 0.5649 LearningRate 0.0002 Epoch: 19 Global Step: 108910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:08,642-Speed 3378.79 samples/sec Loss 0.5174 LearningRate 0.0002 Epoch: 19 Global Step: 108920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:11,660-Speed 3393.35 samples/sec Loss 0.5303 LearningRate 0.0002 Epoch: 19 Global Step: 108930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:14,682-Speed 3389.75 samples/sec Loss 0.5200 LearningRate 0.0002 Epoch: 19 Global Step: 108940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:17,703-Speed 3390.15 samples/sec Loss 0.5112 LearningRate 0.0002 Epoch: 19 Global Step: 108950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:20,725-Speed 3389.78 samples/sec Loss 0.5703 LearningRate 0.0002 Epoch: 19 Global Step: 108960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:23,752-Speed 3383.98 samples/sec Loss 0.5713 LearningRate 0.0002 Epoch: 19 Global Step: 108970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:26,776-Speed 3387.31 samples/sec Loss 0.5036 LearningRate 0.0002 Epoch: 19 Global Step: 108980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:29,793-Speed 3394.55 samples/sec Loss 0.4988 LearningRate 0.0002 Epoch: 19 Global Step: 108990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:32,819-Speed 3385.60 samples/sec Loss 0.5347 LearningRate 0.0002 Epoch: 19 Global Step: 109000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:43:35,842-Speed 3388.03 samples/sec Loss 0.4790 LearningRate 0.0002 Epoch: 19 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:38,868-Speed 3384.40 samples/sec Loss 0.5218 LearningRate 0.0002 Epoch: 19 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:41,885-Speed 3394.10 samples/sec Loss 0.5220 LearningRate 0.0002 Epoch: 19 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:44,917-Speed 3388.04 samples/sec Loss 0.4875 LearningRate 0.0002 Epoch: 19 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:47,940-Speed 3388.45 samples/sec Loss 0.4447 LearningRate 0.0002 Epoch: 19 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:50,962-Speed 3388.75 samples/sec Loss 0.4852 LearningRate 0.0002 Epoch: 19 Global Step: 109060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:53,986-Speed 3386.63 samples/sec Loss 0.4731 LearningRate 0.0002 Epoch: 19 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:43:57,007-Speed 3390.32 samples/sec Loss 0.5853 LearningRate 0.0002 Epoch: 19 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:00,039-Speed 3377.95 samples/sec Loss 0.5245 LearningRate 0.0002 Epoch: 19 Global Step: 109090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:03,060-Speed 3390.52 samples/sec Loss 0.4868 LearningRate 0.0002 Epoch: 19 Global Step: 109100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:06,055-Speed 3420.30 samples/sec Loss 0.4519 LearningRate 0.0002 Epoch: 19 Global Step: 109110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:09,075-Speed 3391.71 samples/sec Loss 0.5110 LearningRate 0.0002 Epoch: 19 Global Step: 109120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:12,142-Speed 3339.01 samples/sec Loss 0.5528 LearningRate 0.0002 Epoch: 19 Global Step: 109130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:15,225-Speed 3322.67 samples/sec Loss 0.4937 LearningRate 0.0002 Epoch: 19 Global Step: 109140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:18,249-Speed 3387.13 samples/sec Loss 0.4596 LearningRate 0.0002 Epoch: 19 Global Step: 109150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:44:21,256-Speed 3406.41 samples/sec Loss 0.5127 LearningRate 0.0002 Epoch: 19 Global Step: 109160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:24,276-Speed 3390.70 samples/sec Loss 0.4595 LearningRate 0.0002 Epoch: 19 Global Step: 109170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:27,299-Speed 3387.99 samples/sec Loss 0.5248 LearningRate 0.0002 Epoch: 19 Global Step: 109180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:30,324-Speed 3386.09 samples/sec Loss 0.4646 LearningRate 0.0002 Epoch: 19 Global Step: 109190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:33,344-Speed 3391.89 samples/sec Loss 0.4434 LearningRate 0.0002 Epoch: 19 Global Step: 109200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:36,429-Speed 3319.36 samples/sec Loss 0.4647 LearningRate 0.0002 Epoch: 19 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:39,500-Speed 3336.06 samples/sec Loss 0.5236 LearningRate 0.0002 Epoch: 19 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:42,519-Speed 3392.68 samples/sec Loss 0.5332 LearningRate 0.0002 Epoch: 19 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:45,558-Speed 3371.33 samples/sec Loss 0.5206 LearningRate 0.0002 Epoch: 19 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:48,585-Speed 3384.74 samples/sec Loss 0.4524 LearningRate 0.0002 Epoch: 19 Global Step: 109250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:51,593-Speed 3405.17 samples/sec Loss 0.4540 LearningRate 0.0002 Epoch: 19 Global Step: 109260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:54,614-Speed 3390.65 samples/sec Loss 0.4303 LearningRate 0.0002 Epoch: 19 Global Step: 109270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:44:57,735-Speed 3282.90 samples/sec Loss 0.5467 LearningRate 0.0002 Epoch: 19 Global Step: 109280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:00,760-Speed 3386.23 samples/sec Loss 0.4858 LearningRate 0.0002 Epoch: 19 Global Step: 109290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:03,824-Speed 3343.10 samples/sec Loss 0.4715 LearningRate 0.0002 Epoch: 19 Global Step: 109300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:06,848-Speed 3386.12 samples/sec Loss 0.4666 LearningRate 0.0002 Epoch: 19 Global Step: 109310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:09,872-Speed 3387.04 samples/sec Loss 0.4950 LearningRate 0.0001 Epoch: 19 Global Step: 109320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:12,901-Speed 3383.80 samples/sec Loss 0.4503 LearningRate 0.0001 Epoch: 19 Global Step: 109330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:15,923-Speed 3390.18 samples/sec Loss 0.5369 LearningRate 0.0001 Epoch: 19 Global Step: 109340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:18,949-Speed 3384.56 samples/sec Loss 0.5380 LearningRate 0.0001 Epoch: 19 Global Step: 109350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:45:21,976-Speed 3382.74 samples/sec Loss 0.4940 LearningRate 0.0001 Epoch: 19 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:25,003-Speed 3384.47 samples/sec Loss 0.5292 LearningRate 0.0001 Epoch: 19 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:28,034-Speed 3379.06 samples/sec Loss 0.5454 LearningRate 0.0001 Epoch: 19 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:31,051-Speed 3394.55 samples/sec Loss 0.4868 LearningRate 0.0001 Epoch: 19 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:34,079-Speed 3382.77 samples/sec Loss 0.5200 LearningRate 0.0001 Epoch: 19 Global Step: 109400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:37,100-Speed 3390.93 samples/sec Loss 0.5105 LearningRate 0.0001 Epoch: 19 Global Step: 109410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:40,119-Speed 3392.30 samples/sec Loss 0.4729 LearningRate 0.0001 Epoch: 19 Global Step: 109420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:43,144-Speed 3386.33 samples/sec Loss 0.4492 LearningRate 0.0001 Epoch: 19 Global Step: 109430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:46,166-Speed 3389.34 samples/sec Loss 0.5427 LearningRate 0.0001 Epoch: 19 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:49,191-Speed 3385.86 samples/sec Loss 0.4763 LearningRate 0.0001 Epoch: 19 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:52,196-Speed 3407.92 samples/sec Loss 0.5077 LearningRate 0.0001 Epoch: 19 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:55,222-Speed 3384.71 samples/sec Loss 0.5060 LearningRate 0.0001 Epoch: 19 Global Step: 109470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:45:58,241-Speed 3393.07 samples/sec Loss 0.5214 LearningRate 0.0001 Epoch: 19 Global Step: 109480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:01,261-Speed 3391.85 samples/sec Loss 0.4956 LearningRate 0.0001 Epoch: 19 Global Step: 109490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:04,269-Speed 3404.48 samples/sec Loss 0.4741 LearningRate 0.0001 Epoch: 19 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:07,290-Speed 3391.51 samples/sec Loss 0.4822 LearningRate 0.0001 Epoch: 19 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:10,336-Speed 3361.68 samples/sec Loss 0.4802 LearningRate 0.0001 Epoch: 19 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:13,390-Speed 3354.58 samples/sec Loss 0.4726 LearningRate 0.0001 Epoch: 19 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:16,422-Speed 3377.65 samples/sec Loss 0.4852 LearningRate 0.0001 Epoch: 19 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:19,473-Speed 3357.09 samples/sec Loss 0.4646 LearningRate 0.0001 Epoch: 19 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:22,509-Speed 3373.26 samples/sec Loss 0.4884 LearningRate 0.0001 Epoch: 19 Global Step: 109560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:25,532-Speed 3387.74 samples/sec Loss 0.4781 LearningRate 0.0001 Epoch: 19 Global Step: 109570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:28,557-Speed 3386.50 samples/sec Loss 0.5155 LearningRate 0.0001 Epoch: 19 Global Step: 109580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:31,583-Speed 3384.88 samples/sec Loss 0.5366 LearningRate 0.0001 Epoch: 19 Global Step: 109590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:46:34,609-Speed 3384.93 samples/sec Loss 0.4590 LearningRate 0.0001 Epoch: 19 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:37,674-Speed 3341.47 samples/sec Loss 0.5146 LearningRate 0.0001 Epoch: 19 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:40,713-Speed 3370.43 samples/sec Loss 0.4973 LearningRate 0.0001 Epoch: 19 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:43,745-Speed 3378.88 samples/sec Loss 0.4979 LearningRate 0.0001 Epoch: 19 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:46,769-Speed 3386.76 samples/sec Loss 0.4895 LearningRate 0.0001 Epoch: 19 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:49,793-Speed 3386.61 samples/sec Loss 0.4858 LearningRate 0.0001 Epoch: 19 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:52,813-Speed 3391.95 samples/sec Loss 0.4718 LearningRate 0.0001 Epoch: 19 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:55,837-Speed 3386.93 samples/sec Loss 0.5548 LearningRate 0.0001 Epoch: 19 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:46:58,867-Speed 3380.11 samples/sec Loss 0.4902 LearningRate 0.0001 Epoch: 19 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:01,902-Speed 3374.81 samples/sec Loss 0.4550 LearningRate 0.0001 Epoch: 19 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:04,962-Speed 3347.98 samples/sec Loss 0.5120 LearningRate 0.0001 Epoch: 19 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:07,984-Speed 3388.94 samples/sec Loss 0.5086 LearningRate 0.0001 Epoch: 19 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:11,010-Speed 3384.62 samples/sec Loss 0.4855 LearningRate 0.0001 Epoch: 19 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:14,012-Speed 3412.12 samples/sec Loss 0.5396 LearningRate 0.0001 Epoch: 19 Global Step: 109730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:17,038-Speed 3384.52 samples/sec Loss 0.5630 LearningRate 0.0001 Epoch: 19 Global Step: 109740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:20,074-Speed 3374.11 samples/sec Loss 0.4685 LearningRate 0.0001 Epoch: 19 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:23,106-Speed 3377.39 samples/sec Loss 0.5152 LearningRate 0.0001 Epoch: 19 Global Step: 109760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:26,129-Speed 3388.15 samples/sec Loss 0.5313 LearningRate 0.0001 Epoch: 19 Global Step: 109770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:29,156-Speed 3383.83 samples/sec Loss 0.4748 LearningRate 0.0001 Epoch: 19 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:32,208-Speed 3357.09 samples/sec Loss 0.4694 LearningRate 0.0001 Epoch: 19 Global Step: 109790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:35,233-Speed 3385.52 samples/sec Loss 0.5061 LearningRate 0.0001 Epoch: 19 Global Step: 109800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:38,252-Speed 3392.61 samples/sec Loss 0.4788 LearningRate 0.0001 Epoch: 19 Global Step: 109810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:41,328-Speed 3328.97 samples/sec Loss 0.4560 LearningRate 0.0001 Epoch: 19 Global Step: 109820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:47:44,350-Speed 3389.86 samples/sec Loss 0.4990 LearningRate 0.0001 Epoch: 19 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:47,372-Speed 3389.41 samples/sec Loss 0.4680 LearningRate 0.0001 Epoch: 19 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:50,400-Speed 3382.21 samples/sec Loss 0.5500 LearningRate 0.0001 Epoch: 19 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:53,475-Speed 3331.21 samples/sec Loss 0.5249 LearningRate 0.0001 Epoch: 19 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:56,498-Speed 3388.66 samples/sec Loss 0.4936 LearningRate 0.0001 Epoch: 19 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:47:59,521-Speed 3387.44 samples/sec Loss 0.4559 LearningRate 0.0001 Epoch: 19 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:02,544-Speed 3388.94 samples/sec Loss 0.5043 LearningRate 0.0001 Epoch: 19 Global Step: 109890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:05,566-Speed 3388.70 samples/sec Loss 0.4812 LearningRate 0.0001 Epoch: 19 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:08,585-Speed 3392.47 samples/sec Loss 0.4971 LearningRate 0.0001 Epoch: 19 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:11,609-Speed 3386.78 samples/sec Loss 0.4502 LearningRate 0.0001 Epoch: 19 Global Step: 109920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:14,659-Speed 3358.71 samples/sec Loss 0.5171 LearningRate 0.0001 Epoch: 19 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:17,684-Speed 3385.23 samples/sec Loss 0.5336 LearningRate 0.0001 Epoch: 19 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:20,709-Speed 3386.48 samples/sec Loss 0.5000 LearningRate 0.0001 Epoch: 19 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:23,732-Speed 3387.87 samples/sec Loss 0.4773 LearningRate 0.0001 Epoch: 19 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:26,759-Speed 3384.61 samples/sec Loss 0.4959 LearningRate 0.0001 Epoch: 19 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:29,785-Speed 3384.64 samples/sec Loss 0.5141 LearningRate 0.0001 Epoch: 19 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:48:32,795-Speed 3402.82 samples/sec Loss 0.5339 LearningRate 0.0001 Epoch: 19 Global Step: 109990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:48:35,872-Speed 3328.58 samples/sec Loss 0.5863 LearningRate 0.0001 Epoch: 19 Global Step: 110000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:49:19,167-[lfw][110000]XNorm: 21.717442 Training: 2022-04-27 12:49:19,168-[lfw][110000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 12:49:19,168-[lfw][110000]Accuracy-Highest: 0.99817 Training: 2022-04-27 12:50:09,451-[cfp_fp][110000]XNorm: 22.112701 Training: 2022-04-27 12:50:09,452-[cfp_fp][110000]Accuracy-Flip: 0.98629+-0.00483 Training: 2022-04-27 12:50:09,452-[cfp_fp][110000]Accuracy-Highest: 0.98629 Training: 2022-04-27 12:50:52,847-[agedb_30][110000]XNorm: 22.248529 Training: 2022-04-27 12:50:52,848-[agedb_30][110000]Accuracy-Flip: 0.98250+-0.00786 Training: 2022-04-27 12:50:52,848-[agedb_30][110000]Accuracy-Highest: 0.98250 Training: 2022-04-27 12:50:55,859-Speed 73.15 samples/sec Loss 0.4784 LearningRate 0.0001 Epoch: 19 Global Step: 110010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:50:58,915-Speed 3351.91 samples/sec Loss 0.4128 LearningRate 0.0001 Epoch: 19 Global Step: 110020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:01,931-Speed 3395.84 samples/sec Loss 0.4833 LearningRate 0.0001 Epoch: 19 Global Step: 110030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:04,957-Speed 3385.47 samples/sec Loss 0.5313 LearningRate 0.0001 Epoch: 19 Global Step: 110040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:07,967-Speed 3402.10 samples/sec Loss 0.5218 LearningRate 0.0001 Epoch: 19 Global Step: 110050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:10,987-Speed 3391.91 samples/sec Loss 0.4865 LearningRate 0.0001 Epoch: 19 Global Step: 110060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:13,998-Speed 3401.01 samples/sec Loss 0.4624 LearningRate 0.0001 Epoch: 19 Global Step: 110070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:17,020-Speed 3389.18 samples/sec Loss 0.5127 LearningRate 0.0001 Epoch: 19 Global Step: 110080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:20,034-Speed 3398.67 samples/sec Loss 0.5065 LearningRate 0.0001 Epoch: 19 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:51:23,075-Speed 3368.49 samples/sec Loss 0.5087 LearningRate 0.0001 Epoch: 19 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:51:26,138-Speed 3343.75 samples/sec Loss 0.5029 LearningRate 0.0001 Epoch: 19 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:51:29,150-Speed 3400.41 samples/sec Loss 0.4988 LearningRate 0.0001 Epoch: 19 Global Step: 110120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:51:32,164-Speed 3398.52 samples/sec Loss 0.5102 LearningRate 0.0001 Epoch: 19 Global Step: 110130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:51:35,159-Speed 3420.15 samples/sec Loss 0.4808 LearningRate 0.0001 Epoch: 19 Global Step: 110140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:51:38,161-Speed 3411.05 samples/sec Loss 0.4585 LearningRate 0.0001 Epoch: 19 Global Step: 110150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:41,188-Speed 3384.25 samples/sec Loss 0.4695 LearningRate 0.0001 Epoch: 19 Global Step: 110160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:44,202-Speed 3397.85 samples/sec Loss 0.4780 LearningRate 0.0001 Epoch: 19 Global Step: 110170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:47,236-Speed 3377.46 samples/sec Loss 0.4889 LearningRate 0.0001 Epoch: 19 Global Step: 110180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:50,261-Speed 3384.91 samples/sec Loss 0.4613 LearningRate 0.0001 Epoch: 19 Global Step: 110190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:53,280-Speed 3392.52 samples/sec Loss 0.4618 LearningRate 0.0001 Epoch: 19 Global Step: 110200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:56,292-Speed 3400.47 samples/sec Loss 0.5747 LearningRate 0.0001 Epoch: 19 Global Step: 110210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:51:59,312-Speed 3392.03 samples/sec Loss 0.4793 LearningRate 0.0001 Epoch: 19 Global Step: 110220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:52:02,328-Speed 3395.92 samples/sec Loss 0.4823 LearningRate 0.0001 Epoch: 19 Global Step: 110230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:52:05,345-Speed 3395.31 samples/sec Loss 0.4783 LearningRate 0.0001 Epoch: 19 Global Step: 110240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 12:52:08,374-Speed 3381.52 samples/sec Loss 0.5310 LearningRate 0.0001 Epoch: 19 Global Step: 110250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:11,393-Speed 3392.69 samples/sec Loss 0.4883 LearningRate 0.0001 Epoch: 19 Global Step: 110260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:14,421-Speed 3383.02 samples/sec Loss 0.4603 LearningRate 0.0001 Epoch: 19 Global Step: 110270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:17,443-Speed 3389.15 samples/sec Loss 0.4850 LearningRate 0.0001 Epoch: 19 Global Step: 110280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:20,463-Speed 3391.21 samples/sec Loss 0.4969 LearningRate 0.0001 Epoch: 19 Global Step: 110290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:23,480-Speed 3395.02 samples/sec Loss 0.5141 LearningRate 0.0001 Epoch: 19 Global Step: 110300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:26,496-Speed 3395.83 samples/sec Loss 0.5374 LearningRate 0.0001 Epoch: 19 Global Step: 110310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:29,510-Speed 3398.24 samples/sec Loss 0.4371 LearningRate 0.0001 Epoch: 19 Global Step: 110320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:32,530-Speed 3391.32 samples/sec Loss 0.4708 LearningRate 0.0001 Epoch: 19 Global Step: 110330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:35,574-Speed 3364.83 samples/sec Loss 0.5141 LearningRate 0.0001 Epoch: 19 Global Step: 110340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:52:38,609-Speed 3374.72 samples/sec Loss 0.4971 LearningRate 0.0001 Epoch: 19 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:41,648-Speed 3370.73 samples/sec Loss 0.4949 LearningRate 0.0001 Epoch: 19 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:44,664-Speed 3396.04 samples/sec Loss 0.5710 LearningRate 0.0001 Epoch: 19 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:47,679-Speed 3397.59 samples/sec Loss 0.4961 LearningRate 0.0001 Epoch: 19 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:50,692-Speed 3398.82 samples/sec Loss 0.5692 LearningRate 0.0001 Epoch: 19 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:53,713-Speed 3390.55 samples/sec Loss 0.4760 LearningRate 0.0001 Epoch: 19 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:56,729-Speed 3396.51 samples/sec Loss 0.5217 LearningRate 0.0001 Epoch: 19 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:52:59,757-Speed 3381.94 samples/sec Loss 0.5428 LearningRate 0.0001 Epoch: 19 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:02,778-Speed 3390.13 samples/sec Loss 0.5053 LearningRate 0.0001 Epoch: 19 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:05,799-Speed 3390.80 samples/sec Loss 0.4802 LearningRate 0.0001 Epoch: 19 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:08,812-Speed 3400.08 samples/sec Loss 0.5224 LearningRate 0.0001 Epoch: 19 Global Step: 110450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:11,826-Speed 3397.79 samples/sec Loss 0.5234 LearningRate 0.0001 Epoch: 19 Global Step: 110460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:14,836-Speed 3403.84 samples/sec Loss 0.5488 LearningRate 0.0001 Epoch: 19 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:17,850-Speed 3398.33 samples/sec Loss 0.4670 LearningRate 0.0001 Epoch: 19 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:20,866-Speed 3396.13 samples/sec Loss 0.4612 LearningRate 0.0001 Epoch: 19 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:23,882-Speed 3395.65 samples/sec Loss 0.4754 LearningRate 0.0001 Epoch: 19 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:26,896-Speed 3398.36 samples/sec Loss 0.4549 LearningRate 0.0001 Epoch: 19 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:29,931-Speed 3374.98 samples/sec Loss 0.4651 LearningRate 0.0001 Epoch: 19 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:32,945-Speed 3397.95 samples/sec Loss 0.5477 LearningRate 0.0001 Epoch: 19 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:35,960-Speed 3397.82 samples/sec Loss 0.4929 LearningRate 0.0001 Epoch: 19 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:38,985-Speed 3386.06 samples/sec Loss 0.5105 LearningRate 0.0001 Epoch: 19 Global Step: 110550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:53:41,999-Speed 3398.10 samples/sec Loss 0.4809 LearningRate 0.0001 Epoch: 19 Global Step: 110560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:53:45,012-Speed 3399.50 samples/sec Loss 0.5257 LearningRate 0.0001 Epoch: 19 Global Step: 110570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:53:48,057-Speed 3363.15 samples/sec Loss 0.5260 LearningRate 0.0001 Epoch: 19 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:51,071-Speed 3398.14 samples/sec Loss 0.4650 LearningRate 0.0001 Epoch: 19 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:54,088-Speed 3394.87 samples/sec Loss 0.5345 LearningRate 0.0001 Epoch: 19 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:53:57,102-Speed 3398.39 samples/sec Loss 0.4980 LearningRate 0.0001 Epoch: 19 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:00,103-Speed 3413.45 samples/sec Loss 0.5402 LearningRate 0.0001 Epoch: 19 Global Step: 110620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:03,117-Speed 3397.44 samples/sec Loss 0.5325 LearningRate 0.0001 Epoch: 19 Global Step: 110630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:06,135-Speed 3394.69 samples/sec Loss 0.4897 LearningRate 0.0001 Epoch: 19 Global Step: 110640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:09,150-Speed 3396.93 samples/sec Loss 0.5026 LearningRate 0.0001 Epoch: 19 Global Step: 110650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:12,169-Speed 3392.88 samples/sec Loss 0.5302 LearningRate 0.0001 Epoch: 19 Global Step: 110660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:15,217-Speed 3360.24 samples/sec Loss 0.4980 LearningRate 0.0001 Epoch: 19 Global Step: 110670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:18,236-Speed 3392.13 samples/sec Loss 0.4609 LearningRate 0.0001 Epoch: 19 Global Step: 110680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:21,252-Speed 3396.30 samples/sec Loss 0.4957 LearningRate 0.0001 Epoch: 19 Global Step: 110690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:24,295-Speed 3366.22 samples/sec Loss 0.4855 LearningRate 0.0001 Epoch: 19 Global Step: 110700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:27,313-Speed 3393.27 samples/sec Loss 0.4571 LearningRate 0.0001 Epoch: 19 Global Step: 110710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:54:30,341-Speed 3383.38 samples/sec Loss 0.5657 LearningRate 0.0001 Epoch: 19 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:33,371-Speed 3379.68 samples/sec Loss 0.4373 LearningRate 0.0001 Epoch: 19 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:36,404-Speed 3377.01 samples/sec Loss 0.5074 LearningRate 0.0001 Epoch: 19 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:39,433-Speed 3381.58 samples/sec Loss 0.4946 LearningRate 0.0001 Epoch: 19 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:42,462-Speed 3382.13 samples/sec Loss 0.5168 LearningRate 0.0001 Epoch: 19 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:45,483-Speed 3389.53 samples/sec Loss 0.4914 LearningRate 0.0001 Epoch: 19 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:48,501-Speed 3393.89 samples/sec Loss 0.4733 LearningRate 0.0001 Epoch: 19 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:51,530-Speed 3381.31 samples/sec Loss 0.5003 LearningRate 0.0001 Epoch: 19 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:54,560-Speed 3380.52 samples/sec Loss 0.5239 LearningRate 0.0001 Epoch: 19 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:54:57,576-Speed 3396.46 samples/sec Loss 0.5012 LearningRate 0.0001 Epoch: 19 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:00,624-Speed 3360.25 samples/sec Loss 0.4875 LearningRate 0.0001 Epoch: 19 Global Step: 110820 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:55:03,750-Speed 3276.98 samples/sec Loss 0.4907 LearningRate 0.0001 Epoch: 19 Global Step: 110830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:55:06,757-Speed 3406.56 samples/sec Loss 0.5084 LearningRate 0.0001 Epoch: 19 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:09,780-Speed 3388.20 samples/sec Loss 0.5614 LearningRate 0.0001 Epoch: 19 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:12,799-Speed 3392.13 samples/sec Loss 0.5098 LearningRate 0.0001 Epoch: 19 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:15,820-Speed 3389.85 samples/sec Loss 0.4830 LearningRate 0.0001 Epoch: 19 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:18,840-Speed 3392.34 samples/sec Loss 0.4930 LearningRate 0.0001 Epoch: 19 Global Step: 110880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:21,880-Speed 3368.89 samples/sec Loss 0.4938 LearningRate 0.0001 Epoch: 19 Global Step: 110890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:24,918-Speed 3371.72 samples/sec Loss 0.5408 LearningRate 0.0001 Epoch: 19 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:27,945-Speed 3383.09 samples/sec Loss 0.5151 LearningRate 0.0001 Epoch: 19 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:30,971-Speed 3385.16 samples/sec Loss 0.5152 LearningRate 0.0001 Epoch: 19 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:33,990-Speed 3392.14 samples/sec Loss 0.5343 LearningRate 0.0001 Epoch: 19 Global Step: 110930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:37,012-Speed 3389.57 samples/sec Loss 0.4520 LearningRate 0.0001 Epoch: 19 Global Step: 110940 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:55:40,017-Speed 3408.15 samples/sec Loss 0.5144 LearningRate 0.0001 Epoch: 19 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:43,040-Speed 3388.50 samples/sec Loss 0.4705 LearningRate 0.0001 Epoch: 19 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:46,088-Speed 3360.39 samples/sec Loss 0.5399 LearningRate 0.0001 Epoch: 19 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:49,192-Speed 3299.59 samples/sec Loss 0.5129 LearningRate 0.0001 Epoch: 19 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:52,213-Speed 3391.10 samples/sec Loss 0.4904 LearningRate 0.0001 Epoch: 19 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:55,236-Speed 3388.58 samples/sec Loss 0.5398 LearningRate 0.0001 Epoch: 19 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:55:58,260-Speed 3387.61 samples/sec Loss 0.5182 LearningRate 0.0001 Epoch: 19 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:01,278-Speed 3392.90 samples/sec Loss 0.4846 LearningRate 0.0001 Epoch: 19 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:04,301-Speed 3388.99 samples/sec Loss 0.4613 LearningRate 0.0001 Epoch: 19 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:07,317-Speed 3395.41 samples/sec Loss 0.5202 LearningRate 0.0001 Epoch: 19 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:10,321-Speed 3409.46 samples/sec Loss 0.5069 LearningRate 0.0001 Epoch: 19 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:13,344-Speed 3388.64 samples/sec Loss 0.5014 LearningRate 0.0001 Epoch: 19 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:16,394-Speed 3358.13 samples/sec Loss 0.5301 LearningRate 0.0001 Epoch: 19 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:19,412-Speed 3393.51 samples/sec Loss 0.5574 LearningRate 0.0001 Epoch: 19 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:22,436-Speed 3386.97 samples/sec Loss 0.4919 LearningRate 0.0001 Epoch: 19 Global Step: 111090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:25,456-Speed 3391.90 samples/sec Loss 0.5441 LearningRate 0.0001 Epoch: 19 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:28,477-Speed 3390.02 samples/sec Loss 0.4871 LearningRate 0.0001 Epoch: 19 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:31,496-Speed 3393.06 samples/sec Loss 0.5027 LearningRate 0.0001 Epoch: 19 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:34,513-Speed 3394.48 samples/sec Loss 0.4739 LearningRate 0.0001 Epoch: 19 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:37,556-Speed 3365.80 samples/sec Loss 0.4191 LearningRate 0.0001 Epoch: 19 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:40,601-Speed 3364.25 samples/sec Loss 0.4679 LearningRate 0.0001 Epoch: 19 Global Step: 111150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:56:43,604-Speed 3410.06 samples/sec Loss 0.5062 LearningRate 0.0001 Epoch: 19 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:46,630-Speed 3385.49 samples/sec Loss 0.4907 LearningRate 0.0001 Epoch: 19 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:49,696-Speed 3340.02 samples/sec Loss 0.4902 LearningRate 0.0000 Epoch: 19 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:52,725-Speed 3382.55 samples/sec Loss 0.5534 LearningRate 0.0000 Epoch: 19 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:55,761-Speed 3373.15 samples/sec Loss 0.5144 LearningRate 0.0000 Epoch: 19 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:56:58,780-Speed 3391.88 samples/sec Loss 0.5698 LearningRate 0.0000 Epoch: 19 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:01,836-Speed 3352.41 samples/sec Loss 0.5045 LearningRate 0.0000 Epoch: 19 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:04,896-Speed 3347.14 samples/sec Loss 0.4953 LearningRate 0.0000 Epoch: 19 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:07,912-Speed 3395.23 samples/sec Loss 0.5246 LearningRate 0.0000 Epoch: 19 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:10,914-Speed 3412.60 samples/sec Loss 0.4033 LearningRate 0.0000 Epoch: 19 Global Step: 111250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:13,931-Speed 3394.37 samples/sec Loss 0.4670 LearningRate 0.0000 Epoch: 19 Global Step: 111260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:16,955-Speed 3387.84 samples/sec Loss 0.4691 LearningRate 0.0000 Epoch: 19 Global Step: 111270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:19,974-Speed 3392.86 samples/sec Loss 0.5240 LearningRate 0.0000 Epoch: 19 Global Step: 111280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:23,014-Speed 3368.71 samples/sec Loss 0.4992 LearningRate 0.0000 Epoch: 19 Global Step: 111290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:26,079-Speed 3342.15 samples/sec Loss 0.4791 LearningRate 0.0000 Epoch: 19 Global Step: 111300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:29,103-Speed 3386.18 samples/sec Loss 0.5117 LearningRate 0.0000 Epoch: 19 Global Step: 111310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:32,119-Speed 3396.14 samples/sec Loss 0.4740 LearningRate 0.0000 Epoch: 19 Global Step: 111320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:35,148-Speed 3381.65 samples/sec Loss 0.4406 LearningRate 0.0000 Epoch: 19 Global Step: 111330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:38,173-Speed 3385.63 samples/sec Loss 0.5700 LearningRate 0.0000 Epoch: 19 Global Step: 111340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:57:41,193-Speed 3391.69 samples/sec Loss 0.5507 LearningRate 0.0000 Epoch: 19 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:44,220-Speed 3384.45 samples/sec Loss 0.4852 LearningRate 0.0000 Epoch: 19 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:47,238-Speed 3393.54 samples/sec Loss 0.4815 LearningRate 0.0000 Epoch: 19 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:50,322-Speed 3320.96 samples/sec Loss 0.5156 LearningRate 0.0000 Epoch: 19 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:53,340-Speed 3394.39 samples/sec Loss 0.5351 LearningRate 0.0000 Epoch: 19 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:56,378-Speed 3371.45 samples/sec Loss 0.5280 LearningRate 0.0000 Epoch: 19 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:57:59,407-Speed 3380.69 samples/sec Loss 0.4265 LearningRate 0.0000 Epoch: 19 Global Step: 111410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:02,425-Speed 3393.88 samples/sec Loss 0.5175 LearningRate 0.0000 Epoch: 19 Global Step: 111420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:05,441-Speed 3396.29 samples/sec Loss 0.4921 LearningRate 0.0000 Epoch: 19 Global Step: 111430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:08,515-Speed 3332.11 samples/sec Loss 0.5164 LearningRate 0.0000 Epoch: 19 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:11,686-Speed 3230.18 samples/sec Loss 0.5342 LearningRate 0.0000 Epoch: 19 Global Step: 111450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:14,707-Speed 3391.35 samples/sec Loss 0.4904 LearningRate 0.0000 Epoch: 19 Global Step: 111460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:17,724-Speed 3393.92 samples/sec Loss 0.5206 LearningRate 0.0000 Epoch: 19 Global Step: 111470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:20,741-Speed 3394.86 samples/sec Loss 0.5216 LearningRate 0.0000 Epoch: 19 Global Step: 111480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:23,762-Speed 3391.36 samples/sec Loss 0.4251 LearningRate 0.0000 Epoch: 19 Global Step: 111490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:26,780-Speed 3393.07 samples/sec Loss 0.4223 LearningRate 0.0000 Epoch: 19 Global Step: 111500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:29,805-Speed 3386.04 samples/sec Loss 0.5527 LearningRate 0.0000 Epoch: 19 Global Step: 111510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:32,827-Speed 3388.59 samples/sec Loss 0.5306 LearningRate 0.0000 Epoch: 19 Global Step: 111520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:35,855-Speed 3385.84 samples/sec Loss 0.4729 LearningRate 0.0000 Epoch: 19 Global Step: 111530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:38,886-Speed 3379.39 samples/sec Loss 0.4870 LearningRate 0.0000 Epoch: 19 Global Step: 111540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:58:41,912-Speed 3385.53 samples/sec Loss 0.5130 LearningRate 0.0000 Epoch: 19 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:44,928-Speed 3395.82 samples/sec Loss 0.5202 LearningRate 0.0000 Epoch: 19 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:47,941-Speed 3398.80 samples/sec Loss 0.5189 LearningRate 0.0000 Epoch: 19 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:50,957-Speed 3395.69 samples/sec Loss 0.4531 LearningRate 0.0000 Epoch: 19 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:53,987-Speed 3380.32 samples/sec Loss 0.4684 LearningRate 0.0000 Epoch: 19 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:58:57,016-Speed 3381.43 samples/sec Loss 0.5266 LearningRate 0.0000 Epoch: 19 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:00,038-Speed 3389.85 samples/sec Loss 0.4794 LearningRate 0.0000 Epoch: 19 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:03,065-Speed 3383.30 samples/sec Loss 0.4340 LearningRate 0.0000 Epoch: 19 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:06,089-Speed 3387.01 samples/sec Loss 0.5012 LearningRate 0.0000 Epoch: 19 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:09,115-Speed 3384.75 samples/sec Loss 0.4467 LearningRate 0.0000 Epoch: 19 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:12,134-Speed 3393.34 samples/sec Loss 0.5298 LearningRate 0.0000 Epoch: 19 Global Step: 111650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 12:59:15,134-Speed 3414.04 samples/sec Loss 0.4789 LearningRate 0.0000 Epoch: 19 Global Step: 111660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:18,156-Speed 3389.33 samples/sec Loss 0.5303 LearningRate 0.0000 Epoch: 19 Global Step: 111670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:21,182-Speed 3384.69 samples/sec Loss 0.4937 LearningRate 0.0000 Epoch: 19 Global Step: 111680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:24,202-Speed 3391.01 samples/sec Loss 0.4729 LearningRate 0.0000 Epoch: 19 Global Step: 111690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:27,233-Speed 3379.34 samples/sec Loss 0.5106 LearningRate 0.0000 Epoch: 19 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:30,271-Speed 3372.15 samples/sec Loss 0.4731 LearningRate 0.0000 Epoch: 19 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:33,289-Speed 3393.78 samples/sec Loss 0.4386 LearningRate 0.0000 Epoch: 19 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:36,314-Speed 3385.67 samples/sec Loss 0.4863 LearningRate 0.0000 Epoch: 19 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:39,337-Speed 3388.82 samples/sec Loss 0.5366 LearningRate 0.0000 Epoch: 19 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 12:59:42,351-Speed 3397.73 samples/sec Loss 0.4884 LearningRate 0.0000 Epoch: 19 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:59:45,370-Speed 3393.00 samples/sec Loss 0.5265 LearningRate 0.0000 Epoch: 19 Global Step: 111760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:59:48,401-Speed 3379.10 samples/sec Loss 0.5461 LearningRate 0.0000 Epoch: 19 Global Step: 111770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:59:51,469-Speed 3338.22 samples/sec Loss 0.4366 LearningRate 0.0000 Epoch: 19 Global Step: 111780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:59:54,523-Speed 3354.11 samples/sec Loss 0.4759 LearningRate 0.0000 Epoch: 19 Global Step: 111790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 12:59:57,546-Speed 3388.37 samples/sec Loss 0.5336 LearningRate 0.0000 Epoch: 19 Global Step: 111800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:00,569-Speed 3388.99 samples/sec Loss 0.4419 LearningRate 0.0000 Epoch: 19 Global Step: 111810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:03,638-Speed 3337.22 samples/sec Loss 0.4732 LearningRate 0.0000 Epoch: 19 Global Step: 111820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:06,661-Speed 3388.24 samples/sec Loss 0.4435 LearningRate 0.0000 Epoch: 19 Global Step: 111830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:09,708-Speed 3361.21 samples/sec Loss 0.5759 LearningRate 0.0000 Epoch: 19 Global Step: 111840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:12,745-Speed 3372.50 samples/sec Loss 0.5213 LearningRate 0.0000 Epoch: 19 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:15,766-Speed 3390.17 samples/sec Loss 0.4750 LearningRate 0.0000 Epoch: 19 Global Step: 111860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:18,789-Speed 3388.48 samples/sec Loss 0.5077 LearningRate 0.0000 Epoch: 19 Global Step: 111870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:21,804-Speed 3397.24 samples/sec Loss 0.4399 LearningRate 0.0000 Epoch: 19 Global Step: 111880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:24,819-Speed 3397.15 samples/sec Loss 0.4788 LearningRate 0.0000 Epoch: 19 Global Step: 111890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:27,853-Speed 3375.17 samples/sec Loss 0.5482 LearningRate 0.0000 Epoch: 19 Global Step: 111900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:30,876-Speed 3388.99 samples/sec Loss 0.4931 LearningRate 0.0000 Epoch: 19 Global Step: 111910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:33,899-Speed 3387.89 samples/sec Loss 0.4325 LearningRate 0.0000 Epoch: 19 Global Step: 111920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:36,920-Speed 3391.16 samples/sec Loss 0.4626 LearningRate 0.0000 Epoch: 19 Global Step: 111930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:00:39,947-Speed 3382.62 samples/sec Loss 0.4696 LearningRate 0.0000 Epoch: 19 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:42,971-Speed 3387.71 samples/sec Loss 0.5706 LearningRate 0.0000 Epoch: 19 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:45,997-Speed 3384.11 samples/sec Loss 0.5059 LearningRate 0.0000 Epoch: 19 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:49,014-Speed 3395.28 samples/sec Loss 0.5090 LearningRate 0.0000 Epoch: 19 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:52,048-Speed 3375.21 samples/sec Loss 0.4806 LearningRate 0.0000 Epoch: 19 Global Step: 111980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:55,065-Speed 3395.62 samples/sec Loss 0.4926 LearningRate 0.0000 Epoch: 19 Global Step: 111990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:00:58,124-Speed 3347.78 samples/sec Loss 0.5099 LearningRate 0.0000 Epoch: 19 Global Step: 112000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:01:41,424-[lfw][112000]XNorm: 21.779004 Training: 2022-04-27 13:01:41,424-[lfw][112000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 13:01:41,425-[lfw][112000]Accuracy-Highest: 0.99817 Training: 2022-04-27 13:02:31,736-[cfp_fp][112000]XNorm: 22.111145 Training: 2022-04-27 13:02:31,737-[cfp_fp][112000]Accuracy-Flip: 0.98529+-0.00542 Training: 2022-04-27 13:02:31,737-[cfp_fp][112000]Accuracy-Highest: 0.98629 Training: 2022-04-27 13:03:15,249-[agedb_30][112000]XNorm: 22.232803 Training: 2022-04-27 13:03:15,250-[agedb_30][112000]Accuracy-Flip: 0.98167+-0.00882 Training: 2022-04-27 13:03:15,250-[agedb_30][112000]Accuracy-Highest: 0.98250 Training: 2022-04-27 13:03:18,266-Speed 73.07 samples/sec Loss 0.4502 LearningRate 0.0000 Epoch: 19 Global Step: 112010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:03:21,274-Speed 3405.36 samples/sec Loss 0.5108 LearningRate 0.0000 Epoch: 19 Global Step: 112020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:03:24,281-Speed 3406.76 samples/sec Loss 0.4599 LearningRate 0.0000 Epoch: 19 Global Step: 112030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:03:27,298-Speed 3394.45 samples/sec Loss 0.4756 LearningRate 0.0000 Epoch: 19 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:30,309-Speed 3401.65 samples/sec Loss 0.4858 LearningRate 0.0000 Epoch: 19 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:33,320-Speed 3401.16 samples/sec Loss 0.4681 LearningRate 0.0000 Epoch: 19 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:36,334-Speed 3399.12 samples/sec Loss 0.4237 LearningRate 0.0000 Epoch: 19 Global Step: 112070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:39,348-Speed 3398.27 samples/sec Loss 0.5001 LearningRate 0.0000 Epoch: 19 Global Step: 112080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:42,365-Speed 3393.91 samples/sec Loss 0.5034 LearningRate 0.0000 Epoch: 19 Global Step: 112090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:45,383-Speed 3394.42 samples/sec Loss 0.4726 LearningRate 0.0000 Epoch: 19 Global Step: 112100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:48,488-Speed 3298.01 samples/sec Loss 0.4948 LearningRate 0.0000 Epoch: 19 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:51,575-Speed 3318.37 samples/sec Loss 0.4561 LearningRate 0.0000 Epoch: 19 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:54,598-Speed 3388.08 samples/sec Loss 0.4937 LearningRate 0.0000 Epoch: 19 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:03:57,635-Speed 3372.44 samples/sec Loss 0.5214 LearningRate 0.0000 Epoch: 19 Global Step: 112140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:04:00,657-Speed 3389.27 samples/sec Loss 0.5138 LearningRate 0.0000 Epoch: 19 Global Step: 112150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:04:03,660-Speed 3410.53 samples/sec Loss 0.5814 LearningRate 0.0000 Epoch: 19 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:06,677-Speed 3395.15 samples/sec Loss 0.5404 LearningRate 0.0000 Epoch: 19 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:09,704-Speed 3383.94 samples/sec Loss 0.5261 LearningRate 0.0000 Epoch: 19 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:12,726-Speed 3389.62 samples/sec Loss 0.4456 LearningRate 0.0000 Epoch: 19 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:15,744-Speed 3393.44 samples/sec Loss 0.5125 LearningRate 0.0000 Epoch: 19 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:18,758-Speed 3398.22 samples/sec Loss 0.5533 LearningRate 0.0000 Epoch: 19 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:21,768-Speed 3403.50 samples/sec Loss 0.4777 LearningRate 0.0000 Epoch: 19 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:24,783-Speed 3396.19 samples/sec Loss 0.5504 LearningRate 0.0000 Epoch: 19 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:27,795-Speed 3401.70 samples/sec Loss 0.4904 LearningRate 0.0000 Epoch: 19 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:30,805-Speed 3401.76 samples/sec Loss 0.5899 LearningRate 0.0000 Epoch: 19 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:33,799-Speed 3421.38 samples/sec Loss 0.4784 LearningRate 0.0000 Epoch: 19 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:36,815-Speed 3395.26 samples/sec Loss 0.4487 LearningRate 0.0000 Epoch: 19 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:39,847-Speed 3378.63 samples/sec Loss 0.5125 LearningRate 0.0000 Epoch: 19 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:42,866-Speed 3392.70 samples/sec Loss 0.5163 LearningRate 0.0000 Epoch: 19 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:45,890-Speed 3386.94 samples/sec Loss 0.4780 LearningRate 0.0000 Epoch: 19 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:48,900-Speed 3402.98 samples/sec Loss 0.5244 LearningRate 0.0000 Epoch: 19 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:51,912-Speed 3401.12 samples/sec Loss 0.5469 LearningRate 0.0000 Epoch: 19 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:54,922-Speed 3402.27 samples/sec Loss 0.4883 LearningRate 0.0000 Epoch: 19 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:04:57,930-Speed 3404.84 samples/sec Loss 0.5511 LearningRate 0.0000 Epoch: 19 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:00,944-Speed 3398.18 samples/sec Loss 0.5141 LearningRate 0.0000 Epoch: 19 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:03,965-Speed 3390.26 samples/sec Loss 0.5498 LearningRate 0.0000 Epoch: 19 Global Step: 112360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:05:06,942-Speed 3440.85 samples/sec Loss 0.4767 LearningRate 0.0000 Epoch: 19 Global Step: 112370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:09,958-Speed 3396.80 samples/sec Loss 0.4909 LearningRate 0.0000 Epoch: 19 Global Step: 112380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:12,976-Speed 3393.74 samples/sec Loss 0.4841 LearningRate 0.0000 Epoch: 19 Global Step: 112390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:15,991-Speed 3396.96 samples/sec Loss 0.4926 LearningRate 0.0000 Epoch: 19 Global Step: 112400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:19,005-Speed 3397.89 samples/sec Loss 0.4858 LearningRate 0.0000 Epoch: 19 Global Step: 112410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:22,029-Speed 3387.33 samples/sec Loss 0.5290 LearningRate 0.0000 Epoch: 19 Global Step: 112420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:25,046-Speed 3394.98 samples/sec Loss 0.4372 LearningRate 0.0000 Epoch: 19 Global Step: 112430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:28,054-Speed 3405.20 samples/sec Loss 0.5088 LearningRate 0.0000 Epoch: 19 Global Step: 112440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:31,064-Speed 3402.02 samples/sec Loss 0.5039 LearningRate 0.0000 Epoch: 19 Global Step: 112450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:34,075-Speed 3402.35 samples/sec Loss 0.5022 LearningRate 0.0000 Epoch: 19 Global Step: 112460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:05:37,084-Speed 3403.59 samples/sec Loss 0.5019 LearningRate 0.0000 Epoch: 19 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:40,100-Speed 3396.31 samples/sec Loss 0.5266 LearningRate 0.0000 Epoch: 19 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:43,109-Speed 3403.70 samples/sec Loss 0.4984 LearningRate 0.0000 Epoch: 19 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:46,121-Speed 3401.23 samples/sec Loss 0.4802 LearningRate 0.0000 Epoch: 19 Global Step: 112500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:49,140-Speed 3392.04 samples/sec Loss 0.4515 LearningRate 0.0000 Epoch: 19 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:52,158-Speed 3394.72 samples/sec Loss 0.5168 LearningRate 0.0000 Epoch: 19 Global Step: 112520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:55,168-Speed 3402.23 samples/sec Loss 0.5103 LearningRate 0.0000 Epoch: 19 Global Step: 112530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:05:58,204-Speed 3373.22 samples/sec Loss 0.5572 LearningRate 0.0000 Epoch: 19 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:01,217-Speed 3400.35 samples/sec Loss 0.5128 LearningRate 0.0000 Epoch: 19 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:04,210-Speed 3421.76 samples/sec Loss 0.5285 LearningRate 0.0000 Epoch: 19 Global Step: 112560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:07,220-Speed 3402.72 samples/sec Loss 0.5091 LearningRate 0.0000 Epoch: 19 Global Step: 112570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:10,255-Speed 3376.12 samples/sec Loss 0.4617 LearningRate 0.0000 Epoch: 19 Global Step: 112580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:13,300-Speed 3363.73 samples/sec Loss 0.5086 LearningRate 0.0000 Epoch: 19 Global Step: 112590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:16,315-Speed 3396.79 samples/sec Loss 0.5198 LearningRate 0.0000 Epoch: 19 Global Step: 112600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:19,328-Speed 3398.66 samples/sec Loss 0.5105 LearningRate 0.0000 Epoch: 19 Global Step: 112610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:22,338-Speed 3402.89 samples/sec Loss 0.4613 LearningRate 0.0000 Epoch: 19 Global Step: 112620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:25,368-Speed 3380.33 samples/sec Loss 0.4610 LearningRate 0.0000 Epoch: 19 Global Step: 112630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:28,381-Speed 3399.84 samples/sec Loss 0.5206 LearningRate 0.0000 Epoch: 19 Global Step: 112640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:31,393-Speed 3401.13 samples/sec Loss 0.5515 LearningRate 0.0000 Epoch: 19 Global Step: 112650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:06:34,406-Speed 3399.39 samples/sec Loss 0.4820 LearningRate 0.0000 Epoch: 19 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:37,418-Speed 3399.97 samples/sec Loss 0.4697 LearningRate 0.0000 Epoch: 19 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:40,440-Speed 3389.76 samples/sec Loss 0.5368 LearningRate 0.0000 Epoch: 19 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:43,448-Speed 3405.07 samples/sec Loss 0.4746 LearningRate 0.0000 Epoch: 19 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:46,460-Speed 3400.71 samples/sec Loss 0.5521 LearningRate 0.0000 Epoch: 19 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:49,474-Speed 3397.87 samples/sec Loss 0.4674 LearningRate 0.0000 Epoch: 19 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:52,485-Speed 3401.24 samples/sec Loss 0.5480 LearningRate 0.0000 Epoch: 19 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:55,498-Speed 3400.11 samples/sec Loss 0.5256 LearningRate 0.0000 Epoch: 19 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:06:58,526-Speed 3382.84 samples/sec Loss 0.4880 LearningRate 0.0000 Epoch: 19 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:01,538-Speed 3400.06 samples/sec Loss 0.4571 LearningRate 0.0000 Epoch: 19 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:04,551-Speed 3400.07 samples/sec Loss 0.4666 LearningRate 0.0000 Epoch: 19 Global Step: 112760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:07:07,543-Speed 3422.84 samples/sec Loss 0.5273 LearningRate 0.0000 Epoch: 19 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:10,533-Speed 3425.73 samples/sec Loss 0.4601 LearningRate 0.0000 Epoch: 19 Global Step: 112780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:13,545-Speed 3399.93 samples/sec Loss 0.5046 LearningRate 0.0000 Epoch: 19 Global Step: 112790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:16,561-Speed 3396.41 samples/sec Loss 0.4541 LearningRate 0.0000 Epoch: 19 Global Step: 112800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:19,575-Speed 3398.21 samples/sec Loss 0.4385 LearningRate 0.0000 Epoch: 19 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:22,595-Speed 3391.44 samples/sec Loss 0.4823 LearningRate 0.0000 Epoch: 19 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:25,619-Speed 3386.57 samples/sec Loss 0.5323 LearningRate 0.0000 Epoch: 19 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:28,637-Speed 3395.15 samples/sec Loss 0.4875 LearningRate 0.0000 Epoch: 19 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:31,678-Speed 3367.61 samples/sec Loss 0.5219 LearningRate 0.0000 Epoch: 19 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:34,690-Speed 3400.40 samples/sec Loss 0.5681 LearningRate 0.0000 Epoch: 19 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:37,715-Speed 3385.53 samples/sec Loss 0.5367 LearningRate 0.0000 Epoch: 19 Global Step: 112870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:07:40,751-Speed 3374.19 samples/sec Loss 0.5630 LearningRate 0.0000 Epoch: 19 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:43,767-Speed 3395.96 samples/sec Loss 0.4742 LearningRate 0.0000 Epoch: 19 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:46,866-Speed 3305.20 samples/sec Loss 0.4790 LearningRate 0.0000 Epoch: 19 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:49,889-Speed 3388.18 samples/sec Loss 0.4941 LearningRate 0.0000 Epoch: 19 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:52,905-Speed 3395.79 samples/sec Loss 0.4283 LearningRate 0.0000 Epoch: 19 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:55,918-Speed 3399.14 samples/sec Loss 0.5187 LearningRate 0.0000 Epoch: 19 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:07:58,936-Speed 3393.91 samples/sec Loss 0.5653 LearningRate 0.0000 Epoch: 19 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:01,949-Speed 3399.82 samples/sec Loss 0.4886 LearningRate 0.0000 Epoch: 19 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:04,970-Speed 3390.34 samples/sec Loss 0.4689 LearningRate 0.0000 Epoch: 19 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:07,986-Speed 3395.75 samples/sec Loss 0.4902 LearningRate 0.0000 Epoch: 19 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:11,015-Speed 3381.19 samples/sec Loss 0.5034 LearningRate 0.0000 Epoch: 19 Global Step: 112980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:14,030-Speed 3397.77 samples/sec Loss 0.4765 LearningRate 0.0000 Epoch: 19 Global Step: 112990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:17,070-Speed 3369.25 samples/sec Loss 0.5304 LearningRate 0.0000 Epoch: 19 Global Step: 113000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:20,089-Speed 3392.35 samples/sec Loss 0.5531 LearningRate 0.0000 Epoch: 19 Global Step: 113010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:23,100-Speed 3401.51 samples/sec Loss 0.5087 LearningRate 0.0000 Epoch: 19 Global Step: 113020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:26,114-Speed 3398.21 samples/sec Loss 0.5288 LearningRate 0.0000 Epoch: 19 Global Step: 113030 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:08:29,106-Speed 3423.80 samples/sec Loss 0.4663 LearningRate 0.0000 Epoch: 19 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:32,123-Speed 3395.35 samples/sec Loss 0.4987 LearningRate 0.0000 Epoch: 19 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:35,136-Speed 3398.97 samples/sec Loss 0.4570 LearningRate 0.0000 Epoch: 19 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:38,156-Speed 3391.52 samples/sec Loss 0.5141 LearningRate 0.0000 Epoch: 19 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:41,196-Speed 3369.68 samples/sec Loss 0.5039 LearningRate 0.0000 Epoch: 19 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:44,210-Speed 3397.60 samples/sec Loss 0.4947 LearningRate 0.0000 Epoch: 19 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:47,224-Speed 3398.60 samples/sec Loss 0.4683 LearningRate 0.0000 Epoch: 19 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:50,237-Speed 3399.72 samples/sec Loss 0.4888 LearningRate 0.0000 Epoch: 19 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:53,256-Speed 3392.47 samples/sec Loss 0.4694 LearningRate 0.0000 Epoch: 19 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:56,269-Speed 3398.73 samples/sec Loss 0.5267 LearningRate 0.0000 Epoch: 19 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:08:59,287-Speed 3394.58 samples/sec Loss 0.4763 LearningRate 0.0000 Epoch: 19 Global Step: 113140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:09:02,305-Speed 3393.57 samples/sec Loss 0.4862 LearningRate 0.0000 Epoch: 19 Global Step: 113150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:09:05,344-Speed 3370.52 samples/sec Loss 0.4515 LearningRate 0.0000 Epoch: 19 Global Step: 113160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:09:08,344-Speed 3414.08 samples/sec Loss 0.4520 LearningRate 0.0000 Epoch: 19 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:11,371-Speed 3384.45 samples/sec Loss 0.5087 LearningRate 0.0000 Epoch: 19 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:14,384-Speed 3398.77 samples/sec Loss 0.4977 LearningRate 0.0000 Epoch: 19 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:17,397-Speed 3400.10 samples/sec Loss 0.4423 LearningRate 0.0000 Epoch: 19 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:20,409-Speed 3401.07 samples/sec Loss 0.5068 LearningRate 0.0000 Epoch: 19 Global Step: 113210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:23,424-Speed 3395.96 samples/sec Loss 0.4507 LearningRate 0.0000 Epoch: 19 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:26,463-Speed 3370.76 samples/sec Loss 0.5165 LearningRate 0.0000 Epoch: 19 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:29,481-Speed 3393.34 samples/sec Loss 0.5161 LearningRate 0.0000 Epoch: 19 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:32,499-Speed 3393.75 samples/sec Loss 0.5633 LearningRate 0.0000 Epoch: 19 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:35,526-Speed 3384.55 samples/sec Loss 0.5298 LearningRate 0.0000 Epoch: 19 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:38,546-Speed 3390.81 samples/sec Loss 0.4612 LearningRate 0.0000 Epoch: 19 Global Step: 113270 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:09:41,551-Speed 3409.43 samples/sec Loss 0.5142 LearningRate 0.0000 Epoch: 19 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:44,565-Speed 3397.74 samples/sec Loss 0.5273 LearningRate 0.0000 Epoch: 19 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:47,602-Speed 3373.04 samples/sec Loss 0.4878 LearningRate 0.0000 Epoch: 19 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:50,635-Speed 3376.71 samples/sec Loss 0.5184 LearningRate 0.0000 Epoch: 19 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:53,650-Speed 3396.91 samples/sec Loss 0.4259 LearningRate 0.0000 Epoch: 19 Global Step: 113320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:56,664-Speed 3398.77 samples/sec Loss 0.5359 LearningRate 0.0000 Epoch: 19 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:09:59,682-Speed 3393.23 samples/sec Loss 0.5062 LearningRate 0.0000 Epoch: 19 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:02,698-Speed 3396.64 samples/sec Loss 0.5403 LearningRate 0.0000 Epoch: 19 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:05,722-Speed 3386.63 samples/sec Loss 0.5279 LearningRate 0.0000 Epoch: 19 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:08,737-Speed 3397.91 samples/sec Loss 0.5114 LearningRate 0.0000 Epoch: 19 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:11,759-Speed 3388.68 samples/sec Loss 0.4573 LearningRate 0.0000 Epoch: 19 Global Step: 113380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:10:14,760-Speed 3413.03 samples/sec Loss 0.5706 LearningRate 0.0000 Epoch: 19 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:17,779-Speed 3392.77 samples/sec Loss 0.5397 LearningRate 0.0000 Epoch: 19 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:20,795-Speed 3395.92 samples/sec Loss 0.5255 LearningRate 0.0000 Epoch: 19 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:23,822-Speed 3383.71 samples/sec Loss 0.4858 LearningRate 0.0000 Epoch: 19 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:26,836-Speed 3398.16 samples/sec Loss 0.4894 LearningRate 0.0000 Epoch: 19 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:29,850-Speed 3398.48 samples/sec Loss 0.4380 LearningRate 0.0000 Epoch: 19 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:10:32,854-Speed 3408.85 samples/sec Loss 0.5289 LearningRate 0.0000 Epoch: 19 Global Step: 113450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:35,892-Speed 3372.38 samples/sec Loss 0.4786 LearningRate 0.0000 Epoch: 19 Global Step: 113460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:38,926-Speed 3375.46 samples/sec Loss 0.5205 LearningRate 0.0000 Epoch: 19 Global Step: 113470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:41,943-Speed 3395.45 samples/sec Loss 0.5217 LearningRate 0.0000 Epoch: 19 Global Step: 113480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:44,958-Speed 3396.67 samples/sec Loss 0.4637 LearningRate 0.0000 Epoch: 19 Global Step: 113490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:47,972-Speed 3398.12 samples/sec Loss 0.5180 LearningRate 0.0000 Epoch: 19 Global Step: 113500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:50,990-Speed 3394.29 samples/sec Loss 0.4811 LearningRate 0.0000 Epoch: 19 Global Step: 113510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:54,027-Speed 3372.81 samples/sec Loss 0.4937 LearningRate 0.0000 Epoch: 19 Global Step: 113520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:10:57,041-Speed 3397.42 samples/sec Loss 0.4477 LearningRate 0.0000 Epoch: 19 Global Step: 113530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:11:00,066-Speed 3385.84 samples/sec Loss 0.4777 LearningRate 0.0000 Epoch: 19 Global Step: 113540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 13:11:03,098-Speed 3379.03 samples/sec Loss 0.5145 LearningRate 0.0000 Epoch: 19 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:06,192-Speed 3309.67 samples/sec Loss 0.4934 LearningRate 0.0000 Epoch: 19 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:09,208-Speed 3397.18 samples/sec Loss 0.4894 LearningRate 0.0000 Epoch: 19 Global Step: 113570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:12,282-Speed 3331.38 samples/sec Loss 0.5240 LearningRate 0.0000 Epoch: 19 Global Step: 113580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:15,301-Speed 3391.87 samples/sec Loss 0.5036 LearningRate 0.0000 Epoch: 19 Global Step: 113590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:18,321-Speed 3392.44 samples/sec Loss 0.4426 LearningRate 0.0000 Epoch: 19 Global Step: 113600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:21,345-Speed 3386.30 samples/sec Loss 0.5241 LearningRate 0.0000 Epoch: 19 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:24,398-Speed 3355.26 samples/sec Loss 0.5259 LearningRate 0.0000 Epoch: 19 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:27,517-Speed 3283.48 samples/sec Loss 0.4634 LearningRate 0.0000 Epoch: 19 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:30,540-Speed 3388.83 samples/sec Loss 0.4484 LearningRate 0.0000 Epoch: 19 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:33,557-Speed 3395.37 samples/sec Loss 0.5249 LearningRate 0.0000 Epoch: 19 Global Step: 113650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 13:11:36,556-Speed 3415.35 samples/sec Loss 0.4938 LearningRate 0.0000 Epoch: 19 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:39,574-Speed 3394.40 samples/sec Loss 0.5447 LearningRate 0.0000 Epoch: 19 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:42,594-Speed 3391.35 samples/sec Loss 0.4329 LearningRate 0.0000 Epoch: 19 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:45,615-Speed 3390.07 samples/sec Loss 0.4785 LearningRate 0.0000 Epoch: 19 Global Step: 113690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:48,639-Speed 3386.47 samples/sec Loss 0.5062 LearningRate 0.0000 Epoch: 19 Global Step: 113700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:51,741-Speed 3302.55 samples/sec Loss 0.4741 LearningRate 0.0000 Epoch: 19 Global Step: 113710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 13:11:54,754-Speed 3398.57 samples/sec Loss 0.4556 LearningRate 0.0000 Epoch: 19 Global Step: 113720 Fp16 Grad Scale: 65536 Required: -0 hours