Training: 2022-04-27 01:27:04,685-rank_id: 0 Training: 2022-04-27 01:27:18,502-: margin_list [1.0, 0.5, 0.0] Training: 2022-04-27 01:27:18,503-: network r50 Training: 2022-04-27 01:27:18,503-: resume False Training: 2022-04-27 01:27:18,503-: output work_dirs/ms1mv2_r50 Training: 2022-04-27 01:27:18,503-: embedding_size 512 Training: 2022-04-27 01:27:18,503-: sample_rate 1.0 Training: 2022-04-27 01:27:18,503-: interclass_filtering_threshold0 Training: 2022-04-27 01:27:18,503-: fp16 True Training: 2022-04-27 01:27:18,503-: batch_size 128 Training: 2022-04-27 01:27:18,503-: optimizer sgd Training: 2022-04-27 01:27:18,503-: lr 0.1 Training: 2022-04-27 01:27:18,503-: momentum 0.9 Training: 2022-04-27 01:27:18,503-: weight_decay 0.0005 Training: 2022-04-27 01:27:18,503-: verbose 2000 Training: 2022-04-27 01:27:18,504-: frequent 10 Training: 2022-04-27 01:27:18,504-: dali False Training: 2022-04-27 01:27:18,504-: rec /train_tmp/faces_emore Training: 2022-04-27 01:27:18,504-: num_classes 85742 Training: 2022-04-27 01:27:18,504-: num_image 5822653 Training: 2022-04-27 01:27:18,504-: num_epoch 20 Training: 2022-04-27 01:27:18,504-: warmup_epoch 0 Training: 2022-04-27 01:27:18,504-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-27 01:27:18,504-: total_batch_size 1024 Training: 2022-04-27 01:27:18,504-: warmup_step 0 Training: 2022-04-27 01:27:18,504-: total_step 113720 Training: 2022-04-27 01:28:27,042-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-27 01:28:30,477-Speed 5360.42 samples/sec Loss 46.3456 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 01:28:32,319-Speed 5564.04 samples/sec Loss 47.0751 LearningRate 0.0999 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 01:28:35,036-Speed 3769.84 samples/sec Loss 47.3905 LearningRate 0.0999 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 01:28:36,888-Speed 5534.23 samples/sec Loss 47.9100 LearningRate 0.0999 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 01:28:38,706-Speed 5636.48 samples/sec Loss 47.6235 LearningRate 0.0999 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 01:28:40,521-Speed 5644.10 samples/sec Loss 47.0028 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 01:28:42,345-Speed 5616.69 samples/sec Loss 46.8146 LearningRate 0.0999 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 01:28:44,170-Speed 5615.92 samples/sec Loss 46.6020 LearningRate 0.0998 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 01:28:45,961-Speed 5719.33 samples/sec Loss 46.3931 LearningRate 0.0998 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 01:28:47,757-Speed 5705.15 samples/sec Loss 46.2863 LearningRate 0.0998 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:28:49,557-Speed 5692.15 samples/sec Loss 45.9337 LearningRate 0.0998 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 01:28:51,358-Speed 5689.73 samples/sec Loss 45.7566 LearningRate 0.0998 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:28:53,167-Speed 5665.25 samples/sec Loss 45.5772 LearningRate 0.0998 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:28:54,962-Speed 5708.78 samples/sec Loss 45.3376 LearningRate 0.0997 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:28:56,789-Speed 5606.88 samples/sec Loss 45.1236 LearningRate 0.0997 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:28:58,619-Speed 5597.91 samples/sec Loss 45.0515 LearningRate 0.0997 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:29:00,464-Speed 5553.50 samples/sec Loss 44.8643 LearningRate 0.0997 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:29:02,338-Speed 5468.18 samples/sec Loss 44.5414 LearningRate 0.0997 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 01:29:04,150-Speed 5653.29 samples/sec Loss 44.4042 LearningRate 0.0996 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 01:29:05,952-Speed 5685.72 samples/sec Loss 44.2332 LearningRate 0.0996 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:07,740-Speed 5728.02 samples/sec Loss 43.8737 LearningRate 0.0996 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:09,561-Speed 5626.85 samples/sec Loss 43.8407 LearningRate 0.0996 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:11,384-Speed 5623.59 samples/sec Loss 43.6009 LearningRate 0.0996 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:13,271-Speed 5428.23 samples/sec Loss 43.2571 LearningRate 0.0996 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:15,101-Speed 5599.94 samples/sec Loss 43.2277 LearningRate 0.0995 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:16,901-Speed 5691.68 samples/sec Loss 43.0493 LearningRate 0.0995 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:18,728-Speed 5608.91 samples/sec Loss 42.8201 LearningRate 0.0995 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:20,572-Speed 5556.54 samples/sec Loss 42.6947 LearningRate 0.0995 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:22,408-Speed 5579.72 samples/sec Loss 42.4567 LearningRate 0.0995 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:29:24,208-Speed 5691.81 samples/sec Loss 42.2410 LearningRate 0.0995 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:25,993-Speed 5740.73 samples/sec Loss 42.0434 LearningRate 0.0994 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:27,778-Speed 5739.99 samples/sec Loss 41.8529 LearningRate 0.0994 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:29,586-Speed 5666.01 samples/sec Loss 41.7013 LearningRate 0.0994 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:31,392-Speed 5674.10 samples/sec Loss 41.4476 LearningRate 0.0994 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:33,226-Speed 5593.65 samples/sec Loss 41.1463 LearningRate 0.0994 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:35,066-Speed 5567.77 samples/sec Loss 41.0893 LearningRate 0.0994 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:36,898-Speed 5592.77 samples/sec Loss 40.8740 LearningRate 0.0993 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:38,704-Speed 5673.70 samples/sec Loss 40.7653 LearningRate 0.0993 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:40,504-Speed 5691.46 samples/sec Loss 40.5158 LearningRate 0.0993 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:42,311-Speed 5671.69 samples/sec Loss 40.3706 LearningRate 0.0993 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:29:44,124-Speed 5649.65 samples/sec Loss 40.2503 LearningRate 0.0993 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:29:45,925-Speed 5692.03 samples/sec Loss 39.9941 LearningRate 0.0992 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:47,749-Speed 5615.71 samples/sec Loss 39.8943 LearningRate 0.0992 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:49,582-Speed 5588.60 samples/sec Loss 39.5720 LearningRate 0.0992 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:29:51,412-Speed 5598.68 samples/sec Loss 39.4801 LearningRate 0.0992 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:29:53,228-Speed 5642.47 samples/sec Loss 39.3382 LearningRate 0.0992 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:29:55,025-Speed 5701.10 samples/sec Loss 39.0744 LearningRate 0.0992 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:29:56,840-Speed 5645.23 samples/sec Loss 38.9429 LearningRate 0.0991 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:29:58,656-Speed 5641.35 samples/sec Loss 38.7262 LearningRate 0.0991 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:00,477-Speed 5626.98 samples/sec Loss 38.5426 LearningRate 0.0991 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:02,287-Speed 5662.42 samples/sec Loss 38.3403 LearningRate 0.0991 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:04,097-Speed 5661.05 samples/sec Loss 38.1049 LearningRate 0.0991 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:30:05,876-Speed 5756.98 samples/sec Loss 37.8981 LearningRate 0.0991 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:07,704-Speed 5603.79 samples/sec Loss 37.7569 LearningRate 0.0990 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:09,496-Speed 5718.73 samples/sec Loss 37.6067 LearningRate 0.0990 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:11,319-Speed 5620.43 samples/sec Loss 37.3457 LearningRate 0.0990 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:13,129-Speed 5660.87 samples/sec Loss 37.2947 LearningRate 0.0990 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:14,925-Speed 5700.58 samples/sec Loss 37.0705 LearningRate 0.0990 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:16,720-Speed 5708.65 samples/sec Loss 36.8260 LearningRate 0.0989 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:18,507-Speed 5732.65 samples/sec Loss 36.6641 LearningRate 0.0989 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:20,364-Speed 5516.45 samples/sec Loss 36.4975 LearningRate 0.0989 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:22,170-Speed 5673.02 samples/sec Loss 36.2007 LearningRate 0.0989 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:23,995-Speed 5615.45 samples/sec Loss 36.1533 LearningRate 0.0989 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:30:25,820-Speed 5613.67 samples/sec Loss 35.9954 LearningRate 0.0989 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:27,634-Speed 5646.97 samples/sec Loss 35.6941 LearningRate 0.0988 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:29,455-Speed 5624.98 samples/sec Loss 35.5594 LearningRate 0.0988 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:31,275-Speed 5628.94 samples/sec Loss 35.2871 LearningRate 0.0988 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:33,117-Speed 5563.33 samples/sec Loss 35.0293 LearningRate 0.0988 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:34,939-Speed 5621.83 samples/sec Loss 34.9440 LearningRate 0.0988 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:36,781-Speed 5563.18 samples/sec Loss 34.7333 LearningRate 0.0988 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:38,615-Speed 5583.55 samples/sec Loss 34.6076 LearningRate 0.0987 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:40,401-Speed 5737.74 samples/sec Loss 34.5230 LearningRate 0.0987 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:42,220-Speed 5634.43 samples/sec Loss 34.2310 LearningRate 0.0987 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:44,007-Speed 5730.76 samples/sec Loss 34.1398 LearningRate 0.0987 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:45,850-Speed 5560.71 samples/sec Loss 33.7545 LearningRate 0.0987 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:47,681-Speed 5593.58 samples/sec Loss 33.5612 LearningRate 0.0987 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:49,557-Speed 5459.62 samples/sec Loss 33.4901 LearningRate 0.0986 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:51,368-Speed 5656.97 samples/sec Loss 33.3739 LearningRate 0.0986 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:53,175-Speed 5670.17 samples/sec Loss 33.0390 LearningRate 0.0986 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:55,012-Speed 5576.92 samples/sec Loss 32.8303 LearningRate 0.0986 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:56,852-Speed 5569.41 samples/sec Loss 32.7205 LearningRate 0.0986 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:30:58,654-Speed 5684.57 samples/sec Loss 32.6235 LearningRate 0.0985 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:00,442-Speed 5727.37 samples/sec Loss 32.3061 LearningRate 0.0985 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:02,246-Speed 5682.62 samples/sec Loss 32.1047 LearningRate 0.0985 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:04,031-Speed 5738.87 samples/sec Loss 31.8824 LearningRate 0.0985 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:05,826-Speed 5706.15 samples/sec Loss 31.8638 LearningRate 0.0985 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:07,624-Speed 5695.89 samples/sec Loss 31.5507 LearningRate 0.0985 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:09,446-Speed 5624.57 samples/sec Loss 31.5783 LearningRate 0.0984 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:11,317-Speed 5475.16 samples/sec Loss 31.2269 LearningRate 0.0984 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:13,196-Speed 5452.76 samples/sec Loss 31.1127 LearningRate 0.0984 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:15,008-Speed 5653.30 samples/sec Loss 30.9387 LearningRate 0.0984 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:16,858-Speed 5538.43 samples/sec Loss 30.8575 LearningRate 0.0984 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:18,688-Speed 5595.83 samples/sec Loss 30.5190 LearningRate 0.0984 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:20,539-Speed 5535.77 samples/sec Loss 30.3220 LearningRate 0.0983 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:22,346-Speed 5671.37 samples/sec Loss 30.2798 LearningRate 0.0983 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:24,160-Speed 5646.44 samples/sec Loss 30.0458 LearningRate 0.0983 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:25,962-Speed 5686.34 samples/sec Loss 29.8282 LearningRate 0.0983 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:27,759-Speed 5698.86 samples/sec Loss 29.8006 LearningRate 0.0983 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:29,556-Speed 5700.82 samples/sec Loss 29.5134 LearningRate 0.0982 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:31,363-Speed 5670.29 samples/sec Loss 29.4889 LearningRate 0.0982 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:33,177-Speed 5648.63 samples/sec Loss 29.1232 LearningRate 0.0982 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:34,961-Speed 5742.95 samples/sec Loss 29.1206 LearningRate 0.0982 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:36,805-Speed 5556.00 samples/sec Loss 28.8510 LearningRate 0.0982 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:38,612-Speed 5668.65 samples/sec Loss 28.5462 LearningRate 0.0982 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:31:40,417-Speed 5675.38 samples/sec Loss 28.3833 LearningRate 0.0981 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:42,254-Speed 5577.57 samples/sec Loss 28.2652 LearningRate 0.0981 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:44,079-Speed 5614.10 samples/sec Loss 28.1105 LearningRate 0.0981 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:45,945-Speed 5489.01 samples/sec Loss 27.9250 LearningRate 0.0981 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:47,819-Speed 5465.89 samples/sec Loss 27.8872 LearningRate 0.0981 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:49,635-Speed 5641.40 samples/sec Loss 27.6505 LearningRate 0.0981 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:51,461-Speed 5610.30 samples/sec Loss 27.4609 LearningRate 0.0980 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:53,295-Speed 5589.52 samples/sec Loss 27.3734 LearningRate 0.0980 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:55,145-Speed 5541.51 samples/sec Loss 27.3261 LearningRate 0.0980 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:57,012-Speed 5486.54 samples/sec Loss 26.8758 LearningRate 0.0980 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:31:58,793-Speed 5749.02 samples/sec Loss 26.9413 LearningRate 0.0980 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:00,580-Speed 5733.93 samples/sec Loss 26.9153 LearningRate 0.0980 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:02,415-Speed 5584.69 samples/sec Loss 26.6429 LearningRate 0.0979 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:04,246-Speed 5595.38 samples/sec Loss 26.3613 LearningRate 0.0979 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:06,048-Speed 5683.09 samples/sec Loss 26.3372 LearningRate 0.0979 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:07,856-Speed 5665.79 samples/sec Loss 26.1279 LearningRate 0.0979 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:09,643-Speed 5733.30 samples/sec Loss 26.0724 LearningRate 0.0979 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:11,448-Speed 5675.07 samples/sec Loss 25.8484 LearningRate 0.0978 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:13,253-Speed 5678.31 samples/sec Loss 25.7585 LearningRate 0.0978 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:15,079-Speed 5608.86 samples/sec Loss 25.5395 LearningRate 0.0978 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:16,892-Speed 5648.37 samples/sec Loss 25.3461 LearningRate 0.0978 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:18,725-Speed 5588.87 samples/sec Loss 25.2559 LearningRate 0.0978 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:20,575-Speed 5538.12 samples/sec Loss 25.1085 LearningRate 0.0978 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:22,373-Speed 5699.20 samples/sec Loss 24.9766 LearningRate 0.0977 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:24,189-Speed 5640.46 samples/sec Loss 24.9944 LearningRate 0.0977 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:25,972-Speed 5746.87 samples/sec Loss 24.5523 LearningRate 0.0977 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:27,799-Speed 5606.16 samples/sec Loss 24.6014 LearningRate 0.0977 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:29,653-Speed 5527.19 samples/sec Loss 24.4833 LearningRate 0.0977 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:31,464-Speed 5656.09 samples/sec Loss 24.3917 LearningRate 0.0977 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:33,252-Speed 5731.19 samples/sec Loss 24.1957 LearningRate 0.0976 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:35,044-Speed 5715.23 samples/sec Loss 24.1251 LearningRate 0.0976 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:36,833-Speed 5726.18 samples/sec Loss 23.9014 LearningRate 0.0976 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:38,662-Speed 5601.21 samples/sec Loss 23.7837 LearningRate 0.0976 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:40,466-Speed 5681.01 samples/sec Loss 23.8481 LearningRate 0.0976 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:42,279-Speed 5649.30 samples/sec Loss 23.6532 LearningRate 0.0976 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:44,077-Speed 5698.08 samples/sec Loss 23.3556 LearningRate 0.0975 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:45,858-Speed 5753.03 samples/sec Loss 23.3261 LearningRate 0.0975 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:47,653-Speed 5708.74 samples/sec Loss 23.2658 LearningRate 0.0975 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:49,438-Speed 5737.30 samples/sec Loss 23.0196 LearningRate 0.0975 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:51,216-Speed 5763.30 samples/sec Loss 22.9705 LearningRate 0.0975 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:53,046-Speed 5598.06 samples/sec Loss 22.9133 LearningRate 0.0974 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:32:54,842-Speed 5703.78 samples/sec Loss 22.7707 LearningRate 0.0974 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:56,649-Speed 5669.96 samples/sec Loss 22.5527 LearningRate 0.0974 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:32:58,456-Speed 5668.04 samples/sec Loss 22.8149 LearningRate 0.0974 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:00,274-Speed 5636.18 samples/sec Loss 22.4662 LearningRate 0.0974 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:02,124-Speed 5536.62 samples/sec Loss 22.3601 LearningRate 0.0974 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:03,960-Speed 5582.75 samples/sec Loss 22.2294 LearningRate 0.0973 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:05,819-Speed 5509.12 samples/sec Loss 22.1268 LearningRate 0.0973 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:07,634-Speed 5644.80 samples/sec Loss 22.0232 LearningRate 0.0973 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:09,431-Speed 5702.16 samples/sec Loss 21.8800 LearningRate 0.0973 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:11,249-Speed 5636.45 samples/sec Loss 21.6714 LearningRate 0.0973 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:13,044-Speed 5707.05 samples/sec Loss 21.5479 LearningRate 0.0973 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:14,849-Speed 5676.57 samples/sec Loss 21.7890 LearningRate 0.0972 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:16,716-Speed 5485.03 samples/sec Loss 21.4092 LearningRate 0.0972 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:18,555-Speed 5571.77 samples/sec Loss 21.3742 LearningRate 0.0972 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:20,425-Speed 5477.48 samples/sec Loss 21.4461 LearningRate 0.0972 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:22,265-Speed 5568.26 samples/sec Loss 21.3889 LearningRate 0.0972 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:24,064-Speed 5697.26 samples/sec Loss 21.2927 LearningRate 0.0972 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:25,870-Speed 5672.53 samples/sec Loss 21.0575 LearningRate 0.0971 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:27,655-Speed 5736.93 samples/sec Loss 21.1780 LearningRate 0.0971 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:29,499-Speed 5555.67 samples/sec Loss 20.9113 LearningRate 0.0971 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:31,325-Speed 5608.96 samples/sec Loss 20.9050 LearningRate 0.0971 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:33,136-Speed 5657.85 samples/sec Loss 20.6957 LearningRate 0.0971 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:34,958-Speed 5623.96 samples/sec Loss 20.6057 LearningRate 0.0970 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:36,756-Speed 5697.64 samples/sec Loss 20.5681 LearningRate 0.0970 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:38,559-Speed 5682.21 samples/sec Loss 20.6389 LearningRate 0.0970 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:40,342-Speed 5745.56 samples/sec Loss 20.4193 LearningRate 0.0970 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:42,152-Speed 5662.49 samples/sec Loss 20.4045 LearningRate 0.0970 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:43,948-Speed 5703.15 samples/sec Loss 20.1147 LearningRate 0.0970 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:33:45,750-Speed 5683.34 samples/sec Loss 20.2711 LearningRate 0.0969 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:47,558-Speed 5668.14 samples/sec Loss 20.1070 LearningRate 0.0969 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:49,406-Speed 5542.95 samples/sec Loss 20.0374 LearningRate 0.0969 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:51,266-Speed 5507.37 samples/sec Loss 19.8804 LearningRate 0.0969 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:53,133-Speed 5488.70 samples/sec Loss 19.8550 LearningRate 0.0969 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:54,928-Speed 5707.96 samples/sec Loss 19.8572 LearningRate 0.0969 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:56,738-Speed 5661.71 samples/sec Loss 19.7349 LearningRate 0.0968 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:33:58,558-Speed 5629.07 samples/sec Loss 19.6512 LearningRate 0.0968 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:34:00,359-Speed 5688.75 samples/sec Loss 19.7015 LearningRate 0.0968 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:34:02,202-Speed 5556.81 samples/sec Loss 19.5028 LearningRate 0.0968 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:34:04,020-Speed 5635.30 samples/sec Loss 19.5194 LearningRate 0.0968 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:05,821-Speed 5690.64 samples/sec Loss 19.4473 LearningRate 0.0968 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:07,615-Speed 5708.57 samples/sec Loss 19.5072 LearningRate 0.0967 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:09,455-Speed 5568.59 samples/sec Loss 19.2960 LearningRate 0.0967 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:11,275-Speed 5629.91 samples/sec Loss 19.0530 LearningRate 0.0967 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:13,076-Speed 5686.48 samples/sec Loss 19.1925 LearningRate 0.0967 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:14,879-Speed 5683.96 samples/sec Loss 18.8920 LearningRate 0.0967 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:16,712-Speed 5587.07 samples/sec Loss 18.9327 LearningRate 0.0967 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:18,524-Speed 5653.95 samples/sec Loss 19.0520 LearningRate 0.0966 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:20,333-Speed 5662.96 samples/sec Loss 18.7798 LearningRate 0.0966 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:22,138-Speed 5677.25 samples/sec Loss 18.8671 LearningRate 0.0966 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:34:23,957-Speed 5631.32 samples/sec Loss 18.8507 LearningRate 0.0966 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:25,784-Speed 5609.18 samples/sec Loss 18.6976 LearningRate 0.0966 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:27,615-Speed 5593.43 samples/sec Loss 18.5995 LearningRate 0.0965 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:29,447-Speed 5591.64 samples/sec Loss 18.7519 LearningRate 0.0965 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:34:31,241-Speed 5711.45 samples/sec Loss 18.6280 LearningRate 0.0965 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:35:00,033-[lfw][2000]XNorm: 20.626767 Training: 2022-04-27 01:35:00,033-[lfw][2000]Accuracy-Flip: 0.98117+-0.00597 Training: 2022-04-27 01:35:00,033-[lfw][2000]Accuracy-Highest: 0.98117 Training: 2022-04-27 01:35:30,987-[cfp_fp][2000]XNorm: 17.467478 Training: 2022-04-27 01:35:30,987-[cfp_fp][2000]Accuracy-Flip: 0.77686+-0.01619 Training: 2022-04-27 01:35:30,988-[cfp_fp][2000]Accuracy-Highest: 0.77686 Training: 2022-04-27 01:35:57,530-[agedb_30][2000]XNorm: 20.021395 Training: 2022-04-27 01:35:57,531-[agedb_30][2000]Accuracy-Flip: 0.89350+-0.02585 Training: 2022-04-27 01:35:57,532-[agedb_30][2000]Accuracy-Highest: 0.89350 Training: 2022-04-27 01:35:59,397-Speed 116.16 samples/sec Loss 18.3722 LearningRate 0.0965 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:01,213-Speed 5641.63 samples/sec Loss 18.2830 LearningRate 0.0965 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:03,047-Speed 5586.62 samples/sec Loss 18.2472 LearningRate 0.0965 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:04,855-Speed 5667.01 samples/sec Loss 18.3900 LearningRate 0.0964 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:06,690-Speed 5580.39 samples/sec Loss 18.1385 LearningRate 0.0964 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:08,544-Speed 5528.78 samples/sec Loss 18.3360 LearningRate 0.0964 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:10,411-Speed 5486.35 samples/sec Loss 18.5458 LearningRate 0.0964 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:12,276-Speed 5492.72 samples/sec Loss 18.2607 LearningRate 0.0964 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:14,150-Speed 5467.99 samples/sec Loss 18.0803 LearningRate 0.0964 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:15,947-Speed 5698.32 samples/sec Loss 18.2470 LearningRate 0.0963 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:17,753-Speed 5672.66 samples/sec Loss 17.9455 LearningRate 0.0963 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:19,538-Speed 5739.74 samples/sec Loss 17.7721 LearningRate 0.0963 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:21,330-Speed 5717.43 samples/sec Loss 17.7421 LearningRate 0.0963 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:23,143-Speed 5651.35 samples/sec Loss 17.6527 LearningRate 0.0963 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:24,933-Speed 5723.58 samples/sec Loss 17.5630 LearningRate 0.0963 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:26,727-Speed 5708.87 samples/sec Loss 17.5829 LearningRate 0.0962 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:28,547-Speed 5632.39 samples/sec Loss 17.6403 LearningRate 0.0962 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:30,341-Speed 5710.66 samples/sec Loss 17.4217 LearningRate 0.0962 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:32,126-Speed 5739.03 samples/sec Loss 17.5170 LearningRate 0.0962 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:33,937-Speed 5660.62 samples/sec Loss 17.5091 LearningRate 0.0962 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:35,720-Speed 5746.38 samples/sec Loss 17.4380 LearningRate 0.0962 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:37,523-Speed 5679.62 samples/sec Loss 17.2893 LearningRate 0.0961 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:39,335-Speed 5657.40 samples/sec Loss 17.2331 LearningRate 0.0961 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:41,131-Speed 5703.93 samples/sec Loss 17.2973 LearningRate 0.0961 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:36:42,942-Speed 5657.25 samples/sec Loss 17.2356 LearningRate 0.0961 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:44,730-Speed 5730.21 samples/sec Loss 17.3028 LearningRate 0.0961 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:46,557-Speed 5611.03 samples/sec Loss 17.1295 LearningRate 0.0960 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:48,372-Speed 5646.54 samples/sec Loss 17.0288 LearningRate 0.0960 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:50,199-Speed 5607.84 samples/sec Loss 17.0225 LearningRate 0.0960 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:51,996-Speed 5701.16 samples/sec Loss 16.8710 LearningRate 0.0960 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:53,785-Speed 5725.42 samples/sec Loss 17.0499 LearningRate 0.0960 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:55,576-Speed 5718.00 samples/sec Loss 16.8542 LearningRate 0.0960 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:57,374-Speed 5699.16 samples/sec Loss 17.0558 LearningRate 0.0959 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:36:59,179-Speed 5678.37 samples/sec Loss 16.7291 LearningRate 0.0959 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:00,989-Speed 5661.18 samples/sec Loss 16.6541 LearningRate 0.0959 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:02,794-Speed 5673.27 samples/sec Loss 16.7534 LearningRate 0.0959 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:04,605-Speed 5657.81 samples/sec Loss 16.7782 LearningRate 0.0959 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:06,389-Speed 5741.24 samples/sec Loss 16.8834 LearningRate 0.0959 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:08,190-Speed 5689.88 samples/sec Loss 16.7004 LearningRate 0.0958 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:09,994-Speed 5677.93 samples/sec Loss 16.5012 LearningRate 0.0958 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:11,782-Speed 5730.96 samples/sec Loss 16.6023 LearningRate 0.0958 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:13,571-Speed 5726.84 samples/sec Loss 16.4872 LearningRate 0.0958 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:15,353-Speed 5750.67 samples/sec Loss 16.5775 LearningRate 0.0958 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:17,173-Speed 5629.39 samples/sec Loss 16.6165 LearningRate 0.0958 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:18,951-Speed 5761.46 samples/sec Loss 16.4181 LearningRate 0.0957 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:20,760-Speed 5663.56 samples/sec Loss 16.4520 LearningRate 0.0957 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:22,563-Speed 5680.94 samples/sec Loss 16.2132 LearningRate 0.0957 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:24,349-Speed 5736.18 samples/sec Loss 16.1577 LearningRate 0.0957 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:26,141-Speed 5716.57 samples/sec Loss 16.2249 LearningRate 0.0957 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:27,928-Speed 5733.62 samples/sec Loss 16.1782 LearningRate 0.0957 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:29,751-Speed 5621.66 samples/sec Loss 16.3545 LearningRate 0.0956 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:31,551-Speed 5693.16 samples/sec Loss 16.2255 LearningRate 0.0956 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:33,347-Speed 5702.91 samples/sec Loss 16.1851 LearningRate 0.0956 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:35,155-Speed 5690.39 samples/sec Loss 15.9891 LearningRate 0.0956 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:36,962-Speed 5671.33 samples/sec Loss 16.1700 LearningRate 0.0956 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:38,787-Speed 5612.41 samples/sec Loss 16.1010 LearningRate 0.0955 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:40,572-Speed 5739.81 samples/sec Loss 16.0645 LearningRate 0.0955 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:42,367-Speed 5706.58 samples/sec Loss 15.9780 LearningRate 0.0955 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:44,186-Speed 5632.82 samples/sec Loss 15.8056 LearningRate 0.0955 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:45,996-Speed 5659.87 samples/sec Loss 16.0167 LearningRate 0.0955 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:47,782-Speed 5735.08 samples/sec Loss 15.8945 LearningRate 0.0955 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:49,584-Speed 5687.84 samples/sec Loss 15.8537 LearningRate 0.0954 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:51,377-Speed 5713.52 samples/sec Loss 15.7436 LearningRate 0.0954 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:53,159-Speed 5749.43 samples/sec Loss 15.7922 LearningRate 0.0954 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:54,970-Speed 5655.36 samples/sec Loss 15.8897 LearningRate 0.0954 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:37:56,783-Speed 5651.12 samples/sec Loss 15.6177 LearningRate 0.0954 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:37:58,573-Speed 5721.72 samples/sec Loss 15.7162 LearningRate 0.0954 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:00,372-Speed 5695.53 samples/sec Loss 15.6855 LearningRate 0.0953 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:02,161-Speed 5726.68 samples/sec Loss 15.7855 LearningRate 0.0953 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:03,966-Speed 5689.08 samples/sec Loss 15.8962 LearningRate 0.0953 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:05,758-Speed 5716.65 samples/sec Loss 15.5118 LearningRate 0.0953 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:07,548-Speed 5723.06 samples/sec Loss 15.6303 LearningRate 0.0953 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:09,349-Speed 5688.09 samples/sec Loss 15.6411 LearningRate 0.0953 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:11,138-Speed 5726.96 samples/sec Loss 15.4358 LearningRate 0.0952 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:12,949-Speed 5698.70 samples/sec Loss 15.4013 LearningRate 0.0952 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:14,739-Speed 5724.68 samples/sec Loss 15.4621 LearningRate 0.0952 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:16,571-Speed 5590.99 samples/sec Loss 15.4899 LearningRate 0.0952 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:18,404-Speed 5589.43 samples/sec Loss 15.4873 LearningRate 0.0952 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:20,216-Speed 5661.39 samples/sec Loss 15.5077 LearningRate 0.0952 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:22,035-Speed 5630.46 samples/sec Loss 15.5228 LearningRate 0.0951 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:23,849-Speed 5648.78 samples/sec Loss 15.3901 LearningRate 0.0951 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:25,636-Speed 5730.20 samples/sec Loss 15.3192 LearningRate 0.0951 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:27,425-Speed 5727.77 samples/sec Loss 15.3243 LearningRate 0.0951 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:29,223-Speed 5696.26 samples/sec Loss 15.4703 LearningRate 0.0951 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:31,026-Speed 5683.16 samples/sec Loss 15.2768 LearningRate 0.0951 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:32,830-Speed 5704.38 samples/sec Loss 15.1758 LearningRate 0.0950 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:38:34,613-Speed 5746.34 samples/sec Loss 15.0984 LearningRate 0.0950 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:36,396-Speed 5746.25 samples/sec Loss 15.1071 LearningRate 0.0950 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:38,186-Speed 5723.16 samples/sec Loss 14.8835 LearningRate 0.0950 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:39,969-Speed 5747.54 samples/sec Loss 15.0346 LearningRate 0.0950 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:41,777-Speed 5664.77 samples/sec Loss 15.3479 LearningRate 0.0949 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:43,598-Speed 5634.59 samples/sec Loss 14.8801 LearningRate 0.0949 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:45,411-Speed 5648.42 samples/sec Loss 15.0555 LearningRate 0.0949 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:47,205-Speed 5710.99 samples/sec Loss 15.1311 LearningRate 0.0949 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:49,027-Speed 5624.00 samples/sec Loss 15.0430 LearningRate 0.0949 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:50,854-Speed 5625.23 samples/sec Loss 14.9190 LearningRate 0.0949 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:52,664-Speed 5661.20 samples/sec Loss 14.9401 LearningRate 0.0948 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:54,484-Speed 5625.92 samples/sec Loss 15.1299 LearningRate 0.0948 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:56,274-Speed 5724.18 samples/sec Loss 14.9492 LearningRate 0.0948 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:58,084-Speed 5660.36 samples/sec Loss 14.9079 LearningRate 0.0948 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:38:59,927-Speed 5558.86 samples/sec Loss 14.8103 LearningRate 0.0948 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:01,713-Speed 5735.60 samples/sec Loss 14.9088 LearningRate 0.0948 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:03,539-Speed 5630.78 samples/sec Loss 14.9134 LearningRate 0.0947 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:05,341-Speed 5685.98 samples/sec Loss 15.0781 LearningRate 0.0947 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:07,121-Speed 5754.44 samples/sec Loss 14.8235 LearningRate 0.0947 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:08,913-Speed 5714.64 samples/sec Loss 14.8098 LearningRate 0.0947 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:10,720-Speed 5672.00 samples/sec Loss 14.7361 LearningRate 0.0947 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:12,519-Speed 5693.26 samples/sec Loss 14.6106 LearningRate 0.0947 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:14,338-Speed 5690.85 samples/sec Loss 14.6268 LearningRate 0.0946 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:16,135-Speed 5700.38 samples/sec Loss 14.6599 LearningRate 0.0946 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:39:17,944-Speed 5660.62 samples/sec Loss 14.7569 LearningRate 0.0946 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:19,742-Speed 5700.14 samples/sec Loss 14.7106 LearningRate 0.0946 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:21,564-Speed 5653.16 samples/sec Loss 14.6159 LearningRate 0.0946 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:23,372-Speed 5664.32 samples/sec Loss 14.6502 LearningRate 0.0946 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:25,203-Speed 5595.44 samples/sec Loss 14.4723 LearningRate 0.0945 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:27,015-Speed 5653.28 samples/sec Loss 14.5233 LearningRate 0.0945 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:28,825-Speed 5660.93 samples/sec Loss 14.4528 LearningRate 0.0945 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:30,672-Speed 5547.76 samples/sec Loss 14.6016 LearningRate 0.0945 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:32,482-Speed 5658.35 samples/sec Loss 14.4854 LearningRate 0.0945 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:34,337-Speed 5626.94 samples/sec Loss 14.5896 LearningRate 0.0945 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:39:36,214-Speed 5456.79 samples/sec Loss 14.4261 LearningRate 0.0944 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:39:38,025-Speed 5657.21 samples/sec Loss 14.3774 LearningRate 0.0944 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:39:39,822-Speed 5701.94 samples/sec Loss 14.3545 LearningRate 0.0944 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:41,603-Speed 5753.09 samples/sec Loss 14.2725 LearningRate 0.0944 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:43,426-Speed 5639.92 samples/sec Loss 14.5015 LearningRate 0.0944 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:45,210-Speed 5743.61 samples/sec Loss 14.2725 LearningRate 0.0943 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:47,032-Speed 5623.24 samples/sec Loss 14.2288 LearningRate 0.0943 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:48,831-Speed 5693.74 samples/sec Loss 14.2481 LearningRate 0.0943 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:50,653-Speed 5671.25 samples/sec Loss 14.4050 LearningRate 0.0943 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:52,445-Speed 5715.74 samples/sec Loss 14.3579 LearningRate 0.0943 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:54,239-Speed 5712.47 samples/sec Loss 14.1832 LearningRate 0.0943 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:56,042-Speed 5682.36 samples/sec Loss 14.2983 LearningRate 0.0942 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:57,838-Speed 5700.97 samples/sec Loss 14.1763 LearningRate 0.0942 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:39:59,649-Speed 5658.60 samples/sec Loss 14.2785 LearningRate 0.0942 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:01,481-Speed 5590.40 samples/sec Loss 14.1675 LearningRate 0.0942 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:03,284-Speed 5700.67 samples/sec Loss 14.0374 LearningRate 0.0942 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:05,107-Speed 5618.99 samples/sec Loss 14.3008 LearningRate 0.0942 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:06,905-Speed 5697.36 samples/sec Loss 14.1350 LearningRate 0.0941 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:08,718-Speed 5651.04 samples/sec Loss 14.1431 LearningRate 0.0941 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:10,540-Speed 5624.11 samples/sec Loss 14.0206 LearningRate 0.0941 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:12,321-Speed 5750.66 samples/sec Loss 14.1414 LearningRate 0.0941 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:14,136-Speed 5647.43 samples/sec Loss 14.1572 LearningRate 0.0941 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:15,939-Speed 5680.12 samples/sec Loss 13.8228 LearningRate 0.0941 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:17,732-Speed 5716.05 samples/sec Loss 14.1407 LearningRate 0.0940 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:19,533-Speed 5688.07 samples/sec Loss 14.0596 LearningRate 0.0940 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:21,336-Speed 5702.70 samples/sec Loss 13.8486 LearningRate 0.0940 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:23,123-Speed 5731.07 samples/sec Loss 14.1734 LearningRate 0.0940 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:24,920-Speed 5700.47 samples/sec Loss 14.0695 LearningRate 0.0940 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:26,719-Speed 5697.25 samples/sec Loss 13.8443 LearningRate 0.0940 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:28,502-Speed 5743.63 samples/sec Loss 14.1173 LearningRate 0.0939 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:30,291-Speed 5726.08 samples/sec Loss 14.1101 LearningRate 0.0939 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:32,081-Speed 5723.42 samples/sec Loss 13.9457 LearningRate 0.0939 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:33,884-Speed 5713.08 samples/sec Loss 14.0378 LearningRate 0.0939 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:40:35,674-Speed 5722.39 samples/sec Loss 13.8730 LearningRate 0.0939 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:37,474-Speed 5693.18 samples/sec Loss 13.7425 LearningRate 0.0939 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:39,261-Speed 5734.25 samples/sec Loss 13.8816 LearningRate 0.0938 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:41,052-Speed 5719.05 samples/sec Loss 13.8947 LearningRate 0.0938 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:42,863-Speed 5657.35 samples/sec Loss 13.8119 LearningRate 0.0938 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:44,675-Speed 5671.58 samples/sec Loss 13.5636 LearningRate 0.0938 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:46,474-Speed 5720.62 samples/sec Loss 13.4996 LearningRate 0.0938 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:48,284-Speed 5660.00 samples/sec Loss 13.8682 LearningRate 0.0938 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:50,099-Speed 5643.33 samples/sec Loss 13.9597 LearningRate 0.0937 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:51,921-Speed 5656.38 samples/sec Loss 13.7929 LearningRate 0.0937 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:53,739-Speed 5635.59 samples/sec Loss 13.6289 LearningRate 0.0937 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:55,528-Speed 5725.66 samples/sec Loss 13.8986 LearningRate 0.0937 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:57,343-Speed 5643.06 samples/sec Loss 13.7506 LearningRate 0.0937 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:40:59,153-Speed 5657.99 samples/sec Loss 13.7127 LearningRate 0.0936 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:00,962-Speed 5665.55 samples/sec Loss 13.7760 LearningRate 0.0936 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:02,781-Speed 5632.51 samples/sec Loss 13.7209 LearningRate 0.0936 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:04,590-Speed 5717.54 samples/sec Loss 13.7394 LearningRate 0.0936 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:06,378-Speed 5727.61 samples/sec Loss 13.6975 LearningRate 0.0936 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:08,177-Speed 5695.63 samples/sec Loss 13.7584 LearningRate 0.0936 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:09,972-Speed 5706.28 samples/sec Loss 13.3949 LearningRate 0.0935 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:11,783-Speed 5655.79 samples/sec Loss 13.6693 LearningRate 0.0935 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:13,576-Speed 5742.67 samples/sec Loss 13.5243 LearningRate 0.0935 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:15,361-Speed 5739.10 samples/sec Loss 13.7714 LearningRate 0.0935 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:17,169-Speed 5664.65 samples/sec Loss 13.8382 LearningRate 0.0935 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:18,959-Speed 5724.89 samples/sec Loss 13.6976 LearningRate 0.0935 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:20,786-Speed 5625.79 samples/sec Loss 13.3342 LearningRate 0.0934 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:22,592-Speed 5670.55 samples/sec Loss 13.5139 LearningRate 0.0934 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:24,429-Speed 5578.98 samples/sec Loss 13.4156 LearningRate 0.0934 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:41:26,257-Speed 5603.64 samples/sec Loss 13.4666 LearningRate 0.0934 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:28,057-Speed 5692.22 samples/sec Loss 13.3318 LearningRate 0.0934 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:29,886-Speed 5599.71 samples/sec Loss 13.5268 LearningRate 0.0934 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:31,698-Speed 5654.19 samples/sec Loss 13.4212 LearningRate 0.0933 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:33,491-Speed 5735.62 samples/sec Loss 13.5047 LearningRate 0.0933 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:35,287-Speed 5702.77 samples/sec Loss 13.5762 LearningRate 0.0933 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:37,084-Speed 5697.91 samples/sec Loss 13.5844 LearningRate 0.0933 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:38,876-Speed 5717.36 samples/sec Loss 13.2985 LearningRate 0.0933 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:40,694-Speed 5635.47 samples/sec Loss 13.4698 LearningRate 0.0933 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:42,483-Speed 5726.21 samples/sec Loss 13.5311 LearningRate 0.0932 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:44,290-Speed 5705.43 samples/sec Loss 13.2960 LearningRate 0.0932 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:46,087-Speed 5699.78 samples/sec Loss 13.3494 LearningRate 0.0932 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:47,872-Speed 5742.96 samples/sec Loss 13.3692 LearningRate 0.0932 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:49,720-Speed 5622.20 samples/sec Loss 13.3668 LearningRate 0.0932 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:51,524-Speed 5678.99 samples/sec Loss 13.5243 LearningRate 0.0932 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:53,342-Speed 5637.41 samples/sec Loss 13.2178 LearningRate 0.0931 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:55,137-Speed 5706.86 samples/sec Loss 13.3711 LearningRate 0.0931 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:56,974-Speed 5615.31 samples/sec Loss 13.4024 LearningRate 0.0931 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:41:58,778-Speed 5679.26 samples/sec Loss 13.2602 LearningRate 0.0931 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:42:25,482-[lfw][4000]XNorm: 21.498628 Training: 2022-04-27 01:42:25,506-[lfw][4000]Accuracy-Flip: 0.98983+-0.00425 Training: 2022-04-27 01:42:25,507-[lfw][4000]Accuracy-Highest: 0.98983 Training: 2022-04-27 01:42:56,338-[cfp_fp][4000]XNorm: 18.556288 Training: 2022-04-27 01:42:56,340-[cfp_fp][4000]Accuracy-Flip: 0.86286+-0.01287 Training: 2022-04-27 01:42:56,341-[cfp_fp][4000]Accuracy-Highest: 0.86286 Training: 2022-04-27 01:43:22,866-[agedb_30][4000]XNorm: 21.040543 Training: 2022-04-27 01:43:22,936-[agedb_30][4000]Accuracy-Flip: 0.93167+-0.01412 Training: 2022-04-27 01:43:22,936-[agedb_30][4000]Accuracy-Highest: 0.93167 Training: 2022-04-27 01:43:24,738-Speed 119.13 samples/sec Loss 13.2713 LearningRate 0.0931 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:26,553-Speed 5645.25 samples/sec Loss 13.2576 LearningRate 0.0931 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:43:28,377-Speed 5616.87 samples/sec Loss 13.2427 LearningRate 0.0930 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:30,190-Speed 5650.72 samples/sec Loss 13.2277 LearningRate 0.0930 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:31,987-Speed 5708.39 samples/sec Loss 13.2277 LearningRate 0.0930 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:33,784-Speed 5700.63 samples/sec Loss 13.1456 LearningRate 0.0930 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:35,567-Speed 5745.03 samples/sec Loss 13.1526 LearningRate 0.0930 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:37,365-Speed 5695.96 samples/sec Loss 13.0568 LearningRate 0.0930 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:39,202-Speed 5578.33 samples/sec Loss 13.2792 LearningRate 0.0929 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:41,105-Speed 5383.24 samples/sec Loss 13.1169 LearningRate 0.0929 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:42,901-Speed 5701.79 samples/sec Loss 13.1134 LearningRate 0.0929 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:44,693-Speed 5729.05 samples/sec Loss 13.3161 LearningRate 0.0929 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:46,483-Speed 5725.73 samples/sec Loss 13.0973 LearningRate 0.0929 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:43:48,265-Speed 5746.19 samples/sec Loss 13.2733 LearningRate 0.0929 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:50,059-Speed 5731.72 samples/sec Loss 12.9995 LearningRate 0.0928 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:51,866-Speed 5669.03 samples/sec Loss 13.0621 LearningRate 0.0928 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:53,657-Speed 5719.84 samples/sec Loss 13.0668 LearningRate 0.0928 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:55,452-Speed 5708.41 samples/sec Loss 12.9115 LearningRate 0.0928 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:57,263-Speed 5686.64 samples/sec Loss 12.9041 LearningRate 0.0928 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:43:59,080-Speed 5637.01 samples/sec Loss 12.9927 LearningRate 0.0927 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:00,875-Speed 5706.98 samples/sec Loss 13.1281 LearningRate 0.0927 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:02,678-Speed 5682.15 samples/sec Loss 12.6766 LearningRate 0.0927 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:04,468-Speed 5724.09 samples/sec Loss 13.0092 LearningRate 0.0927 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:06,254-Speed 5735.55 samples/sec Loss 12.9966 LearningRate 0.0927 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:08,041-Speed 5732.82 samples/sec Loss 13.0313 LearningRate 0.0927 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:09,842-Speed 5707.06 samples/sec Loss 12.9197 LearningRate 0.0926 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:11,641-Speed 5695.60 samples/sec Loss 12.8718 LearningRate 0.0926 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:13,477-Speed 5578.33 samples/sec Loss 13.0006 LearningRate 0.0926 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:15,336-Speed 5510.57 samples/sec Loss 13.0317 LearningRate 0.0926 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:17,143-Speed 5668.33 samples/sec Loss 13.0785 LearningRate 0.0926 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:18,946-Speed 5683.60 samples/sec Loss 12.7946 LearningRate 0.0926 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:20,736-Speed 5722.75 samples/sec Loss 12.9522 LearningRate 0.0925 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:22,552-Speed 5642.98 samples/sec Loss 12.8313 LearningRate 0.0925 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:24,363-Speed 5665.25 samples/sec Loss 12.9213 LearningRate 0.0925 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:26,157-Speed 5710.05 samples/sec Loss 12.9417 LearningRate 0.0925 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:27,945-Speed 5729.78 samples/sec Loss 12.7878 LearningRate 0.0925 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:29,790-Speed 5551.05 samples/sec Loss 12.8949 LearningRate 0.0925 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:31,616-Speed 5613.83 samples/sec Loss 12.8251 LearningRate 0.0924 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:33,433-Speed 5637.72 samples/sec Loss 12.9767 LearningRate 0.0924 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:35,243-Speed 5658.57 samples/sec Loss 12.7866 LearningRate 0.0924 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:37,061-Speed 5636.22 samples/sec Loss 12.7060 LearningRate 0.0924 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:38,857-Speed 5705.36 samples/sec Loss 12.7602 LearningRate 0.0924 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:40,646-Speed 5726.22 samples/sec Loss 12.8291 LearningRate 0.0924 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:42,443-Speed 5698.72 samples/sec Loss 12.7631 LearningRate 0.0923 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:44,262-Speed 5634.36 samples/sec Loss 12.7748 LearningRate 0.0923 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:44:46,081-Speed 5631.35 samples/sec Loss 12.5993 LearningRate 0.0923 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:47,896-Speed 5644.73 samples/sec Loss 12.5352 LearningRate 0.0923 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:49,687-Speed 5720.38 samples/sec Loss 12.6741 LearningRate 0.0923 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:51,496-Speed 5662.56 samples/sec Loss 12.5767 LearningRate 0.0923 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:53,302-Speed 5671.84 samples/sec Loss 12.7922 LearningRate 0.0922 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:55,117-Speed 5647.85 samples/sec Loss 12.7052 LearningRate 0.0922 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:56,937-Speed 5627.01 samples/sec Loss 12.6796 LearningRate 0.0922 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:44:58,796-Speed 5510.37 samples/sec Loss 12.7016 LearningRate 0.0922 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:00,590-Speed 5713.18 samples/sec Loss 12.6243 LearningRate 0.0922 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:02,394-Speed 5683.33 samples/sec Loss 12.6991 LearningRate 0.0922 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:04,214-Speed 5630.91 samples/sec Loss 12.5648 LearningRate 0.0921 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:45:06,003-Speed 5726.63 samples/sec Loss 12.4926 LearningRate 0.0921 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:07,820-Speed 5637.94 samples/sec Loss 12.7147 LearningRate 0.0921 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:09,607-Speed 5731.73 samples/sec Loss 12.6041 LearningRate 0.0921 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:11,395-Speed 5728.57 samples/sec Loss 12.5444 LearningRate 0.0921 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:13,191-Speed 5705.52 samples/sec Loss 12.7611 LearningRate 0.0921 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:14,996-Speed 5675.76 samples/sec Loss 12.5239 LearningRate 0.0920 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:16,816-Speed 5628.70 samples/sec Loss 12.4999 LearningRate 0.0920 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:18,604-Speed 5730.61 samples/sec Loss 12.5114 LearningRate 0.0920 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:20,430-Speed 5608.82 samples/sec Loss 12.5924 LearningRate 0.0920 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:22,234-Speed 5680.65 samples/sec Loss 12.7173 LearningRate 0.0920 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:24,025-Speed 5719.19 samples/sec Loss 12.6523 LearningRate 0.0920 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:25,821-Speed 5704.67 samples/sec Loss 12.4798 LearningRate 0.0919 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:27,618-Speed 5700.31 samples/sec Loss 12.5356 LearningRate 0.0919 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:29,441-Speed 5621.63 samples/sec Loss 12.5079 LearningRate 0.0919 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:31,280-Speed 5572.27 samples/sec Loss 12.5542 LearningRate 0.0919 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:33,093-Speed 5649.34 samples/sec Loss 12.4672 LearningRate 0.0919 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:34,894-Speed 5687.34 samples/sec Loss 12.5747 LearningRate 0.0919 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:36,709-Speed 5646.38 samples/sec Loss 12.5108 LearningRate 0.0918 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:38,514-Speed 5672.58 samples/sec Loss 12.4374 LearningRate 0.0918 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:40,356-Speed 5564.39 samples/sec Loss 12.4691 LearningRate 0.0918 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:42,184-Speed 5602.71 samples/sec Loss 12.3608 LearningRate 0.0918 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:43,982-Speed 5698.10 samples/sec Loss 12.6885 LearningRate 0.0918 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:45:45,802-Speed 5629.99 samples/sec Loss 12.3994 LearningRate 0.0918 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:47,612-Speed 5661.06 samples/sec Loss 12.5682 LearningRate 0.0917 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:49,417-Speed 5673.84 samples/sec Loss 12.3157 LearningRate 0.0917 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:51,224-Speed 5671.35 samples/sec Loss 12.5211 LearningRate 0.0917 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:53,025-Speed 5686.19 samples/sec Loss 12.4565 LearningRate 0.0917 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:54,823-Speed 5700.48 samples/sec Loss 12.1828 LearningRate 0.0917 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:56,625-Speed 5683.59 samples/sec Loss 12.4118 LearningRate 0.0917 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:45:58,457-Speed 5593.95 samples/sec Loss 12.5142 LearningRate 0.0916 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:00,264-Speed 5669.75 samples/sec Loss 12.3388 LearningRate 0.0916 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:02,083-Speed 5632.20 samples/sec Loss 12.3123 LearningRate 0.0916 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:03,899-Speed 5642.17 samples/sec Loss 12.4451 LearningRate 0.0916 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:05,721-Speed 5621.89 samples/sec Loss 12.3494 LearningRate 0.0916 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:07,529-Speed 5664.86 samples/sec Loss 12.6278 LearningRate 0.0916 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:09,329-Speed 5691.85 samples/sec Loss 12.4653 LearningRate 0.0915 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:11,124-Speed 5707.87 samples/sec Loss 12.2095 LearningRate 0.0915 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:12,932-Speed 5667.65 samples/sec Loss 12.2476 LearningRate 0.0915 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:14,787-Speed 5521.33 samples/sec Loss 12.4260 LearningRate 0.0915 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:16,591-Speed 5680.63 samples/sec Loss 12.3417 LearningRate 0.0915 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:18,383-Speed 5714.36 samples/sec Loss 12.3979 LearningRate 0.0915 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:20,176-Speed 5714.20 samples/sec Loss 12.3522 LearningRate 0.0914 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:21,967-Speed 5721.50 samples/sec Loss 12.3673 LearningRate 0.0914 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:23,760-Speed 5710.88 samples/sec Loss 12.2655 LearningRate 0.0914 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:25,553-Speed 5715.73 samples/sec Loss 12.1534 LearningRate 0.0914 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:27,346-Speed 5711.02 samples/sec Loss 12.2785 LearningRate 0.0914 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:29,132-Speed 5737.79 samples/sec Loss 12.2490 LearningRate 0.0913 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:30,922-Speed 5724.79 samples/sec Loss 12.2661 LearningRate 0.0913 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:32,729-Speed 5668.91 samples/sec Loss 12.3663 LearningRate 0.0913 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:34,514-Speed 5739.03 samples/sec Loss 12.1940 LearningRate 0.0913 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:36,299-Speed 5738.11 samples/sec Loss 12.1720 LearningRate 0.0913 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:38,101-Speed 5688.02 samples/sec Loss 12.2789 LearningRate 0.0913 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:39,890-Speed 5725.31 samples/sec Loss 12.0730 LearningRate 0.0912 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:41,691-Speed 5688.23 samples/sec Loss 12.1125 LearningRate 0.0912 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:43,488-Speed 5702.38 samples/sec Loss 12.1093 LearningRate 0.0912 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:45,297-Speed 5660.57 samples/sec Loss 12.2699 LearningRate 0.0912 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:47,141-Speed 5558.09 samples/sec Loss 12.1064 LearningRate 0.0912 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:48,926-Speed 5738.09 samples/sec Loss 12.2239 LearningRate 0.0912 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:50,747-Speed 5627.01 samples/sec Loss 12.1771 LearningRate 0.0911 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:52,562-Speed 5643.42 samples/sec Loss 12.2539 LearningRate 0.0911 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:54,403-Speed 5567.21 samples/sec Loss 12.2019 LearningRate 0.0911 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:46:56,223-Speed 5627.75 samples/sec Loss 12.1189 LearningRate 0.0911 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:58,016-Speed 5713.48 samples/sec Loss 12.1945 LearningRate 0.0911 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:46:59,820-Speed 5681.54 samples/sec Loss 12.2827 LearningRate 0.0911 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:47:01,638-Speed 5635.89 samples/sec Loss 12.2747 LearningRate 0.0910 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:47:03,432-Speed 5708.47 samples/sec Loss 12.1566 LearningRate 0.0910 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:47:05,222-Speed 5722.42 samples/sec Loss 12.1403 LearningRate 0.0910 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:47:07,032-Speed 5661.71 samples/sec Loss 12.1642 LearningRate 0.0910 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:47:08,823-Speed 5719.76 samples/sec Loss 11.9315 LearningRate 0.0910 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:10,624-Speed 5690.74 samples/sec Loss 11.8864 LearningRate 0.0910 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:12,473-Speed 5538.80 samples/sec Loss 12.2182 LearningRate 0.0909 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:14,299-Speed 5612.37 samples/sec Loss 12.0049 LearningRate 0.0909 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:16,107-Speed 5666.08 samples/sec Loss 12.2923 LearningRate 0.0909 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:17,937-Speed 5598.00 samples/sec Loss 12.1841 LearningRate 0.0909 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:19,746-Speed 5664.56 samples/sec Loss 12.1385 LearningRate 0.0909 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:21,532-Speed 5735.52 samples/sec Loss 12.2674 LearningRate 0.0909 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:23,321-Speed 5727.00 samples/sec Loss 12.1192 LearningRate 0.0908 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:25,135-Speed 5647.00 samples/sec Loss 12.2730 LearningRate 0.0908 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:26,945-Speed 5660.02 samples/sec Loss 12.1155 LearningRate 0.0908 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:28,735-Speed 5725.74 samples/sec Loss 12.1033 LearningRate 0.0908 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:30,545-Speed 5660.89 samples/sec Loss 12.0922 LearningRate 0.0908 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:32,328-Speed 5744.10 samples/sec Loss 12.0104 LearningRate 0.0908 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:34,123-Speed 5706.01 samples/sec Loss 11.9554 LearningRate 0.0907 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:35,948-Speed 5615.88 samples/sec Loss 12.0398 LearningRate 0.0907 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:37,755-Speed 5668.66 samples/sec Loss 11.9698 LearningRate 0.0907 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:47:39,572-Speed 5637.60 samples/sec Loss 12.0964 LearningRate 0.0907 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:41,375-Speed 5684.51 samples/sec Loss 11.9936 LearningRate 0.0907 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:43,174-Speed 5694.28 samples/sec Loss 12.0363 LearningRate 0.0907 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:44,995-Speed 5625.01 samples/sec Loss 12.0406 LearningRate 0.0906 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:46,800-Speed 5674.26 samples/sec Loss 12.0182 LearningRate 0.0906 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:48,603-Speed 5685.05 samples/sec Loss 11.9138 LearningRate 0.0906 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:50,412-Speed 5661.96 samples/sec Loss 11.9201 LearningRate 0.0906 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:52,226-Speed 5648.01 samples/sec Loss 12.0156 LearningRate 0.0906 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:54,068-Speed 5560.56 samples/sec Loss 12.0148 LearningRate 0.0906 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:55,871-Speed 5683.07 samples/sec Loss 11.9307 LearningRate 0.0905 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:47:57,716-Speed 5552.68 samples/sec Loss 11.8723 LearningRate 0.0905 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:47:59,538-Speed 5624.77 samples/sec Loss 12.0674 LearningRate 0.0905 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:01,332-Speed 5710.18 samples/sec Loss 11.9653 LearningRate 0.0905 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:03,145-Speed 5652.94 samples/sec Loss 12.0083 LearningRate 0.0905 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:04,941-Speed 5701.97 samples/sec Loss 11.8580 LearningRate 0.0905 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:06,792-Speed 5537.03 samples/sec Loss 11.9321 LearningRate 0.0904 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:08,645-Speed 5526.59 samples/sec Loss 11.8531 LearningRate 0.0904 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:10,487-Speed 5561.81 samples/sec Loss 11.7932 LearningRate 0.0904 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:12,294-Speed 5670.26 samples/sec Loss 11.9241 LearningRate 0.0904 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:14,110-Speed 5641.69 samples/sec Loss 11.9544 LearningRate 0.0904 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:15,903-Speed 5714.44 samples/sec Loss 11.7989 LearningRate 0.0904 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:17,762-Speed 5510.07 samples/sec Loss 11.7006 LearningRate 0.0903 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:19,568-Speed 5671.63 samples/sec Loss 11.6925 LearningRate 0.0903 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:21,372-Speed 5679.14 samples/sec Loss 11.9825 LearningRate 0.0903 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:23,175-Speed 5681.21 samples/sec Loss 11.7854 LearningRate 0.0903 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:24,984-Speed 5662.55 samples/sec Loss 11.9105 LearningRate 0.0903 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:48:26,864-Speed 5450.73 samples/sec Loss 11.9494 LearningRate 0.0903 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:38,650-Speed 868.88 samples/sec Loss 11.3573 LearningRate 0.0902 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:40,471-Speed 5628.13 samples/sec Loss 11.1112 LearningRate 0.0902 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:48:42,278-Speed 5669.84 samples/sec Loss 11.0967 LearningRate 0.0902 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:44,229-Speed 5250.25 samples/sec Loss 11.0840 LearningRate 0.0902 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:46,065-Speed 5580.16 samples/sec Loss 10.9549 LearningRate 0.0902 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:47,907-Speed 5564.57 samples/sec Loss 11.1733 LearningRate 0.0902 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:49,697-Speed 5721.58 samples/sec Loss 11.1006 LearningRate 0.0901 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:51,514-Speed 5639.88 samples/sec Loss 11.0302 LearningRate 0.0901 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:53,340-Speed 5611.90 samples/sec Loss 11.0678 LearningRate 0.0901 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:55,143-Speed 5679.78 samples/sec Loss 11.1510 LearningRate 0.0901 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:56,990-Speed 5548.34 samples/sec Loss 11.1475 LearningRate 0.0901 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:48:58,817-Speed 5606.06 samples/sec Loss 11.2589 LearningRate 0.0901 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 01:49:00,609-Speed 5717.82 samples/sec Loss 11.1063 LearningRate 0.0900 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:02,448-Speed 5568.82 samples/sec Loss 11.1103 LearningRate 0.0900 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:04,297-Speed 5542.76 samples/sec Loss 11.2272 LearningRate 0.0900 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:06,143-Speed 5549.02 samples/sec Loss 11.2709 LearningRate 0.0900 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:07,968-Speed 5611.91 samples/sec Loss 11.2278 LearningRate 0.0900 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:09,813-Speed 5552.06 samples/sec Loss 11.0519 LearningRate 0.0900 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:11,635-Speed 5623.34 samples/sec Loss 11.2374 LearningRate 0.0899 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:13,443-Speed 5666.30 samples/sec Loss 11.2196 LearningRate 0.0899 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:15,302-Speed 5508.52 samples/sec Loss 11.2742 LearningRate 0.0899 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:17,165-Speed 5500.08 samples/sec Loss 11.3558 LearningRate 0.0899 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:18,982-Speed 5636.96 samples/sec Loss 11.2752 LearningRate 0.0899 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:49:20,805-Speed 5620.66 samples/sec Loss 11.2280 LearningRate 0.0899 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:49:22,651-Speed 5550.87 samples/sec Loss 11.3553 LearningRate 0.0898 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:49:24,516-Speed 5491.88 samples/sec Loss 11.2856 LearningRate 0.0898 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:49:26,303-Speed 5734.21 samples/sec Loss 11.5056 LearningRate 0.0898 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:49:28,124-Speed 5625.98 samples/sec Loss 11.3685 LearningRate 0.0898 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:29,935-Speed 5657.06 samples/sec Loss 11.2588 LearningRate 0.0898 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:31,720-Speed 5737.30 samples/sec Loss 11.3085 LearningRate 0.0898 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:33,568-Speed 5545.00 samples/sec Loss 11.2043 LearningRate 0.0897 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:49:35,407-Speed 5568.78 samples/sec Loss 11.4039 LearningRate 0.0897 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:50:01,839-[lfw][6000]XNorm: 22.373225 Training: 2022-04-27 01:50:01,840-[lfw][6000]Accuracy-Flip: 0.99283+-0.00334 Training: 2022-04-27 01:50:01,840-[lfw][6000]Accuracy-Highest: 0.99283 Training: 2022-04-27 01:50:32,719-[cfp_fp][6000]XNorm: 18.730500 Training: 2022-04-27 01:50:32,720-[cfp_fp][6000]Accuracy-Flip: 0.88843+-0.01585 Training: 2022-04-27 01:50:32,720-[cfp_fp][6000]Accuracy-Highest: 0.88843 Training: 2022-04-27 01:50:59,279-[agedb_30][6000]XNorm: 21.276906 Training: 2022-04-27 01:50:59,280-[agedb_30][6000]Accuracy-Flip: 0.94550+-0.01085 Training: 2022-04-27 01:50:59,281-[agedb_30][6000]Accuracy-Highest: 0.94550 Training: 2022-04-27 01:51:01,112-Speed 119.48 samples/sec Loss 11.0510 LearningRate 0.0897 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:02,903-Speed 5721.40 samples/sec Loss 11.2775 LearningRate 0.0897 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:04,697-Speed 5708.85 samples/sec Loss 11.3413 LearningRate 0.0897 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:06,480-Speed 5744.04 samples/sec Loss 11.4747 LearningRate 0.0897 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:08,270-Speed 5723.67 samples/sec Loss 11.2591 LearningRate 0.0896 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:10,105-Speed 5584.16 samples/sec Loss 11.2510 LearningRate 0.0896 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:11,928-Speed 5618.88 samples/sec Loss 11.3043 LearningRate 0.0896 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:13,730-Speed 5684.48 samples/sec Loss 11.2923 LearningRate 0.0896 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:15,538-Speed 5672.24 samples/sec Loss 11.3325 LearningRate 0.0896 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:17,339-Speed 5687.26 samples/sec Loss 11.2201 LearningRate 0.0896 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:19,135-Speed 5706.38 samples/sec Loss 11.2270 LearningRate 0.0895 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:20,937-Speed 5683.95 samples/sec Loss 11.3511 LearningRate 0.0895 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:22,759-Speed 5621.14 samples/sec Loss 11.3905 LearningRate 0.0895 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:24,570-Speed 5657.53 samples/sec Loss 11.4273 LearningRate 0.0895 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:26,355-Speed 5739.66 samples/sec Loss 11.4074 LearningRate 0.0895 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:28,180-Speed 5614.61 samples/sec Loss 11.3813 LearningRate 0.0895 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:29,981-Speed 5685.86 samples/sec Loss 11.3995 LearningRate 0.0894 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:31,770-Speed 5726.74 samples/sec Loss 11.3266 LearningRate 0.0894 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:33,549-Speed 5761.06 samples/sec Loss 11.2425 LearningRate 0.0894 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:35,360-Speed 5656.47 samples/sec Loss 11.2715 LearningRate 0.0894 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:37,168-Speed 5665.73 samples/sec Loss 11.2042 LearningRate 0.0894 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:38,989-Speed 5627.14 samples/sec Loss 11.2692 LearningRate 0.0894 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:40,789-Speed 5691.71 samples/sec Loss 11.4074 LearningRate 0.0893 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:42,599-Speed 5657.78 samples/sec Loss 11.1975 LearningRate 0.0893 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:44,401-Speed 5686.85 samples/sec Loss 11.3409 LearningRate 0.0893 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:46,186-Speed 5737.04 samples/sec Loss 11.1876 LearningRate 0.0893 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:47,998-Speed 5654.17 samples/sec Loss 11.4219 LearningRate 0.0893 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:49,810-Speed 5654.61 samples/sec Loss 11.2799 LearningRate 0.0893 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:51:51,628-Speed 5634.46 samples/sec Loss 11.1975 LearningRate 0.0892 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:53,436-Speed 5666.39 samples/sec Loss 11.2433 LearningRate 0.0892 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:55,265-Speed 5598.50 samples/sec Loss 11.2067 LearningRate 0.0892 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:57,082-Speed 5641.44 samples/sec Loss 11.3396 LearningRate 0.0892 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:51:58,897-Speed 5642.95 samples/sec Loss 11.2254 LearningRate 0.0892 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:00,749-Speed 5532.07 samples/sec Loss 11.2959 LearningRate 0.0892 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:02,588-Speed 5568.11 samples/sec Loss 11.2941 LearningRate 0.0891 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:04,379-Speed 5720.28 samples/sec Loss 11.3416 LearningRate 0.0891 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:06,172-Speed 5714.77 samples/sec Loss 11.2192 LearningRate 0.0891 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:07,983-Speed 5659.68 samples/sec Loss 11.1959 LearningRate 0.0891 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:09,818-Speed 5582.61 samples/sec Loss 11.2934 LearningRate 0.0891 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:52:11,668-Speed 5538.06 samples/sec Loss 11.1807 LearningRate 0.0891 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:13,483-Speed 5643.43 samples/sec Loss 11.3378 LearningRate 0.0890 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:15,289-Speed 5675.63 samples/sec Loss 11.2219 LearningRate 0.0890 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:17,094-Speed 5672.93 samples/sec Loss 11.1728 LearningRate 0.0890 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:18,888-Speed 5711.59 samples/sec Loss 11.2328 LearningRate 0.0890 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:20,703-Speed 5643.39 samples/sec Loss 11.5090 LearningRate 0.0890 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:22,529-Speed 5611.60 samples/sec Loss 11.3111 LearningRate 0.0890 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:24,411-Speed 5442.19 samples/sec Loss 11.2294 LearningRate 0.0889 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:26,256-Speed 5554.28 samples/sec Loss 11.2745 LearningRate 0.0889 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:28,068-Speed 5654.13 samples/sec Loss 11.2603 LearningRate 0.0889 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:29,871-Speed 5680.58 samples/sec Loss 11.3234 LearningRate 0.0889 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:31,668-Speed 5700.64 samples/sec Loss 11.1353 LearningRate 0.0889 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:33,468-Speed 5693.41 samples/sec Loss 11.2485 LearningRate 0.0889 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:35,286-Speed 5635.48 samples/sec Loss 11.1149 LearningRate 0.0888 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:37,088-Speed 5683.44 samples/sec Loss 11.2530 LearningRate 0.0888 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:38,902-Speed 5648.65 samples/sec Loss 11.2261 LearningRate 0.0888 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:40,689-Speed 5734.17 samples/sec Loss 11.3109 LearningRate 0.0888 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:42,502-Speed 5650.70 samples/sec Loss 11.2756 LearningRate 0.0888 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:44,314-Speed 5653.08 samples/sec Loss 11.3217 LearningRate 0.0888 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:46,122-Speed 5667.02 samples/sec Loss 11.3796 LearningRate 0.0887 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:47,957-Speed 5582.76 samples/sec Loss 11.4235 LearningRate 0.0887 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:49,783-Speed 5609.38 samples/sec Loss 11.2762 LearningRate 0.0887 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:51,650-Speed 5487.13 samples/sec Loss 11.0992 LearningRate 0.0887 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:53,502-Speed 5531.42 samples/sec Loss 11.3379 LearningRate 0.0887 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:55,285-Speed 5745.29 samples/sec Loss 10.9922 LearningRate 0.0887 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:57,083-Speed 5698.88 samples/sec Loss 11.2218 LearningRate 0.0886 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:52:58,920-Speed 5575.93 samples/sec Loss 11.2464 LearningRate 0.0886 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:00,700-Speed 5754.96 samples/sec Loss 11.1181 LearningRate 0.0886 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:02,493-Speed 5716.02 samples/sec Loss 11.2687 LearningRate 0.0886 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:04,279-Speed 5736.06 samples/sec Loss 11.1788 LearningRate 0.0886 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:06,056-Speed 5762.88 samples/sec Loss 11.3461 LearningRate 0.0886 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:07,845-Speed 5727.41 samples/sec Loss 11.1914 LearningRate 0.0885 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:09,616-Speed 5784.44 samples/sec Loss 11.0404 LearningRate 0.0885 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:11,417-Speed 5685.84 samples/sec Loss 11.1649 LearningRate 0.0885 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:13,200-Speed 5747.24 samples/sec Loss 11.0962 LearningRate 0.0885 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:15,024-Speed 5617.70 samples/sec Loss 10.9999 LearningRate 0.0885 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:16,817-Speed 5712.29 samples/sec Loss 11.3177 LearningRate 0.0885 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:18,608-Speed 5720.47 samples/sec Loss 11.1720 LearningRate 0.0884 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:20,390-Speed 5749.18 samples/sec Loss 11.1120 LearningRate 0.0884 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:22,184-Speed 5709.58 samples/sec Loss 11.0454 LearningRate 0.0884 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:23,985-Speed 5688.51 samples/sec Loss 11.1783 LearningRate 0.0884 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:25,766-Speed 5754.15 samples/sec Loss 11.0639 LearningRate 0.0884 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:27,562-Speed 5702.45 samples/sec Loss 11.1266 LearningRate 0.0884 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:29,369-Speed 5669.88 samples/sec Loss 11.0998 LearningRate 0.0883 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:31,158-Speed 5727.03 samples/sec Loss 11.1105 LearningRate 0.0883 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:32,952-Speed 5712.56 samples/sec Loss 11.2177 LearningRate 0.0883 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:34,761-Speed 5660.72 samples/sec Loss 11.1769 LearningRate 0.0883 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:36,573-Speed 5654.23 samples/sec Loss 11.1269 LearningRate 0.0883 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:38,359-Speed 5738.37 samples/sec Loss 11.2548 LearningRate 0.0883 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:40,166-Speed 5667.57 samples/sec Loss 11.0303 LearningRate 0.0882 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:53:41,939-Speed 5777.81 samples/sec Loss 11.0603 LearningRate 0.0882 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:43,746-Speed 5668.80 samples/sec Loss 10.9575 LearningRate 0.0882 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:45,562-Speed 5642.66 samples/sec Loss 11.2014 LearningRate 0.0882 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:47,365-Speed 5681.22 samples/sec Loss 11.1431 LearningRate 0.0882 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:49,168-Speed 5683.38 samples/sec Loss 10.9604 LearningRate 0.0882 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:50,982-Speed 5646.13 samples/sec Loss 11.1932 LearningRate 0.0882 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:52,768-Speed 5736.04 samples/sec Loss 11.0837 LearningRate 0.0881 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:54,558-Speed 5722.43 samples/sec Loss 11.1581 LearningRate 0.0881 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:56,393-Speed 5583.35 samples/sec Loss 11.0668 LearningRate 0.0881 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:53:58,190-Speed 5701.18 samples/sec Loss 11.0263 LearningRate 0.0881 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:00,030-Speed 5569.39 samples/sec Loss 10.9930 LearningRate 0.0881 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:54:01,826-Speed 5703.11 samples/sec Loss 11.4268 LearningRate 0.0881 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:03,611-Speed 5740.19 samples/sec Loss 11.1477 LearningRate 0.0880 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:05,399-Speed 5727.53 samples/sec Loss 10.9051 LearningRate 0.0880 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:07,211-Speed 5656.38 samples/sec Loss 11.1771 LearningRate 0.0880 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:09,007-Speed 5705.14 samples/sec Loss 11.0153 LearningRate 0.0880 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:10,787-Speed 5754.43 samples/sec Loss 11.1046 LearningRate 0.0880 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:12,571-Speed 5741.98 samples/sec Loss 11.1827 LearningRate 0.0880 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:14,374-Speed 5682.24 samples/sec Loss 11.1415 LearningRate 0.0879 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:16,158-Speed 5742.24 samples/sec Loss 11.1708 LearningRate 0.0879 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:17,958-Speed 5692.25 samples/sec Loss 10.9061 LearningRate 0.0879 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:19,781-Speed 5620.42 samples/sec Loss 11.0422 LearningRate 0.0879 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:54:21,570-Speed 5727.21 samples/sec Loss 11.0731 LearningRate 0.0879 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:54:23,358-Speed 5728.58 samples/sec Loss 10.9406 LearningRate 0.0879 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:54:25,158-Speed 5691.45 samples/sec Loss 11.0847 LearningRate 0.0878 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:26,982-Speed 5617.10 samples/sec Loss 11.0844 LearningRate 0.0878 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:54:28,791-Speed 5660.40 samples/sec Loss 10.9083 LearningRate 0.0878 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:30,600-Speed 5665.63 samples/sec Loss 11.1871 LearningRate 0.0878 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:32,409-Speed 5661.26 samples/sec Loss 11.0300 LearningRate 0.0878 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:34,203-Speed 5712.15 samples/sec Loss 11.0834 LearningRate 0.0878 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:35,995-Speed 5716.40 samples/sec Loss 11.1207 LearningRate 0.0877 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:37,794-Speed 5695.22 samples/sec Loss 11.0620 LearningRate 0.0877 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:39,589-Speed 5706.71 samples/sec Loss 10.7903 LearningRate 0.0877 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:41,375-Speed 5735.56 samples/sec Loss 11.0704 LearningRate 0.0877 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:43,158-Speed 5745.39 samples/sec Loss 10.9223 LearningRate 0.0877 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:54:44,949-Speed 5721.88 samples/sec Loss 10.9728 LearningRate 0.0877 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:54:46,745-Speed 5703.59 samples/sec Loss 10.8449 LearningRate 0.0876 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:54:48,541-Speed 5703.07 samples/sec Loss 10.9120 LearningRate 0.0876 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:50,336-Speed 5708.21 samples/sec Loss 10.9476 LearningRate 0.0876 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:52,133-Speed 5700.02 samples/sec Loss 10.7995 LearningRate 0.0876 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:53,951-Speed 5635.34 samples/sec Loss 10.8023 LearningRate 0.0876 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:55,752-Speed 5689.30 samples/sec Loss 10.8641 LearningRate 0.0876 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:57,555-Speed 5682.63 samples/sec Loss 10.9109 LearningRate 0.0875 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:54:59,366-Speed 5657.05 samples/sec Loss 10.8680 LearningRate 0.0875 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:01,151-Speed 5737.07 samples/sec Loss 10.7438 LearningRate 0.0875 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:02,940-Speed 5725.04 samples/sec Loss 10.9618 LearningRate 0.0875 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:04,736-Speed 5707.05 samples/sec Loss 10.8771 LearningRate 0.0875 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:06,536-Speed 5690.43 samples/sec Loss 10.8339 LearningRate 0.0875 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:08,374-Speed 5575.25 samples/sec Loss 10.7793 LearningRate 0.0874 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:10,199-Speed 5612.60 samples/sec Loss 11.0669 LearningRate 0.0874 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:12,015-Speed 5639.77 samples/sec Loss 10.7874 LearningRate 0.0874 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:13,821-Speed 5674.65 samples/sec Loss 10.6885 LearningRate 0.0874 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:15,614-Speed 5712.77 samples/sec Loss 10.9283 LearningRate 0.0874 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:17,452-Speed 5575.00 samples/sec Loss 10.9000 LearningRate 0.0874 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:19,261-Speed 5661.69 samples/sec Loss 10.7843 LearningRate 0.0873 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:21,063-Speed 5685.94 samples/sec Loss 10.8089 LearningRate 0.0873 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:22,860-Speed 5700.75 samples/sec Loss 10.7141 LearningRate 0.0873 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:24,658-Speed 5697.78 samples/sec Loss 10.7804 LearningRate 0.0873 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:26,442-Speed 5741.61 samples/sec Loss 10.9757 LearningRate 0.0873 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:28,227-Speed 5741.66 samples/sec Loss 10.8494 LearningRate 0.0873 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:30,022-Speed 5706.96 samples/sec Loss 10.8837 LearningRate 0.0872 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:31,824-Speed 5685.85 samples/sec Loss 10.4707 LearningRate 0.0872 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:33,616-Speed 5714.91 samples/sec Loss 10.7889 LearningRate 0.0872 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:35,404-Speed 5730.44 samples/sec Loss 10.6933 LearningRate 0.0872 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:37,191-Speed 5733.98 samples/sec Loss 10.8974 LearningRate 0.0872 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:39,043-Speed 5531.22 samples/sec Loss 10.7362 LearningRate 0.0872 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:40,833-Speed 5723.38 samples/sec Loss 10.7996 LearningRate 0.0871 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:55:42,636-Speed 5682.33 samples/sec Loss 10.8546 LearningRate 0.0871 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:44,425-Speed 5726.43 samples/sec Loss 10.9208 LearningRate 0.0871 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:46,209-Speed 5743.13 samples/sec Loss 10.8701 LearningRate 0.0871 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:48,022-Speed 5650.38 samples/sec Loss 10.8728 LearningRate 0.0871 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:49,841-Speed 5629.56 samples/sec Loss 10.8481 LearningRate 0.0871 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:51,665-Speed 5618.80 samples/sec Loss 10.9321 LearningRate 0.0870 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:53,450-Speed 5739.15 samples/sec Loss 10.8250 LearningRate 0.0870 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:55,233-Speed 5743.48 samples/sec Loss 10.7097 LearningRate 0.0870 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:57,023-Speed 5724.91 samples/sec Loss 10.7733 LearningRate 0.0870 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:55:58,822-Speed 5693.63 samples/sec Loss 10.7816 LearningRate 0.0870 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:00,605-Speed 5746.76 samples/sec Loss 10.6494 LearningRate 0.0870 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:02,429-Speed 5614.34 samples/sec Loss 10.8110 LearningRate 0.0869 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:04,226-Speed 5700.84 samples/sec Loss 10.6867 LearningRate 0.0869 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:06,023-Speed 5701.91 samples/sec Loss 10.9096 LearningRate 0.0869 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:07,818-Speed 5706.54 samples/sec Loss 10.8564 LearningRate 0.0869 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:09,614-Speed 5706.07 samples/sec Loss 10.8524 LearningRate 0.0869 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:11,395-Speed 5751.41 samples/sec Loss 10.7329 LearningRate 0.0869 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:13,202-Speed 5668.33 samples/sec Loss 10.7322 LearningRate 0.0869 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:15,007-Speed 5676.17 samples/sec Loss 10.6968 LearningRate 0.0868 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:16,812-Speed 5676.61 samples/sec Loss 10.9076 LearningRate 0.0868 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:18,612-Speed 5690.01 samples/sec Loss 10.7930 LearningRate 0.0868 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 01:56:20,404-Speed 5715.49 samples/sec Loss 10.7475 LearningRate 0.0868 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:22,223-Speed 5633.08 samples/sec Loss 10.8395 LearningRate 0.0868 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:24,010-Speed 5732.59 samples/sec Loss 10.7463 LearningRate 0.0868 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:25,811-Speed 5687.06 samples/sec Loss 10.6187 LearningRate 0.0867 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:27,595-Speed 5742.51 samples/sec Loss 10.7630 LearningRate 0.0867 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:29,374-Speed 5758.30 samples/sec Loss 10.8406 LearningRate 0.0867 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:31,155-Speed 5752.40 samples/sec Loss 10.7879 LearningRate 0.0867 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:32,948-Speed 5712.24 samples/sec Loss 10.6702 LearningRate 0.0867 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:34,741-Speed 5712.38 samples/sec Loss 10.5769 LearningRate 0.0867 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:36,527-Speed 5735.41 samples/sec Loss 10.5497 LearningRate 0.0866 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:38,321-Speed 5714.30 samples/sec Loss 10.9506 LearningRate 0.0866 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:40,114-Speed 5715.38 samples/sec Loss 10.7795 LearningRate 0.0866 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:41,916-Speed 5684.35 samples/sec Loss 10.6737 LearningRate 0.0866 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:43,744-Speed 5604.09 samples/sec Loss 10.6655 LearningRate 0.0866 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:45,590-Speed 5549.52 samples/sec Loss 10.6377 LearningRate 0.0866 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 01:56:47,388-Speed 5697.25 samples/sec Loss 10.8435 LearningRate 0.0865 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:49,182-Speed 5708.97 samples/sec Loss 10.6438 LearningRate 0.0865 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:51,008-Speed 5610.38 samples/sec Loss 10.5556 LearningRate 0.0865 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:52,836-Speed 5604.02 samples/sec Loss 10.6327 LearningRate 0.0865 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:54,626-Speed 5720.28 samples/sec Loss 10.6593 LearningRate 0.0865 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:56,431-Speed 5674.32 samples/sec Loss 10.7046 LearningRate 0.0865 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:56:58,215-Speed 5745.13 samples/sec Loss 10.7072 LearningRate 0.0864 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:57:00,036-Speed 5624.04 samples/sec Loss 10.6941 LearningRate 0.0864 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 01:57:26,523-[lfw][8000]XNorm: 22.133128 Training: 2022-04-27 01:57:26,523-[lfw][8000]Accuracy-Flip: 0.99450+-0.00415 Training: 2022-04-27 01:57:26,524-[lfw][8000]Accuracy-Highest: 0.99450 Training: 2022-04-27 01:57:57,191-[cfp_fp][8000]XNorm: 18.497261 Training: 2022-04-27 01:57:57,191-[cfp_fp][8000]Accuracy-Flip: 0.91071+-0.01325 Training: 2022-04-27 01:57:57,192-[cfp_fp][8000]Accuracy-Highest: 0.91071 Training: 2022-04-27 01:58:23,671-[agedb_30][8000]XNorm: 21.640810 Training: 2022-04-27 01:58:23,672-[agedb_30][8000]Accuracy-Flip: 0.95567+-0.00961 Training: 2022-04-27 01:58:23,672-[agedb_30][8000]Accuracy-Highest: 0.95567 Training: 2022-04-27 01:58:25,486-Speed 119.84 samples/sec Loss 10.7057 LearningRate 0.0864 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:27,277-Speed 5721.68 samples/sec Loss 10.5941 LearningRate 0.0864 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:29,048-Speed 5783.11 samples/sec Loss 10.5033 LearningRate 0.0864 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:30,839-Speed 5720.21 samples/sec Loss 10.8357 LearningRate 0.0864 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:32,636-Speed 5699.13 samples/sec Loss 10.5151 LearningRate 0.0863 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:34,412-Speed 5769.04 samples/sec Loss 10.8805 LearningRate 0.0863 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:36,192-Speed 5757.46 samples/sec Loss 10.7316 LearningRate 0.0863 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:37,982-Speed 5722.39 samples/sec Loss 10.6967 LearningRate 0.0863 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:39,785-Speed 5682.57 samples/sec Loss 10.5257 LearningRate 0.0863 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:41,581-Speed 5705.27 samples/sec Loss 10.5432 LearningRate 0.0863 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:43,417-Speed 5579.67 samples/sec Loss 10.6007 LearningRate 0.0862 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:45,212-Speed 5705.79 samples/sec Loss 10.7053 LearningRate 0.0862 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:46,997-Speed 5741.65 samples/sec Loss 10.7008 LearningRate 0.0862 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:58:48,797-Speed 5691.32 samples/sec Loss 10.5550 LearningRate 0.0862 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:50,588-Speed 5720.07 samples/sec Loss 10.4912 LearningRate 0.0862 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:52,382-Speed 5707.77 samples/sec Loss 10.5521 LearningRate 0.0862 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:54,175-Speed 5715.68 samples/sec Loss 10.6199 LearningRate 0.0861 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:55,986-Speed 5656.21 samples/sec Loss 10.6788 LearningRate 0.0861 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:57,767-Speed 5752.51 samples/sec Loss 10.6006 LearningRate 0.0861 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:58:59,549-Speed 5748.48 samples/sec Loss 10.5807 LearningRate 0.0861 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:01,356-Speed 5669.30 samples/sec Loss 10.5440 LearningRate 0.0861 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:03,137-Speed 5752.62 samples/sec Loss 10.5735 LearningRate 0.0861 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:04,924-Speed 5731.65 samples/sec Loss 10.5811 LearningRate 0.0860 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:06,723-Speed 5693.48 samples/sec Loss 10.5855 LearningRate 0.0860 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:08,509-Speed 5738.12 samples/sec Loss 10.4053 LearningRate 0.0860 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:10,319-Speed 5658.59 samples/sec Loss 10.5278 LearningRate 0.0860 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:12,173-Speed 5526.51 samples/sec Loss 10.4346 LearningRate 0.0860 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:13,988-Speed 5645.56 samples/sec Loss 10.5241 LearningRate 0.0860 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:15,800-Speed 5655.25 samples/sec Loss 10.5621 LearningRate 0.0860 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:17,619-Speed 5629.49 samples/sec Loss 10.5859 LearningRate 0.0859 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:19,419-Speed 5691.63 samples/sec Loss 10.2764 LearningRate 0.0859 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:21,213-Speed 5710.82 samples/sec Loss 10.4652 LearningRate 0.0859 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:23,023-Speed 5661.57 samples/sec Loss 10.3737 LearningRate 0.0859 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:24,811-Speed 5729.49 samples/sec Loss 10.5361 LearningRate 0.0859 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:26,614-Speed 5680.52 samples/sec Loss 10.5583 LearningRate 0.0859 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:28,406-Speed 5718.95 samples/sec Loss 10.6645 LearningRate 0.0858 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:30,221-Speed 5641.81 samples/sec Loss 10.5425 LearningRate 0.0858 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:32,003-Speed 5749.27 samples/sec Loss 10.4493 LearningRate 0.0858 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:33,799-Speed 5706.11 samples/sec Loss 10.4997 LearningRate 0.0858 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:35,594-Speed 5707.89 samples/sec Loss 10.4716 LearningRate 0.0858 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:37,423-Speed 5600.07 samples/sec Loss 10.5574 LearningRate 0.0858 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:39,224-Speed 5689.88 samples/sec Loss 10.5970 LearningRate 0.0857 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:41,019-Speed 5704.19 samples/sec Loss 10.5970 LearningRate 0.0857 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:42,817-Speed 5699.76 samples/sec Loss 10.3551 LearningRate 0.0857 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-27 01:59:44,598-Speed 5751.20 samples/sec Loss 10.4548 LearningRate 0.0857 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:46,385-Speed 5734.36 samples/sec Loss 10.5520 LearningRate 0.0857 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:48,175-Speed 5722.34 samples/sec Loss 10.6178 LearningRate 0.0857 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 01:59:49,974-Speed 5695.97 samples/sec Loss 10.5082 LearningRate 0.0856 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:59:51,768-Speed 5708.56 samples/sec Loss 10.4952 LearningRate 0.0856 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:59:53,567-Speed 5694.50 samples/sec Loss 10.3849 LearningRate 0.0856 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:59:55,370-Speed 5681.60 samples/sec Loss 10.5123 LearningRate 0.0856 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:59:57,178-Speed 5668.93 samples/sec Loss 10.3219 LearningRate 0.0856 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 01:59:58,968-Speed 5721.38 samples/sec Loss 10.4210 LearningRate 0.0856 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 02:00:00,770-Speed 5685.71 samples/sec Loss 10.5311 LearningRate 0.0855 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 02:00:02,570-Speed 5693.35 samples/sec Loss 10.2872 LearningRate 0.0855 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 02:00:04,356-Speed 5734.02 samples/sec Loss 10.3969 LearningRate 0.0855 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 02:00:06,157-Speed 5688.98 samples/sec Loss 10.4365 LearningRate 0.0855 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 02:00:07,963-Speed 5671.68 samples/sec Loss 10.5898 LearningRate 0.0855 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 02:00:09,745-Speed 5750.26 samples/sec Loss 10.2535 LearningRate 0.0855 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 02:00:11,540-Speed 5704.30 samples/sec Loss 10.4610 LearningRate 0.0854 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 02:00:13,324-Speed 5744.72 samples/sec Loss 10.6841 LearningRate 0.0854 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:15,118-Speed 5708.91 samples/sec Loss 10.3825 LearningRate 0.0854 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:16,906-Speed 5730.46 samples/sec Loss 10.3186 LearningRate 0.0854 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:18,704-Speed 5696.57 samples/sec Loss 10.5790 LearningRate 0.0854 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:20,497-Speed 5713.86 samples/sec Loss 10.4686 LearningRate 0.0854 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:22,332-Speed 5582.00 samples/sec Loss 10.2604 LearningRate 0.0853 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:24,142-Speed 5658.85 samples/sec Loss 10.3869 LearningRate 0.0853 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:25,915-Speed 5778.69 samples/sec Loss 10.3344 LearningRate 0.0853 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:27,764-Speed 5541.05 samples/sec Loss 10.2645 LearningRate 0.0853 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:29,553-Speed 5726.45 samples/sec Loss 10.2884 LearningRate 0.0853 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:31,347-Speed 5708.01 samples/sec Loss 10.3509 LearningRate 0.0853 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:33,146-Speed 5698.75 samples/sec Loss 10.4638 LearningRate 0.0853 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:34,926-Speed 5753.36 samples/sec Loss 10.3509 LearningRate 0.0852 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:36,729-Speed 5681.60 samples/sec Loss 10.4426 LearningRate 0.0852 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:38,540-Speed 5657.33 samples/sec Loss 10.3474 LearningRate 0.0852 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:40,339-Speed 5694.30 samples/sec Loss 10.2945 LearningRate 0.0852 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:42,129-Speed 5724.34 samples/sec Loss 10.3960 LearningRate 0.0852 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:43,906-Speed 5766.14 samples/sec Loss 10.3700 LearningRate 0.0852 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:45,729-Speed 5617.85 samples/sec Loss 10.3692 LearningRate 0.0851 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:47,541-Speed 5654.96 samples/sec Loss 10.3639 LearningRate 0.0851 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:49,321-Speed 5753.53 samples/sec Loss 10.2020 LearningRate 0.0851 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:51,122-Speed 5689.16 samples/sec Loss 10.3628 LearningRate 0.0851 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:52,913-Speed 5720.25 samples/sec Loss 10.3381 LearningRate 0.0851 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:54,697-Speed 5744.30 samples/sec Loss 10.1951 LearningRate 0.0851 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:56,525-Speed 5603.18 samples/sec Loss 10.2233 LearningRate 0.0850 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:00:58,334-Speed 5662.57 samples/sec Loss 10.1685 LearningRate 0.0850 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:00,141-Speed 5669.63 samples/sec Loss 10.1977 LearningRate 0.0850 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:01,940-Speed 5692.45 samples/sec Loss 10.3309 LearningRate 0.0850 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:03,796-Speed 5523.20 samples/sec Loss 10.0664 LearningRate 0.0850 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:05,664-Speed 5482.90 samples/sec Loss 10.1574 LearningRate 0.0850 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:07,486-Speed 5621.81 samples/sec Loss 10.3518 LearningRate 0.0849 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:09,298-Speed 5655.42 samples/sec Loss 10.2609 LearningRate 0.0849 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:11,101-Speed 5680.48 samples/sec Loss 10.1656 LearningRate 0.0849 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:12,897-Speed 5707.94 samples/sec Loss 10.3118 LearningRate 0.0849 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:14,690-Speed 5711.08 samples/sec Loss 10.2240 LearningRate 0.0849 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:16,488-Speed 5697.53 samples/sec Loss 10.2727 LearningRate 0.0849 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:18,276-Speed 5729.64 samples/sec Loss 10.4237 LearningRate 0.0848 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:20,070-Speed 5712.19 samples/sec Loss 10.2931 LearningRate 0.0848 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:01:21,905-Speed 5581.53 samples/sec Loss 10.0952 LearningRate 0.0848 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:23,701-Speed 5706.95 samples/sec Loss 10.2686 LearningRate 0.0848 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:25,508-Speed 5668.12 samples/sec Loss 10.1449 LearningRate 0.0848 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:27,306-Speed 5697.21 samples/sec Loss 10.2480 LearningRate 0.0848 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:29,095-Speed 5727.56 samples/sec Loss 10.2990 LearningRate 0.0847 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:30,898-Speed 5680.33 samples/sec Loss 10.3413 LearningRate 0.0847 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:32,762-Speed 5496.19 samples/sec Loss 10.2597 LearningRate 0.0847 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:34,594-Speed 5592.82 samples/sec Loss 10.1763 LearningRate 0.0847 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:36,402-Speed 5664.61 samples/sec Loss 10.4073 LearningRate 0.0847 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:38,225-Speed 5622.00 samples/sec Loss 10.3282 LearningRate 0.0847 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:40,022-Speed 5700.11 samples/sec Loss 10.3897 LearningRate 0.0847 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:41,844-Speed 5623.25 samples/sec Loss 10.2823 LearningRate 0.0846 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:43,640-Speed 5704.03 samples/sec Loss 10.3052 LearningRate 0.0846 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:45,442-Speed 5685.40 samples/sec Loss 10.2770 LearningRate 0.0846 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:47,241-Speed 5692.65 samples/sec Loss 10.1909 LearningRate 0.0846 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:49,025-Speed 5743.95 samples/sec Loss 10.4263 LearningRate 0.0846 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:50,839-Speed 5649.87 samples/sec Loss 10.4187 LearningRate 0.0846 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:52,646-Speed 5667.99 samples/sec Loss 10.2975 LearningRate 0.0845 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:54,444-Speed 5697.82 samples/sec Loss 10.3456 LearningRate 0.0845 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:56,242-Speed 5695.74 samples/sec Loss 10.1928 LearningRate 0.0845 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:01:58,074-Speed 5591.59 samples/sec Loss 10.1625 LearningRate 0.0845 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:01:59,855-Speed 5752.66 samples/sec Loss 10.2324 LearningRate 0.0845 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:01,654-Speed 5695.55 samples/sec Loss 10.2123 LearningRate 0.0845 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:03,446-Speed 5717.78 samples/sec Loss 10.3022 LearningRate 0.0844 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:05,229-Speed 5745.49 samples/sec Loss 10.1491 LearningRate 0.0844 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:07,030-Speed 5689.49 samples/sec Loss 10.0279 LearningRate 0.0844 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:08,831-Speed 5687.48 samples/sec Loss 10.1774 LearningRate 0.0844 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:10,623-Speed 5715.07 samples/sec Loss 10.1972 LearningRate 0.0844 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:12,412-Speed 5728.40 samples/sec Loss 10.3126 LearningRate 0.0844 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:14,204-Speed 5716.03 samples/sec Loss 10.2695 LearningRate 0.0843 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:16,048-Speed 5556.21 samples/sec Loss 10.2280 LearningRate 0.0843 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:17,861-Speed 5650.71 samples/sec Loss 10.1821 LearningRate 0.0843 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:19,652-Speed 5720.66 samples/sec Loss 10.1979 LearningRate 0.0843 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:21,460-Speed 5667.05 samples/sec Loss 10.4555 LearningRate 0.0843 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:23,263-Speed 5682.54 samples/sec Loss 10.2656 LearningRate 0.0843 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:25,051-Speed 5727.74 samples/sec Loss 10.2006 LearningRate 0.0842 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:26,838-Speed 5731.87 samples/sec Loss 10.2212 LearningRate 0.0842 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:28,630-Speed 5717.73 samples/sec Loss 10.0755 LearningRate 0.0842 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:30,427-Speed 5699.24 samples/sec Loss 10.1744 LearningRate 0.0842 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:32,242-Speed 5646.28 samples/sec Loss 10.1262 LearningRate 0.0842 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:34,032-Speed 5722.84 samples/sec Loss 10.1948 LearningRate 0.0842 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:35,820-Speed 5729.00 samples/sec Loss 10.1498 LearningRate 0.0842 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:37,615-Speed 5708.72 samples/sec Loss 10.2891 LearningRate 0.0841 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:39,403-Speed 5728.57 samples/sec Loss 10.2751 LearningRate 0.0841 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:41,201-Speed 5697.10 samples/sec Loss 10.1037 LearningRate 0.0841 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:42,991-Speed 5726.54 samples/sec Loss 10.2305 LearningRate 0.0841 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:44,782-Speed 5719.26 samples/sec Loss 10.1045 LearningRate 0.0841 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:46,575-Speed 5714.19 samples/sec Loss 10.1607 LearningRate 0.0841 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:48,378-Speed 5682.03 samples/sec Loss 10.1921 LearningRate 0.0840 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:50,162-Speed 5741.43 samples/sec Loss 10.0813 LearningRate 0.0840 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:51,955-Speed 5715.09 samples/sec Loss 10.1892 LearningRate 0.0840 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:53,740-Speed 5737.77 samples/sec Loss 10.2081 LearningRate 0.0840 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:02:55,549-Speed 5665.76 samples/sec Loss 10.0631 LearningRate 0.0840 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:57,361-Speed 5652.09 samples/sec Loss 10.0688 LearningRate 0.0840 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:02:59,153-Speed 5717.71 samples/sec Loss 10.0985 LearningRate 0.0839 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:00,956-Speed 5678.50 samples/sec Loss 10.1688 LearningRate 0.0839 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:02,758-Speed 5686.62 samples/sec Loss 10.2486 LearningRate 0.0839 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:04,573-Speed 5644.08 samples/sec Loss 10.0515 LearningRate 0.0839 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:06,433-Speed 5507.73 samples/sec Loss 10.0551 LearningRate 0.0839 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:08,242-Speed 5663.88 samples/sec Loss 10.2417 LearningRate 0.0839 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:10,037-Speed 5706.83 samples/sec Loss 10.1305 LearningRate 0.0838 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:11,848-Speed 5657.79 samples/sec Loss 10.1608 LearningRate 0.0838 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:13,665-Speed 5637.04 samples/sec Loss 10.1411 LearningRate 0.0838 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:15,505-Speed 5569.39 samples/sec Loss 10.0835 LearningRate 0.0838 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:17,312-Speed 5666.91 samples/sec Loss 10.1027 LearningRate 0.0838 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:19,102-Speed 5722.19 samples/sec Loss 10.2561 LearningRate 0.0838 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:20,892-Speed 5724.06 samples/sec Loss 10.1797 LearningRate 0.0837 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:22,688-Speed 5705.95 samples/sec Loss 10.1944 LearningRate 0.0837 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:24,493-Speed 5674.51 samples/sec Loss 10.0968 LearningRate 0.0837 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:26,288-Speed 5707.61 samples/sec Loss 9.8984 LearningRate 0.0837 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:28,083-Speed 5705.98 samples/sec Loss 10.2155 LearningRate 0.0837 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:29,899-Speed 5641.49 samples/sec Loss 10.2128 LearningRate 0.0837 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:31,690-Speed 5718.95 samples/sec Loss 10.0413 LearningRate 0.0837 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:03:33,467-Speed 5768.29 samples/sec Loss 9.9978 LearningRate 0.0836 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:35,264-Speed 5699.63 samples/sec Loss 10.1456 LearningRate 0.0836 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:37,054-Speed 5722.61 samples/sec Loss 9.9853 LearningRate 0.0836 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:38,852-Speed 5696.38 samples/sec Loss 10.0471 LearningRate 0.0836 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:40,655-Speed 5683.22 samples/sec Loss 10.0188 LearningRate 0.0836 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:42,469-Speed 5646.80 samples/sec Loss 10.1233 LearningRate 0.0836 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:44,274-Speed 5678.05 samples/sec Loss 10.2043 LearningRate 0.0835 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:03:46,073-Speed 5694.25 samples/sec Loss 9.9990 LearningRate 0.0835 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:47,865-Speed 5717.32 samples/sec Loss 10.0872 LearningRate 0.0835 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:49,648-Speed 5746.15 samples/sec Loss 9.8705 LearningRate 0.0835 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:51,440-Speed 5717.88 samples/sec Loss 10.0163 LearningRate 0.0835 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:53,233-Speed 5711.65 samples/sec Loss 10.0366 LearningRate 0.0835 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:55,042-Speed 5665.23 samples/sec Loss 10.2559 LearningRate 0.0834 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:56,844-Speed 5684.82 samples/sec Loss 10.0413 LearningRate 0.0834 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:03:58,636-Speed 5716.89 samples/sec Loss 10.1394 LearningRate 0.0834 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:04:00,436-Speed 5691.20 samples/sec Loss 9.8882 LearningRate 0.0834 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:04:02,254-Speed 5635.70 samples/sec Loss 9.8361 LearningRate 0.0834 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:04:04,074-Speed 5627.55 samples/sec Loss 9.9611 LearningRate 0.0834 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:04:05,880-Speed 5671.31 samples/sec Loss 10.0429 LearningRate 0.0833 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:04:07,692-Speed 5654.14 samples/sec Loss 10.0752 LearningRate 0.0833 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:04:09,506-Speed 5648.92 samples/sec Loss 10.1007 LearningRate 0.0833 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:04:11,317-Speed 5658.16 samples/sec Loss 9.9505 LearningRate 0.0833 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:13,116-Speed 5694.66 samples/sec Loss 9.9305 LearningRate 0.0833 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:14,934-Speed 5636.12 samples/sec Loss 10.1706 LearningRate 0.0833 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:16,717-Speed 5743.72 samples/sec Loss 10.0291 LearningRate 0.0833 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:18,528-Speed 5659.47 samples/sec Loss 10.1131 LearningRate 0.0832 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:20,326-Speed 5697.27 samples/sec Loss 9.9290 LearningRate 0.0832 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:22,110-Speed 5740.74 samples/sec Loss 10.1092 LearningRate 0.0832 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:23,909-Speed 5696.49 samples/sec Loss 10.1087 LearningRate 0.0832 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:04:50,724-[lfw][10000]XNorm: 21.238482 Training: 2022-04-27 02:04:50,724-[lfw][10000]Accuracy-Flip: 0.99517+-0.00311 Training: 2022-04-27 02:04:50,725-[lfw][10000]Accuracy-Highest: 0.99517 Training: 2022-04-27 02:05:21,743-[cfp_fp][10000]XNorm: 18.948939 Training: 2022-04-27 02:05:21,744-[cfp_fp][10000]Accuracy-Flip: 0.90586+-0.01211 Training: 2022-04-27 02:05:21,745-[cfp_fp][10000]Accuracy-Highest: 0.91071 Training: 2022-04-27 02:05:48,436-[agedb_30][10000]XNorm: 21.228598 Training: 2022-04-27 02:05:48,436-[agedb_30][10000]Accuracy-Flip: 0.95350+-0.01050 Training: 2022-04-27 02:05:48,437-[agedb_30][10000]Accuracy-Highest: 0.95567 Training: 2022-04-27 02:05:50,238-Speed 118.62 samples/sec Loss 9.9019 LearningRate 0.0832 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:05:52,052-Speed 5647.28 samples/sec Loss 9.8762 LearningRate 0.0832 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:05:53,851-Speed 5695.28 samples/sec Loss 9.9874 LearningRate 0.0831 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:05:55,661-Speed 5660.07 samples/sec Loss 9.9149 LearningRate 0.0831 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:05:57,449-Speed 5733.69 samples/sec Loss 10.0144 LearningRate 0.0831 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:05:59,239-Speed 5720.43 samples/sec Loss 9.9034 LearningRate 0.0831 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:01,026-Speed 5733.58 samples/sec Loss 9.9557 LearningRate 0.0831 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:02,810-Speed 5741.06 samples/sec Loss 10.0913 LearningRate 0.0831 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:04,608-Speed 5700.12 samples/sec Loss 10.1132 LearningRate 0.0830 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:06,392-Speed 5742.16 samples/sec Loss 10.1382 LearningRate 0.0830 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:08,173-Speed 5751.68 samples/sec Loss 9.9740 LearningRate 0.0830 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:09,961-Speed 5727.91 samples/sec Loss 9.9214 LearningRate 0.0830 Epoch: 1 Global Step: 10120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:06:11,740-Speed 5757.61 samples/sec Loss 10.0323 LearningRate 0.0830 Epoch: 1 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:13,523-Speed 5749.70 samples/sec Loss 10.0066 LearningRate 0.0830 Epoch: 1 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:15,314-Speed 5719.77 samples/sec Loss 10.0964 LearningRate 0.0829 Epoch: 1 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:17,125-Speed 5655.35 samples/sec Loss 10.0723 LearningRate 0.0829 Epoch: 1 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:18,915-Speed 5722.94 samples/sec Loss 9.9286 LearningRate 0.0829 Epoch: 1 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:20,705-Speed 5723.21 samples/sec Loss 9.9190 LearningRate 0.0829 Epoch: 1 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:22,496-Speed 5718.35 samples/sec Loss 9.9333 LearningRate 0.0829 Epoch: 1 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:24,285-Speed 5729.13 samples/sec Loss 9.9572 LearningRate 0.0829 Epoch: 1 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:26,081-Speed 5704.70 samples/sec Loss 9.8279 LearningRate 0.0828 Epoch: 1 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:27,918-Speed 5576.27 samples/sec Loss 9.8614 LearningRate 0.0828 Epoch: 1 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:29,732-Speed 5647.80 samples/sec Loss 9.8638 LearningRate 0.0828 Epoch: 1 Global Step: 10230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:06:31,505-Speed 5777.46 samples/sec Loss 9.9583 LearningRate 0.0828 Epoch: 1 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:33,304-Speed 5696.64 samples/sec Loss 9.8560 LearningRate 0.0828 Epoch: 1 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:35,089-Speed 5738.93 samples/sec Loss 9.9179 LearningRate 0.0828 Epoch: 1 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:36,880-Speed 5718.83 samples/sec Loss 9.8349 LearningRate 0.0828 Epoch: 1 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:38,681-Speed 5689.05 samples/sec Loss 9.9059 LearningRate 0.0827 Epoch: 1 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:40,474-Speed 5714.30 samples/sec Loss 9.9685 LearningRate 0.0827 Epoch: 1 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:42,257-Speed 5743.13 samples/sec Loss 10.0370 LearningRate 0.0827 Epoch: 1 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:44,062-Speed 5678.44 samples/sec Loss 9.9204 LearningRate 0.0827 Epoch: 1 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:45,847-Speed 5738.48 samples/sec Loss 10.0502 LearningRate 0.0827 Epoch: 1 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:47,631-Speed 5742.98 samples/sec Loss 9.8838 LearningRate 0.0827 Epoch: 1 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:49,413-Speed 5748.62 samples/sec Loss 9.7684 LearningRate 0.0826 Epoch: 1 Global Step: 10340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:06:51,193-Speed 5755.97 samples/sec Loss 9.9217 LearningRate 0.0826 Epoch: 1 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:53,002-Speed 5664.85 samples/sec Loss 10.0376 LearningRate 0.0826 Epoch: 1 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:54,784-Speed 5747.20 samples/sec Loss 10.0428 LearningRate 0.0826 Epoch: 1 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:56,613-Speed 5602.38 samples/sec Loss 10.0122 LearningRate 0.0826 Epoch: 1 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:06:58,430-Speed 5637.72 samples/sec Loss 10.0223 LearningRate 0.0826 Epoch: 1 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:00,237-Speed 5669.36 samples/sec Loss 9.9863 LearningRate 0.0825 Epoch: 1 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:02,029-Speed 5717.39 samples/sec Loss 9.7396 LearningRate 0.0825 Epoch: 1 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:03,814-Speed 5742.00 samples/sec Loss 9.9686 LearningRate 0.0825 Epoch: 1 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:05,610-Speed 5703.38 samples/sec Loss 9.8916 LearningRate 0.0825 Epoch: 1 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:07,403-Speed 5713.63 samples/sec Loss 9.8129 LearningRate 0.0825 Epoch: 1 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:09,202-Speed 5694.05 samples/sec Loss 9.9253 LearningRate 0.0825 Epoch: 1 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:07:10,991-Speed 5726.37 samples/sec Loss 10.0466 LearningRate 0.0824 Epoch: 1 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:12,799-Speed 5665.04 samples/sec Loss 10.0114 LearningRate 0.0824 Epoch: 1 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:14,622-Speed 5622.13 samples/sec Loss 9.7493 LearningRate 0.0824 Epoch: 1 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:16,418-Speed 5701.89 samples/sec Loss 9.8188 LearningRate 0.0824 Epoch: 1 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:18,205-Speed 5734.17 samples/sec Loss 9.8661 LearningRate 0.0824 Epoch: 1 Global Step: 10500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:19,992-Speed 5732.07 samples/sec Loss 9.7739 LearningRate 0.0824 Epoch: 1 Global Step: 10510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:21,785-Speed 5713.79 samples/sec Loss 9.7987 LearningRate 0.0824 Epoch: 1 Global Step: 10520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:23,596-Speed 5658.22 samples/sec Loss 9.7885 LearningRate 0.0823 Epoch: 1 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:25,377-Speed 5751.58 samples/sec Loss 9.8699 LearningRate 0.0823 Epoch: 1 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:27,169-Speed 5716.71 samples/sec Loss 9.9679 LearningRate 0.0823 Epoch: 1 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:28,980-Speed 5659.08 samples/sec Loss 9.7239 LearningRate 0.0823 Epoch: 1 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:30,763-Speed 5746.18 samples/sec Loss 9.8029 LearningRate 0.0823 Epoch: 1 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:32,544-Speed 5749.02 samples/sec Loss 9.6602 LearningRate 0.0823 Epoch: 1 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:34,364-Speed 5630.46 samples/sec Loss 9.8698 LearningRate 0.0822 Epoch: 1 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:36,182-Speed 5633.84 samples/sec Loss 9.6672 LearningRate 0.0822 Epoch: 1 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:37,992-Speed 5660.46 samples/sec Loss 9.6954 LearningRate 0.0822 Epoch: 1 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:39,780-Speed 5729.60 samples/sec Loss 9.8189 LearningRate 0.0822 Epoch: 1 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:41,584-Speed 5680.10 samples/sec Loss 9.6169 LearningRate 0.0822 Epoch: 1 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:43,367-Speed 5744.79 samples/sec Loss 9.8236 LearningRate 0.0822 Epoch: 1 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:45,182-Speed 5646.63 samples/sec Loss 9.8467 LearningRate 0.0821 Epoch: 1 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:46,965-Speed 5742.65 samples/sec Loss 9.9139 LearningRate 0.0821 Epoch: 1 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:48,747-Speed 5751.36 samples/sec Loss 9.9436 LearningRate 0.0821 Epoch: 1 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:50,547-Speed 5691.52 samples/sec Loss 9.8468 LearningRate 0.0821 Epoch: 1 Global Step: 10680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:52,358-Speed 5654.64 samples/sec Loss 9.7217 LearningRate 0.0821 Epoch: 1 Global Step: 10690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:54,159-Speed 5690.17 samples/sec Loss 9.7731 LearningRate 0.0821 Epoch: 1 Global Step: 10700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:07:55,940-Speed 5753.10 samples/sec Loss 9.6341 LearningRate 0.0821 Epoch: 1 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:57,739-Speed 5694.90 samples/sec Loss 9.6540 LearningRate 0.0820 Epoch: 1 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:07:59,524-Speed 5737.70 samples/sec Loss 9.8116 LearningRate 0.0820 Epoch: 1 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:01,344-Speed 5630.40 samples/sec Loss 9.9032 LearningRate 0.0820 Epoch: 1 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:03,136-Speed 5719.31 samples/sec Loss 9.8360 LearningRate 0.0820 Epoch: 1 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:04,922-Speed 5735.40 samples/sec Loss 9.8885 LearningRate 0.0820 Epoch: 1 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:06,758-Speed 5578.42 samples/sec Loss 9.6768 LearningRate 0.0820 Epoch: 1 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:08,560-Speed 5686.02 samples/sec Loss 9.8426 LearningRate 0.0819 Epoch: 1 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:10,356-Speed 5704.01 samples/sec Loss 9.6664 LearningRate 0.0819 Epoch: 1 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:12,136-Speed 5755.49 samples/sec Loss 9.8293 LearningRate 0.0819 Epoch: 1 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:13,934-Speed 5697.87 samples/sec Loss 9.7266 LearningRate 0.0819 Epoch: 1 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:15,711-Speed 5764.57 samples/sec Loss 9.6517 LearningRate 0.0819 Epoch: 1 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:17,492-Speed 5751.54 samples/sec Loss 9.5228 LearningRate 0.0819 Epoch: 1 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:19,298-Speed 5671.88 samples/sec Loss 9.7144 LearningRate 0.0818 Epoch: 1 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:21,118-Speed 5631.21 samples/sec Loss 9.8718 LearningRate 0.0818 Epoch: 1 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:22,948-Speed 5597.10 samples/sec Loss 9.9407 LearningRate 0.0818 Epoch: 1 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:24,813-Speed 5495.01 samples/sec Loss 9.6680 LearningRate 0.0818 Epoch: 1 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:26,632-Speed 5629.16 samples/sec Loss 9.7796 LearningRate 0.0818 Epoch: 1 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:28,415-Speed 5747.90 samples/sec Loss 9.8308 LearningRate 0.0818 Epoch: 1 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:30,196-Speed 5752.24 samples/sec Loss 9.7361 LearningRate 0.0817 Epoch: 1 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:32,001-Speed 5674.32 samples/sec Loss 9.7923 LearningRate 0.0817 Epoch: 1 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:33,781-Speed 5755.30 samples/sec Loss 9.6319 LearningRate 0.0817 Epoch: 1 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:35,569-Speed 5729.09 samples/sec Loss 9.8211 LearningRate 0.0817 Epoch: 1 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:37,388-Speed 5634.16 samples/sec Loss 9.8293 LearningRate 0.0817 Epoch: 1 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:39,239-Speed 5534.46 samples/sec Loss 9.9672 LearningRate 0.0817 Epoch: 1 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:41,046-Speed 5666.85 samples/sec Loss 9.7632 LearningRate 0.0817 Epoch: 1 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:42,847-Speed 5688.68 samples/sec Loss 9.7668 LearningRate 0.0816 Epoch: 1 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:08:44,644-Speed 5702.25 samples/sec Loss 9.8728 LearningRate 0.0816 Epoch: 1 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:46,435-Speed 5721.04 samples/sec Loss 9.7549 LearningRate 0.0816 Epoch: 1 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:48,217-Speed 5746.58 samples/sec Loss 9.9684 LearningRate 0.0816 Epoch: 1 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:50,010-Speed 5713.86 samples/sec Loss 9.8606 LearningRate 0.0816 Epoch: 1 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:51,806-Speed 5702.64 samples/sec Loss 9.8088 LearningRate 0.0816 Epoch: 1 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:53,600-Speed 5711.63 samples/sec Loss 9.5473 LearningRate 0.0815 Epoch: 1 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:55,389-Speed 5727.75 samples/sec Loss 9.6388 LearningRate 0.0815 Epoch: 1 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:57,222-Speed 5586.92 samples/sec Loss 9.8127 LearningRate 0.0815 Epoch: 1 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:08:59,054-Speed 5591.72 samples/sec Loss 9.7477 LearningRate 0.0815 Epoch: 1 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:00,884-Speed 5597.74 samples/sec Loss 9.6857 LearningRate 0.0815 Epoch: 1 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:02,714-Speed 5599.92 samples/sec Loss 9.6408 LearningRate 0.0815 Epoch: 1 Global Step: 11080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:04,537-Speed 5618.70 samples/sec Loss 9.7677 LearningRate 0.0814 Epoch: 1 Global Step: 11090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:06,378-Speed 5564.01 samples/sec Loss 9.7061 LearningRate 0.0814 Epoch: 1 Global Step: 11100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:08,224-Speed 5549.96 samples/sec Loss 9.6771 LearningRate 0.0814 Epoch: 1 Global Step: 11110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:10,025-Speed 5687.94 samples/sec Loss 9.6602 LearningRate 0.0814 Epoch: 1 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:11,826-Speed 5688.79 samples/sec Loss 9.8005 LearningRate 0.0814 Epoch: 1 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:13,658-Speed 5592.60 samples/sec Loss 9.7799 LearningRate 0.0814 Epoch: 1 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:15,527-Speed 5482.37 samples/sec Loss 9.8196 LearningRate 0.0814 Epoch: 1 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:17,372-Speed 5550.20 samples/sec Loss 9.7291 LearningRate 0.0813 Epoch: 1 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:19,190-Speed 5638.18 samples/sec Loss 9.8940 LearningRate 0.0813 Epoch: 1 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:20,996-Speed 5669.80 samples/sec Loss 9.7434 LearningRate 0.0813 Epoch: 1 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:22,791-Speed 5708.78 samples/sec Loss 9.8405 LearningRate 0.0813 Epoch: 1 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:24,587-Speed 5705.59 samples/sec Loss 9.8786 LearningRate 0.0813 Epoch: 1 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:26,419-Speed 5591.23 samples/sec Loss 9.7683 LearningRate 0.0813 Epoch: 1 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:28,232-Speed 5649.76 samples/sec Loss 9.7171 LearningRate 0.0812 Epoch: 1 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:30,033-Speed 5688.78 samples/sec Loss 9.5069 LearningRate 0.0812 Epoch: 1 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:31,825-Speed 5717.90 samples/sec Loss 9.6315 LearningRate 0.0812 Epoch: 1 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:33,656-Speed 5595.38 samples/sec Loss 9.7714 LearningRate 0.0812 Epoch: 1 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:35,448-Speed 5716.26 samples/sec Loss 9.6129 LearningRate 0.0812 Epoch: 1 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:37,245-Speed 5700.77 samples/sec Loss 9.8072 LearningRate 0.0812 Epoch: 1 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:39,039-Speed 5708.82 samples/sec Loss 9.7780 LearningRate 0.0811 Epoch: 1 Global Step: 11280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:09:40,832-Speed 5715.37 samples/sec Loss 9.5819 LearningRate 0.0811 Epoch: 1 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:42,690-Speed 5510.79 samples/sec Loss 9.6404 LearningRate 0.0811 Epoch: 1 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:44,511-Speed 5629.84 samples/sec Loss 9.7320 LearningRate 0.0811 Epoch: 1 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:46,352-Speed 5562.19 samples/sec Loss 9.6451 LearningRate 0.0811 Epoch: 1 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:48,142-Speed 5723.60 samples/sec Loss 9.7155 LearningRate 0.0811 Epoch: 1 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:09:49,934-Speed 5716.63 samples/sec Loss 9.6498 LearningRate 0.0811 Epoch: 1 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:51,798-Speed 5495.42 samples/sec Loss 9.6612 LearningRate 0.0810 Epoch: 1 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:53,605-Speed 5673.48 samples/sec Loss 9.8116 LearningRate 0.0810 Epoch: 1 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:09:55,485-Speed 5449.08 samples/sec Loss 9.5793 LearningRate 0.0810 Epoch: 1 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:08,206-Speed 805.04 samples/sec Loss 9.2617 LearningRate 0.0810 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:10,038-Speed 5594.23 samples/sec Loss 9.0670 LearningRate 0.0810 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:11,860-Speed 5630.21 samples/sec Loss 9.1420 LearningRate 0.0810 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:13,697-Speed 5576.55 samples/sec Loss 8.9711 LearningRate 0.0809 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:15,506-Speed 5661.46 samples/sec Loss 9.0038 LearningRate 0.0809 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:17,416-Speed 5365.68 samples/sec Loss 8.9337 LearningRate 0.0809 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:19,224-Speed 5667.18 samples/sec Loss 9.0977 LearningRate 0.0809 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:10:21,022-Speed 5700.53 samples/sec Loss 8.9034 LearningRate 0.0809 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:10:22,821-Speed 5692.57 samples/sec Loss 8.9371 LearningRate 0.0809 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:10:24,626-Speed 5677.02 samples/sec Loss 9.0995 LearningRate 0.0808 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:10:26,441-Speed 5644.92 samples/sec Loss 9.0033 LearningRate 0.0808 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:28,243-Speed 5686.07 samples/sec Loss 9.2327 LearningRate 0.0808 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:30,050-Speed 5667.80 samples/sec Loss 9.1152 LearningRate 0.0808 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:31,851-Speed 5690.09 samples/sec Loss 9.1559 LearningRate 0.0808 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:33,657-Speed 5672.29 samples/sec Loss 9.1925 LearningRate 0.0808 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:35,453-Speed 5703.94 samples/sec Loss 9.0908 LearningRate 0.0808 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:37,267-Speed 5648.04 samples/sec Loss 9.2335 LearningRate 0.0807 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:39,073-Speed 5672.00 samples/sec Loss 9.2894 LearningRate 0.0807 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:40,886-Speed 5650.14 samples/sec Loss 9.3186 LearningRate 0.0807 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:10:42,667-Speed 5751.60 samples/sec Loss 9.2571 LearningRate 0.0807 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:44,477-Speed 5660.63 samples/sec Loss 9.1216 LearningRate 0.0807 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:46,274-Speed 5700.41 samples/sec Loss 9.3299 LearningRate 0.0807 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:48,085-Speed 5658.32 samples/sec Loss 9.3656 LearningRate 0.0806 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:49,893-Speed 5664.77 samples/sec Loss 9.3284 LearningRate 0.0806 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:51,697-Speed 5678.89 samples/sec Loss 9.0962 LearningRate 0.0806 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:53,501-Speed 5680.15 samples/sec Loss 9.1851 LearningRate 0.0806 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:55,353-Speed 5531.62 samples/sec Loss 9.1147 LearningRate 0.0806 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:57,185-Speed 5591.61 samples/sec Loss 9.2534 LearningRate 0.0806 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:10:58,979-Speed 5712.31 samples/sec Loss 9.2760 LearningRate 0.0805 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:11:00,813-Speed 5584.40 samples/sec Loss 9.2043 LearningRate 0.0805 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:02,638-Speed 5614.59 samples/sec Loss 9.2597 LearningRate 0.0805 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:04,439-Speed 5689.00 samples/sec Loss 9.2699 LearningRate 0.0805 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:06,266-Speed 5606.91 samples/sec Loss 9.0967 LearningRate 0.0805 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:08,077-Speed 5657.03 samples/sec Loss 9.2662 LearningRate 0.0805 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:09,916-Speed 5571.46 samples/sec Loss 9.1465 LearningRate 0.0805 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:11,724-Speed 5666.66 samples/sec Loss 9.3204 LearningRate 0.0804 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:13,544-Speed 5627.26 samples/sec Loss 9.5022 LearningRate 0.0804 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:15,370-Speed 5610.85 samples/sec Loss 9.5140 LearningRate 0.0804 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:17,175-Speed 5675.50 samples/sec Loss 9.3734 LearningRate 0.0804 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:18,975-Speed 5697.34 samples/sec Loss 9.3977 LearningRate 0.0804 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:20,813-Speed 5574.06 samples/sec Loss 9.5220 LearningRate 0.0804 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:22,699-Speed 5431.02 samples/sec Loss 9.3947 LearningRate 0.0803 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:24,576-Speed 5456.77 samples/sec Loss 9.2618 LearningRate 0.0803 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:26,451-Speed 5463.43 samples/sec Loss 9.1862 LearningRate 0.0803 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:28,325-Speed 5465.13 samples/sec Loss 9.3562 LearningRate 0.0803 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:30,234-Speed 5367.59 samples/sec Loss 9.1975 LearningRate 0.0803 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:32,042-Speed 5667.24 samples/sec Loss 9.4039 LearningRate 0.0803 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:33,868-Speed 5607.16 samples/sec Loss 9.2815 LearningRate 0.0802 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:35,675-Speed 5669.48 samples/sec Loss 9.3999 LearningRate 0.0802 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:37,459-Speed 5742.44 samples/sec Loss 9.3696 LearningRate 0.0802 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:39,287-Speed 5603.49 samples/sec Loss 9.2750 LearningRate 0.0802 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:41,094-Speed 5671.26 samples/sec Loss 9.1223 LearningRate 0.0802 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:42,886-Speed 5714.56 samples/sec Loss 9.2844 LearningRate 0.0802 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:44,682-Speed 5704.57 samples/sec Loss 9.3571 LearningRate 0.0802 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:46,473-Speed 5719.87 samples/sec Loss 9.2590 LearningRate 0.0801 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:11:48,263-Speed 5722.84 samples/sec Loss 9.3360 LearningRate 0.0801 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:50,109-Speed 5551.23 samples/sec Loss 9.4474 LearningRate 0.0801 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:51,904-Speed 5705.32 samples/sec Loss 9.4279 LearningRate 0.0801 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:53,709-Speed 5675.03 samples/sec Loss 9.5476 LearningRate 0.0801 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:55,551-Speed 5562.13 samples/sec Loss 9.4805 LearningRate 0.0801 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:57,387-Speed 5579.85 samples/sec Loss 9.3332 LearningRate 0.0800 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:11:59,223-Speed 5579.55 samples/sec Loss 9.3122 LearningRate 0.0800 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:12:01,061-Speed 5570.94 samples/sec Loss 9.2695 LearningRate 0.0800 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:12:28,012-[lfw][12000]XNorm: 22.465708 Training: 2022-04-27 02:12:28,013-[lfw][12000]Accuracy-Flip: 0.99583+-0.00344 Training: 2022-04-27 02:12:28,013-[lfw][12000]Accuracy-Highest: 0.99583 Training: 2022-04-27 02:12:58,917-[cfp_fp][12000]XNorm: 19.763358 Training: 2022-04-27 02:12:58,918-[cfp_fp][12000]Accuracy-Flip: 0.91529+-0.01357 Training: 2022-04-27 02:12:58,919-[cfp_fp][12000]Accuracy-Highest: 0.91529 Training: 2022-04-27 02:13:25,781-[agedb_30][12000]XNorm: 22.394243 Training: 2022-04-27 02:13:25,781-[agedb_30][12000]Accuracy-Flip: 0.95533+-0.00974 Training: 2022-04-27 02:13:25,782-[agedb_30][12000]Accuracy-Highest: 0.95567 Training: 2022-04-27 02:13:27,585-Speed 118.35 samples/sec Loss 9.5076 LearningRate 0.0800 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:29,385-Speed 5692.04 samples/sec Loss 9.3575 LearningRate 0.0800 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:31,176-Speed 5720.54 samples/sec Loss 9.3931 LearningRate 0.0800 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:13:32,960-Speed 5742.65 samples/sec Loss 9.3345 LearningRate 0.0799 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:34,771-Speed 5658.50 samples/sec Loss 9.3602 LearningRate 0.0799 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:36,591-Speed 5629.37 samples/sec Loss 9.3711 LearningRate 0.0799 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:38,400-Speed 5664.71 samples/sec Loss 9.3314 LearningRate 0.0799 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:40,214-Speed 5645.95 samples/sec Loss 9.3767 LearningRate 0.0799 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:42,022-Speed 5669.45 samples/sec Loss 9.4185 LearningRate 0.0799 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:43,813-Speed 5717.25 samples/sec Loss 9.2434 LearningRate 0.0799 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:45,603-Speed 5724.28 samples/sec Loss 9.4975 LearningRate 0.0798 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:47,421-Speed 5636.64 samples/sec Loss 9.4370 LearningRate 0.0798 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:49,227-Speed 5673.87 samples/sec Loss 9.3191 LearningRate 0.0798 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:51,017-Speed 5721.97 samples/sec Loss 9.3566 LearningRate 0.0798 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:52,810-Speed 5717.46 samples/sec Loss 9.3580 LearningRate 0.0798 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:54,610-Speed 5692.95 samples/sec Loss 9.4980 LearningRate 0.0798 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:56,407-Speed 5702.37 samples/sec Loss 9.4481 LearningRate 0.0797 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:13:58,201-Speed 5710.67 samples/sec Loss 9.4696 LearningRate 0.0797 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:00,032-Speed 5595.36 samples/sec Loss 9.4665 LearningRate 0.0797 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:01,864-Speed 5594.64 samples/sec Loss 9.3831 LearningRate 0.0797 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:03,674-Speed 5660.99 samples/sec Loss 9.2765 LearningRate 0.0797 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:05,490-Speed 5640.21 samples/sec Loss 9.3375 LearningRate 0.0797 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:07,324-Speed 5588.06 samples/sec Loss 9.4343 LearningRate 0.0796 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:09,134-Speed 5660.31 samples/sec Loss 9.3732 LearningRate 0.0796 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:10,948-Speed 5647.55 samples/sec Loss 9.5760 LearningRate 0.0796 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:12,778-Speed 5600.50 samples/sec Loss 9.4338 LearningRate 0.0796 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:14,607-Speed 5601.11 samples/sec Loss 9.4199 LearningRate 0.0796 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:16,420-Speed 5652.49 samples/sec Loss 9.2725 LearningRate 0.0796 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:18,225-Speed 5676.90 samples/sec Loss 9.4097 LearningRate 0.0796 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:20,039-Speed 5646.55 samples/sec Loss 9.3327 LearningRate 0.0795 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:21,841-Speed 5684.77 samples/sec Loss 9.1447 LearningRate 0.0795 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:23,644-Speed 5691.64 samples/sec Loss 9.5159 LearningRate 0.0795 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:25,441-Speed 5700.06 samples/sec Loss 9.3298 LearningRate 0.0795 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:27,243-Speed 5686.71 samples/sec Loss 9.4500 LearningRate 0.0795 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:14:29,028-Speed 5740.58 samples/sec Loss 9.3471 LearningRate 0.0795 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:30,818-Speed 5721.92 samples/sec Loss 9.5173 LearningRate 0.0794 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:32,635-Speed 5641.55 samples/sec Loss 9.4131 LearningRate 0.0794 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:34,456-Speed 5625.16 samples/sec Loss 9.4754 LearningRate 0.0794 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:36,261-Speed 5676.34 samples/sec Loss 9.3648 LearningRate 0.0794 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:38,047-Speed 5734.67 samples/sec Loss 9.4630 LearningRate 0.0794 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:39,834-Speed 5736.01 samples/sec Loss 9.3522 LearningRate 0.0794 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:41,664-Speed 5595.25 samples/sec Loss 9.2821 LearningRate 0.0793 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:43,454-Speed 5723.61 samples/sec Loss 9.1606 LearningRate 0.0793 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:45,239-Speed 5741.64 samples/sec Loss 9.4430 LearningRate 0.0793 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:47,072-Speed 5588.88 samples/sec Loss 9.3903 LearningRate 0.0793 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:48,862-Speed 5723.39 samples/sec Loss 9.2064 LearningRate 0.0793 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:50,668-Speed 5673.63 samples/sec Loss 9.4083 LearningRate 0.0793 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:52,474-Speed 5671.84 samples/sec Loss 9.4826 LearningRate 0.0793 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:14:54,283-Speed 5665.11 samples/sec Loss 9.2998 LearningRate 0.0792 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:14:56,080-Speed 5698.33 samples/sec Loss 9.3792 LearningRate 0.0792 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:14:57,886-Speed 5673.24 samples/sec Loss 9.2571 LearningRate 0.0792 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:14:59,698-Speed 5655.43 samples/sec Loss 9.2782 LearningRate 0.0792 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:01,486-Speed 5729.99 samples/sec Loss 9.3029 LearningRate 0.0792 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:03,305-Speed 5630.89 samples/sec Loss 9.4119 LearningRate 0.0792 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:05,154-Speed 5541.00 samples/sec Loss 9.2772 LearningRate 0.0791 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:06,955-Speed 5687.72 samples/sec Loss 9.3303 LearningRate 0.0791 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:08,767-Speed 5653.83 samples/sec Loss 9.2241 LearningRate 0.0791 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:10,567-Speed 5694.36 samples/sec Loss 9.2883 LearningRate 0.0791 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:12,375-Speed 5664.79 samples/sec Loss 9.4877 LearningRate 0.0791 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:14,189-Speed 5649.67 samples/sec Loss 9.3395 LearningRate 0.0791 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:16,003-Speed 5645.89 samples/sec Loss 9.4158 LearningRate 0.0791 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:17,811-Speed 5667.52 samples/sec Loss 9.3832 LearningRate 0.0790 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:19,636-Speed 5613.35 samples/sec Loss 9.4274 LearningRate 0.0790 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:21,455-Speed 5635.22 samples/sec Loss 9.3548 LearningRate 0.0790 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:23,267-Speed 5653.65 samples/sec Loss 9.1487 LearningRate 0.0790 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:25,085-Speed 5634.44 samples/sec Loss 9.4753 LearningRate 0.0790 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:26,879-Speed 5708.99 samples/sec Loss 9.2618 LearningRate 0.0790 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:28,670-Speed 5721.46 samples/sec Loss 9.2798 LearningRate 0.0789 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:30,464-Speed 5710.73 samples/sec Loss 9.1937 LearningRate 0.0789 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:32,250-Speed 5737.35 samples/sec Loss 9.2828 LearningRate 0.0789 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:34,049-Speed 5692.49 samples/sec Loss 9.1053 LearningRate 0.0789 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:35,831-Speed 5750.61 samples/sec Loss 9.1628 LearningRate 0.0789 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:37,614-Speed 5745.39 samples/sec Loss 9.2784 LearningRate 0.0789 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:39,403-Speed 5728.15 samples/sec Loss 9.4009 LearningRate 0.0788 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:41,202-Speed 5691.74 samples/sec Loss 9.2501 LearningRate 0.0788 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:43,009-Speed 5670.09 samples/sec Loss 9.2483 LearningRate 0.0788 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:44,812-Speed 5683.32 samples/sec Loss 9.4020 LearningRate 0.0788 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:46,616-Speed 5678.56 samples/sec Loss 9.2484 LearningRate 0.0788 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:48,417-Speed 5690.43 samples/sec Loss 9.4391 LearningRate 0.0788 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:15:50,224-Speed 5671.48 samples/sec Loss 9.3372 LearningRate 0.0788 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:15:52,001-Speed 5764.96 samples/sec Loss 9.1398 LearningRate 0.0787 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:53,807-Speed 5670.73 samples/sec Loss 9.2626 LearningRate 0.0787 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:55,605-Speed 5700.43 samples/sec Loss 9.3898 LearningRate 0.0787 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:57,427-Speed 5620.85 samples/sec Loss 9.2632 LearningRate 0.0787 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:15:59,225-Speed 5700.07 samples/sec Loss 9.4134 LearningRate 0.0787 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:01,027-Speed 5687.02 samples/sec Loss 9.2101 LearningRate 0.0787 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:02,844-Speed 5636.88 samples/sec Loss 9.4717 LearningRate 0.0786 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:04,645-Speed 5688.20 samples/sec Loss 9.2227 LearningRate 0.0786 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:06,441-Speed 5703.72 samples/sec Loss 9.3224 LearningRate 0.0786 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:08,258-Speed 5640.48 samples/sec Loss 9.3852 LearningRate 0.0786 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:16:10,094-Speed 5580.50 samples/sec Loss 9.4453 LearningRate 0.0786 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:11,884-Speed 5723.07 samples/sec Loss 9.3059 LearningRate 0.0786 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:13,686-Speed 5683.94 samples/sec Loss 9.2878 LearningRate 0.0786 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:15,484-Speed 5699.37 samples/sec Loss 9.4831 LearningRate 0.0785 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:17,309-Speed 5614.46 samples/sec Loss 9.3608 LearningRate 0.0785 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:19,101-Speed 5715.53 samples/sec Loss 9.2854 LearningRate 0.0785 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:20,889-Speed 5730.25 samples/sec Loss 9.1612 LearningRate 0.0785 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:22,679-Speed 5723.87 samples/sec Loss 9.2057 LearningRate 0.0785 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:24,462-Speed 5744.29 samples/sec Loss 9.4271 LearningRate 0.0785 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:26,271-Speed 5663.00 samples/sec Loss 9.3324 LearningRate 0.0784 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:28,065-Speed 5710.18 samples/sec Loss 9.3045 LearningRate 0.0784 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:29,876-Speed 5660.49 samples/sec Loss 9.3302 LearningRate 0.0784 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:31,717-Speed 5563.51 samples/sec Loss 9.3896 LearningRate 0.0784 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:33,557-Speed 5567.77 samples/sec Loss 9.2565 LearningRate 0.0784 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:35,359-Speed 5685.98 samples/sec Loss 9.4717 LearningRate 0.0784 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:37,163-Speed 5679.75 samples/sec Loss 9.2247 LearningRate 0.0784 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:38,961-Speed 5697.94 samples/sec Loss 9.2387 LearningRate 0.0783 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:40,822-Speed 5511.04 samples/sec Loss 9.3374 LearningRate 0.0783 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:42,653-Speed 5594.38 samples/sec Loss 9.2270 LearningRate 0.0783 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:44,451-Speed 5697.21 samples/sec Loss 9.4634 LearningRate 0.0783 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:46,250-Speed 5694.02 samples/sec Loss 9.3938 LearningRate 0.0783 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:16:48,024-Speed 5775.47 samples/sec Loss 9.3965 LearningRate 0.0783 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:49,839-Speed 5645.20 samples/sec Loss 9.2376 LearningRate 0.0782 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:51,668-Speed 5602.65 samples/sec Loss 9.2878 LearningRate 0.0782 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:53,450-Speed 5748.76 samples/sec Loss 9.2624 LearningRate 0.0782 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:55,292-Speed 5561.25 samples/sec Loss 9.2897 LearningRate 0.0782 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:57,087-Speed 5708.26 samples/sec Loss 9.3799 LearningRate 0.0782 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:16:58,876-Speed 5727.82 samples/sec Loss 9.2809 LearningRate 0.0782 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:00,663-Speed 5736.44 samples/sec Loss 9.4861 LearningRate 0.0781 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:02,472-Speed 5662.59 samples/sec Loss 9.0390 LearningRate 0.0781 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:04,255-Speed 5745.27 samples/sec Loss 9.1706 LearningRate 0.0781 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:06,060-Speed 5676.23 samples/sec Loss 9.3322 LearningRate 0.0781 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:17:07,840-Speed 5755.00 samples/sec Loss 9.3250 LearningRate 0.0781 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:09,654-Speed 5647.51 samples/sec Loss 9.0229 LearningRate 0.0781 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:11,443-Speed 5727.21 samples/sec Loss 9.3978 LearningRate 0.0781 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:13,255-Speed 5654.38 samples/sec Loss 9.2418 LearningRate 0.0780 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:15,041-Speed 5734.44 samples/sec Loss 9.3671 LearningRate 0.0780 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:16,854-Speed 5651.64 samples/sec Loss 9.3084 LearningRate 0.0780 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:18,661-Speed 5673.09 samples/sec Loss 9.1307 LearningRate 0.0780 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:20,476-Speed 5644.08 samples/sec Loss 9.0579 LearningRate 0.0780 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:22,283-Speed 5669.70 samples/sec Loss 9.2165 LearningRate 0.0780 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:24,103-Speed 5627.38 samples/sec Loss 9.0822 LearningRate 0.0779 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:25,905-Speed 5688.35 samples/sec Loss 9.2273 LearningRate 0.0779 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:27,704-Speed 5694.72 samples/sec Loss 9.1711 LearningRate 0.0779 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:29,512-Speed 5666.64 samples/sec Loss 9.2171 LearningRate 0.0779 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:31,314-Speed 5685.80 samples/sec Loss 9.2651 LearningRate 0.0779 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:33,122-Speed 5666.24 samples/sec Loss 9.2197 LearningRate 0.0779 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:34,927-Speed 5675.40 samples/sec Loss 9.2391 LearningRate 0.0779 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:36,716-Speed 5728.60 samples/sec Loss 9.0824 LearningRate 0.0778 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:38,540-Speed 5616.25 samples/sec Loss 9.2153 LearningRate 0.0778 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:40,330-Speed 5725.94 samples/sec Loss 9.1652 LearningRate 0.0778 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:42,138-Speed 5666.60 samples/sec Loss 9.0979 LearningRate 0.0778 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:43,931-Speed 5713.59 samples/sec Loss 9.1539 LearningRate 0.0778 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:45,724-Speed 5711.73 samples/sec Loss 9.0078 LearningRate 0.0778 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:47,526-Speed 5686.92 samples/sec Loss 9.3867 LearningRate 0.0777 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:49,325-Speed 5695.61 samples/sec Loss 9.1632 LearningRate 0.0777 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:51,137-Speed 5653.24 samples/sec Loss 9.1073 LearningRate 0.0777 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:52,982-Speed 5552.04 samples/sec Loss 9.3281 LearningRate 0.0777 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:54,766-Speed 5744.63 samples/sec Loss 9.1881 LearningRate 0.0777 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:56,567-Speed 5688.65 samples/sec Loss 9.1681 LearningRate 0.0777 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:17:58,388-Speed 5626.12 samples/sec Loss 9.1761 LearningRate 0.0777 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:00,213-Speed 5614.67 samples/sec Loss 9.2880 LearningRate 0.0776 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:02,074-Speed 5505.18 samples/sec Loss 9.1612 LearningRate 0.0776 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:03,929-Speed 5525.23 samples/sec Loss 9.4146 LearningRate 0.0776 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:05,789-Speed 5507.58 samples/sec Loss 9.0333 LearningRate 0.0776 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:07,620-Speed 5596.17 samples/sec Loss 9.1600 LearningRate 0.0776 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:09,413-Speed 5713.05 samples/sec Loss 9.1684 LearningRate 0.0776 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:11,199-Speed 5740.22 samples/sec Loss 9.2245 LearningRate 0.0775 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:12,990-Speed 5719.55 samples/sec Loss 9.3180 LearningRate 0.0775 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:14,782-Speed 5716.82 samples/sec Loss 9.1406 LearningRate 0.0775 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:16,582-Speed 5692.50 samples/sec Loss 9.1205 LearningRate 0.0775 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:18,388-Speed 5673.46 samples/sec Loss 9.2289 LearningRate 0.0775 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:20,185-Speed 5698.92 samples/sec Loss 9.2632 LearningRate 0.0775 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:18:21,972-Speed 5735.19 samples/sec Loss 9.1415 LearningRate 0.0774 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:23,776-Speed 5678.68 samples/sec Loss 9.2121 LearningRate 0.0774 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:25,579-Speed 5684.14 samples/sec Loss 9.1706 LearningRate 0.0774 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:27,410-Speed 5594.42 samples/sec Loss 9.2011 LearningRate 0.0774 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:29,206-Speed 5704.44 samples/sec Loss 9.2628 LearningRate 0.0774 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:31,014-Speed 5668.47 samples/sec Loss 9.1812 LearningRate 0.0774 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:32,819-Speed 5675.61 samples/sec Loss 8.9395 LearningRate 0.0774 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:34,608-Speed 5727.79 samples/sec Loss 9.1245 LearningRate 0.0773 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:36,401-Speed 5713.19 samples/sec Loss 9.1050 LearningRate 0.0773 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:38,189-Speed 5729.08 samples/sec Loss 9.0837 LearningRate 0.0773 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:40,004-Speed 5646.64 samples/sec Loss 9.1898 LearningRate 0.0773 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:41,804-Speed 5691.67 samples/sec Loss 9.1063 LearningRate 0.0773 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:43,593-Speed 5724.46 samples/sec Loss 9.1208 LearningRate 0.0773 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:45,415-Speed 5625.71 samples/sec Loss 9.1145 LearningRate 0.0772 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:47,225-Speed 5660.41 samples/sec Loss 9.0394 LearningRate 0.0772 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:49,047-Speed 5623.97 samples/sec Loss 9.2858 LearningRate 0.0772 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:50,842-Speed 5707.05 samples/sec Loss 9.1917 LearningRate 0.0772 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:52,631-Speed 5726.28 samples/sec Loss 9.1132 LearningRate 0.0772 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:54,427-Speed 5705.48 samples/sec Loss 9.1634 LearningRate 0.0772 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:56,233-Speed 5673.47 samples/sec Loss 9.0479 LearningRate 0.0772 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:18:58,050-Speed 5637.02 samples/sec Loss 9.0993 LearningRate 0.0771 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:18:59,835-Speed 5741.19 samples/sec Loss 9.1241 LearningRate 0.0771 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:01,637-Speed 5683.65 samples/sec Loss 9.3036 LearningRate 0.0771 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:03,458-Speed 5625.34 samples/sec Loss 9.0844 LearningRate 0.0771 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:05,287-Speed 5603.89 samples/sec Loss 9.1528 LearningRate 0.0771 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:07,129-Speed 5562.22 samples/sec Loss 9.0164 LearningRate 0.0771 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:08,937-Speed 5666.20 samples/sec Loss 9.1222 LearningRate 0.0770 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:10,738-Speed 5690.45 samples/sec Loss 9.1677 LearningRate 0.0770 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:12,536-Speed 5695.79 samples/sec Loss 9.0835 LearningRate 0.0770 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:14,325-Speed 5727.56 samples/sec Loss 9.2086 LearningRate 0.0770 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:16,117-Speed 5718.03 samples/sec Loss 9.1785 LearningRate 0.0770 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:17,917-Speed 5693.42 samples/sec Loss 9.0037 LearningRate 0.0770 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:19,710-Speed 5714.21 samples/sec Loss 9.0941 LearningRate 0.0770 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:21,540-Speed 5596.76 samples/sec Loss 9.0807 LearningRate 0.0769 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:23,328-Speed 5730.25 samples/sec Loss 9.1427 LearningRate 0.0769 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:25,158-Speed 5599.97 samples/sec Loss 9.1293 LearningRate 0.0769 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:26,987-Speed 5599.15 samples/sec Loss 9.2193 LearningRate 0.0769 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:19:53,711-[lfw][14000]XNorm: 23.086993 Training: 2022-04-27 02:19:53,712-[lfw][14000]Accuracy-Flip: 0.99617+-0.00279 Training: 2022-04-27 02:19:53,713-[lfw][14000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:20:24,771-[cfp_fp][14000]XNorm: 19.993398 Training: 2022-04-27 02:20:24,772-[cfp_fp][14000]Accuracy-Flip: 0.92343+-0.01574 Training: 2022-04-27 02:20:24,773-[cfp_fp][14000]Accuracy-Highest: 0.92343 Training: 2022-04-27 02:20:51,551-[agedb_30][14000]XNorm: 22.814435 Training: 2022-04-27 02:20:51,552-[agedb_30][14000]Accuracy-Flip: 0.95567+-0.00978 Training: 2022-04-27 02:20:51,552-[agedb_30][14000]Accuracy-Highest: 0.95567 Training: 2022-04-27 02:20:53,408-Speed 118.49 samples/sec Loss 9.1931 LearningRate 0.0769 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:20:55,226-Speed 5636.66 samples/sec Loss 9.2419 LearningRate 0.0769 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:20:57,027-Speed 5686.30 samples/sec Loss 9.1542 LearningRate 0.0768 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:20:58,812-Speed 5740.74 samples/sec Loss 8.9942 LearningRate 0.0768 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:00,601-Speed 5728.66 samples/sec Loss 9.1459 LearningRate 0.0768 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:02,413-Speed 5653.97 samples/sec Loss 9.1195 LearningRate 0.0768 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:04,240-Speed 5608.74 samples/sec Loss 9.1885 LearningRate 0.0768 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:06,046-Speed 5673.58 samples/sec Loss 8.9563 LearningRate 0.0768 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:07,866-Speed 5628.08 samples/sec Loss 8.9537 LearningRate 0.0768 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:09,697-Speed 5596.38 samples/sec Loss 9.0961 LearningRate 0.0767 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:11,514-Speed 5638.66 samples/sec Loss 8.9109 LearningRate 0.0767 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:13,307-Speed 5713.67 samples/sec Loss 9.1050 LearningRate 0.0767 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:15,137-Speed 5595.62 samples/sec Loss 9.1483 LearningRate 0.0767 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:16,941-Speed 5680.68 samples/sec Loss 9.0742 LearningRate 0.0767 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:18,753-Speed 5653.13 samples/sec Loss 9.1103 LearningRate 0.0767 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:21:20,569-Speed 5644.23 samples/sec Loss 9.0943 LearningRate 0.0766 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:22,363-Speed 5708.18 samples/sec Loss 9.0681 LearningRate 0.0766 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:24,165-Speed 5687.28 samples/sec Loss 9.0846 LearningRate 0.0766 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:25,965-Speed 5689.90 samples/sec Loss 9.0874 LearningRate 0.0766 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:27,786-Speed 5626.64 samples/sec Loss 9.0367 LearningRate 0.0766 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:29,602-Speed 5643.48 samples/sec Loss 9.2168 LearningRate 0.0766 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:31,413-Speed 5655.98 samples/sec Loss 9.1119 LearningRate 0.0766 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:33,217-Speed 5679.86 samples/sec Loss 9.0661 LearningRate 0.0765 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:35,005-Speed 5728.95 samples/sec Loss 9.0163 LearningRate 0.0765 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:21:36,811-Speed 5671.89 samples/sec Loss 9.0421 LearningRate 0.0765 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:38,608-Speed 5703.13 samples/sec Loss 8.9782 LearningRate 0.0765 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:40,407-Speed 5694.45 samples/sec Loss 9.0659 LearningRate 0.0765 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:42,201-Speed 5711.55 samples/sec Loss 9.0478 LearningRate 0.0765 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:44,109-Speed 5368.71 samples/sec Loss 8.8857 LearningRate 0.0764 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:45,927-Speed 5633.99 samples/sec Loss 9.1045 LearningRate 0.0764 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:47,739-Speed 5654.98 samples/sec Loss 8.9992 LearningRate 0.0764 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:49,551-Speed 5654.82 samples/sec Loss 9.0477 LearningRate 0.0764 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:51,373-Speed 5619.98 samples/sec Loss 9.1369 LearningRate 0.0764 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:53,180-Speed 5671.73 samples/sec Loss 9.1065 LearningRate 0.0764 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:21:54,986-Speed 5671.68 samples/sec Loss 8.9842 LearningRate 0.0764 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:21:56,797-Speed 5655.53 samples/sec Loss 9.0656 LearningRate 0.0763 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:21:58,598-Speed 5688.39 samples/sec Loss 9.0917 LearningRate 0.0763 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:00,416-Speed 5633.96 samples/sec Loss 9.1244 LearningRate 0.0763 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:02,250-Speed 5585.61 samples/sec Loss 8.9260 LearningRate 0.0763 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:04,047-Speed 5703.30 samples/sec Loss 8.9894 LearningRate 0.0763 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:05,832-Speed 5738.13 samples/sec Loss 8.9637 LearningRate 0.0763 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:07,649-Speed 5637.49 samples/sec Loss 9.0977 LearningRate 0.0762 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:09,454-Speed 5676.58 samples/sec Loss 8.9268 LearningRate 0.0762 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:11,263-Speed 5662.47 samples/sec Loss 9.0390 LearningRate 0.0762 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:22:13,064-Speed 5688.00 samples/sec Loss 9.0543 LearningRate 0.0762 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:14,888-Speed 5617.97 samples/sec Loss 9.0279 LearningRate 0.0762 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:16,699-Speed 5657.46 samples/sec Loss 9.0128 LearningRate 0.0762 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:18,484-Speed 5737.78 samples/sec Loss 9.0712 LearningRate 0.0762 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:20,278-Speed 5710.76 samples/sec Loss 8.9404 LearningRate 0.0761 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:22,072-Speed 5709.40 samples/sec Loss 8.9340 LearningRate 0.0761 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:23,877-Speed 5675.83 samples/sec Loss 8.9015 LearningRate 0.0761 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:25,663-Speed 5735.99 samples/sec Loss 9.1146 LearningRate 0.0761 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:27,454-Speed 5719.00 samples/sec Loss 9.1301 LearningRate 0.0761 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:29,302-Speed 5544.16 samples/sec Loss 9.0405 LearningRate 0.0761 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:31,161-Speed 5511.15 samples/sec Loss 8.9402 LearningRate 0.0760 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:22:33,004-Speed 5558.98 samples/sec Loss 9.0924 LearningRate 0.0760 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:34,873-Speed 5481.55 samples/sec Loss 9.0720 LearningRate 0.0760 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:36,741-Speed 5482.05 samples/sec Loss 9.0840 LearningRate 0.0760 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:38,557-Speed 5640.38 samples/sec Loss 8.9527 LearningRate 0.0760 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:40,371-Speed 5650.85 samples/sec Loss 8.9491 LearningRate 0.0760 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:42,184-Speed 5650.93 samples/sec Loss 8.9197 LearningRate 0.0760 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:43,991-Speed 5667.80 samples/sec Loss 9.2807 LearningRate 0.0759 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:45,794-Speed 5684.79 samples/sec Loss 8.9509 LearningRate 0.0759 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:47,592-Speed 5696.64 samples/sec Loss 8.9750 LearningRate 0.0759 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:49,408-Speed 5641.42 samples/sec Loss 9.1653 LearningRate 0.0759 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:51,209-Speed 5690.26 samples/sec Loss 8.8854 LearningRate 0.0759 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:22:53,003-Speed 5709.86 samples/sec Loss 8.9620 LearningRate 0.0759 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:54,808-Speed 5674.22 samples/sec Loss 8.9320 LearningRate 0.0758 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:56,673-Speed 5492.20 samples/sec Loss 9.0549 LearningRate 0.0758 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:22:58,470-Speed 5700.46 samples/sec Loss 9.0493 LearningRate 0.0758 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:00,265-Speed 5714.22 samples/sec Loss 8.9852 LearningRate 0.0758 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:02,057-Speed 5714.41 samples/sec Loss 8.9567 LearningRate 0.0758 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:03,888-Speed 5597.10 samples/sec Loss 8.8972 LearningRate 0.0758 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:05,694-Speed 5671.60 samples/sec Loss 8.9602 LearningRate 0.0758 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:07,499-Speed 5675.51 samples/sec Loss 9.0774 LearningRate 0.0757 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:09,289-Speed 5724.11 samples/sec Loss 8.8663 LearningRate 0.0757 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:11,085-Speed 5706.56 samples/sec Loss 8.8297 LearningRate 0.0757 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:12,885-Speed 5688.64 samples/sec Loss 9.1735 LearningRate 0.0757 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:14,701-Speed 5640.10 samples/sec Loss 8.9926 LearningRate 0.0757 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:16,510-Speed 5667.41 samples/sec Loss 8.9586 LearningRate 0.0757 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:18,303-Speed 5711.57 samples/sec Loss 9.0358 LearningRate 0.0756 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:20,161-Speed 5514.13 samples/sec Loss 8.9689 LearningRate 0.0756 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:21,983-Speed 5623.63 samples/sec Loss 8.9104 LearningRate 0.0756 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:23,797-Speed 5653.50 samples/sec Loss 8.9449 LearningRate 0.0756 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:25,623-Speed 5610.72 samples/sec Loss 9.0092 LearningRate 0.0756 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:27,422-Speed 5691.87 samples/sec Loss 8.9656 LearningRate 0.0756 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:29,228-Speed 5673.85 samples/sec Loss 9.0277 LearningRate 0.0756 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:31,030-Speed 5685.46 samples/sec Loss 9.1175 LearningRate 0.0755 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:32,820-Speed 5722.44 samples/sec Loss 8.9195 LearningRate 0.0755 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:34,625-Speed 5678.80 samples/sec Loss 8.9811 LearningRate 0.0755 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:36,487-Speed 5502.84 samples/sec Loss 8.7294 LearningRate 0.0755 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:38,296-Speed 5662.58 samples/sec Loss 8.9314 LearningRate 0.0755 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:40,088-Speed 5716.25 samples/sec Loss 9.1658 LearningRate 0.0755 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:41,898-Speed 5661.44 samples/sec Loss 8.9766 LearningRate 0.0755 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:43,699-Speed 5688.82 samples/sec Loss 8.9697 LearningRate 0.0754 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:45,497-Speed 5697.73 samples/sec Loss 8.8811 LearningRate 0.0754 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:47,311-Speed 5646.40 samples/sec Loss 9.2026 LearningRate 0.0754 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:49,142-Speed 5596.70 samples/sec Loss 8.8922 LearningRate 0.0754 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:23:50,953-Speed 5654.40 samples/sec Loss 8.8725 LearningRate 0.0754 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:52,752-Speed 5696.68 samples/sec Loss 8.8033 LearningRate 0.0754 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:54,552-Speed 5690.58 samples/sec Loss 9.0445 LearningRate 0.0753 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:56,358-Speed 5673.74 samples/sec Loss 9.0373 LearningRate 0.0753 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:58,200-Speed 5561.54 samples/sec Loss 8.9763 LearningRate 0.0753 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:23:59,992-Speed 5716.92 samples/sec Loss 8.8499 LearningRate 0.0753 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:01,789-Speed 5702.72 samples/sec Loss 9.1001 LearningRate 0.0753 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:03,609-Speed 5628.08 samples/sec Loss 9.0090 LearningRate 0.0753 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:05,435-Speed 5611.41 samples/sec Loss 8.9696 LearningRate 0.0753 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:07,237-Speed 5682.72 samples/sec Loss 8.9585 LearningRate 0.0752 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:09,040-Speed 5683.87 samples/sec Loss 8.9354 LearningRate 0.0752 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:10,847-Speed 5669.15 samples/sec Loss 8.7362 LearningRate 0.0752 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:12,651-Speed 5678.00 samples/sec Loss 8.8719 LearningRate 0.0752 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:14,440-Speed 5729.87 samples/sec Loss 8.9343 LearningRate 0.0752 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:16,246-Speed 5671.44 samples/sec Loss 9.0024 LearningRate 0.0752 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:18,046-Speed 5692.40 samples/sec Loss 8.8692 LearningRate 0.0751 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:19,834-Speed 5729.90 samples/sec Loss 9.0175 LearningRate 0.0751 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:21,617-Speed 5744.14 samples/sec Loss 9.0044 LearningRate 0.0751 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:23,421-Speed 5679.70 samples/sec Loss 8.9130 LearningRate 0.0751 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:25,206-Speed 5741.26 samples/sec Loss 8.8206 LearningRate 0.0751 Epoch: 2 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:26,987-Speed 5752.03 samples/sec Loss 9.1444 LearningRate 0.0751 Epoch: 2 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:28,786-Speed 5695.15 samples/sec Loss 8.9889 LearningRate 0.0751 Epoch: 2 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:30,584-Speed 5696.91 samples/sec Loss 8.9567 LearningRate 0.0750 Epoch: 2 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:32,413-Speed 5602.27 samples/sec Loss 8.9057 LearningRate 0.0750 Epoch: 2 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:34,200-Speed 5730.56 samples/sec Loss 8.8341 LearningRate 0.0750 Epoch: 2 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:36,011-Speed 5658.78 samples/sec Loss 8.9357 LearningRate 0.0750 Epoch: 2 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:37,841-Speed 5598.82 samples/sec Loss 9.0280 LearningRate 0.0750 Epoch: 2 Global Step: 15250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:39,650-Speed 5664.18 samples/sec Loss 8.8969 LearningRate 0.0750 Epoch: 2 Global Step: 15260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:41,449-Speed 5694.81 samples/sec Loss 8.8034 LearningRate 0.0749 Epoch: 2 Global Step: 15270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:43,240-Speed 5720.05 samples/sec Loss 9.0282 LearningRate 0.0749 Epoch: 2 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:45,063-Speed 5619.82 samples/sec Loss 8.9256 LearningRate 0.0749 Epoch: 2 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:24:46,913-Speed 5539.76 samples/sec Loss 8.8917 LearningRate 0.0749 Epoch: 2 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:48,710-Speed 5698.22 samples/sec Loss 8.8706 LearningRate 0.0749 Epoch: 2 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:50,510-Speed 5695.44 samples/sec Loss 8.9793 LearningRate 0.0749 Epoch: 2 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:52,306-Speed 5701.84 samples/sec Loss 8.8632 LearningRate 0.0749 Epoch: 2 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:54,110-Speed 5679.59 samples/sec Loss 8.9912 LearningRate 0.0748 Epoch: 2 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:55,919-Speed 5665.63 samples/sec Loss 8.8514 LearningRate 0.0748 Epoch: 2 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:57,715-Speed 5701.95 samples/sec Loss 8.7462 LearningRate 0.0748 Epoch: 2 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:24:59,521-Speed 5673.78 samples/sec Loss 9.0112 LearningRate 0.0748 Epoch: 2 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:01,331-Speed 5659.96 samples/sec Loss 8.9611 LearningRate 0.0748 Epoch: 2 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:03,138-Speed 5670.23 samples/sec Loss 8.8375 LearningRate 0.0748 Epoch: 2 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:04,933-Speed 5707.20 samples/sec Loss 8.9329 LearningRate 0.0747 Epoch: 2 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:06,727-Speed 5710.43 samples/sec Loss 8.9362 LearningRate 0.0747 Epoch: 2 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:08,525-Speed 5696.61 samples/sec Loss 8.9252 LearningRate 0.0747 Epoch: 2 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:10,343-Speed 5637.35 samples/sec Loss 8.7280 LearningRate 0.0747 Epoch: 2 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:12,149-Speed 5673.92 samples/sec Loss 8.8540 LearningRate 0.0747 Epoch: 2 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:13,951-Speed 5685.44 samples/sec Loss 8.7979 LearningRate 0.0747 Epoch: 2 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:15,753-Speed 5685.23 samples/sec Loss 8.8931 LearningRate 0.0747 Epoch: 2 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:17,553-Speed 5691.69 samples/sec Loss 8.9926 LearningRate 0.0746 Epoch: 2 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:19,363-Speed 5659.84 samples/sec Loss 8.7627 LearningRate 0.0746 Epoch: 2 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:21,182-Speed 5634.01 samples/sec Loss 8.9060 LearningRate 0.0746 Epoch: 2 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:22,967-Speed 5740.65 samples/sec Loss 9.0690 LearningRate 0.0746 Epoch: 2 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:24,759-Speed 5716.83 samples/sec Loss 8.9283 LearningRate 0.0746 Epoch: 2 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:26,557-Speed 5697.25 samples/sec Loss 9.0108 LearningRate 0.0746 Epoch: 2 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:28,371-Speed 5646.83 samples/sec Loss 8.8189 LearningRate 0.0746 Epoch: 2 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:30,172-Speed 5690.39 samples/sec Loss 8.8061 LearningRate 0.0745 Epoch: 2 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:31,991-Speed 5632.58 samples/sec Loss 9.0280 LearningRate 0.0745 Epoch: 2 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:33,788-Speed 5700.75 samples/sec Loss 8.9203 LearningRate 0.0745 Epoch: 2 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:35,591-Speed 5681.13 samples/sec Loss 8.8206 LearningRate 0.0745 Epoch: 2 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:37,398-Speed 5673.18 samples/sec Loss 8.8432 LearningRate 0.0745 Epoch: 2 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:39,201-Speed 5679.75 samples/sec Loss 8.8831 LearningRate 0.0745 Epoch: 2 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:41,050-Speed 5543.76 samples/sec Loss 8.9159 LearningRate 0.0744 Epoch: 2 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:42,911-Speed 5509.12 samples/sec Loss 8.9244 LearningRate 0.0744 Epoch: 2 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:44,771-Speed 5505.29 samples/sec Loss 8.7896 LearningRate 0.0744 Epoch: 2 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:46,616-Speed 5558.06 samples/sec Loss 8.9032 LearningRate 0.0744 Epoch: 2 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:48,417-Speed 5687.42 samples/sec Loss 8.7593 LearningRate 0.0744 Epoch: 2 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:50,222-Speed 5676.66 samples/sec Loss 8.6884 LearningRate 0.0744 Epoch: 2 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:52,021-Speed 5695.54 samples/sec Loss 8.7331 LearningRate 0.0744 Epoch: 2 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:53,838-Speed 5640.34 samples/sec Loss 8.8274 LearningRate 0.0743 Epoch: 2 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:55,638-Speed 5692.87 samples/sec Loss 8.8343 LearningRate 0.0743 Epoch: 2 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:57,430-Speed 5719.37 samples/sec Loss 8.9197 LearningRate 0.0743 Epoch: 2 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:25:59,239-Speed 5660.55 samples/sec Loss 8.9241 LearningRate 0.0743 Epoch: 2 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:26:01,033-Speed 5713.17 samples/sec Loss 8.8476 LearningRate 0.0743 Epoch: 2 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:02,839-Speed 5674.95 samples/sec Loss 8.9150 LearningRate 0.0743 Epoch: 2 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:04,646-Speed 5672.17 samples/sec Loss 8.7551 LearningRate 0.0742 Epoch: 2 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:06,478-Speed 5594.48 samples/sec Loss 8.7329 LearningRate 0.0742 Epoch: 2 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:08,307-Speed 5599.98 samples/sec Loss 8.8935 LearningRate 0.0742 Epoch: 2 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:10,099-Speed 5718.36 samples/sec Loss 8.8623 LearningRate 0.0742 Epoch: 2 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:11,897-Speed 5695.06 samples/sec Loss 8.7388 LearningRate 0.0742 Epoch: 2 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:13,701-Speed 5678.67 samples/sec Loss 8.8885 LearningRate 0.0742 Epoch: 2 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:15,506-Speed 5677.32 samples/sec Loss 8.6891 LearningRate 0.0742 Epoch: 2 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:17,311-Speed 5678.26 samples/sec Loss 8.7881 LearningRate 0.0741 Epoch: 2 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:19,094-Speed 5745.98 samples/sec Loss 8.8302 LearningRate 0.0741 Epoch: 2 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:20,896-Speed 5682.57 samples/sec Loss 8.8551 LearningRate 0.0741 Epoch: 2 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:22,693-Speed 5702.78 samples/sec Loss 8.8676 LearningRate 0.0741 Epoch: 2 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:24,483-Speed 5724.83 samples/sec Loss 8.9331 LearningRate 0.0741 Epoch: 2 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:26,278-Speed 5706.79 samples/sec Loss 8.8677 LearningRate 0.0741 Epoch: 2 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:28,087-Speed 5663.36 samples/sec Loss 8.8749 LearningRate 0.0741 Epoch: 2 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:29,896-Speed 5661.34 samples/sec Loss 8.8466 LearningRate 0.0740 Epoch: 2 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:31,695-Speed 5696.35 samples/sec Loss 8.9505 LearningRate 0.0740 Epoch: 2 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:33,500-Speed 5675.83 samples/sec Loss 8.8544 LearningRate 0.0740 Epoch: 2 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:35,324-Speed 5615.29 samples/sec Loss 8.8109 LearningRate 0.0740 Epoch: 2 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:37,118-Speed 5713.09 samples/sec Loss 8.7582 LearningRate 0.0740 Epoch: 2 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:38,971-Speed 5526.85 samples/sec Loss 8.5775 LearningRate 0.0740 Epoch: 2 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:40,779-Speed 5668.63 samples/sec Loss 8.6625 LearningRate 0.0739 Epoch: 2 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:42,594-Speed 5643.47 samples/sec Loss 8.6943 LearningRate 0.0739 Epoch: 2 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:44,403-Speed 5664.81 samples/sec Loss 8.7998 LearningRate 0.0739 Epoch: 2 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:46,200-Speed 5699.62 samples/sec Loss 8.7113 LearningRate 0.0739 Epoch: 2 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:48,009-Speed 5665.17 samples/sec Loss 8.7996 LearningRate 0.0739 Epoch: 2 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:49,810-Speed 5688.01 samples/sec Loss 8.7586 LearningRate 0.0739 Epoch: 2 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:51,604-Speed 5714.31 samples/sec Loss 8.8093 LearningRate 0.0739 Epoch: 2 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:26:53,400-Speed 5703.64 samples/sec Loss 8.7692 LearningRate 0.0738 Epoch: 2 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:27:20,430-[lfw][16000]XNorm: 21.846058 Training: 2022-04-27 02:27:20,431-[lfw][16000]Accuracy-Flip: 0.99500+-0.00342 Training: 2022-04-27 02:27:20,432-[lfw][16000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:27:51,431-[cfp_fp][16000]XNorm: 18.664564 Training: 2022-04-27 02:27:51,432-[cfp_fp][16000]Accuracy-Flip: 0.92200+-0.01685 Training: 2022-04-27 02:27:51,433-[cfp_fp][16000]Accuracy-Highest: 0.92343 Training: 2022-04-27 02:28:17,981-[agedb_30][16000]XNorm: 21.657405 Training: 2022-04-27 02:28:17,981-[agedb_30][16000]Accuracy-Flip: 0.96383+-0.00997 Training: 2022-04-27 02:28:17,982-[agedb_30][16000]Accuracy-Highest: 0.96383 Training: 2022-04-27 02:28:19,783-Speed 118.54 samples/sec Loss 8.8321 LearningRate 0.0738 Epoch: 2 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:21,596-Speed 5652.78 samples/sec Loss 8.9230 LearningRate 0.0738 Epoch: 2 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:23,383-Speed 5733.09 samples/sec Loss 8.8022 LearningRate 0.0738 Epoch: 2 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:25,185-Speed 5686.92 samples/sec Loss 8.7705 LearningRate 0.0738 Epoch: 2 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:27,024-Speed 5570.84 samples/sec Loss 8.6551 LearningRate 0.0738 Epoch: 2 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:28,878-Speed 5525.37 samples/sec Loss 8.6625 LearningRate 0.0737 Epoch: 2 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:30,697-Speed 5634.43 samples/sec Loss 8.5044 LearningRate 0.0737 Epoch: 2 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:32,531-Speed 5585.52 samples/sec Loss 8.7866 LearningRate 0.0737 Epoch: 2 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:34,329-Speed 5696.70 samples/sec Loss 8.7948 LearningRate 0.0737 Epoch: 2 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:36,119-Speed 5722.82 samples/sec Loss 8.9348 LearningRate 0.0737 Epoch: 2 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:37,917-Speed 5699.16 samples/sec Loss 8.8102 LearningRate 0.0737 Epoch: 2 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:39,719-Speed 5684.91 samples/sec Loss 8.7353 LearningRate 0.0737 Epoch: 2 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:41,520-Speed 5690.57 samples/sec Loss 8.8066 LearningRate 0.0736 Epoch: 2 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:43,315-Speed 5706.69 samples/sec Loss 8.5475 LearningRate 0.0736 Epoch: 2 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:28:45,119-Speed 5678.48 samples/sec Loss 8.8323 LearningRate 0.0736 Epoch: 2 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:46,929-Speed 5661.38 samples/sec Loss 8.8680 LearningRate 0.0736 Epoch: 2 Global Step: 16160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:48,751-Speed 5622.05 samples/sec Loss 8.8023 LearningRate 0.0736 Epoch: 2 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:50,555-Speed 5680.82 samples/sec Loss 8.7706 LearningRate 0.0736 Epoch: 2 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:52,371-Speed 5639.58 samples/sec Loss 8.8292 LearningRate 0.0736 Epoch: 2 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:54,168-Speed 5703.83 samples/sec Loss 8.8139 LearningRate 0.0735 Epoch: 2 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:55,993-Speed 5611.17 samples/sec Loss 8.8175 LearningRate 0.0735 Epoch: 2 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:57,801-Speed 5670.37 samples/sec Loss 8.6322 LearningRate 0.0735 Epoch: 2 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:28:59,622-Speed 5624.00 samples/sec Loss 8.6983 LearningRate 0.0735 Epoch: 2 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:29:01,467-Speed 5553.81 samples/sec Loss 8.7208 LearningRate 0.0735 Epoch: 2 Global Step: 16240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:29:03,277-Speed 5661.34 samples/sec Loss 8.7706 LearningRate 0.0735 Epoch: 2 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:05,089-Speed 5652.33 samples/sec Loss 8.7293 LearningRate 0.0734 Epoch: 2 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:06,901-Speed 5656.85 samples/sec Loss 8.6218 LearningRate 0.0734 Epoch: 2 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:08,695-Speed 5710.09 samples/sec Loss 8.6388 LearningRate 0.0734 Epoch: 2 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:10,519-Speed 5617.80 samples/sec Loss 8.8273 LearningRate 0.0734 Epoch: 2 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:12,315-Speed 5701.22 samples/sec Loss 8.7567 LearningRate 0.0734 Epoch: 2 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:14,107-Speed 5719.97 samples/sec Loss 8.6831 LearningRate 0.0734 Epoch: 2 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:15,909-Speed 5684.72 samples/sec Loss 8.7244 LearningRate 0.0734 Epoch: 2 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:17,751-Speed 5562.32 samples/sec Loss 8.7272 LearningRate 0.0733 Epoch: 2 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:19,585-Speed 5586.51 samples/sec Loss 8.8356 LearningRate 0.0733 Epoch: 2 Global Step: 16340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:21,375-Speed 5722.58 samples/sec Loss 8.8273 LearningRate 0.0733 Epoch: 2 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:23,258-Speed 5442.07 samples/sec Loss 8.8067 LearningRate 0.0733 Epoch: 2 Global Step: 16360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:25,061-Speed 5681.87 samples/sec Loss 8.8323 LearningRate 0.0733 Epoch: 2 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:26,884-Speed 5618.65 samples/sec Loss 8.7895 LearningRate 0.0733 Epoch: 2 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:28,684-Speed 5691.25 samples/sec Loss 8.6869 LearningRate 0.0733 Epoch: 2 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:30,507-Speed 5622.61 samples/sec Loss 8.7197 LearningRate 0.0732 Epoch: 2 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:32,319-Speed 5652.24 samples/sec Loss 8.5760 LearningRate 0.0732 Epoch: 2 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:34,112-Speed 5712.26 samples/sec Loss 8.8136 LearningRate 0.0732 Epoch: 2 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:35,900-Speed 5729.17 samples/sec Loss 8.5345 LearningRate 0.0732 Epoch: 2 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:37,715-Speed 5646.31 samples/sec Loss 8.7178 LearningRate 0.0732 Epoch: 2 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:39,503-Speed 5729.06 samples/sec Loss 8.5687 LearningRate 0.0732 Epoch: 2 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:41,332-Speed 5600.54 samples/sec Loss 8.8896 LearningRate 0.0731 Epoch: 2 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:43,141-Speed 5664.07 samples/sec Loss 8.7462 LearningRate 0.0731 Epoch: 2 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:44,941-Speed 5693.70 samples/sec Loss 8.5461 LearningRate 0.0731 Epoch: 2 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:46,759-Speed 5635.54 samples/sec Loss 8.8619 LearningRate 0.0731 Epoch: 2 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:48,561-Speed 5685.42 samples/sec Loss 8.6666 LearningRate 0.0731 Epoch: 2 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:50,367-Speed 5672.39 samples/sec Loss 8.7250 LearningRate 0.0731 Epoch: 2 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:52,194-Speed 5609.64 samples/sec Loss 8.6230 LearningRate 0.0731 Epoch: 2 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:53,989-Speed 5705.28 samples/sec Loss 8.6906 LearningRate 0.0730 Epoch: 2 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:55,807-Speed 5641.17 samples/sec Loss 8.7636 LearningRate 0.0730 Epoch: 2 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:57,607-Speed 5690.68 samples/sec Loss 8.7569 LearningRate 0.0730 Epoch: 2 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:29:59,411-Speed 5680.09 samples/sec Loss 8.6869 LearningRate 0.0730 Epoch: 2 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:01,204-Speed 5714.24 samples/sec Loss 8.5304 LearningRate 0.0730 Epoch: 2 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:03,024-Speed 5626.68 samples/sec Loss 8.7424 LearningRate 0.0730 Epoch: 2 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:04,823-Speed 5696.78 samples/sec Loss 8.6832 LearningRate 0.0730 Epoch: 2 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:06,629-Speed 5673.35 samples/sec Loss 8.6407 LearningRate 0.0729 Epoch: 2 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:08,453-Speed 5618.68 samples/sec Loss 8.6473 LearningRate 0.0729 Epoch: 2 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:10,266-Speed 5649.67 samples/sec Loss 8.5312 LearningRate 0.0729 Epoch: 2 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:12,105-Speed 5570.59 samples/sec Loss 8.6958 LearningRate 0.0729 Epoch: 2 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:13,912-Speed 5671.96 samples/sec Loss 8.5922 LearningRate 0.0729 Epoch: 2 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:15,707-Speed 5705.36 samples/sec Loss 8.8744 LearningRate 0.0729 Epoch: 2 Global Step: 16650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:30:17,538-Speed 5597.40 samples/sec Loss 8.6529 LearningRate 0.0728 Epoch: 2 Global Step: 16660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:30:19,331-Speed 5712.47 samples/sec Loss 8.6671 LearningRate 0.0728 Epoch: 2 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:21,145-Speed 5651.25 samples/sec Loss 8.7929 LearningRate 0.0728 Epoch: 2 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:22,951-Speed 5670.42 samples/sec Loss 8.7745 LearningRate 0.0728 Epoch: 2 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:24,750-Speed 5695.38 samples/sec Loss 8.6930 LearningRate 0.0728 Epoch: 2 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:26,595-Speed 5552.12 samples/sec Loss 8.7618 LearningRate 0.0728 Epoch: 2 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:28,440-Speed 5555.29 samples/sec Loss 8.7139 LearningRate 0.0728 Epoch: 2 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:30,239-Speed 5693.93 samples/sec Loss 8.7709 LearningRate 0.0727 Epoch: 2 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:32,048-Speed 5664.83 samples/sec Loss 8.7664 LearningRate 0.0727 Epoch: 2 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:33,873-Speed 5613.50 samples/sec Loss 8.7121 LearningRate 0.0727 Epoch: 2 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:35,687-Speed 5646.87 samples/sec Loss 8.6044 LearningRate 0.0727 Epoch: 2 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:37,520-Speed 5590.57 samples/sec Loss 8.5538 LearningRate 0.0727 Epoch: 2 Global Step: 16770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:39,364-Speed 5556.28 samples/sec Loss 8.6801 LearningRate 0.0727 Epoch: 2 Global Step: 16780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:41,198-Speed 5584.43 samples/sec Loss 8.6808 LearningRate 0.0727 Epoch: 2 Global Step: 16790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:30:43,030-Speed 5592.83 samples/sec Loss 8.7454 LearningRate 0.0726 Epoch: 2 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:44,891-Speed 5504.97 samples/sec Loss 8.5544 LearningRate 0.0726 Epoch: 2 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:46,713-Speed 5624.53 samples/sec Loss 8.4573 LearningRate 0.0726 Epoch: 2 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:48,536-Speed 5621.41 samples/sec Loss 8.6803 LearningRate 0.0726 Epoch: 2 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:50,353-Speed 5638.55 samples/sec Loss 8.5615 LearningRate 0.0726 Epoch: 2 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:52,190-Speed 5575.58 samples/sec Loss 8.6580 LearningRate 0.0726 Epoch: 2 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:54,015-Speed 5615.53 samples/sec Loss 8.6083 LearningRate 0.0725 Epoch: 2 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:55,807-Speed 5716.86 samples/sec Loss 8.7597 LearningRate 0.0725 Epoch: 2 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:57,608-Speed 5688.90 samples/sec Loss 8.6918 LearningRate 0.0725 Epoch: 2 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:30:59,412-Speed 5680.96 samples/sec Loss 8.8556 LearningRate 0.0725 Epoch: 2 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:01,218-Speed 5671.29 samples/sec Loss 8.5750 LearningRate 0.0725 Epoch: 2 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:03,016-Speed 5703.31 samples/sec Loss 8.5139 LearningRate 0.0725 Epoch: 2 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:04,807-Speed 5720.05 samples/sec Loss 8.5183 LearningRate 0.0725 Epoch: 2 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:06,613-Speed 5672.30 samples/sec Loss 8.7017 LearningRate 0.0724 Epoch: 2 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:08,450-Speed 5577.28 samples/sec Loss 8.6892 LearningRate 0.0724 Epoch: 2 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:10,277-Speed 5608.87 samples/sec Loss 8.6581 LearningRate 0.0724 Epoch: 2 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:12,097-Speed 5630.23 samples/sec Loss 8.6513 LearningRate 0.0724 Epoch: 2 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:13,906-Speed 5662.56 samples/sec Loss 8.6960 LearningRate 0.0724 Epoch: 2 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:15,710-Speed 5679.26 samples/sec Loss 8.5887 LearningRate 0.0724 Epoch: 2 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:17,513-Speed 5682.10 samples/sec Loss 8.6307 LearningRate 0.0724 Epoch: 2 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:19,341-Speed 5605.96 samples/sec Loss 8.6001 LearningRate 0.0723 Epoch: 2 Global Step: 17000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:31:21,128-Speed 5733.83 samples/sec Loss 8.6960 LearningRate 0.0723 Epoch: 2 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:22,965-Speed 5577.86 samples/sec Loss 8.5624 LearningRate 0.0723 Epoch: 2 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:24,791-Speed 5609.99 samples/sec Loss 8.4896 LearningRate 0.0723 Epoch: 2 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:26,591-Speed 5691.12 samples/sec Loss 8.5987 LearningRate 0.0723 Epoch: 2 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:28,499-Speed 5372.00 samples/sec Loss 8.8981 LearningRate 0.0723 Epoch: 2 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:40,164-Speed 877.93 samples/sec Loss 8.5458 LearningRate 0.0722 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:42,202-Speed 5027.19 samples/sec Loss 7.9754 LearningRate 0.0722 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:44,023-Speed 5628.05 samples/sec Loss 7.8362 LearningRate 0.0722 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:45,847-Speed 5619.11 samples/sec Loss 8.0257 LearningRate 0.0722 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:47,656-Speed 5662.45 samples/sec Loss 7.9474 LearningRate 0.0722 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:49,601-Speed 5267.12 samples/sec Loss 7.8727 LearningRate 0.0722 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:51,532-Speed 5308.59 samples/sec Loss 7.9359 LearningRate 0.0722 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:53,343-Speed 5657.10 samples/sec Loss 7.8595 LearningRate 0.0721 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:55,147-Speed 5679.26 samples/sec Loss 8.1134 LearningRate 0.0721 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:31:56,940-Speed 5714.04 samples/sec Loss 8.1087 LearningRate 0.0721 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:31:58,744-Speed 5679.22 samples/sec Loss 8.2271 LearningRate 0.0721 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:00,562-Speed 5636.18 samples/sec Loss 8.1711 LearningRate 0.0721 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:02,393-Speed 5595.79 samples/sec Loss 7.9409 LearningRate 0.0721 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:04,203-Speed 5659.79 samples/sec Loss 8.1410 LearningRate 0.0721 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:06,004-Speed 5687.02 samples/sec Loss 8.0676 LearningRate 0.0720 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:07,820-Speed 5640.41 samples/sec Loss 8.1410 LearningRate 0.0720 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:09,632-Speed 5655.12 samples/sec Loss 8.1718 LearningRate 0.0720 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:11,424-Speed 5717.83 samples/sec Loss 8.1878 LearningRate 0.0720 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:13,278-Speed 5524.94 samples/sec Loss 8.1929 LearningRate 0.0720 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:15,082-Speed 5679.79 samples/sec Loss 8.1476 LearningRate 0.0720 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:16,871-Speed 5724.94 samples/sec Loss 8.2591 LearningRate 0.0719 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:18,685-Speed 5649.39 samples/sec Loss 8.3397 LearningRate 0.0719 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:20,487-Speed 5685.87 samples/sec Loss 8.2492 LearningRate 0.0719 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:22,310-Speed 5619.70 samples/sec Loss 8.2297 LearningRate 0.0719 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:24,116-Speed 5672.65 samples/sec Loss 8.2746 LearningRate 0.0719 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:25,914-Speed 5698.41 samples/sec Loss 8.2327 LearningRate 0.0719 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:27,733-Speed 5633.42 samples/sec Loss 8.3393 LearningRate 0.0719 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:29,554-Speed 5627.36 samples/sec Loss 8.1096 LearningRate 0.0718 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:31,382-Speed 5603.61 samples/sec Loss 8.1590 LearningRate 0.0718 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:33,191-Speed 5664.45 samples/sec Loss 8.3881 LearningRate 0.0718 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:34,995-Speed 5679.57 samples/sec Loss 8.1392 LearningRate 0.0718 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:36,833-Speed 5573.69 samples/sec Loss 8.3892 LearningRate 0.0718 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:38,665-Speed 5591.88 samples/sec Loss 8.3568 LearningRate 0.0718 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:40,469-Speed 5679.94 samples/sec Loss 8.1845 LearningRate 0.0718 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:42,314-Speed 5554.69 samples/sec Loss 8.2525 LearningRate 0.0717 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:44,160-Speed 5548.55 samples/sec Loss 8.2585 LearningRate 0.0717 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:45,978-Speed 5635.00 samples/sec Loss 8.3618 LearningRate 0.0717 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:47,790-Speed 5655.14 samples/sec Loss 8.3242 LearningRate 0.0717 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:49,625-Speed 5580.30 samples/sec Loss 8.2664 LearningRate 0.0717 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:32:51,447-Speed 5625.29 samples/sec Loss 8.3507 LearningRate 0.0717 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:53,275-Speed 5606.04 samples/sec Loss 8.4257 LearningRate 0.0717 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:55,085-Speed 5658.67 samples/sec Loss 8.4090 LearningRate 0.0716 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:56,906-Speed 5627.62 samples/sec Loss 8.3149 LearningRate 0.0716 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:32:58,718-Speed 5657.11 samples/sec Loss 8.2471 LearningRate 0.0716 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:00,527-Speed 5663.52 samples/sec Loss 8.2851 LearningRate 0.0716 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:02,362-Speed 5584.19 samples/sec Loss 8.2576 LearningRate 0.0716 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:04,186-Speed 5616.60 samples/sec Loss 8.1898 LearningRate 0.0716 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:05,994-Speed 5664.59 samples/sec Loss 8.1712 LearningRate 0.0715 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:07,797-Speed 5683.71 samples/sec Loss 8.1893 LearningRate 0.0715 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:09,593-Speed 5702.87 samples/sec Loss 8.4064 LearningRate 0.0715 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:11,406-Speed 5650.53 samples/sec Loss 8.3539 LearningRate 0.0715 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:13,217-Speed 5657.62 samples/sec Loss 8.3946 LearningRate 0.0715 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:15,027-Speed 5661.73 samples/sec Loss 8.2741 LearningRate 0.0715 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:16,881-Speed 5524.56 samples/sec Loss 8.4603 LearningRate 0.0715 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:18,684-Speed 5681.63 samples/sec Loss 8.3465 LearningRate 0.0714 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:20,478-Speed 5711.90 samples/sec Loss 8.4804 LearningRate 0.0714 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:22,294-Speed 5640.76 samples/sec Loss 8.3364 LearningRate 0.0714 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:24,110-Speed 5643.11 samples/sec Loss 8.3313 LearningRate 0.0714 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:25,954-Speed 5554.61 samples/sec Loss 8.3908 LearningRate 0.0714 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:27,775-Speed 5649.19 samples/sec Loss 8.2949 LearningRate 0.0714 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:33:29,591-Speed 5646.06 samples/sec Loss 8.3365 LearningRate 0.0714 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:31,395-Speed 5676.47 samples/sec Loss 8.2960 LearningRate 0.0713 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:33,187-Speed 5718.24 samples/sec Loss 8.4783 LearningRate 0.0713 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:34,993-Speed 5672.63 samples/sec Loss 8.5359 LearningRate 0.0713 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:36,783-Speed 5723.48 samples/sec Loss 8.3215 LearningRate 0.0713 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:38,574-Speed 5719.39 samples/sec Loss 8.3830 LearningRate 0.0713 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:40,381-Speed 5670.07 samples/sec Loss 8.3611 LearningRate 0.0713 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:42,270-Speed 5425.41 samples/sec Loss 8.3514 LearningRate 0.0712 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:44,093-Speed 5618.67 samples/sec Loss 8.5520 LearningRate 0.0712 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:45,898-Speed 5674.33 samples/sec Loss 8.2699 LearningRate 0.0712 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:47,680-Speed 5749.78 samples/sec Loss 8.5283 LearningRate 0.0712 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:49,502-Speed 5624.22 samples/sec Loss 8.4532 LearningRate 0.0712 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:51,302-Speed 5690.74 samples/sec Loss 8.3108 LearningRate 0.0712 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:53,110-Speed 5667.50 samples/sec Loss 8.4022 LearningRate 0.0712 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:54,929-Speed 5631.37 samples/sec Loss 8.4019 LearningRate 0.0711 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:56,737-Speed 5667.35 samples/sec Loss 8.3025 LearningRate 0.0711 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:33:58,555-Speed 5635.27 samples/sec Loss 8.4353 LearningRate 0.0711 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:00,399-Speed 5577.09 samples/sec Loss 8.4107 LearningRate 0.0711 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:02,216-Speed 5637.08 samples/sec Loss 8.4657 LearningRate 0.0711 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:04,037-Speed 5626.06 samples/sec Loss 8.3889 LearningRate 0.0711 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:05,820-Speed 5746.32 samples/sec Loss 8.5415 LearningRate 0.0711 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:07,641-Speed 5625.66 samples/sec Loss 8.3466 LearningRate 0.0710 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:09,484-Speed 5558.24 samples/sec Loss 8.3410 LearningRate 0.0710 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:11,284-Speed 5695.05 samples/sec Loss 8.4210 LearningRate 0.0710 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:13,090-Speed 5671.05 samples/sec Loss 8.4518 LearningRate 0.0710 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:34:14,905-Speed 5645.19 samples/sec Loss 8.5334 LearningRate 0.0710 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:16,728-Speed 5619.69 samples/sec Loss 8.4335 LearningRate 0.0710 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:18,578-Speed 5537.87 samples/sec Loss 8.5289 LearningRate 0.0710 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:20,390-Speed 5654.97 samples/sec Loss 8.5761 LearningRate 0.0709 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:22,227-Speed 5575.77 samples/sec Loss 8.4816 LearningRate 0.0709 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:24,052-Speed 5613.91 samples/sec Loss 8.4410 LearningRate 0.0709 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:25,906-Speed 5526.80 samples/sec Loss 8.2965 LearningRate 0.0709 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:27,693-Speed 5731.42 samples/sec Loss 8.5643 LearningRate 0.0709 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:29,513-Speed 5629.67 samples/sec Loss 8.4499 LearningRate 0.0709 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:31,331-Speed 5637.67 samples/sec Loss 8.5323 LearningRate 0.0708 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:34:57,837-[lfw][18000]XNorm: 22.767255 Training: 2022-04-27 02:34:57,838-[lfw][18000]Accuracy-Flip: 0.99567+-0.00281 Training: 2022-04-27 02:34:57,839-[lfw][18000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:35:28,673-[cfp_fp][18000]XNorm: 19.326344 Training: 2022-04-27 02:35:28,674-[cfp_fp][18000]Accuracy-Flip: 0.92057+-0.01528 Training: 2022-04-27 02:35:28,675-[cfp_fp][18000]Accuracy-Highest: 0.92343 Training: 2022-04-27 02:35:55,172-[agedb_30][18000]XNorm: 22.428629 Training: 2022-04-27 02:35:55,173-[agedb_30][18000]Accuracy-Flip: 0.96367+-0.00826 Training: 2022-04-27 02:35:55,173-[agedb_30][18000]Accuracy-Highest: 0.96383 Training: 2022-04-27 02:35:56,985-Speed 119.55 samples/sec Loss 8.5343 LearningRate 0.0708 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:35:58,797-Speed 5653.55 samples/sec Loss 8.2422 LearningRate 0.0708 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:00,588-Speed 5721.25 samples/sec Loss 8.3996 LearningRate 0.0708 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:02,416-Speed 5605.78 samples/sec Loss 8.4968 LearningRate 0.0708 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:04,239-Speed 5620.66 samples/sec Loss 8.3137 LearningRate 0.0708 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:06,031-Speed 5717.76 samples/sec Loss 8.4336 LearningRate 0.0708 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:07,881-Speed 5536.69 samples/sec Loss 8.4393 LearningRate 0.0707 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:09,701-Speed 5631.30 samples/sec Loss 8.4146 LearningRate 0.0707 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:11,512-Speed 5655.90 samples/sec Loss 8.5153 LearningRate 0.0707 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:13,319-Speed 5669.39 samples/sec Loss 8.3470 LearningRate 0.0707 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:15,123-Speed 5680.11 samples/sec Loss 8.4384 LearningRate 0.0707 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:16,916-Speed 5715.05 samples/sec Loss 8.4874 LearningRate 0.0707 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:18,706-Speed 5722.12 samples/sec Loss 8.4564 LearningRate 0.0707 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:20,506-Speed 5692.06 samples/sec Loss 8.2547 LearningRate 0.0706 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:22,286-Speed 5754.92 samples/sec Loss 8.4558 LearningRate 0.0706 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:24,088-Speed 5685.60 samples/sec Loss 8.3630 LearningRate 0.0706 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:25,921-Speed 5591.06 samples/sec Loss 8.3808 LearningRate 0.0706 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:27,711-Speed 5722.93 samples/sec Loss 8.3710 LearningRate 0.0706 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:29,515-Speed 5678.55 samples/sec Loss 8.4228 LearningRate 0.0706 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:31,309-Speed 5710.30 samples/sec Loss 8.4983 LearningRate 0.0706 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:33,118-Speed 5664.38 samples/sec Loss 8.3403 LearningRate 0.0705 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:34,906-Speed 5731.45 samples/sec Loss 8.3803 LearningRate 0.0705 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:36,693-Speed 5730.52 samples/sec Loss 8.5510 LearningRate 0.0705 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:38,513-Speed 5630.06 samples/sec Loss 8.2846 LearningRate 0.0705 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:40,335-Speed 5622.66 samples/sec Loss 8.5901 LearningRate 0.0705 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:42,140-Speed 5674.76 samples/sec Loss 8.3483 LearningRate 0.0705 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:43,993-Speed 5528.69 samples/sec Loss 8.3887 LearningRate 0.0704 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:45,782-Speed 5726.92 samples/sec Loss 8.3605 LearningRate 0.0704 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:47,579-Speed 5701.85 samples/sec Loss 8.4801 LearningRate 0.0704 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:49,405-Speed 5609.62 samples/sec Loss 8.4864 LearningRate 0.0704 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:51,217-Speed 5654.81 samples/sec Loss 8.2974 LearningRate 0.0704 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:53,025-Speed 5665.25 samples/sec Loss 8.2172 LearningRate 0.0704 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:54,845-Speed 5629.71 samples/sec Loss 8.4339 LearningRate 0.0704 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:36:56,644-Speed 5695.60 samples/sec Loss 8.4340 LearningRate 0.0703 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:36:58,427-Speed 5745.29 samples/sec Loss 8.3267 LearningRate 0.0703 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:00,237-Speed 5660.82 samples/sec Loss 8.5275 LearningRate 0.0703 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:02,046-Speed 5662.57 samples/sec Loss 8.6164 LearningRate 0.0703 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:03,839-Speed 5714.48 samples/sec Loss 8.3215 LearningRate 0.0703 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:05,664-Speed 5612.40 samples/sec Loss 8.4056 LearningRate 0.0703 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:07,502-Speed 5573.06 samples/sec Loss 8.2994 LearningRate 0.0703 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:09,310-Speed 5665.62 samples/sec Loss 8.4662 LearningRate 0.0702 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:11,114-Speed 5680.60 samples/sec Loss 8.3911 LearningRate 0.0702 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:12,903-Speed 5726.93 samples/sec Loss 8.4963 LearningRate 0.0702 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:14,712-Speed 5662.68 samples/sec Loss 8.3246 LearningRate 0.0702 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:16,500-Speed 5729.30 samples/sec Loss 8.4406 LearningRate 0.0702 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:18,296-Speed 5702.97 samples/sec Loss 8.5177 LearningRate 0.0702 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:20,103-Speed 5668.69 samples/sec Loss 8.3149 LearningRate 0.0702 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:21,926-Speed 5619.76 samples/sec Loss 8.6009 LearningRate 0.0701 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:23,758-Speed 5591.59 samples/sec Loss 8.4070 LearningRate 0.0701 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:25,597-Speed 5572.51 samples/sec Loss 8.4287 LearningRate 0.0701 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:27,445-Speed 5544.07 samples/sec Loss 8.4747 LearningRate 0.0701 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:29,247-Speed 5686.71 samples/sec Loss 8.5390 LearningRate 0.0701 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:31,065-Speed 5633.73 samples/sec Loss 8.4454 LearningRate 0.0701 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:32,877-Speed 5653.18 samples/sec Loss 8.4813 LearningRate 0.0701 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:34,691-Speed 5646.88 samples/sec Loss 8.4360 LearningRate 0.0700 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:36,493-Speed 5685.04 samples/sec Loss 8.4790 LearningRate 0.0700 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:38,309-Speed 5640.95 samples/sec Loss 8.4045 LearningRate 0.0700 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:40,160-Speed 5535.78 samples/sec Loss 8.4718 LearningRate 0.0700 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:42,010-Speed 5536.54 samples/sec Loss 8.4224 LearningRate 0.0700 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:37:43,805-Speed 5707.99 samples/sec Loss 8.2933 LearningRate 0.0700 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:45,618-Speed 5649.30 samples/sec Loss 8.3618 LearningRate 0.0699 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:47,420-Speed 5687.11 samples/sec Loss 8.3943 LearningRate 0.0699 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:49,229-Speed 5661.28 samples/sec Loss 8.5983 LearningRate 0.0699 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:51,056-Speed 5608.70 samples/sec Loss 8.4922 LearningRate 0.0699 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:52,847-Speed 5721.26 samples/sec Loss 8.4613 LearningRate 0.0699 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:54,678-Speed 5595.36 samples/sec Loss 8.4857 LearningRate 0.0699 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:56,491-Speed 5650.42 samples/sec Loss 8.3915 LearningRate 0.0699 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:37:58,312-Speed 5624.43 samples/sec Loss 8.1290 LearningRate 0.0698 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:00,133-Speed 5628.34 samples/sec Loss 8.4954 LearningRate 0.0698 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:01,986-Speed 5527.69 samples/sec Loss 8.3874 LearningRate 0.0698 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:03,788-Speed 5685.54 samples/sec Loss 8.2651 LearningRate 0.0698 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:05,617-Speed 5601.46 samples/sec Loss 8.3409 LearningRate 0.0698 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:07,417-Speed 5695.48 samples/sec Loss 8.3706 LearningRate 0.0698 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:09,211-Speed 5711.01 samples/sec Loss 8.3220 LearningRate 0.0698 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:11,018-Speed 5667.32 samples/sec Loss 8.3992 LearningRate 0.0697 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:12,854-Speed 5581.66 samples/sec Loss 8.3869 LearningRate 0.0697 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:14,676-Speed 5622.99 samples/sec Loss 8.3897 LearningRate 0.0697 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:16,498-Speed 5621.19 samples/sec Loss 8.4682 LearningRate 0.0697 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:18,334-Speed 5580.38 samples/sec Loss 8.2223 LearningRate 0.0697 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:20,168-Speed 5587.54 samples/sec Loss 8.3587 LearningRate 0.0697 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:38:21,994-Speed 5611.92 samples/sec Loss 8.3946 LearningRate 0.0697 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:23,799-Speed 5677.53 samples/sec Loss 8.3474 LearningRate 0.0696 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:25,637-Speed 5572.71 samples/sec Loss 8.3207 LearningRate 0.0696 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:27,455-Speed 5633.25 samples/sec Loss 8.3161 LearningRate 0.0696 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:29,265-Speed 5661.59 samples/sec Loss 8.2828 LearningRate 0.0696 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:31,063-Speed 5699.18 samples/sec Loss 8.3774 LearningRate 0.0696 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:32,874-Speed 5657.34 samples/sec Loss 8.4149 LearningRate 0.0696 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:34,681-Speed 5668.78 samples/sec Loss 8.3748 LearningRate 0.0696 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:36,477-Speed 5704.80 samples/sec Loss 8.3782 LearningRate 0.0695 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:38,288-Speed 5657.85 samples/sec Loss 8.3291 LearningRate 0.0695 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:40,068-Speed 5754.61 samples/sec Loss 8.4196 LearningRate 0.0695 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:41,873-Speed 5683.77 samples/sec Loss 8.3016 LearningRate 0.0695 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:43,677-Speed 5678.95 samples/sec Loss 8.3767 LearningRate 0.0695 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:45,461-Speed 5743.79 samples/sec Loss 8.3087 LearningRate 0.0695 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:47,267-Speed 5674.34 samples/sec Loss 8.3489 LearningRate 0.0694 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:49,059-Speed 5717.11 samples/sec Loss 8.3205 LearningRate 0.0694 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:50,856-Speed 5698.44 samples/sec Loss 8.3590 LearningRate 0.0694 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:52,677-Speed 5628.43 samples/sec Loss 8.2597 LearningRate 0.0694 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:38:54,484-Speed 5670.38 samples/sec Loss 8.3710 LearningRate 0.0694 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:38:56,271-Speed 5729.64 samples/sec Loss 8.3074 LearningRate 0.0694 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:38:58,056-Speed 5741.39 samples/sec Loss 8.4247 LearningRate 0.0694 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:38:59,878-Speed 5624.51 samples/sec Loss 8.3125 LearningRate 0.0693 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:01,686-Speed 5665.41 samples/sec Loss 8.2646 LearningRate 0.0693 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:03,479-Speed 5712.99 samples/sec Loss 8.3093 LearningRate 0.0693 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:05,285-Speed 5674.44 samples/sec Loss 8.1821 LearningRate 0.0693 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:07,099-Speed 5645.09 samples/sec Loss 8.4694 LearningRate 0.0693 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:08,900-Speed 5687.58 samples/sec Loss 8.4165 LearningRate 0.0693 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:10,698-Speed 5698.57 samples/sec Loss 8.3309 LearningRate 0.0693 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:12,496-Speed 5700.70 samples/sec Loss 8.1671 LearningRate 0.0692 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:14,304-Speed 5664.36 samples/sec Loss 8.2899 LearningRate 0.0692 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:16,099-Speed 5707.72 samples/sec Loss 8.1756 LearningRate 0.0692 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:17,892-Speed 5714.10 samples/sec Loss 8.3599 LearningRate 0.0692 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:19,681-Speed 5726.47 samples/sec Loss 8.3337 LearningRate 0.0692 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:21,475-Speed 5710.64 samples/sec Loss 8.3660 LearningRate 0.0692 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:23,306-Speed 5593.16 samples/sec Loss 8.2427 LearningRate 0.0692 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:25,121-Speed 5644.02 samples/sec Loss 8.4933 LearningRate 0.0691 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:26,915-Speed 5711.57 samples/sec Loss 8.3020 LearningRate 0.0691 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:28,718-Speed 5682.30 samples/sec Loss 8.5337 LearningRate 0.0691 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:30,518-Speed 5690.19 samples/sec Loss 8.1706 LearningRate 0.0691 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:32,299-Speed 5751.66 samples/sec Loss 8.1813 LearningRate 0.0691 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:34,133-Speed 5586.95 samples/sec Loss 8.4280 LearningRate 0.0691 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:35,931-Speed 5697.30 samples/sec Loss 8.4220 LearningRate 0.0691 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:39:37,752-Speed 5622.53 samples/sec Loss 8.4001 LearningRate 0.0690 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:39,566-Speed 5648.97 samples/sec Loss 8.3895 LearningRate 0.0690 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:41,356-Speed 5721.79 samples/sec Loss 8.3197 LearningRate 0.0690 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:43,170-Speed 5648.97 samples/sec Loss 8.3171 LearningRate 0.0690 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:44,956-Speed 5733.88 samples/sec Loss 8.2030 LearningRate 0.0690 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:46,760-Speed 5679.95 samples/sec Loss 8.1889 LearningRate 0.0690 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:48,604-Speed 5555.52 samples/sec Loss 8.4377 LearningRate 0.0690 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:50,464-Speed 5506.39 samples/sec Loss 8.2884 LearningRate 0.0689 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:52,277-Speed 5650.53 samples/sec Loss 8.2151 LearningRate 0.0689 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:54,083-Speed 5674.86 samples/sec Loss 8.2398 LearningRate 0.0689 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:55,886-Speed 5678.93 samples/sec Loss 8.2367 LearningRate 0.0689 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:57,751-Speed 5493.25 samples/sec Loss 8.2599 LearningRate 0.0689 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:39:59,611-Speed 5510.10 samples/sec Loss 8.4112 LearningRate 0.0689 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:01,415-Speed 5678.52 samples/sec Loss 8.4052 LearningRate 0.0688 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:03,248-Speed 5589.85 samples/sec Loss 8.3125 LearningRate 0.0688 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:05,102-Speed 5528.31 samples/sec Loss 8.2686 LearningRate 0.0688 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:06,954-Speed 5529.13 samples/sec Loss 8.4337 LearningRate 0.0688 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:08,776-Speed 5624.75 samples/sec Loss 8.3287 LearningRate 0.0688 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:10,580-Speed 5676.87 samples/sec Loss 8.2711 LearningRate 0.0688 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:12,376-Speed 5705.94 samples/sec Loss 8.3854 LearningRate 0.0688 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:14,173-Speed 5700.25 samples/sec Loss 8.4877 LearningRate 0.0687 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:15,970-Speed 5699.71 samples/sec Loss 8.2535 LearningRate 0.0687 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:17,773-Speed 5689.54 samples/sec Loss 8.3547 LearningRate 0.0687 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:19,575-Speed 5684.34 samples/sec Loss 8.1746 LearningRate 0.0687 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:21,400-Speed 5614.48 samples/sec Loss 8.2451 LearningRate 0.0687 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:23,196-Speed 5702.41 samples/sec Loss 8.2338 LearningRate 0.0687 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:25,007-Speed 5660.23 samples/sec Loss 8.1274 LearningRate 0.0687 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:26,812-Speed 5674.22 samples/sec Loss 8.3413 LearningRate 0.0686 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:28,613-Speed 5687.52 samples/sec Loss 8.4609 LearningRate 0.0686 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:30,405-Speed 5717.19 samples/sec Loss 8.2033 LearningRate 0.0686 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:32,236-Speed 5595.20 samples/sec Loss 8.3195 LearningRate 0.0686 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:40:34,040-Speed 5680.37 samples/sec Loss 8.1194 LearningRate 0.0686 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:35,846-Speed 5671.50 samples/sec Loss 8.2669 LearningRate 0.0686 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:37,679-Speed 5589.92 samples/sec Loss 8.2865 LearningRate 0.0686 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:39,494-Speed 5645.45 samples/sec Loss 8.2417 LearningRate 0.0685 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:41,297-Speed 5682.78 samples/sec Loss 8.2778 LearningRate 0.0685 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:43,121-Speed 5618.00 samples/sec Loss 8.2448 LearningRate 0.0685 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:44,929-Speed 5665.08 samples/sec Loss 8.3075 LearningRate 0.0685 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:46,740-Speed 5657.03 samples/sec Loss 8.4955 LearningRate 0.0685 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:48,539-Speed 5695.49 samples/sec Loss 8.2020 LearningRate 0.0685 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:50,356-Speed 5637.21 samples/sec Loss 8.2637 LearningRate 0.0685 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:52,158-Speed 5686.78 samples/sec Loss 8.4194 LearningRate 0.0684 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:40:53,950-Speed 5718.40 samples/sec Loss 8.3417 LearningRate 0.0684 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:55,797-Speed 5545.19 samples/sec Loss 8.4313 LearningRate 0.0684 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:57,616-Speed 5635.18 samples/sec Loss 8.1371 LearningRate 0.0684 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:40:59,445-Speed 5602.59 samples/sec Loss 8.3830 LearningRate 0.0684 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:01,283-Speed 5573.92 samples/sec Loss 8.2642 LearningRate 0.0684 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:03,082-Speed 5693.86 samples/sec Loss 8.1585 LearningRate 0.0684 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:04,918-Speed 5580.75 samples/sec Loss 8.3060 LearningRate 0.0683 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:06,711-Speed 5711.71 samples/sec Loss 8.1798 LearningRate 0.0683 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:08,528-Speed 5639.30 samples/sec Loss 8.3373 LearningRate 0.0683 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:10,336-Speed 5667.43 samples/sec Loss 8.2076 LearningRate 0.0683 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:12,143-Speed 5670.11 samples/sec Loss 8.2276 LearningRate 0.0683 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:13,982-Speed 5569.20 samples/sec Loss 8.2306 LearningRate 0.0683 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:15,806-Speed 5617.68 samples/sec Loss 8.1713 LearningRate 0.0683 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:17,616-Speed 5660.76 samples/sec Loss 8.4062 LearningRate 0.0682 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:19,461-Speed 5550.84 samples/sec Loss 8.2956 LearningRate 0.0682 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:21,266-Speed 5674.66 samples/sec Loss 8.3398 LearningRate 0.0682 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:23,090-Speed 5617.83 samples/sec Loss 8.4015 LearningRate 0.0682 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:24,880-Speed 5724.90 samples/sec Loss 8.2963 LearningRate 0.0682 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:26,701-Speed 5625.81 samples/sec Loss 8.1786 LearningRate 0.0682 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:28,516-Speed 5644.79 samples/sec Loss 8.4416 LearningRate 0.0682 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:30,324-Speed 5667.03 samples/sec Loss 8.1953 LearningRate 0.0681 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:32,144-Speed 5628.59 samples/sec Loss 8.2384 LearningRate 0.0681 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:34,017-Speed 5468.95 samples/sec Loss 8.2379 LearningRate 0.0681 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:35,812-Speed 5710.42 samples/sec Loss 8.2654 LearningRate 0.0681 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:37,638-Speed 5610.06 samples/sec Loss 8.3618 LearningRate 0.0681 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:39,443-Speed 5674.46 samples/sec Loss 8.2822 LearningRate 0.0681 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:41,251-Speed 5666.34 samples/sec Loss 8.0920 LearningRate 0.0680 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:41:43,096-Speed 5552.99 samples/sec Loss 8.2786 LearningRate 0.0680 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:44,930-Speed 5586.41 samples/sec Loss 8.1576 LearningRate 0.0680 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:46,773-Speed 5558.93 samples/sec Loss 8.0952 LearningRate 0.0680 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:48,611-Speed 5574.95 samples/sec Loss 8.2131 LearningRate 0.0680 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:50,449-Speed 5571.40 samples/sec Loss 8.1865 LearningRate 0.0680 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:52,264-Speed 5644.91 samples/sec Loss 8.2419 LearningRate 0.0680 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:54,070-Speed 5673.64 samples/sec Loss 8.2091 LearningRate 0.0679 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:55,861-Speed 5720.25 samples/sec Loss 8.2650 LearningRate 0.0679 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:41:57,655-Speed 5712.02 samples/sec Loss 8.3071 LearningRate 0.0679 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:42:24,579-[lfw][20000]XNorm: 23.215628 Training: 2022-04-27 02:42:24,580-[lfw][20000]Accuracy-Flip: 0.99500+-0.00387 Training: 2022-04-27 02:42:24,581-[lfw][20000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:42:55,768-[cfp_fp][20000]XNorm: 20.358291 Training: 2022-04-27 02:42:55,769-[cfp_fp][20000]Accuracy-Flip: 0.91500+-0.01674 Training: 2022-04-27 02:42:55,770-[cfp_fp][20000]Accuracy-Highest: 0.92343 Training: 2022-04-27 02:43:22,730-[agedb_30][20000]XNorm: 23.245018 Training: 2022-04-27 02:43:22,731-[agedb_30][20000]Accuracy-Flip: 0.96867+-0.01082 Training: 2022-04-27 02:43:22,732-[agedb_30][20000]Accuracy-Highest: 0.96867 Training: 2022-04-27 02:43:24,573-Speed 117.81 samples/sec Loss 8.2992 LearningRate 0.0679 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:26,368-Speed 5707.58 samples/sec Loss 8.1662 LearningRate 0.0679 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:28,174-Speed 5671.21 samples/sec Loss 8.1514 LearningRate 0.0679 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:29,997-Speed 5623.00 samples/sec Loss 8.1965 LearningRate 0.0679 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:31,836-Speed 5574.94 samples/sec Loss 8.3157 LearningRate 0.0678 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:33,640-Speed 5680.30 samples/sec Loss 8.2701 LearningRate 0.0678 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:35,448-Speed 5666.41 samples/sec Loss 8.2319 LearningRate 0.0678 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:37,247-Speed 5693.21 samples/sec Loss 8.0376 LearningRate 0.0678 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:39,029-Speed 5749.61 samples/sec Loss 8.1342 LearningRate 0.0678 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:40,861-Speed 5592.42 samples/sec Loss 8.1907 LearningRate 0.0678 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:42,677-Speed 5641.85 samples/sec Loss 8.1161 LearningRate 0.0678 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:44,489-Speed 5655.27 samples/sec Loss 8.2633 LearningRate 0.0677 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:46,303-Speed 5648.35 samples/sec Loss 8.3329 LearningRate 0.0677 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:48,111-Speed 5664.92 samples/sec Loss 8.1445 LearningRate 0.0677 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:43:49,910-Speed 5696.59 samples/sec Loss 8.1232 LearningRate 0.0677 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:51,760-Speed 5536.35 samples/sec Loss 8.2375 LearningRate 0.0677 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:53,579-Speed 5634.19 samples/sec Loss 8.2744 LearningRate 0.0677 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:55,416-Speed 5577.58 samples/sec Loss 8.2870 LearningRate 0.0677 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:57,220-Speed 5679.11 samples/sec Loss 8.3100 LearningRate 0.0676 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:43:59,039-Speed 5632.62 samples/sec Loss 8.3571 LearningRate 0.0676 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:00,837-Speed 5697.11 samples/sec Loss 8.1697 LearningRate 0.0676 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:02,658-Speed 5625.67 samples/sec Loss 8.2149 LearningRate 0.0676 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:04,493-Speed 5585.09 samples/sec Loss 8.2279 LearningRate 0.0676 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:06,309-Speed 5638.85 samples/sec Loss 8.3107 LearningRate 0.0676 Epoch: 3 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:08,135-Speed 5613.51 samples/sec Loss 8.2085 LearningRate 0.0676 Epoch: 3 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:09,973-Speed 5571.95 samples/sec Loss 8.1169 LearningRate 0.0675 Epoch: 3 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:11,784-Speed 5657.95 samples/sec Loss 7.9985 LearningRate 0.0675 Epoch: 3 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:13,590-Speed 5675.23 samples/sec Loss 8.2000 LearningRate 0.0675 Epoch: 3 Global Step: 20280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:15,400-Speed 5661.17 samples/sec Loss 8.1246 LearningRate 0.0675 Epoch: 3 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:17,206-Speed 5671.17 samples/sec Loss 8.1141 LearningRate 0.0675 Epoch: 3 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:19,016-Speed 5658.76 samples/sec Loss 8.1518 LearningRate 0.0675 Epoch: 3 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:20,865-Speed 5542.07 samples/sec Loss 8.0869 LearningRate 0.0675 Epoch: 3 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:22,687-Speed 5624.56 samples/sec Loss 8.0263 LearningRate 0.0674 Epoch: 3 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:24,505-Speed 5636.30 samples/sec Loss 8.1434 LearningRate 0.0674 Epoch: 3 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:26,315-Speed 5657.89 samples/sec Loss 8.2738 LearningRate 0.0674 Epoch: 3 Global Step: 20350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:44:28,102-Speed 5735.94 samples/sec Loss 8.1815 LearningRate 0.0674 Epoch: 3 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:29,910-Speed 5665.04 samples/sec Loss 8.0499 LearningRate 0.0674 Epoch: 3 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:31,717-Speed 5669.25 samples/sec Loss 8.0866 LearningRate 0.0674 Epoch: 3 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:33,510-Speed 5714.04 samples/sec Loss 8.1325 LearningRate 0.0674 Epoch: 3 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:35,303-Speed 5713.12 samples/sec Loss 8.0841 LearningRate 0.0673 Epoch: 3 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:37,102-Speed 5697.52 samples/sec Loss 8.2527 LearningRate 0.0673 Epoch: 3 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:38,899-Speed 5698.29 samples/sec Loss 7.9853 LearningRate 0.0673 Epoch: 3 Global Step: 20420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:40,709-Speed 5660.80 samples/sec Loss 8.3250 LearningRate 0.0673 Epoch: 3 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:42,505-Speed 5704.50 samples/sec Loss 8.2201 LearningRate 0.0673 Epoch: 3 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:44,303-Speed 5698.55 samples/sec Loss 8.3665 LearningRate 0.0673 Epoch: 3 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:46,087-Speed 5755.45 samples/sec Loss 8.1032 LearningRate 0.0673 Epoch: 3 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:47,906-Speed 5633.54 samples/sec Loss 8.1148 LearningRate 0.0672 Epoch: 3 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:49,719-Speed 5651.27 samples/sec Loss 8.1558 LearningRate 0.0672 Epoch: 3 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:44:51,550-Speed 5594.90 samples/sec Loss 8.1487 LearningRate 0.0672 Epoch: 3 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:44:53,365-Speed 5646.72 samples/sec Loss 7.9561 LearningRate 0.0672 Epoch: 3 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:44:55,167-Speed 5684.52 samples/sec Loss 8.2604 LearningRate 0.0672 Epoch: 3 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:44:56,997-Speed 5600.18 samples/sec Loss 8.1160 LearningRate 0.0672 Epoch: 3 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:44:58,807-Speed 5658.09 samples/sec Loss 8.1598 LearningRate 0.0672 Epoch: 3 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:00,618-Speed 5659.00 samples/sec Loss 8.1716 LearningRate 0.0671 Epoch: 3 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:02,440-Speed 5622.25 samples/sec Loss 8.1622 LearningRate 0.0671 Epoch: 3 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:04,245-Speed 5676.33 samples/sec Loss 8.2538 LearningRate 0.0671 Epoch: 3 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:06,045-Speed 5689.73 samples/sec Loss 8.0664 LearningRate 0.0671 Epoch: 3 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:07,861-Speed 5641.17 samples/sec Loss 8.2742 LearningRate 0.0671 Epoch: 3 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:09,694-Speed 5590.30 samples/sec Loss 8.0579 LearningRate 0.0671 Epoch: 3 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:11,507-Speed 5651.84 samples/sec Loss 8.2000 LearningRate 0.0671 Epoch: 3 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:13,293-Speed 5740.20 samples/sec Loss 8.2802 LearningRate 0.0670 Epoch: 3 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:15,088-Speed 5707.89 samples/sec Loss 8.2470 LearningRate 0.0670 Epoch: 3 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:16,882-Speed 5716.56 samples/sec Loss 8.2000 LearningRate 0.0670 Epoch: 3 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:18,700-Speed 5634.30 samples/sec Loss 8.3014 LearningRate 0.0670 Epoch: 3 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:20,515-Speed 5643.77 samples/sec Loss 8.3213 LearningRate 0.0670 Epoch: 3 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:22,328-Speed 5653.58 samples/sec Loss 8.3755 LearningRate 0.0670 Epoch: 3 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:24,129-Speed 5687.80 samples/sec Loss 8.2022 LearningRate 0.0670 Epoch: 3 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:25,928-Speed 5694.21 samples/sec Loss 8.3416 LearningRate 0.0669 Epoch: 3 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:27,752-Speed 5617.76 samples/sec Loss 8.2834 LearningRate 0.0669 Epoch: 3 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:29,557-Speed 5677.05 samples/sec Loss 8.1810 LearningRate 0.0669 Epoch: 3 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:31,374-Speed 5637.69 samples/sec Loss 8.1870 LearningRate 0.0669 Epoch: 3 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:33,183-Speed 5663.14 samples/sec Loss 8.1230 LearningRate 0.0669 Epoch: 3 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:45:34,995-Speed 5653.93 samples/sec Loss 8.0577 LearningRate 0.0669 Epoch: 3 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:36,799-Speed 5676.86 samples/sec Loss 8.2390 LearningRate 0.0669 Epoch: 3 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:38,609-Speed 5659.12 samples/sec Loss 8.2185 LearningRate 0.0668 Epoch: 3 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:40,411-Speed 5684.99 samples/sec Loss 8.2117 LearningRate 0.0668 Epoch: 3 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:42,236-Speed 5613.08 samples/sec Loss 8.0580 LearningRate 0.0668 Epoch: 3 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:44,050-Speed 5649.17 samples/sec Loss 8.1904 LearningRate 0.0668 Epoch: 3 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:45,865-Speed 5644.00 samples/sec Loss 8.1065 LearningRate 0.0668 Epoch: 3 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:47,670-Speed 5675.63 samples/sec Loss 8.0697 LearningRate 0.0668 Epoch: 3 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:49,480-Speed 5661.36 samples/sec Loss 8.2632 LearningRate 0.0667 Epoch: 3 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:51,293-Speed 5647.73 samples/sec Loss 8.1320 LearningRate 0.0667 Epoch: 3 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:53,087-Speed 5716.25 samples/sec Loss 8.2437 LearningRate 0.0667 Epoch: 3 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:54,925-Speed 5573.57 samples/sec Loss 8.2515 LearningRate 0.0667 Epoch: 3 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:56,729-Speed 5678.06 samples/sec Loss 8.1505 LearningRate 0.0667 Epoch: 3 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:45:58,525-Speed 5703.59 samples/sec Loss 8.1271 LearningRate 0.0667 Epoch: 3 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:00,337-Speed 5656.75 samples/sec Loss 8.1901 LearningRate 0.0667 Epoch: 3 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:02,158-Speed 5627.40 samples/sec Loss 8.0976 LearningRate 0.0666 Epoch: 3 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:03,986-Speed 5605.78 samples/sec Loss 8.2696 LearningRate 0.0666 Epoch: 3 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:05,798-Speed 5653.28 samples/sec Loss 8.1088 LearningRate 0.0666 Epoch: 3 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:07,620-Speed 5622.69 samples/sec Loss 8.0000 LearningRate 0.0666 Epoch: 3 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:09,436-Speed 5643.32 samples/sec Loss 8.2603 LearningRate 0.0666 Epoch: 3 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:11,304-Speed 5484.14 samples/sec Loss 8.2375 LearningRate 0.0666 Epoch: 3 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:13,112-Speed 5668.22 samples/sec Loss 8.0563 LearningRate 0.0666 Epoch: 3 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:14,933-Speed 5625.39 samples/sec Loss 8.1153 LearningRate 0.0665 Epoch: 3 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:16,740-Speed 5670.86 samples/sec Loss 8.0762 LearningRate 0.0665 Epoch: 3 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:18,551-Speed 5655.30 samples/sec Loss 8.0670 LearningRate 0.0665 Epoch: 3 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:20,352-Speed 5689.06 samples/sec Loss 8.0007 LearningRate 0.0665 Epoch: 3 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:22,153-Speed 5689.11 samples/sec Loss 8.1004 LearningRate 0.0665 Epoch: 3 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:23,999-Speed 5550.35 samples/sec Loss 8.2068 LearningRate 0.0665 Epoch: 3 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:25,792-Speed 5712.94 samples/sec Loss 8.0037 LearningRate 0.0665 Epoch: 3 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:27,615-Speed 5619.62 samples/sec Loss 7.9615 LearningRate 0.0664 Epoch: 3 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:29,427-Speed 5657.19 samples/sec Loss 7.9987 LearningRate 0.0664 Epoch: 3 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:31,226-Speed 5694.69 samples/sec Loss 8.1091 LearningRate 0.0664 Epoch: 3 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:33,042-Speed 5640.67 samples/sec Loss 8.0157 LearningRate 0.0664 Epoch: 3 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:34,858-Speed 5640.43 samples/sec Loss 8.0764 LearningRate 0.0664 Epoch: 3 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:36,666-Speed 5667.02 samples/sec Loss 8.1555 LearningRate 0.0664 Epoch: 3 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:38,477-Speed 5657.76 samples/sec Loss 8.0777 LearningRate 0.0664 Epoch: 3 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:40,300-Speed 5620.33 samples/sec Loss 8.0800 LearningRate 0.0663 Epoch: 3 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:42,113-Speed 5649.63 samples/sec Loss 8.1122 LearningRate 0.0663 Epoch: 3 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:46:43,895-Speed 5748.85 samples/sec Loss 8.2092 LearningRate 0.0663 Epoch: 3 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:45,736-Speed 5566.15 samples/sec Loss 8.1133 LearningRate 0.0663 Epoch: 3 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:47,542-Speed 5671.46 samples/sec Loss 8.3125 LearningRate 0.0663 Epoch: 3 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:49,341-Speed 5696.43 samples/sec Loss 8.2305 LearningRate 0.0663 Epoch: 3 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:51,154-Speed 5649.29 samples/sec Loss 8.0413 LearningRate 0.0663 Epoch: 3 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:52,960-Speed 5672.43 samples/sec Loss 8.2204 LearningRate 0.0662 Epoch: 3 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:54,781-Speed 5626.70 samples/sec Loss 8.0946 LearningRate 0.0662 Epoch: 3 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:56,592-Speed 5658.14 samples/sec Loss 8.1181 LearningRate 0.0662 Epoch: 3 Global Step: 21180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:46:58,398-Speed 5672.18 samples/sec Loss 7.9500 LearningRate 0.0662 Epoch: 3 Global Step: 21190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:00,215-Speed 5641.03 samples/sec Loss 8.0347 LearningRate 0.0662 Epoch: 3 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:02,026-Speed 5658.09 samples/sec Loss 8.1101 LearningRate 0.0662 Epoch: 3 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:03,859-Speed 5585.96 samples/sec Loss 8.0024 LearningRate 0.0662 Epoch: 3 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:05,665-Speed 5674.23 samples/sec Loss 8.0278 LearningRate 0.0661 Epoch: 3 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:07,459-Speed 5711.71 samples/sec Loss 8.1477 LearningRate 0.0661 Epoch: 3 Global Step: 21240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:09,255-Speed 5702.71 samples/sec Loss 8.0722 LearningRate 0.0661 Epoch: 3 Global Step: 21250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:11,069-Speed 5646.66 samples/sec Loss 8.1638 LearningRate 0.0661 Epoch: 3 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:12,865-Speed 5707.78 samples/sec Loss 8.1328 LearningRate 0.0661 Epoch: 3 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:14,672-Speed 5670.13 samples/sec Loss 7.9412 LearningRate 0.0661 Epoch: 3 Global Step: 21280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:16,497-Speed 5612.60 samples/sec Loss 8.1682 LearningRate 0.0661 Epoch: 3 Global Step: 21290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:18,307-Speed 5658.50 samples/sec Loss 8.3039 LearningRate 0.0660 Epoch: 3 Global Step: 21300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:20,105-Speed 5698.78 samples/sec Loss 7.9448 LearningRate 0.0660 Epoch: 3 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:21,923-Speed 5635.25 samples/sec Loss 8.1144 LearningRate 0.0660 Epoch: 3 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:23,740-Speed 5637.72 samples/sec Loss 8.1045 LearningRate 0.0660 Epoch: 3 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:47:25,524-Speed 5743.99 samples/sec Loss 7.9938 LearningRate 0.0660 Epoch: 3 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:27,323-Speed 5696.87 samples/sec Loss 8.1015 LearningRate 0.0660 Epoch: 3 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:29,154-Speed 5595.70 samples/sec Loss 8.0468 LearningRate 0.0660 Epoch: 3 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:30,956-Speed 5685.43 samples/sec Loss 7.8915 LearningRate 0.0659 Epoch: 3 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:32,745-Speed 5724.79 samples/sec Loss 8.0263 LearningRate 0.0659 Epoch: 3 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:34,560-Speed 5643.15 samples/sec Loss 8.2251 LearningRate 0.0659 Epoch: 3 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:36,403-Speed 5558.56 samples/sec Loss 8.2240 LearningRate 0.0659 Epoch: 3 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:38,228-Speed 5613.72 samples/sec Loss 8.0505 LearningRate 0.0659 Epoch: 3 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:40,039-Speed 5655.67 samples/sec Loss 8.1125 LearningRate 0.0659 Epoch: 3 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:41,866-Speed 5607.08 samples/sec Loss 8.0706 LearningRate 0.0659 Epoch: 3 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:43,659-Speed 5714.74 samples/sec Loss 8.1256 LearningRate 0.0658 Epoch: 3 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:45,465-Speed 5671.77 samples/sec Loss 8.1125 LearningRate 0.0658 Epoch: 3 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:47,287-Speed 5624.07 samples/sec Loss 8.1584 LearningRate 0.0658 Epoch: 3 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:49,134-Speed 5546.00 samples/sec Loss 8.1597 LearningRate 0.0658 Epoch: 3 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:50,964-Speed 5596.98 samples/sec Loss 8.1169 LearningRate 0.0658 Epoch: 3 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:52,810-Speed 5552.18 samples/sec Loss 8.1378 LearningRate 0.0658 Epoch: 3 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:54,621-Speed 5655.68 samples/sec Loss 8.1384 LearningRate 0.0658 Epoch: 3 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:56,450-Speed 5599.55 samples/sec Loss 8.0643 LearningRate 0.0657 Epoch: 3 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:47:58,291-Speed 5566.14 samples/sec Loss 8.0340 LearningRate 0.0657 Epoch: 3 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:48:00,108-Speed 5636.08 samples/sec Loss 8.2064 LearningRate 0.0657 Epoch: 3 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:48:01,914-Speed 5674.50 samples/sec Loss 8.0848 LearningRate 0.0657 Epoch: 3 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:03,719-Speed 5673.19 samples/sec Loss 8.1031 LearningRate 0.0657 Epoch: 3 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:05,535-Speed 5643.40 samples/sec Loss 8.1332 LearningRate 0.0657 Epoch: 3 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:07,343-Speed 5666.19 samples/sec Loss 7.9857 LearningRate 0.0657 Epoch: 3 Global Step: 21570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:09,158-Speed 5642.08 samples/sec Loss 8.0518 LearningRate 0.0656 Epoch: 3 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:10,970-Speed 5652.31 samples/sec Loss 8.1051 LearningRate 0.0656 Epoch: 3 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:12,782-Speed 5653.86 samples/sec Loss 8.0938 LearningRate 0.0656 Epoch: 3 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:14,593-Speed 5656.26 samples/sec Loss 7.9399 LearningRate 0.0656 Epoch: 3 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:16,425-Speed 5592.83 samples/sec Loss 8.0820 LearningRate 0.0656 Epoch: 3 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:18,260-Speed 5580.49 samples/sec Loss 8.0086 LearningRate 0.0656 Epoch: 3 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:20,082-Speed 5623.15 samples/sec Loss 7.9960 LearningRate 0.0656 Epoch: 3 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:21,893-Speed 5658.08 samples/sec Loss 8.0889 LearningRate 0.0655 Epoch: 3 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:23,739-Speed 5549.54 samples/sec Loss 7.9211 LearningRate 0.0655 Epoch: 3 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:25,555-Speed 5643.29 samples/sec Loss 7.8642 LearningRate 0.0655 Epoch: 3 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:27,416-Speed 5504.78 samples/sec Loss 7.9165 LearningRate 0.0655 Epoch: 3 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:29,239-Speed 5617.43 samples/sec Loss 8.1743 LearningRate 0.0655 Epoch: 3 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:31,103-Speed 5498.64 samples/sec Loss 7.9192 LearningRate 0.0655 Epoch: 3 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:32,965-Speed 5501.31 samples/sec Loss 8.1476 LearningRate 0.0655 Epoch: 3 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:34,787-Speed 5624.37 samples/sec Loss 7.9673 LearningRate 0.0654 Epoch: 3 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:36,602-Speed 5643.33 samples/sec Loss 8.0389 LearningRate 0.0654 Epoch: 3 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:38,402-Speed 5694.71 samples/sec Loss 7.9996 LearningRate 0.0654 Epoch: 3 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:40,204-Speed 5688.01 samples/sec Loss 7.9005 LearningRate 0.0654 Epoch: 3 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:42,037-Speed 5588.16 samples/sec Loss 8.0956 LearningRate 0.0654 Epoch: 3 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:43,876-Speed 5571.27 samples/sec Loss 8.2515 LearningRate 0.0654 Epoch: 3 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:46,034-Speed 4747.55 samples/sec Loss 8.0400 LearningRate 0.0654 Epoch: 3 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:47,861-Speed 5608.54 samples/sec Loss 8.1060 LearningRate 0.0653 Epoch: 3 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:49,710-Speed 5540.06 samples/sec Loss 8.0499 LearningRate 0.0653 Epoch: 3 Global Step: 21800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:51,566-Speed 5521.97 samples/sec Loss 8.1543 LearningRate 0.0653 Epoch: 3 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:53,382-Speed 5642.44 samples/sec Loss 7.9753 LearningRate 0.0653 Epoch: 3 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:55,216-Speed 5584.58 samples/sec Loss 8.0087 LearningRate 0.0653 Epoch: 3 Global Step: 21830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:48:57,022-Speed 5672.86 samples/sec Loss 7.9673 LearningRate 0.0653 Epoch: 3 Global Step: 21840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:48:58,853-Speed 5594.66 samples/sec Loss 8.0376 LearningRate 0.0653 Epoch: 3 Global Step: 21850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:00,695-Speed 5562.44 samples/sec Loss 7.9752 LearningRate 0.0652 Epoch: 3 Global Step: 21860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:02,527-Speed 5592.31 samples/sec Loss 7.9977 LearningRate 0.0652 Epoch: 3 Global Step: 21870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:04,326-Speed 5695.42 samples/sec Loss 8.0035 LearningRate 0.0652 Epoch: 3 Global Step: 21880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:06,162-Speed 5580.46 samples/sec Loss 7.9215 LearningRate 0.0652 Epoch: 3 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:07,984-Speed 5623.04 samples/sec Loss 7.9236 LearningRate 0.0652 Epoch: 3 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:09,823-Speed 5569.40 samples/sec Loss 7.9155 LearningRate 0.0652 Epoch: 3 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:11,644-Speed 5624.92 samples/sec Loss 8.0729 LearningRate 0.0652 Epoch: 3 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:13,524-Speed 5452.15 samples/sec Loss 7.9754 LearningRate 0.0652 Epoch: 3 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:15,425-Speed 5387.60 samples/sec Loss 8.0360 LearningRate 0.0651 Epoch: 3 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:17,273-Speed 5545.09 samples/sec Loss 7.9624 LearningRate 0.0651 Epoch: 3 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:19,145-Speed 5473.66 samples/sec Loss 7.8507 LearningRate 0.0651 Epoch: 3 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:20,970-Speed 5611.59 samples/sec Loss 7.9425 LearningRate 0.0651 Epoch: 3 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:22,839-Speed 5482.92 samples/sec Loss 8.0383 LearningRate 0.0651 Epoch: 3 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:24,725-Speed 5429.59 samples/sec Loss 8.1059 LearningRate 0.0651 Epoch: 3 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:26,559-Speed 5587.08 samples/sec Loss 8.1370 LearningRate 0.0651 Epoch: 3 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:49:53,492-[lfw][22000]XNorm: 21.071302 Training: 2022-04-27 02:49:53,492-[lfw][22000]Accuracy-Flip: 0.99583+-0.00318 Training: 2022-04-27 02:49:53,493-[lfw][22000]Accuracy-Highest: 0.99617 Training: 2022-04-27 02:50:24,580-[cfp_fp][22000]XNorm: 18.253711 Training: 2022-04-27 02:50:24,581-[cfp_fp][22000]Accuracy-Flip: 0.93171+-0.01137 Training: 2022-04-27 02:50:24,582-[cfp_fp][22000]Accuracy-Highest: 0.93171 Training: 2022-04-27 02:50:51,403-[agedb_30][22000]XNorm: 20.980046 Training: 2022-04-27 02:50:51,404-[agedb_30][22000]Accuracy-Flip: 0.96750+-0.00672 Training: 2022-04-27 02:50:51,404-[agedb_30][22000]Accuracy-Highest: 0.96867 Training: 2022-04-27 02:50:53,209-Speed 118.18 samples/sec Loss 7.9351 LearningRate 0.0650 Epoch: 3 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:50:55,012-Speed 5684.63 samples/sec Loss 8.0260 LearningRate 0.0650 Epoch: 3 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:50:56,850-Speed 5573.02 samples/sec Loss 7.9518 LearningRate 0.0650 Epoch: 3 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:50:58,665-Speed 5643.93 samples/sec Loss 8.0046 LearningRate 0.0650 Epoch: 3 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:00,464-Speed 5700.02 samples/sec Loss 7.8917 LearningRate 0.0650 Epoch: 3 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:02,326-Speed 5504.12 samples/sec Loss 8.1336 LearningRate 0.0650 Epoch: 3 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:04,141-Speed 5641.70 samples/sec Loss 7.9443 LearningRate 0.0650 Epoch: 3 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:05,945-Speed 5678.50 samples/sec Loss 7.8540 LearningRate 0.0649 Epoch: 3 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:07,752-Speed 5670.41 samples/sec Loss 8.0192 LearningRate 0.0649 Epoch: 3 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:09,555-Speed 5679.77 samples/sec Loss 7.8545 LearningRate 0.0649 Epoch: 3 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:11,341-Speed 5737.14 samples/sec Loss 8.0439 LearningRate 0.0649 Epoch: 3 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:13,130-Speed 5725.65 samples/sec Loss 7.9050 LearningRate 0.0649 Epoch: 3 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:14,930-Speed 5695.03 samples/sec Loss 8.0292 LearningRate 0.0649 Epoch: 3 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:16,727-Speed 5698.33 samples/sec Loss 8.0381 LearningRate 0.0649 Epoch: 3 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:18,507-Speed 5755.83 samples/sec Loss 7.9307 LearningRate 0.0648 Epoch: 3 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:20,306-Speed 5695.98 samples/sec Loss 7.8725 LearningRate 0.0648 Epoch: 3 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:22,094-Speed 5727.52 samples/sec Loss 7.8733 LearningRate 0.0648 Epoch: 3 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:23,888-Speed 5712.51 samples/sec Loss 8.1184 LearningRate 0.0648 Epoch: 3 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:25,685-Speed 5699.83 samples/sec Loss 8.0337 LearningRate 0.0648 Epoch: 3 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:27,488-Speed 5682.69 samples/sec Loss 7.9390 LearningRate 0.0648 Epoch: 3 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:29,281-Speed 5711.66 samples/sec Loss 7.9424 LearningRate 0.0648 Epoch: 3 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:31,109-Speed 5607.52 samples/sec Loss 7.9794 LearningRate 0.0647 Epoch: 3 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:32,916-Speed 5669.76 samples/sec Loss 7.8850 LearningRate 0.0647 Epoch: 3 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:34,731-Speed 5643.53 samples/sec Loss 7.8545 LearningRate 0.0647 Epoch: 3 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:36,572-Speed 5565.08 samples/sec Loss 7.9874 LearningRate 0.0647 Epoch: 3 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:38,406-Speed 5583.75 samples/sec Loss 8.0187 LearningRate 0.0647 Epoch: 3 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:40,215-Speed 5664.34 samples/sec Loss 7.9523 LearningRate 0.0647 Epoch: 3 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:42,024-Speed 5660.71 samples/sec Loss 7.8800 LearningRate 0.0647 Epoch: 3 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:51:43,818-Speed 5714.26 samples/sec Loss 7.9657 LearningRate 0.0646 Epoch: 3 Global Step: 22290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:45,627-Speed 5662.01 samples/sec Loss 8.0390 LearningRate 0.0646 Epoch: 3 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:47,436-Speed 5661.13 samples/sec Loss 7.8227 LearningRate 0.0646 Epoch: 3 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:49,254-Speed 5637.93 samples/sec Loss 7.8656 LearningRate 0.0646 Epoch: 3 Global Step: 22320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:51,070-Speed 5639.22 samples/sec Loss 7.9969 LearningRate 0.0646 Epoch: 3 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:52,889-Speed 5632.06 samples/sec Loss 8.0989 LearningRate 0.0646 Epoch: 3 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:54,706-Speed 5639.98 samples/sec Loss 8.1021 LearningRate 0.0646 Epoch: 3 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:56,512-Speed 5670.83 samples/sec Loss 8.0888 LearningRate 0.0645 Epoch: 3 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:51:58,371-Speed 5511.77 samples/sec Loss 8.0878 LearningRate 0.0645 Epoch: 3 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:00,185-Speed 5645.65 samples/sec Loss 7.8903 LearningRate 0.0645 Epoch: 3 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:01,999-Speed 5647.26 samples/sec Loss 7.9637 LearningRate 0.0645 Epoch: 3 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:03,804-Speed 5675.12 samples/sec Loss 7.9979 LearningRate 0.0645 Epoch: 3 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:05,597-Speed 5714.89 samples/sec Loss 7.9487 LearningRate 0.0645 Epoch: 3 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:07,418-Speed 5625.10 samples/sec Loss 8.0887 LearningRate 0.0645 Epoch: 3 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:09,207-Speed 5725.09 samples/sec Loss 7.9272 LearningRate 0.0644 Epoch: 3 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:11,026-Speed 5631.69 samples/sec Loss 7.8922 LearningRate 0.0644 Epoch: 3 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:12,855-Speed 5603.53 samples/sec Loss 8.0858 LearningRate 0.0644 Epoch: 3 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:14,676-Speed 5623.64 samples/sec Loss 8.0021 LearningRate 0.0644 Epoch: 3 Global Step: 22460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:16,485-Speed 5663.15 samples/sec Loss 7.9994 LearningRate 0.0644 Epoch: 3 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:18,284-Speed 5693.25 samples/sec Loss 7.9966 LearningRate 0.0644 Epoch: 3 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:20,089-Speed 5676.60 samples/sec Loss 7.9319 LearningRate 0.0644 Epoch: 3 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:21,910-Speed 5625.10 samples/sec Loss 7.8838 LearningRate 0.0643 Epoch: 3 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:23,709-Speed 5699.81 samples/sec Loss 8.0393 LearningRate 0.0643 Epoch: 3 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:25,522-Speed 5650.08 samples/sec Loss 7.9606 LearningRate 0.0643 Epoch: 3 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:27,346-Speed 5613.81 samples/sec Loss 7.8463 LearningRate 0.0643 Epoch: 3 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:29,168-Speed 5624.48 samples/sec Loss 7.9099 LearningRate 0.0643 Epoch: 3 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:30,975-Speed 5671.43 samples/sec Loss 7.9384 LearningRate 0.0643 Epoch: 3 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:32,788-Speed 5649.03 samples/sec Loss 7.7676 LearningRate 0.0643 Epoch: 3 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:34,593-Speed 5677.72 samples/sec Loss 7.8974 LearningRate 0.0642 Epoch: 3 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:52:36,420-Speed 5605.91 samples/sec Loss 7.9799 LearningRate 0.0642 Epoch: 3 Global Step: 22580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:38,226-Speed 5672.74 samples/sec Loss 7.8224 LearningRate 0.0642 Epoch: 3 Global Step: 22590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:40,028-Speed 5684.40 samples/sec Loss 7.9801 LearningRate 0.0642 Epoch: 3 Global Step: 22600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:41,839-Speed 5656.22 samples/sec Loss 7.9200 LearningRate 0.0642 Epoch: 3 Global Step: 22610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:43,646-Speed 5671.28 samples/sec Loss 8.0383 LearningRate 0.0642 Epoch: 3 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:45,463-Speed 5636.50 samples/sec Loss 8.0295 LearningRate 0.0642 Epoch: 3 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:47,262-Speed 5694.33 samples/sec Loss 7.8503 LearningRate 0.0641 Epoch: 3 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:49,085-Speed 5619.15 samples/sec Loss 7.8247 LearningRate 0.0641 Epoch: 3 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:50,889-Speed 5679.20 samples/sec Loss 7.9700 LearningRate 0.0641 Epoch: 3 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:52,703-Speed 5646.26 samples/sec Loss 7.9294 LearningRate 0.0641 Epoch: 3 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:54,550-Speed 5547.74 samples/sec Loss 8.0789 LearningRate 0.0641 Epoch: 3 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:56,365-Speed 5645.32 samples/sec Loss 8.0149 LearningRate 0.0641 Epoch: 3 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:58,180-Speed 5646.12 samples/sec Loss 7.9287 LearningRate 0.0641 Epoch: 3 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:52:59,995-Speed 5644.42 samples/sec Loss 7.9199 LearningRate 0.0640 Epoch: 3 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:01,813-Speed 5633.63 samples/sec Loss 7.9284 LearningRate 0.0640 Epoch: 3 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:03,653-Speed 5570.35 samples/sec Loss 8.0621 LearningRate 0.0640 Epoch: 3 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:05,514-Speed 5503.26 samples/sec Loss 7.9205 LearningRate 0.0640 Epoch: 3 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:19,352-Speed 740.05 samples/sec Loss 7.5249 LearningRate 0.0640 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:21,381-Speed 5051.12 samples/sec Loss 7.3209 LearningRate 0.0640 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:23,200-Speed 5631.33 samples/sec Loss 7.3136 LearningRate 0.0640 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:25,032-Speed 5595.83 samples/sec Loss 7.2668 LearningRate 0.0639 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:26,878-Speed 5550.06 samples/sec Loss 7.1854 LearningRate 0.0639 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:28,709-Speed 5592.27 samples/sec Loss 7.1962 LearningRate 0.0639 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:30,550-Speed 5568.78 samples/sec Loss 7.2275 LearningRate 0.0639 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:32,387-Speed 5577.11 samples/sec Loss 7.2536 LearningRate 0.0639 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:34,246-Speed 5510.70 samples/sec Loss 7.1927 LearningRate 0.0639 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:36,061-Speed 5645.20 samples/sec Loss 7.3272 LearningRate 0.0639 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:37,887-Speed 5612.83 samples/sec Loss 7.4441 LearningRate 0.0639 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:39,698-Speed 5658.24 samples/sec Loss 7.3787 LearningRate 0.0638 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:41,528-Speed 5597.56 samples/sec Loss 7.4082 LearningRate 0.0638 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:43,358-Speed 5597.80 samples/sec Loss 7.2974 LearningRate 0.0638 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:45,197-Speed 5572.23 samples/sec Loss 7.5060 LearningRate 0.0638 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:47,062-Speed 5493.23 samples/sec Loss 7.6514 LearningRate 0.0638 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:48,940-Speed 5457.35 samples/sec Loss 7.3299 LearningRate 0.0638 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:50,780-Speed 5568.54 samples/sec Loss 7.4678 LearningRate 0.0638 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:52,623-Speed 5557.85 samples/sec Loss 7.3510 LearningRate 0.0637 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:54,521-Speed 5398.01 samples/sec Loss 7.4806 LearningRate 0.0637 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:53:56,354-Speed 5590.86 samples/sec Loss 7.5025 LearningRate 0.0637 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:53:58,189-Speed 5581.60 samples/sec Loss 7.2343 LearningRate 0.0637 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:00,009-Speed 5629.65 samples/sec Loss 7.5409 LearningRate 0.0637 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:01,870-Speed 5505.15 samples/sec Loss 7.5068 LearningRate 0.0637 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:03,717-Speed 5546.64 samples/sec Loss 7.5921 LearningRate 0.0637 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:05,539-Speed 5620.61 samples/sec Loss 7.4046 LearningRate 0.0636 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:07,388-Speed 5541.63 samples/sec Loss 7.4683 LearningRate 0.0636 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:09,264-Speed 5461.71 samples/sec Loss 7.4562 LearningRate 0.0636 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:11,077-Speed 5649.88 samples/sec Loss 7.4856 LearningRate 0.0636 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:12,889-Speed 5653.27 samples/sec Loss 7.3914 LearningRate 0.0636 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:14,724-Speed 5581.49 samples/sec Loss 7.5212 LearningRate 0.0636 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:16,551-Speed 5610.68 samples/sec Loss 7.4898 LearningRate 0.0636 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:18,391-Speed 5566.80 samples/sec Loss 7.6000 LearningRate 0.0635 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:20,226-Speed 5582.78 samples/sec Loss 7.5457 LearningRate 0.0635 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:22,061-Speed 5584.12 samples/sec Loss 7.6638 LearningRate 0.0635 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:23,870-Speed 5662.87 samples/sec Loss 7.4534 LearningRate 0.0635 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:25,672-Speed 5685.69 samples/sec Loss 7.5348 LearningRate 0.0635 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:27,497-Speed 5614.78 samples/sec Loss 7.3965 LearningRate 0.0635 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:29,323-Speed 5609.10 samples/sec Loss 7.5741 LearningRate 0.0635 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:31,159-Speed 5581.35 samples/sec Loss 7.5774 LearningRate 0.0634 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:33,000-Speed 5564.12 samples/sec Loss 7.5949 LearningRate 0.0634 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:54:34,801-Speed 5688.04 samples/sec Loss 7.7766 LearningRate 0.0634 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:36,622-Speed 5625.83 samples/sec Loss 7.7392 LearningRate 0.0634 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:38,453-Speed 5594.66 samples/sec Loss 7.5544 LearningRate 0.0634 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:40,310-Speed 5517.79 samples/sec Loss 7.6138 LearningRate 0.0634 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:42,148-Speed 5572.34 samples/sec Loss 7.5442 LearningRate 0.0634 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:43,953-Speed 5678.53 samples/sec Loss 7.6814 LearningRate 0.0633 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:45,781-Speed 5603.12 samples/sec Loss 7.6475 LearningRate 0.0633 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:47,611-Speed 5596.93 samples/sec Loss 7.7440 LearningRate 0.0633 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:49,425-Speed 5648.03 samples/sec Loss 7.6875 LearningRate 0.0633 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:51,255-Speed 5600.12 samples/sec Loss 7.6415 LearningRate 0.0633 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:53,078-Speed 5619.52 samples/sec Loss 7.5112 LearningRate 0.0633 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:54,930-Speed 5530.64 samples/sec Loss 7.8367 LearningRate 0.0633 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:56,777-Speed 5548.04 samples/sec Loss 7.7091 LearningRate 0.0632 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:54:58,636-Speed 5509.47 samples/sec Loss 7.5885 LearningRate 0.0632 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:00,433-Speed 5700.52 samples/sec Loss 7.7671 LearningRate 0.0632 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:02,258-Speed 5614.05 samples/sec Loss 7.4829 LearningRate 0.0632 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:04,717-Speed 4166.22 samples/sec Loss 7.6115 LearningRate 0.0632 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:07,765-Speed 3360.65 samples/sec Loss 7.5067 LearningRate 0.0632 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:09,579-Speed 5646.72 samples/sec Loss 7.7567 LearningRate 0.0632 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:11,404-Speed 5614.96 samples/sec Loss 7.5218 LearningRate 0.0632 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:13,230-Speed 5609.96 samples/sec Loss 7.6623 LearningRate 0.0631 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:15,076-Speed 5548.46 samples/sec Loss 7.7503 LearningRate 0.0631 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:16,898-Speed 5624.10 samples/sec Loss 7.7925 LearningRate 0.0631 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:18,735-Speed 5574.20 samples/sec Loss 7.6312 LearningRate 0.0631 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:20,548-Speed 5651.56 samples/sec Loss 7.6544 LearningRate 0.0631 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:22,372-Speed 5616.34 samples/sec Loss 7.7428 LearningRate 0.0631 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:24,232-Speed 5507.60 samples/sec Loss 7.7061 LearningRate 0.0631 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:26,058-Speed 5608.92 samples/sec Loss 7.8538 LearningRate 0.0630 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:27,877-Speed 5633.20 samples/sec Loss 7.6620 LearningRate 0.0630 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:29,682-Speed 5673.46 samples/sec Loss 7.7861 LearningRate 0.0630 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:31,490-Speed 5666.69 samples/sec Loss 7.6819 LearningRate 0.0630 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 02:55:33,283-Speed 5712.80 samples/sec Loss 7.7271 LearningRate 0.0630 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:35,097-Speed 5645.63 samples/sec Loss 7.6772 LearningRate 0.0630 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:36,885-Speed 5731.24 samples/sec Loss 7.5202 LearningRate 0.0630 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:38,693-Speed 5664.58 samples/sec Loss 7.6712 LearningRate 0.0629 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:40,511-Speed 5635.42 samples/sec Loss 7.7445 LearningRate 0.0629 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:42,318-Speed 5667.46 samples/sec Loss 7.7636 LearningRate 0.0629 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:44,162-Speed 5555.15 samples/sec Loss 7.7690 LearningRate 0.0629 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:45,958-Speed 5704.47 samples/sec Loss 7.6437 LearningRate 0.0629 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:47,765-Speed 5668.27 samples/sec Loss 7.6694 LearningRate 0.0629 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:49,572-Speed 5671.34 samples/sec Loss 7.6387 LearningRate 0.0629 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:51,368-Speed 5702.01 samples/sec Loss 7.7664 LearningRate 0.0628 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:53,183-Speed 5646.28 samples/sec Loss 7.7092 LearningRate 0.0628 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:55:55,049-Speed 5489.57 samples/sec Loss 7.7161 LearningRate 0.0628 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:56,909-Speed 5505.68 samples/sec Loss 7.7351 LearningRate 0.0628 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:55:58,716-Speed 5667.59 samples/sec Loss 7.6702 LearningRate 0.0628 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:00,538-Speed 5624.19 samples/sec Loss 7.4544 LearningRate 0.0628 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:02,347-Speed 5661.63 samples/sec Loss 7.6721 LearningRate 0.0628 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:04,158-Speed 5657.47 samples/sec Loss 7.8310 LearningRate 0.0627 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:05,947-Speed 5726.16 samples/sec Loss 7.5993 LearningRate 0.0627 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:07,766-Speed 5630.73 samples/sec Loss 7.7535 LearningRate 0.0627 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:09,667-Speed 5389.02 samples/sec Loss 7.7219 LearningRate 0.0627 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:11,560-Speed 5409.25 samples/sec Loss 7.7019 LearningRate 0.0627 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:13,417-Speed 5516.91 samples/sec Loss 7.7814 LearningRate 0.0627 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:15,263-Speed 5548.22 samples/sec Loss 7.8136 LearningRate 0.0627 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:17,082-Speed 5632.60 samples/sec Loss 7.5812 LearningRate 0.0626 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:18,887-Speed 5674.84 samples/sec Loss 7.7387 LearningRate 0.0626 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:20,683-Speed 5703.81 samples/sec Loss 7.7149 LearningRate 0.0626 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:22,478-Speed 5706.77 samples/sec Loss 7.7502 LearningRate 0.0626 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:24,276-Speed 5697.38 samples/sec Loss 7.6423 LearningRate 0.0626 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:26,103-Speed 5607.94 samples/sec Loss 7.7069 LearningRate 0.0626 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:27,934-Speed 5595.76 samples/sec Loss 7.7510 LearningRate 0.0626 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:29,741-Speed 5666.45 samples/sec Loss 7.6559 LearningRate 0.0626 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:31,548-Speed 5671.23 samples/sec Loss 7.6402 LearningRate 0.0625 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:33,359-Speed 5657.45 samples/sec Loss 7.7106 LearningRate 0.0625 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:35,161-Speed 5684.83 samples/sec Loss 7.6920 LearningRate 0.0625 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:36,959-Speed 5694.21 samples/sec Loss 7.6883 LearningRate 0.0625 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:56:38,751-Speed 5717.06 samples/sec Loss 7.8850 LearningRate 0.0625 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:40,564-Speed 5649.59 samples/sec Loss 7.6343 LearningRate 0.0625 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:42,378-Speed 5648.34 samples/sec Loss 7.7641 LearningRate 0.0625 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:44,167-Speed 5727.13 samples/sec Loss 7.8742 LearningRate 0.0624 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:45,991-Speed 5614.02 samples/sec Loss 7.7918 LearningRate 0.0624 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:47,826-Speed 5582.97 samples/sec Loss 7.9118 LearningRate 0.0624 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:49,621-Speed 5707.38 samples/sec Loss 7.6187 LearningRate 0.0624 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:51,456-Speed 5582.93 samples/sec Loss 7.7609 LearningRate 0.0624 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:53,268-Speed 5652.32 samples/sec Loss 7.7500 LearningRate 0.0624 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:55,095-Speed 5607.63 samples/sec Loss 7.6922 LearningRate 0.0624 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:56:56,912-Speed 5637.26 samples/sec Loss 7.6300 LearningRate 0.0623 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:56:58,736-Speed 5616.33 samples/sec Loss 7.8516 LearningRate 0.0623 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:00,565-Speed 5601.92 samples/sec Loss 7.7495 LearningRate 0.0623 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:02,397-Speed 5590.63 samples/sec Loss 7.7268 LearningRate 0.0623 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:04,238-Speed 5565.56 samples/sec Loss 7.9841 LearningRate 0.0623 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:06,043-Speed 5673.65 samples/sec Loss 7.7810 LearningRate 0.0623 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:07,848-Speed 5675.01 samples/sec Loss 7.7160 LearningRate 0.0623 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:09,652-Speed 5678.85 samples/sec Loss 7.6570 LearningRate 0.0622 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:57:36,688-[lfw][24000]XNorm: 21.676750 Training: 2022-04-27 02:57:36,689-[lfw][24000]Accuracy-Flip: 0.99717+-0.00248 Training: 2022-04-27 02:57:36,689-[lfw][24000]Accuracy-Highest: 0.99717 Training: 2022-04-27 02:58:07,709-[cfp_fp][24000]XNorm: 18.930162 Training: 2022-04-27 02:58:07,710-[cfp_fp][24000]Accuracy-Flip: 0.91771+-0.01542 Training: 2022-04-27 02:58:07,710-[cfp_fp][24000]Accuracy-Highest: 0.93171 Training: 2022-04-27 02:58:34,644-[agedb_30][24000]XNorm: 21.409065 Training: 2022-04-27 02:58:34,645-[agedb_30][24000]Accuracy-Flip: 0.96500+-0.00719 Training: 2022-04-27 02:58:34,645-[agedb_30][24000]Accuracy-Highest: 0.96867 Training: 2022-04-27 02:58:36,481-Speed 117.93 samples/sec Loss 7.8664 LearningRate 0.0622 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:38,313-Speed 5593.09 samples/sec Loss 7.7716 LearningRate 0.0622 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:40,119-Speed 5673.42 samples/sec Loss 7.6766 LearningRate 0.0622 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:41,940-Speed 5630.24 samples/sec Loss 7.7115 LearningRate 0.0622 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:43,756-Speed 5642.62 samples/sec Loss 7.6369 LearningRate 0.0622 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:45,578-Speed 5623.43 samples/sec Loss 7.7752 LearningRate 0.0622 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:47,487-Speed 5365.70 samples/sec Loss 7.6285 LearningRate 0.0621 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:58:49,318-Speed 5597.00 samples/sec Loss 7.5898 LearningRate 0.0621 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:58:51,135-Speed 5639.23 samples/sec Loss 7.6724 LearningRate 0.0621 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:58:52,959-Speed 5617.73 samples/sec Loss 7.6074 LearningRate 0.0621 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:58:54,776-Speed 5639.23 samples/sec Loss 7.7066 LearningRate 0.0621 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:58:56,604-Speed 5605.59 samples/sec Loss 7.7920 LearningRate 0.0621 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:58:58,423-Speed 5633.91 samples/sec Loss 7.7175 LearningRate 0.0621 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:00,242-Speed 5631.20 samples/sec Loss 7.7119 LearningRate 0.0621 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:02,046-Speed 5679.40 samples/sec Loss 7.6719 LearningRate 0.0620 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:03,896-Speed 5538.45 samples/sec Loss 7.7670 LearningRate 0.0620 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:05,739-Speed 5560.12 samples/sec Loss 7.7064 LearningRate 0.0620 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:07,537-Speed 5694.48 samples/sec Loss 7.7959 LearningRate 0.0620 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 02:59:09,336-Speed 5693.94 samples/sec Loss 7.8070 LearningRate 0.0620 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:11,177-Speed 5566.71 samples/sec Loss 7.6657 LearningRate 0.0620 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:13,009-Speed 5592.03 samples/sec Loss 7.7591 LearningRate 0.0620 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:14,813-Speed 5676.84 samples/sec Loss 7.7381 LearningRate 0.0619 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:16,612-Speed 5695.29 samples/sec Loss 7.6300 LearningRate 0.0619 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:18,430-Speed 5637.85 samples/sec Loss 7.6954 LearningRate 0.0619 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:20,246-Speed 5638.80 samples/sec Loss 7.7343 LearningRate 0.0619 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:22,057-Speed 5656.39 samples/sec Loss 7.7049 LearningRate 0.0619 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:23,896-Speed 5572.97 samples/sec Loss 7.8325 LearningRate 0.0619 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:25,742-Speed 5549.66 samples/sec Loss 7.8294 LearningRate 0.0619 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:27,588-Speed 5548.61 samples/sec Loss 7.8781 LearningRate 0.0618 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:29,406-Speed 5634.49 samples/sec Loss 7.8842 LearningRate 0.0618 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:31,218-Speed 5655.09 samples/sec Loss 7.9734 LearningRate 0.0618 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:33,045-Speed 5608.39 samples/sec Loss 7.5573 LearningRate 0.0618 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:34,859-Speed 5647.98 samples/sec Loss 7.7421 LearningRate 0.0618 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:36,680-Speed 5628.23 samples/sec Loss 7.8184 LearningRate 0.0618 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:38,489-Speed 5660.66 samples/sec Loss 7.7621 LearningRate 0.0618 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:40,322-Speed 5589.65 samples/sec Loss 7.5460 LearningRate 0.0617 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:42,160-Speed 5573.68 samples/sec Loss 7.7171 LearningRate 0.0617 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:43,984-Speed 5618.94 samples/sec Loss 7.8530 LearningRate 0.0617 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:45,856-Speed 5470.29 samples/sec Loss 7.7175 LearningRate 0.0617 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:47,709-Speed 5532.60 samples/sec Loss 7.6457 LearningRate 0.0617 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:49,542-Speed 5589.94 samples/sec Loss 7.6020 LearningRate 0.0617 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 02:59:51,387-Speed 5552.20 samples/sec Loss 7.7485 LearningRate 0.0617 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:53,240-Speed 5528.23 samples/sec Loss 7.7299 LearningRate 0.0616 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:55,080-Speed 5566.34 samples/sec Loss 7.6280 LearningRate 0.0616 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:56,916-Speed 5581.57 samples/sec Loss 7.8419 LearningRate 0.0616 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 02:59:58,759-Speed 5558.23 samples/sec Loss 7.9439 LearningRate 0.0616 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:00,649-Speed 5419.11 samples/sec Loss 7.7532 LearningRate 0.0616 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:02,447-Speed 5701.70 samples/sec Loss 7.6581 LearningRate 0.0616 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:04,244-Speed 5699.50 samples/sec Loss 7.7090 LearningRate 0.0616 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:06,088-Speed 5557.21 samples/sec Loss 7.7860 LearningRate 0.0616 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:07,910-Speed 5623.17 samples/sec Loss 7.5960 LearningRate 0.0615 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:09,736-Speed 5612.61 samples/sec Loss 7.7831 LearningRate 0.0615 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:11,571-Speed 5584.17 samples/sec Loss 7.7343 LearningRate 0.0615 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:13,416-Speed 5551.26 samples/sec Loss 7.5478 LearningRate 0.0615 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:15,212-Speed 5706.62 samples/sec Loss 7.8149 LearningRate 0.0615 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:17,035-Speed 5618.78 samples/sec Loss 7.6812 LearningRate 0.0615 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:18,864-Speed 5604.25 samples/sec Loss 7.7415 LearningRate 0.0615 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:20,686-Speed 5624.75 samples/sec Loss 7.7679 LearningRate 0.0614 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:22,495-Speed 5663.30 samples/sec Loss 7.7594 LearningRate 0.0614 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:24,305-Speed 5659.65 samples/sec Loss 7.6640 LearningRate 0.0614 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:26,162-Speed 5519.77 samples/sec Loss 7.6087 LearningRate 0.0614 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:27,992-Speed 5596.01 samples/sec Loss 7.7552 LearningRate 0.0614 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-27 03:00:29,817-Speed 5614.88 samples/sec Loss 7.6255 LearningRate 0.0614 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:31,623-Speed 5675.67 samples/sec Loss 7.7525 LearningRate 0.0614 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:33,476-Speed 5528.01 samples/sec Loss 7.6820 LearningRate 0.0613 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:35,313-Speed 5577.88 samples/sec Loss 7.7106 LearningRate 0.0613 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:37,142-Speed 5601.94 samples/sec Loss 7.7971 LearningRate 0.0613 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:38,947-Speed 5676.61 samples/sec Loss 7.7541 LearningRate 0.0613 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:40,764-Speed 5637.05 samples/sec Loss 7.6826 LearningRate 0.0613 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:42,586-Speed 5623.09 samples/sec Loss 7.8081 LearningRate 0.0613 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:44,396-Speed 5663.97 samples/sec Loss 7.6036 LearningRate 0.0613 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:46,242-Speed 5548.42 samples/sec Loss 7.5258 LearningRate 0.0613 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:48,046-Speed 5678.73 samples/sec Loss 7.7383 LearningRate 0.0612 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:49,861-Speed 5645.42 samples/sec Loss 7.7751 LearningRate 0.0612 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:51,682-Speed 5628.04 samples/sec Loss 7.7593 LearningRate 0.0612 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:53,498-Speed 5642.08 samples/sec Loss 7.7196 LearningRate 0.0612 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:55,346-Speed 5543.39 samples/sec Loss 7.5974 LearningRate 0.0612 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:57,183-Speed 5577.11 samples/sec Loss 7.6746 LearningRate 0.0612 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:00:59,046-Speed 5499.76 samples/sec Loss 7.7521 LearningRate 0.0612 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:00,894-Speed 5545.10 samples/sec Loss 7.6514 LearningRate 0.0611 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:02,737-Speed 5558.50 samples/sec Loss 7.8039 LearningRate 0.0611 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:04,558-Speed 5626.12 samples/sec Loss 7.6934 LearningRate 0.0611 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:06,370-Speed 5655.22 samples/sec Loss 7.7004 LearningRate 0.0611 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:08,170-Speed 5691.91 samples/sec Loss 7.4782 LearningRate 0.0611 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:09,987-Speed 5639.13 samples/sec Loss 7.6161 LearningRate 0.0611 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:11,819-Speed 5592.11 samples/sec Loss 7.6921 LearningRate 0.0611 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:13,635-Speed 5640.89 samples/sec Loss 7.6741 LearningRate 0.0610 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:15,454-Speed 5630.78 samples/sec Loss 7.7715 LearningRate 0.0610 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:17,281-Speed 5609.65 samples/sec Loss 7.6374 LearningRate 0.0610 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:19,090-Speed 5661.51 samples/sec Loss 7.5960 LearningRate 0.0610 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:20,906-Speed 5640.93 samples/sec Loss 7.6567 LearningRate 0.0610 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:22,780-Speed 5466.24 samples/sec Loss 7.7478 LearningRate 0.0610 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:24,589-Speed 5664.04 samples/sec Loss 7.6507 LearningRate 0.0610 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:26,404-Speed 5646.17 samples/sec Loss 7.5411 LearningRate 0.0609 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:28,226-Speed 5622.23 samples/sec Loss 7.4116 LearningRate 0.0609 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:30,056-Speed 5599.71 samples/sec Loss 7.5987 LearningRate 0.0609 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:31,880-Speed 5616.09 samples/sec Loss 7.6434 LearningRate 0.0609 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:33,728-Speed 5544.58 samples/sec Loss 7.6448 LearningRate 0.0609 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:35,554-Speed 5610.32 samples/sec Loss 7.7745 LearningRate 0.0609 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:37,382-Speed 5606.19 samples/sec Loss 7.5859 LearningRate 0.0609 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:39,177-Speed 5709.91 samples/sec Loss 7.6309 LearningRate 0.0609 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:41,003-Speed 5609.81 samples/sec Loss 7.7126 LearningRate 0.0608 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:42,834-Speed 5595.10 samples/sec Loss 7.6660 LearningRate 0.0608 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:44,677-Speed 5560.29 samples/sec Loss 7.6698 LearningRate 0.0608 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:46,523-Speed 5549.09 samples/sec Loss 7.6033 LearningRate 0.0608 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:48,357-Speed 5588.12 samples/sec Loss 7.5868 LearningRate 0.0608 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:50,204-Speed 5545.95 samples/sec Loss 7.4834 LearningRate 0.0608 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:52,043-Speed 5574.34 samples/sec Loss 7.6490 LearningRate 0.0608 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:53,888-Speed 5555.89 samples/sec Loss 7.5660 LearningRate 0.0607 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:55,701-Speed 5650.17 samples/sec Loss 7.6039 LearningRate 0.0607 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 03:01:57,540-Speed 5571.84 samples/sec Loss 7.5998 LearningRate 0.0607 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:01:59,374-Speed 5586.89 samples/sec Loss 7.5911 LearningRate 0.0607 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:01,197-Speed 5619.99 samples/sec Loss 7.6219 LearningRate 0.0607 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:03,016-Speed 5630.66 samples/sec Loss 7.4803 LearningRate 0.0607 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:04,857-Speed 5565.95 samples/sec Loss 7.6448 LearningRate 0.0607 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:06,685-Speed 5606.77 samples/sec Loss 7.7681 LearningRate 0.0606 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:08,610-Speed 5319.60 samples/sec Loss 7.6076 LearningRate 0.0606 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:10,465-Speed 5524.27 samples/sec Loss 7.7116 LearningRate 0.0606 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:12,267-Speed 5687.14 samples/sec Loss 7.6688 LearningRate 0.0606 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:14,098-Speed 5595.15 samples/sec Loss 7.5260 LearningRate 0.0606 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-27 03:02:15,924-Speed 5608.22 samples/sec Loss 7.6094 LearningRate 0.0606 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:17,754-Speed 5603.51 samples/sec Loss 7.6586 LearningRate 0.0606 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:19,606-Speed 5531.12 samples/sec Loss 7.7449 LearningRate 0.0606 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:21,451-Speed 5555.44 samples/sec Loss 7.6191 LearningRate 0.0605 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:23,282-Speed 5595.13 samples/sec Loss 7.5734 LearningRate 0.0605 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:25,113-Speed 5598.11 samples/sec Loss 7.6181 LearningRate 0.0605 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:26,923-Speed 5657.40 samples/sec Loss 7.7646 LearningRate 0.0605 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:28,737-Speed 5648.61 samples/sec Loss 7.7077 LearningRate 0.0605 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:30,557-Speed 5630.47 samples/sec Loss 7.6269 LearningRate 0.0605 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:32,368-Speed 5655.49 samples/sec Loss 7.6341 LearningRate 0.0605 Epoch: 4 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:34,191-Speed 5620.47 samples/sec Loss 7.6377 LearningRate 0.0604 Epoch: 4 Global Step: 25310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-27 03:02:35,990-Speed 5696.35 samples/sec Loss 7.5690 LearningRate 0.0604 Epoch: 4 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:37,796-Speed 5674.14 samples/sec Loss 7.6376 LearningRate 0.0604 Epoch: 4 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:39,625-Speed 5600.03 samples/sec Loss 7.4727 LearningRate 0.0604 Epoch: 4 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:41,465-Speed 5570.33 samples/sec Loss 7.6730 LearningRate 0.0604 Epoch: 4 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:43,288-Speed 5619.44 samples/sec Loss 7.5661 LearningRate 0.0604 Epoch: 4 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:45,109-Speed 5626.03 samples/sec Loss 7.6029 LearningRate 0.0604 Epoch: 4 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:46,939-Speed 5599.38 samples/sec Loss 7.5467 LearningRate 0.0603 Epoch: 4 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:48,771-Speed 5591.49 samples/sec Loss 7.6860 LearningRate 0.0603 Epoch: 4 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:50,594-Speed 5620.98 samples/sec Loss 7.5348 LearningRate 0.0603 Epoch: 4 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:52,410-Speed 5642.49 samples/sec Loss 7.6429 LearningRate 0.0603 Epoch: 4 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:54,227-Speed 5638.44 samples/sec Loss 7.5688 LearningRate 0.0603 Epoch: 4 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:56,055-Speed 5604.41 samples/sec Loss 7.7292 LearningRate 0.0603 Epoch: 4 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:02:57,870-Speed 5644.84 samples/sec Loss 7.6623 LearningRate 0.0603 Epoch: 4 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:02:59,688-Speed 5633.62 samples/sec Loss 7.5293 LearningRate 0.0602 Epoch: 4 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:01,503-Speed 5646.05 samples/sec Loss 7.8081 LearningRate 0.0602 Epoch: 4 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:03,320-Speed 5638.45 samples/sec Loss 7.5905 LearningRate 0.0602 Epoch: 4 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:05,147-Speed 5606.09 samples/sec Loss 7.5892 LearningRate 0.0602 Epoch: 4 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:06,947-Speed 5692.70 samples/sec Loss 7.5376 LearningRate 0.0602 Epoch: 4 Global Step: 25490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:08,755-Speed 5665.94 samples/sec Loss 7.5003 LearningRate 0.0602 Epoch: 4 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:10,569-Speed 5648.52 samples/sec Loss 7.6345 LearningRate 0.0602 Epoch: 4 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:12,384-Speed 5645.42 samples/sec Loss 7.3594 LearningRate 0.0602 Epoch: 4 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:14,208-Speed 5618.15 samples/sec Loss 7.6498 LearningRate 0.0601 Epoch: 4 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:16,034-Speed 5609.14 samples/sec Loss 7.6497 LearningRate 0.0601 Epoch: 4 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:17,839-Speed 5678.17 samples/sec Loss 7.3792 LearningRate 0.0601 Epoch: 4 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:19,645-Speed 5669.91 samples/sec Loss 7.5467 LearningRate 0.0601 Epoch: 4 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:21,468-Speed 5621.00 samples/sec Loss 7.5262 LearningRate 0.0601 Epoch: 4 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:23,269-Speed 5687.77 samples/sec Loss 7.4038 LearningRate 0.0601 Epoch: 4 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:25,086-Speed 5639.40 samples/sec Loss 7.7853 LearningRate 0.0601 Epoch: 4 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:26,938-Speed 5531.99 samples/sec Loss 7.4604 LearningRate 0.0600 Epoch: 4 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:28,761-Speed 5621.01 samples/sec Loss 7.5828 LearningRate 0.0600 Epoch: 4 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:30,603-Speed 5562.65 samples/sec Loss 7.5581 LearningRate 0.0600 Epoch: 4 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:32,428-Speed 5615.29 samples/sec Loss 7.6412 LearningRate 0.0600 Epoch: 4 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:34,254-Speed 5610.40 samples/sec Loss 7.6135 LearningRate 0.0600 Epoch: 4 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:36,080-Speed 5610.70 samples/sec Loss 7.4934 LearningRate 0.0600 Epoch: 4 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:37,942-Speed 5506.28 samples/sec Loss 7.6994 LearningRate 0.0600 Epoch: 4 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:39,782-Speed 5568.69 samples/sec Loss 7.5816 LearningRate 0.0599 Epoch: 4 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:41,620-Speed 5575.39 samples/sec Loss 7.6217 LearningRate 0.0599 Epoch: 4 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:43,434-Speed 5648.09 samples/sec Loss 7.5723 LearningRate 0.0599 Epoch: 4 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:45,245-Speed 5656.75 samples/sec Loss 7.6127 LearningRate 0.0599 Epoch: 4 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:47,064-Speed 5632.01 samples/sec Loss 7.5242 LearningRate 0.0599 Epoch: 4 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:48,894-Speed 5599.36 samples/sec Loss 7.4321 LearningRate 0.0599 Epoch: 4 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:03:50,717-Speed 5619.13 samples/sec Loss 7.4753 LearningRate 0.0599 Epoch: 4 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:52,544-Speed 5607.91 samples/sec Loss 7.6162 LearningRate 0.0599 Epoch: 4 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:54,389-Speed 5553.93 samples/sec Loss 7.7093 LearningRate 0.0598 Epoch: 4 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:56,221-Speed 5593.67 samples/sec Loss 7.6064 LearningRate 0.0598 Epoch: 4 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:58,044-Speed 5620.57 samples/sec Loss 7.5427 LearningRate 0.0598 Epoch: 4 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:03:59,919-Speed 5462.93 samples/sec Loss 7.6420 LearningRate 0.0598 Epoch: 4 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:01,754-Speed 5584.49 samples/sec Loss 7.3746 LearningRate 0.0598 Epoch: 4 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:03,564-Speed 5659.21 samples/sec Loss 7.5874 LearningRate 0.0598 Epoch: 4 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:05,404-Speed 5571.67 samples/sec Loss 7.6924 LearningRate 0.0598 Epoch: 4 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:07,231-Speed 5606.92 samples/sec Loss 7.5805 LearningRate 0.0597 Epoch: 4 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:09,049-Speed 5637.42 samples/sec Loss 7.5825 LearningRate 0.0597 Epoch: 4 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:10,925-Speed 5459.88 samples/sec Loss 7.6509 LearningRate 0.0597 Epoch: 4 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:12,741-Speed 5642.78 samples/sec Loss 7.4272 LearningRate 0.0597 Epoch: 4 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:14,583-Speed 5562.45 samples/sec Loss 7.6485 LearningRate 0.0597 Epoch: 4 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:16,394-Speed 5658.33 samples/sec Loss 7.5414 LearningRate 0.0597 Epoch: 4 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:18,251-Speed 5517.64 samples/sec Loss 7.8024 LearningRate 0.0597 Epoch: 4 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:20,089-Speed 5571.01 samples/sec Loss 7.6642 LearningRate 0.0597 Epoch: 4 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:21,913-Speed 5618.44 samples/sec Loss 7.4870 LearningRate 0.0596 Epoch: 4 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:23,727-Speed 5648.99 samples/sec Loss 7.5225 LearningRate 0.0596 Epoch: 4 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:25,540-Speed 5651.47 samples/sec Loss 7.5214 LearningRate 0.0596 Epoch: 4 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:27,373-Speed 5589.67 samples/sec Loss 7.6541 LearningRate 0.0596 Epoch: 4 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:29,193-Speed 5627.41 samples/sec Loss 7.7836 LearningRate 0.0596 Epoch: 4 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:31,041-Speed 5543.36 samples/sec Loss 7.4596 LearningRate 0.0596 Epoch: 4 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:32,859-Speed 5638.38 samples/sec Loss 7.6369 LearningRate 0.0596 Epoch: 4 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:04:34,684-Speed 5612.70 samples/sec Loss 7.6861 LearningRate 0.0595 Epoch: 4 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:36,505-Speed 5626.40 samples/sec Loss 7.5118 LearningRate 0.0595 Epoch: 4 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:38,322-Speed 5638.16 samples/sec Loss 7.6172 LearningRate 0.0595 Epoch: 4 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:04:40,159-Speed 5577.14 samples/sec Loss 7.6716 LearningRate 0.0595 Epoch: 4 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:05:06,958-[lfw][26000]XNorm: 22.816512 Training: 2022-04-27 03:05:06,959-[lfw][26000]Accuracy-Flip: 0.99567+-0.00309 Training: 2022-04-27 03:05:06,959-[lfw][26000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:05:37,734-[cfp_fp][26000]XNorm: 19.423009 Training: 2022-04-27 03:05:37,735-[cfp_fp][26000]Accuracy-Flip: 0.93243+-0.00895 Training: 2022-04-27 03:05:37,736-[cfp_fp][26000]Accuracy-Highest: 0.93243 Training: 2022-04-27 03:06:04,345-[agedb_30][26000]XNorm: 22.459086 Training: 2022-04-27 03:06:04,346-[agedb_30][26000]Accuracy-Flip: 0.96433+-0.01153 Training: 2022-04-27 03:06:04,347-[agedb_30][26000]Accuracy-Highest: 0.96867 Training: 2022-04-27 03:06:06,193-Speed 119.03 samples/sec Loss 7.6100 LearningRate 0.0595 Epoch: 4 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:06:08,006-Speed 5649.13 samples/sec Loss 7.5094 LearningRate 0.0595 Epoch: 4 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:06:09,807-Speed 5688.71 samples/sec Loss 7.6237 LearningRate 0.0595 Epoch: 4 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:06:11,644-Speed 5576.58 samples/sec Loss 7.6584 LearningRate 0.0594 Epoch: 4 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:06:13,476-Speed 5593.92 samples/sec Loss 7.6225 LearningRate 0.0594 Epoch: 4 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:15,308-Speed 5590.04 samples/sec Loss 7.3726 LearningRate 0.0594 Epoch: 4 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:17,168-Speed 5511.49 samples/sec Loss 7.2887 LearningRate 0.0594 Epoch: 4 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:18,982-Speed 5647.87 samples/sec Loss 7.5112 LearningRate 0.0594 Epoch: 4 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:20,806-Speed 5616.70 samples/sec Loss 7.5437 LearningRate 0.0594 Epoch: 4 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:22,643-Speed 5575.65 samples/sec Loss 7.4948 LearningRate 0.0594 Epoch: 4 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:24,486-Speed 5559.53 samples/sec Loss 7.6638 LearningRate 0.0594 Epoch: 4 Global Step: 26110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:26,309-Speed 5620.66 samples/sec Loss 7.5961 LearningRate 0.0593 Epoch: 4 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:28,161-Speed 5532.44 samples/sec Loss 7.6997 LearningRate 0.0593 Epoch: 4 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:30,025-Speed 5498.51 samples/sec Loss 7.5325 LearningRate 0.0593 Epoch: 4 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:31,887-Speed 5501.73 samples/sec Loss 7.5366 LearningRate 0.0593 Epoch: 4 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:33,690-Speed 5683.59 samples/sec Loss 7.6005 LearningRate 0.0593 Epoch: 4 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:35,522-Speed 5593.41 samples/sec Loss 7.4902 LearningRate 0.0593 Epoch: 4 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:37,347-Speed 5611.76 samples/sec Loss 7.5336 LearningRate 0.0593 Epoch: 4 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:39,206-Speed 5512.75 samples/sec Loss 7.5784 LearningRate 0.0592 Epoch: 4 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:41,028-Speed 5623.63 samples/sec Loss 7.4574 LearningRate 0.0592 Epoch: 4 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:42,851-Speed 5622.46 samples/sec Loss 7.5249 LearningRate 0.0592 Epoch: 4 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:44,674-Speed 5618.74 samples/sec Loss 7.6516 LearningRate 0.0592 Epoch: 4 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:46,493-Speed 5633.10 samples/sec Loss 7.4979 LearningRate 0.0592 Epoch: 4 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:48,307-Speed 5649.46 samples/sec Loss 7.4090 LearningRate 0.0592 Epoch: 4 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:50,138-Speed 5595.08 samples/sec Loss 7.5088 LearningRate 0.0592 Epoch: 4 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:06:51,963-Speed 5611.22 samples/sec Loss 7.6106 LearningRate 0.0591 Epoch: 4 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:53,809-Speed 5551.19 samples/sec Loss 7.4561 LearningRate 0.0591 Epoch: 4 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:55,671-Speed 5501.46 samples/sec Loss 7.3993 LearningRate 0.0591 Epoch: 4 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:57,553-Speed 5444.38 samples/sec Loss 7.4558 LearningRate 0.0591 Epoch: 4 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:06:59,394-Speed 5565.18 samples/sec Loss 7.4651 LearningRate 0.0591 Epoch: 4 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:01,225-Speed 5593.91 samples/sec Loss 7.3137 LearningRate 0.0591 Epoch: 4 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:03,044-Speed 5632.23 samples/sec Loss 7.3899 LearningRate 0.0591 Epoch: 4 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:04,884-Speed 5569.46 samples/sec Loss 7.5606 LearningRate 0.0591 Epoch: 4 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:06,694-Speed 5658.52 samples/sec Loss 7.7378 LearningRate 0.0590 Epoch: 4 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:08,544-Speed 5540.92 samples/sec Loss 7.2797 LearningRate 0.0590 Epoch: 4 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:10,355-Speed 5658.29 samples/sec Loss 7.3221 LearningRate 0.0590 Epoch: 4 Global Step: 26360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:12,234-Speed 5452.06 samples/sec Loss 7.6032 LearningRate 0.0590 Epoch: 4 Global Step: 26370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:14,144-Speed 5362.29 samples/sec Loss 7.4280 LearningRate 0.0590 Epoch: 4 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:16,062-Speed 5341.48 samples/sec Loss 7.4761 LearningRate 0.0590 Epoch: 4 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:17,874-Speed 5655.95 samples/sec Loss 7.4894 LearningRate 0.0590 Epoch: 4 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:19,698-Speed 5617.46 samples/sec Loss 7.3503 LearningRate 0.0589 Epoch: 4 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:21,495-Speed 5702.85 samples/sec Loss 7.6038 LearningRate 0.0589 Epoch: 4 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:23,304-Speed 5662.26 samples/sec Loss 7.6072 LearningRate 0.0589 Epoch: 4 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:25,124-Speed 5628.47 samples/sec Loss 7.5043 LearningRate 0.0589 Epoch: 4 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:26,954-Speed 5600.62 samples/sec Loss 7.5197 LearningRate 0.0589 Epoch: 4 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:28,758-Speed 5676.78 samples/sec Loss 7.4256 LearningRate 0.0589 Epoch: 4 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:30,575-Speed 5639.19 samples/sec Loss 7.5595 LearningRate 0.0589 Epoch: 4 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:32,458-Speed 5440.50 samples/sec Loss 7.4647 LearningRate 0.0589 Epoch: 4 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:34,291-Speed 5589.33 samples/sec Loss 7.4791 LearningRate 0.0588 Epoch: 4 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:36,111-Speed 5628.92 samples/sec Loss 7.2853 LearningRate 0.0588 Epoch: 4 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:37,915-Speed 5679.19 samples/sec Loss 7.4500 LearningRate 0.0588 Epoch: 4 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:39,720-Speed 5677.80 samples/sec Loss 7.4931 LearningRate 0.0588 Epoch: 4 Global Step: 26520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:41,557-Speed 5576.87 samples/sec Loss 7.3707 LearningRate 0.0588 Epoch: 4 Global Step: 26530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:43,383-Speed 5611.12 samples/sec Loss 7.3179 LearningRate 0.0588 Epoch: 4 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:45,179-Speed 5704.85 samples/sec Loss 7.3906 LearningRate 0.0588 Epoch: 4 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:47,012-Speed 5585.67 samples/sec Loss 7.3777 LearningRate 0.0587 Epoch: 4 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:48,846-Speed 5587.26 samples/sec Loss 7.6497 LearningRate 0.0587 Epoch: 4 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:50,685-Speed 5570.50 samples/sec Loss 7.6592 LearningRate 0.0587 Epoch: 4 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:52,521-Speed 5581.59 samples/sec Loss 7.5139 LearningRate 0.0587 Epoch: 4 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:07:54,346-Speed 5615.96 samples/sec Loss 7.4854 LearningRate 0.0587 Epoch: 4 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:56,208-Speed 5503.77 samples/sec Loss 7.5613 LearningRate 0.0587 Epoch: 4 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:58,066-Speed 5514.47 samples/sec Loss 7.5383 LearningRate 0.0587 Epoch: 4 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:07:59,904-Speed 5574.14 samples/sec Loss 7.5293 LearningRate 0.0586 Epoch: 4 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:01,744-Speed 5566.40 samples/sec Loss 7.4910 LearningRate 0.0586 Epoch: 4 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:03,561-Speed 5638.40 samples/sec Loss 7.5642 LearningRate 0.0586 Epoch: 4 Global Step: 26650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:05,381-Speed 5630.77 samples/sec Loss 7.4485 LearningRate 0.0586 Epoch: 4 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:07,225-Speed 5556.70 samples/sec Loss 7.4892 LearningRate 0.0586 Epoch: 4 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:09,066-Speed 5564.61 samples/sec Loss 7.6369 LearningRate 0.0586 Epoch: 4 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:10,889-Speed 5621.76 samples/sec Loss 7.5659 LearningRate 0.0586 Epoch: 4 Global Step: 26690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:12,706-Speed 5638.21 samples/sec Loss 7.4668 LearningRate 0.0586 Epoch: 4 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:14,559-Speed 5530.54 samples/sec Loss 7.4243 LearningRate 0.0585 Epoch: 4 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:16,381-Speed 5621.68 samples/sec Loss 7.5662 LearningRate 0.0585 Epoch: 4 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:18,196-Speed 5645.07 samples/sec Loss 7.4799 LearningRate 0.0585 Epoch: 4 Global Step: 26730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:20,023-Speed 5606.79 samples/sec Loss 7.4997 LearningRate 0.0585 Epoch: 4 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:21,832-Speed 5664.91 samples/sec Loss 7.4734 LearningRate 0.0585 Epoch: 4 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:23,642-Speed 5661.14 samples/sec Loss 7.5154 LearningRate 0.0585 Epoch: 4 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:25,457-Speed 5646.96 samples/sec Loss 7.3706 LearningRate 0.0585 Epoch: 4 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:27,305-Speed 5544.14 samples/sec Loss 7.3896 LearningRate 0.0584 Epoch: 4 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:29,126-Speed 5626.64 samples/sec Loss 7.4787 LearningRate 0.0584 Epoch: 4 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:30,958-Speed 5593.01 samples/sec Loss 7.4730 LearningRate 0.0584 Epoch: 4 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:32,797-Speed 5570.05 samples/sec Loss 7.5850 LearningRate 0.0584 Epoch: 4 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:34,636-Speed 5572.99 samples/sec Loss 7.6456 LearningRate 0.0584 Epoch: 4 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:36,461-Speed 5611.50 samples/sec Loss 7.4809 LearningRate 0.0584 Epoch: 4 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:38,286-Speed 5618.04 samples/sec Loss 7.4380 LearningRate 0.0584 Epoch: 4 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:40,136-Speed 5537.24 samples/sec Loss 7.5574 LearningRate 0.0584 Epoch: 4 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:41,952-Speed 5641.40 samples/sec Loss 7.3891 LearningRate 0.0583 Epoch: 4 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:43,789-Speed 5578.09 samples/sec Loss 7.5329 LearningRate 0.0583 Epoch: 4 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:45,615-Speed 5609.57 samples/sec Loss 7.5733 LearningRate 0.0583 Epoch: 4 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:47,448-Speed 5592.76 samples/sec Loss 7.3661 LearningRate 0.0583 Epoch: 4 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:49,269-Speed 5626.97 samples/sec Loss 7.6115 LearningRate 0.0583 Epoch: 4 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:08:51,104-Speed 5584.25 samples/sec Loss 7.5283 LearningRate 0.0583 Epoch: 4 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:52,961-Speed 5516.98 samples/sec Loss 7.4988 LearningRate 0.0583 Epoch: 4 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:54,812-Speed 5533.20 samples/sec Loss 7.5508 LearningRate 0.0582 Epoch: 4 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:56,632-Speed 5630.54 samples/sec Loss 7.4693 LearningRate 0.0582 Epoch: 4 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:08:58,454-Speed 5623.62 samples/sec Loss 7.3584 LearningRate 0.0582 Epoch: 4 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:00,303-Speed 5540.54 samples/sec Loss 7.6089 LearningRate 0.0582 Epoch: 4 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:02,150-Speed 5547.27 samples/sec Loss 7.6868 LearningRate 0.0582 Epoch: 4 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:03,983-Speed 5587.24 samples/sec Loss 7.4050 LearningRate 0.0582 Epoch: 4 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:05,859-Speed 5462.56 samples/sec Loss 7.4566 LearningRate 0.0582 Epoch: 4 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:07,672-Speed 5652.60 samples/sec Loss 7.5537 LearningRate 0.0582 Epoch: 4 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:09,490-Speed 5635.65 samples/sec Loss 7.4922 LearningRate 0.0581 Epoch: 4 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:11,313-Speed 5620.58 samples/sec Loss 7.3012 LearningRate 0.0581 Epoch: 4 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:13,126-Speed 5650.16 samples/sec Loss 7.4469 LearningRate 0.0581 Epoch: 4 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:14,971-Speed 5554.77 samples/sec Loss 7.5241 LearningRate 0.0581 Epoch: 4 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:16,805-Speed 5585.69 samples/sec Loss 7.4218 LearningRate 0.0581 Epoch: 4 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:18,652-Speed 5544.90 samples/sec Loss 7.4175 LearningRate 0.0581 Epoch: 4 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:20,503-Speed 5537.37 samples/sec Loss 7.4119 LearningRate 0.0581 Epoch: 4 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:22,331-Speed 5602.44 samples/sec Loss 7.4391 LearningRate 0.0580 Epoch: 4 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:24,139-Speed 5665.68 samples/sec Loss 7.3268 LearningRate 0.0580 Epoch: 4 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:25,963-Speed 5619.99 samples/sec Loss 7.3658 LearningRate 0.0580 Epoch: 4 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:27,777-Speed 5648.61 samples/sec Loss 7.3733 LearningRate 0.0580 Epoch: 4 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:29,609-Speed 5592.00 samples/sec Loss 7.4578 LearningRate 0.0580 Epoch: 4 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:31,454-Speed 5552.10 samples/sec Loss 7.4586 LearningRate 0.0580 Epoch: 4 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:33,286-Speed 5592.21 samples/sec Loss 7.3163 LearningRate 0.0580 Epoch: 4 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:35,121-Speed 5584.02 samples/sec Loss 7.4987 LearningRate 0.0580 Epoch: 4 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:36,977-Speed 5520.62 samples/sec Loss 7.4345 LearningRate 0.0579 Epoch: 4 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:38,790-Speed 5651.94 samples/sec Loss 7.2839 LearningRate 0.0579 Epoch: 4 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:40,652-Speed 5540.08 samples/sec Loss 7.5560 LearningRate 0.0579 Epoch: 4 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:42,479-Speed 5606.47 samples/sec Loss 7.3716 LearningRate 0.0579 Epoch: 4 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:44,291-Speed 5654.50 samples/sec Loss 7.4368 LearningRate 0.0579 Epoch: 4 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:46,116-Speed 5614.49 samples/sec Loss 7.3859 LearningRate 0.0579 Epoch: 4 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:47,923-Speed 5670.84 samples/sec Loss 7.4070 LearningRate 0.0579 Epoch: 4 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:09:49,741-Speed 5634.82 samples/sec Loss 7.5735 LearningRate 0.0578 Epoch: 4 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:51,554-Speed 5650.42 samples/sec Loss 7.4800 LearningRate 0.0578 Epoch: 4 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:53,373-Speed 5632.48 samples/sec Loss 7.2415 LearningRate 0.0578 Epoch: 4 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:55,175-Speed 5687.90 samples/sec Loss 7.2886 LearningRate 0.0578 Epoch: 4 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:57,039-Speed 5494.68 samples/sec Loss 7.4089 LearningRate 0.0578 Epoch: 4 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:09:58,850-Speed 5656.60 samples/sec Loss 7.2998 LearningRate 0.0578 Epoch: 4 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:00,691-Speed 5565.99 samples/sec Loss 7.5345 LearningRate 0.0578 Epoch: 4 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:02,525-Speed 5586.08 samples/sec Loss 7.5694 LearningRate 0.0578 Epoch: 4 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:04,352-Speed 5609.03 samples/sec Loss 7.5508 LearningRate 0.0577 Epoch: 4 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:06,173-Speed 5628.50 samples/sec Loss 7.3567 LearningRate 0.0577 Epoch: 4 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:08,006-Speed 5588.05 samples/sec Loss 7.4764 LearningRate 0.0577 Epoch: 4 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:09,837-Speed 5596.35 samples/sec Loss 7.2985 LearningRate 0.0577 Epoch: 4 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:11,648-Speed 5655.58 samples/sec Loss 7.4298 LearningRate 0.0577 Epoch: 4 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:13,507-Speed 5512.07 samples/sec Loss 7.3093 LearningRate 0.0577 Epoch: 4 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:15,314-Speed 5669.69 samples/sec Loss 7.2227 LearningRate 0.0577 Epoch: 4 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:17,141-Speed 5607.13 samples/sec Loss 7.3490 LearningRate 0.0576 Epoch: 4 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:18,957-Speed 5641.88 samples/sec Loss 7.5673 LearningRate 0.0576 Epoch: 4 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:20,771-Speed 5647.18 samples/sec Loss 7.3757 LearningRate 0.0576 Epoch: 4 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:22,615-Speed 5558.67 samples/sec Loss 7.4542 LearningRate 0.0576 Epoch: 4 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:24,443-Speed 5602.34 samples/sec Loss 7.4320 LearningRate 0.0576 Epoch: 4 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:26,268-Speed 5615.65 samples/sec Loss 7.4241 LearningRate 0.0576 Epoch: 4 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:28,093-Speed 5611.53 samples/sec Loss 7.4217 LearningRate 0.0576 Epoch: 4 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:29,913-Speed 5631.76 samples/sec Loss 7.3259 LearningRate 0.0576 Epoch: 4 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:31,726-Speed 5650.27 samples/sec Loss 7.4468 LearningRate 0.0575 Epoch: 4 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:33,557-Speed 5594.56 samples/sec Loss 7.3928 LearningRate 0.0575 Epoch: 4 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:35,380-Speed 5620.70 samples/sec Loss 7.4591 LearningRate 0.0575 Epoch: 4 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:37,197-Speed 5638.48 samples/sec Loss 7.3472 LearningRate 0.0575 Epoch: 4 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:39,019-Speed 5623.62 samples/sec Loss 7.3275 LearningRate 0.0575 Epoch: 4 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:40,831-Speed 5651.90 samples/sec Loss 7.4120 LearningRate 0.0575 Epoch: 4 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:42,660-Speed 5601.70 samples/sec Loss 7.5535 LearningRate 0.0575 Epoch: 4 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:44,497-Speed 5577.07 samples/sec Loss 7.1821 LearningRate 0.0574 Epoch: 4 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:46,331-Speed 5589.08 samples/sec Loss 7.4136 LearningRate 0.0574 Epoch: 4 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:48,151-Speed 5629.63 samples/sec Loss 7.4343 LearningRate 0.0574 Epoch: 4 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:50,043-Speed 5414.41 samples/sec Loss 7.4361 LearningRate 0.0574 Epoch: 4 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:51,862-Speed 5632.91 samples/sec Loss 7.4183 LearningRate 0.0574 Epoch: 4 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:10:53,685-Speed 5618.19 samples/sec Loss 7.4731 LearningRate 0.0574 Epoch: 4 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:55,529-Speed 5558.65 samples/sec Loss 7.3005 LearningRate 0.0574 Epoch: 4 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:57,358-Speed 5599.43 samples/sec Loss 7.4585 LearningRate 0.0574 Epoch: 4 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:10:59,207-Speed 5539.77 samples/sec Loss 7.3687 LearningRate 0.0573 Epoch: 4 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:01,034-Speed 5610.20 samples/sec Loss 7.4099 LearningRate 0.0573 Epoch: 4 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:02,881-Speed 5545.13 samples/sec Loss 7.4186 LearningRate 0.0573 Epoch: 4 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:04,717-Speed 5581.66 samples/sec Loss 7.4852 LearningRate 0.0573 Epoch: 4 Global Step: 27640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:06,573-Speed 5519.98 samples/sec Loss 7.3013 LearningRate 0.0573 Epoch: 4 Global Step: 27650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:08,403-Speed 5598.35 samples/sec Loss 7.3970 LearningRate 0.0573 Epoch: 4 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:10,235-Speed 5592.78 samples/sec Loss 7.3299 LearningRate 0.0573 Epoch: 4 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:11:12,080-Speed 5554.39 samples/sec Loss 7.4914 LearningRate 0.0572 Epoch: 4 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:13,904-Speed 5614.55 samples/sec Loss 7.5174 LearningRate 0.0572 Epoch: 4 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:15,754-Speed 5540.92 samples/sec Loss 7.3122 LearningRate 0.0572 Epoch: 4 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:17,591-Speed 5576.10 samples/sec Loss 7.3944 LearningRate 0.0572 Epoch: 4 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:19,433-Speed 5563.61 samples/sec Loss 7.3309 LearningRate 0.0572 Epoch: 4 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:21,250-Speed 5637.76 samples/sec Loss 7.2382 LearningRate 0.0572 Epoch: 4 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:23,063-Speed 5650.50 samples/sec Loss 7.4465 LearningRate 0.0572 Epoch: 4 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:24,900-Speed 5577.86 samples/sec Loss 7.3250 LearningRate 0.0572 Epoch: 4 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:26,714-Speed 5648.33 samples/sec Loss 7.3515 LearningRate 0.0571 Epoch: 4 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:28,531-Speed 5636.63 samples/sec Loss 7.2981 LearningRate 0.0571 Epoch: 4 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:30,351-Speed 5630.28 samples/sec Loss 7.2432 LearningRate 0.0571 Epoch: 4 Global Step: 27780 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-27 03:11:32,149-Speed 5694.74 samples/sec Loss 7.3794 LearningRate 0.0571 Epoch: 4 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:33,975-Speed 5614.34 samples/sec Loss 7.4096 LearningRate 0.0571 Epoch: 4 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:35,820-Speed 5552.97 samples/sec Loss 7.2657 LearningRate 0.0571 Epoch: 4 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:37,639-Speed 5631.82 samples/sec Loss 7.2490 LearningRate 0.0571 Epoch: 4 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:39,494-Speed 5522.66 samples/sec Loss 7.3635 LearningRate 0.0570 Epoch: 4 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:41,319-Speed 5613.63 samples/sec Loss 7.3598 LearningRate 0.0570 Epoch: 4 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:43,143-Speed 5616.24 samples/sec Loss 7.3858 LearningRate 0.0570 Epoch: 4 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:44,964-Speed 5628.10 samples/sec Loss 7.3567 LearningRate 0.0570 Epoch: 4 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:46,774-Speed 5660.77 samples/sec Loss 7.4582 LearningRate 0.0570 Epoch: 4 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:48,592-Speed 5632.53 samples/sec Loss 7.2770 LearningRate 0.0570 Epoch: 4 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:50,410-Speed 5636.05 samples/sec Loss 7.3501 LearningRate 0.0570 Epoch: 4 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:52,251-Speed 5565.17 samples/sec Loss 7.3963 LearningRate 0.0570 Epoch: 4 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:54,080-Speed 5602.61 samples/sec Loss 7.3521 LearningRate 0.0569 Epoch: 4 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:55,892-Speed 5655.77 samples/sec Loss 7.1377 LearningRate 0.0569 Epoch: 4 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:57,726-Speed 5585.36 samples/sec Loss 7.2981 LearningRate 0.0569 Epoch: 4 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:11:59,536-Speed 5659.40 samples/sec Loss 7.3879 LearningRate 0.0569 Epoch: 4 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:01,358-Speed 5623.70 samples/sec Loss 7.4082 LearningRate 0.0569 Epoch: 4 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:03,222-Speed 5497.30 samples/sec Loss 7.3907 LearningRate 0.0569 Epoch: 4 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:05,051-Speed 5603.62 samples/sec Loss 7.3384 LearningRate 0.0569 Epoch: 4 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:06,874-Speed 5620.04 samples/sec Loss 7.3258 LearningRate 0.0568 Epoch: 4 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:08,708-Speed 5587.44 samples/sec Loss 7.5364 LearningRate 0.0568 Epoch: 4 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:10,552-Speed 5553.29 samples/sec Loss 7.4023 LearningRate 0.0568 Epoch: 4 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:12:37,586-[lfw][28000]XNorm: 21.047332 Training: 2022-04-27 03:12:37,586-[lfw][28000]Accuracy-Flip: 0.99667+-0.00279 Training: 2022-04-27 03:12:37,587-[lfw][28000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:13:08,558-[cfp_fp][28000]XNorm: 18.570384 Training: 2022-04-27 03:13:08,558-[cfp_fp][28000]Accuracy-Flip: 0.92343+-0.01185 Training: 2022-04-27 03:13:08,559-[cfp_fp][28000]Accuracy-Highest: 0.93243 Training: 2022-04-27 03:13:35,348-[agedb_30][28000]XNorm: 20.969596 Training: 2022-04-27 03:13:35,349-[agedb_30][28000]Accuracy-Flip: 0.96883+-0.00778 Training: 2022-04-27 03:13:35,350-[agedb_30][28000]Accuracy-Highest: 0.96883 Training: 2022-04-27 03:13:37,194-Speed 118.19 samples/sec Loss 7.4707 LearningRate 0.0568 Epoch: 4 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:13:39,037-Speed 5558.65 samples/sec Loss 7.2823 LearningRate 0.0568 Epoch: 4 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:13:40,867-Speed 5599.80 samples/sec Loss 7.3509 LearningRate 0.0568 Epoch: 4 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:13:42,676-Speed 5663.51 samples/sec Loss 7.2962 LearningRate 0.0568 Epoch: 4 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:13:44,482-Speed 5673.83 samples/sec Loss 7.3249 LearningRate 0.0568 Epoch: 4 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:46,341-Speed 5511.76 samples/sec Loss 7.2162 LearningRate 0.0567 Epoch: 4 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:48,162-Speed 5626.64 samples/sec Loss 7.3043 LearningRate 0.0567 Epoch: 4 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:49,978-Speed 5642.31 samples/sec Loss 7.3400 LearningRate 0.0567 Epoch: 4 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:51,799-Speed 5626.48 samples/sec Loss 7.3729 LearningRate 0.0567 Epoch: 4 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:53,621-Speed 5625.09 samples/sec Loss 7.4800 LearningRate 0.0567 Epoch: 4 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:55,435-Speed 5646.93 samples/sec Loss 7.3901 LearningRate 0.0567 Epoch: 4 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:57,260-Speed 5616.62 samples/sec Loss 7.4090 LearningRate 0.0567 Epoch: 4 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:13:59,095-Speed 5582.61 samples/sec Loss 7.3481 LearningRate 0.0566 Epoch: 4 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:00,920-Speed 5614.71 samples/sec Loss 7.4330 LearningRate 0.0566 Epoch: 4 Global Step: 28140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:02,754-Speed 5584.59 samples/sec Loss 7.2727 LearningRate 0.0566 Epoch: 4 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:04,634-Speed 5450.33 samples/sec Loss 7.1587 LearningRate 0.0566 Epoch: 4 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:06,454-Speed 5630.01 samples/sec Loss 7.1924 LearningRate 0.0566 Epoch: 4 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:08,290-Speed 5580.00 samples/sec Loss 7.4075 LearningRate 0.0566 Epoch: 4 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:10,163-Speed 5471.02 samples/sec Loss 7.2027 LearningRate 0.0566 Epoch: 4 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:12,072-Speed 5366.62 samples/sec Loss 7.3481 LearningRate 0.0566 Epoch: 4 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:13,918-Speed 5550.19 samples/sec Loss 7.4129 LearningRate 0.0565 Epoch: 4 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:15,768-Speed 5537.48 samples/sec Loss 7.2081 LearningRate 0.0565 Epoch: 4 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:17,618-Speed 5540.59 samples/sec Loss 7.4543 LearningRate 0.0565 Epoch: 4 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:19,442-Speed 5614.91 samples/sec Loss 7.4613 LearningRate 0.0565 Epoch: 4 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:21,281-Speed 5575.20 samples/sec Loss 7.3356 LearningRate 0.0565 Epoch: 4 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:23,115-Speed 5589.90 samples/sec Loss 7.2787 LearningRate 0.0565 Epoch: 4 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:24,944-Speed 5599.32 samples/sec Loss 7.1830 LearningRate 0.0565 Epoch: 4 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:26,775-Speed 5595.31 samples/sec Loss 7.2396 LearningRate 0.0564 Epoch: 4 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:28,686-Speed 5360.86 samples/sec Loss 7.2367 LearningRate 0.0564 Epoch: 4 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:30,544-Speed 5514.56 samples/sec Loss 7.2248 LearningRate 0.0564 Epoch: 4 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:32,363-Speed 5631.07 samples/sec Loss 7.2430 LearningRate 0.0564 Epoch: 4 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:34,212-Speed 5542.14 samples/sec Loss 7.3521 LearningRate 0.0564 Epoch: 4 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:36,027-Speed 5647.14 samples/sec Loss 7.4476 LearningRate 0.0564 Epoch: 4 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:37,844-Speed 5637.03 samples/sec Loss 7.2961 LearningRate 0.0564 Epoch: 4 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:39,679-Speed 5582.46 samples/sec Loss 7.2690 LearningRate 0.0564 Epoch: 4 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:41,518-Speed 5572.53 samples/sec Loss 7.3434 LearningRate 0.0563 Epoch: 4 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:14:43,337-Speed 5634.58 samples/sec Loss 7.4459 LearningRate 0.0563 Epoch: 4 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:45,173-Speed 5578.01 samples/sec Loss 7.3349 LearningRate 0.0563 Epoch: 4 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:47,032-Speed 5511.43 samples/sec Loss 7.3305 LearningRate 0.0563 Epoch: 4 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:48,859-Speed 5609.00 samples/sec Loss 7.2245 LearningRate 0.0563 Epoch: 4 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:50,711-Speed 5531.25 samples/sec Loss 7.1992 LearningRate 0.0563 Epoch: 4 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:52,671-Speed 5228.36 samples/sec Loss 7.3783 LearningRate 0.0563 Epoch: 4 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:14:54,454-Speed 5746.10 samples/sec Loss 7.3105 LearningRate 0.0563 Epoch: 4 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:10,061-Speed 656.17 samples/sec Loss 6.7975 LearningRate 0.0562 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:11,928-Speed 5489.71 samples/sec Loss 6.6816 LearningRate 0.0562 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:13,753-Speed 5613.05 samples/sec Loss 6.8391 LearningRate 0.0562 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:15,606-Speed 5531.03 samples/sec Loss 6.7640 LearningRate 0.0562 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:17,500-Speed 5408.88 samples/sec Loss 6.6999 LearningRate 0.0562 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:19,331-Speed 5596.34 samples/sec Loss 6.6379 LearningRate 0.0562 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:21,179-Speed 5545.36 samples/sec Loss 6.6999 LearningRate 0.0562 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:23,013-Speed 5585.31 samples/sec Loss 6.7053 LearningRate 0.0561 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:24,827-Speed 5649.38 samples/sec Loss 6.7827 LearningRate 0.0561 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:26,665-Speed 5573.10 samples/sec Loss 6.8327 LearningRate 0.0561 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:28,484-Speed 5632.36 samples/sec Loss 6.8206 LearningRate 0.0561 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:30,290-Speed 5673.45 samples/sec Loss 6.7471 LearningRate 0.0561 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:32,106-Speed 5640.36 samples/sec Loss 6.7916 LearningRate 0.0561 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:33,945-Speed 5570.70 samples/sec Loss 6.7803 LearningRate 0.0561 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:35,749-Speed 5677.81 samples/sec Loss 6.7195 LearningRate 0.0561 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:37,567-Speed 5634.66 samples/sec Loss 6.7842 LearningRate 0.0560 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:39,394-Speed 5605.46 samples/sec Loss 6.7595 LearningRate 0.0560 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:41,213-Speed 5630.63 samples/sec Loss 6.8438 LearningRate 0.0560 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:43,033-Speed 5628.14 samples/sec Loss 6.6716 LearningRate 0.0560 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:44,849-Speed 5642.13 samples/sec Loss 6.9387 LearningRate 0.0560 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:46,657-Speed 5666.97 samples/sec Loss 6.8810 LearningRate 0.0560 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:48,470-Speed 5648.71 samples/sec Loss 6.8100 LearningRate 0.0560 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:50,287-Speed 5639.94 samples/sec Loss 6.9717 LearningRate 0.0559 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:15:52,096-Speed 5659.51 samples/sec Loss 6.8396 LearningRate 0.0559 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:53,912-Speed 5644.02 samples/sec Loss 6.8145 LearningRate 0.0559 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:55,715-Speed 5680.77 samples/sec Loss 6.8373 LearningRate 0.0559 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:57,518-Speed 5679.28 samples/sec Loss 7.0163 LearningRate 0.0559 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:15:59,338-Speed 5629.38 samples/sec Loss 6.9819 LearningRate 0.0559 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:01,160-Speed 5622.65 samples/sec Loss 6.9801 LearningRate 0.0559 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:02,975-Speed 5643.10 samples/sec Loss 6.9049 LearningRate 0.0559 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:04,809-Speed 5585.22 samples/sec Loss 6.8306 LearningRate 0.0558 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:06,626-Speed 5639.04 samples/sec Loss 6.9674 LearningRate 0.0558 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:08,458-Speed 5591.46 samples/sec Loss 6.8519 LearningRate 0.0558 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:10,292-Speed 5584.85 samples/sec Loss 6.9212 LearningRate 0.0558 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:12,119-Speed 5606.44 samples/sec Loss 6.8318 LearningRate 0.0558 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:13,943-Speed 5615.43 samples/sec Loss 6.9979 LearningRate 0.0558 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:15,784-Speed 5563.45 samples/sec Loss 6.9054 LearningRate 0.0558 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:17,616-Speed 5591.78 samples/sec Loss 7.0134 LearningRate 0.0557 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:19,453-Speed 5576.09 samples/sec Loss 7.0146 LearningRate 0.0557 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:21,252-Speed 5693.90 samples/sec Loss 7.1187 LearningRate 0.0557 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:23,070-Speed 5635.30 samples/sec Loss 6.9082 LearningRate 0.0557 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:24,882-Speed 5653.39 samples/sec Loss 7.1119 LearningRate 0.0557 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:26,705-Speed 5620.12 samples/sec Loss 6.9756 LearningRate 0.0557 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:28,521-Speed 5640.33 samples/sec Loss 6.9693 LearningRate 0.0557 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:30,332-Speed 5656.11 samples/sec Loss 6.9116 LearningRate 0.0557 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:32,150-Speed 5633.67 samples/sec Loss 6.9215 LearningRate 0.0556 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:34,065-Speed 5347.83 samples/sec Loss 6.9220 LearningRate 0.0556 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:35,917-Speed 5532.33 samples/sec Loss 6.9788 LearningRate 0.0556 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:37,797-Speed 5450.41 samples/sec Loss 7.0148 LearningRate 0.0556 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:39,693-Speed 5400.16 samples/sec Loss 7.0386 LearningRate 0.0556 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:41,527-Speed 5584.89 samples/sec Loss 7.2240 LearningRate 0.0556 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:43,380-Speed 5530.97 samples/sec Loss 7.1789 LearningRate 0.0556 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:45,231-Speed 5532.64 samples/sec Loss 6.9906 LearningRate 0.0556 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:47,139-Speed 5368.89 samples/sec Loss 6.9829 LearningRate 0.0555 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:48,964-Speed 5615.13 samples/sec Loss 7.0424 LearningRate 0.0555 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:50,778-Speed 5645.92 samples/sec Loss 6.9556 LearningRate 0.0555 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:16:52,580-Speed 5684.95 samples/sec Loss 7.0257 LearningRate 0.0555 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:54,399-Speed 5629.55 samples/sec Loss 7.0209 LearningRate 0.0555 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:56,215-Speed 5641.93 samples/sec Loss 6.9682 LearningRate 0.0555 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:58,026-Speed 5654.81 samples/sec Loss 7.1572 LearningRate 0.0555 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:16:59,843-Speed 5639.55 samples/sec Loss 7.0298 LearningRate 0.0554 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:01,705-Speed 5500.03 samples/sec Loss 7.1544 LearningRate 0.0554 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:03,534-Speed 5599.62 samples/sec Loss 6.9042 LearningRate 0.0554 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:05,341-Speed 5672.24 samples/sec Loss 7.1493 LearningRate 0.0554 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:07,177-Speed 5576.80 samples/sec Loss 6.9621 LearningRate 0.0554 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:09,020-Speed 5557.87 samples/sec Loss 6.8725 LearningRate 0.0554 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:10,841-Speed 5626.62 samples/sec Loss 6.9045 LearningRate 0.0554 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:12,653-Speed 5652.64 samples/sec Loss 7.0957 LearningRate 0.0554 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:14,476-Speed 5617.35 samples/sec Loss 7.0391 LearningRate 0.0553 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:16,290-Speed 5647.48 samples/sec Loss 7.1195 LearningRate 0.0553 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:18,121-Speed 5594.11 samples/sec Loss 7.0124 LearningRate 0.0553 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:19,940-Speed 5633.37 samples/sec Loss 7.1565 LearningRate 0.0553 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:21,763-Speed 5618.45 samples/sec Loss 7.0038 LearningRate 0.0553 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:23,574-Speed 5655.54 samples/sec Loss 7.0391 LearningRate 0.0553 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:25,386-Speed 5655.14 samples/sec Loss 7.0377 LearningRate 0.0553 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:27,193-Speed 5668.72 samples/sec Loss 7.1436 LearningRate 0.0553 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:29,035-Speed 5560.31 samples/sec Loss 7.0498 LearningRate 0.0552 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:30,868-Speed 5587.61 samples/sec Loss 7.1311 LearningRate 0.0552 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:32,688-Speed 5628.26 samples/sec Loss 7.0901 LearningRate 0.0552 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:34,522-Speed 5585.40 samples/sec Loss 6.9247 LearningRate 0.0552 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:36,344-Speed 5621.60 samples/sec Loss 7.0359 LearningRate 0.0552 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:38,150-Speed 5672.94 samples/sec Loss 6.9563 LearningRate 0.0552 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:39,982-Speed 5591.09 samples/sec Loss 6.9645 LearningRate 0.0552 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:41,803-Speed 5623.30 samples/sec Loss 7.0125 LearningRate 0.0551 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:43,620-Speed 5638.97 samples/sec Loss 7.1929 LearningRate 0.0551 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:17:45,424-Speed 5679.20 samples/sec Loss 6.9678 LearningRate 0.0551 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:47,246-Speed 5620.99 samples/sec Loss 7.0165 LearningRate 0.0551 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:49,059-Speed 5651.90 samples/sec Loss 6.9220 LearningRate 0.0551 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:50,890-Speed 5595.21 samples/sec Loss 7.1501 LearningRate 0.0551 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:52,717-Speed 5607.28 samples/sec Loss 6.9887 LearningRate 0.0551 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:54,536-Speed 5629.94 samples/sec Loss 7.0738 LearningRate 0.0551 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:56,349-Speed 5650.75 samples/sec Loss 7.0926 LearningRate 0.0550 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:58,174-Speed 5612.45 samples/sec Loss 7.1180 LearningRate 0.0550 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:17:59,992-Speed 5632.79 samples/sec Loss 6.9720 LearningRate 0.0550 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:01,905-Speed 5354.98 samples/sec Loss 7.0996 LearningRate 0.0550 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:03,723-Speed 5632.83 samples/sec Loss 7.2517 LearningRate 0.0550 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:05,533-Speed 5661.57 samples/sec Loss 7.1808 LearningRate 0.0550 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:07,347-Speed 5645.16 samples/sec Loss 6.9010 LearningRate 0.0550 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:09,162-Speed 5644.37 samples/sec Loss 6.9558 LearningRate 0.0550 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:10,983-Speed 5626.16 samples/sec Loss 7.0450 LearningRate 0.0549 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:12,829-Speed 5548.73 samples/sec Loss 6.9551 LearningRate 0.0549 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:14,667-Speed 5573.95 samples/sec Loss 7.1283 LearningRate 0.0549 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:16,512-Speed 5551.13 samples/sec Loss 6.9609 LearningRate 0.0549 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:18,360-Speed 5543.95 samples/sec Loss 7.0696 LearningRate 0.0549 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:20,177-Speed 5638.24 samples/sec Loss 7.1398 LearningRate 0.0549 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:22,004-Speed 5606.83 samples/sec Loss 7.0291 LearningRate 0.0549 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:23,830-Speed 5608.38 samples/sec Loss 7.1428 LearningRate 0.0548 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:25,656-Speed 5611.11 samples/sec Loss 7.0763 LearningRate 0.0548 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:27,521-Speed 5492.03 samples/sec Loss 7.1051 LearningRate 0.0548 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:29,349-Speed 5604.33 samples/sec Loss 7.0213 LearningRate 0.0548 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:31,171-Speed 5620.59 samples/sec Loss 7.1877 LearningRate 0.0548 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:32,990-Speed 5631.66 samples/sec Loss 7.0383 LearningRate 0.0548 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:18:34,825-Speed 5583.09 samples/sec Loss 7.1368 LearningRate 0.0548 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:36,641-Speed 5639.60 samples/sec Loss 7.1953 LearningRate 0.0548 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:38,448-Speed 5670.57 samples/sec Loss 7.1764 LearningRate 0.0547 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:40,270-Speed 5620.53 samples/sec Loss 6.9903 LearningRate 0.0547 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:42,102-Speed 5592.48 samples/sec Loss 7.1514 LearningRate 0.0547 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:43,921-Speed 5631.43 samples/sec Loss 7.2875 LearningRate 0.0547 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:45,754-Speed 5586.12 samples/sec Loss 7.0309 LearningRate 0.0547 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:47,571-Speed 5639.45 samples/sec Loss 6.9308 LearningRate 0.0547 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:49,398-Speed 5606.45 samples/sec Loss 7.1501 LearningRate 0.0547 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:51,259-Speed 5503.57 samples/sec Loss 7.1559 LearningRate 0.0547 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:53,127-Speed 5484.51 samples/sec Loss 7.1925 LearningRate 0.0546 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:54,996-Speed 5481.34 samples/sec Loss 7.1468 LearningRate 0.0546 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:56,820-Speed 5615.99 samples/sec Loss 7.1204 LearningRate 0.0546 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:18:58,640-Speed 5627.25 samples/sec Loss 7.1655 LearningRate 0.0546 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:00,459-Speed 5631.25 samples/sec Loss 7.0176 LearningRate 0.0546 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:02,288-Speed 5600.59 samples/sec Loss 7.1473 LearningRate 0.0546 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:04,119-Speed 5594.08 samples/sec Loss 6.9899 LearningRate 0.0546 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:05,951-Speed 5592.64 samples/sec Loss 7.1505 LearningRate 0.0545 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:07,785-Speed 5584.19 samples/sec Loss 7.1803 LearningRate 0.0545 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:09,622-Speed 5576.28 samples/sec Loss 7.0667 LearningRate 0.0545 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:11,460-Speed 5573.70 samples/sec Loss 7.0211 LearningRate 0.0545 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:13,279-Speed 5630.22 samples/sec Loss 7.0714 LearningRate 0.0545 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:15,106-Speed 5607.43 samples/sec Loss 7.2720 LearningRate 0.0545 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:16,921-Speed 5646.34 samples/sec Loss 7.1631 LearningRate 0.0545 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:18,738-Speed 5636.96 samples/sec Loss 7.1058 LearningRate 0.0545 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:20,544-Speed 5670.91 samples/sec Loss 7.0178 LearningRate 0.0544 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:22,356-Speed 5653.03 samples/sec Loss 7.1004 LearningRate 0.0544 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:24,164-Speed 5664.70 samples/sec Loss 7.0914 LearningRate 0.0544 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:25,990-Speed 5609.33 samples/sec Loss 7.1251 LearningRate 0.0544 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:27,799-Speed 5663.58 samples/sec Loss 7.0588 LearningRate 0.0544 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:29,624-Speed 5612.06 samples/sec Loss 7.0897 LearningRate 0.0544 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:31,438-Speed 5647.21 samples/sec Loss 7.0025 LearningRate 0.0544 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:33,258-Speed 5628.38 samples/sec Loss 7.1049 LearningRate 0.0544 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:35,074-Speed 5641.28 samples/sec Loss 7.1515 LearningRate 0.0543 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:36,889-Speed 5643.51 samples/sec Loss 6.9463 LearningRate 0.0543 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:19:38,703-Speed 5646.28 samples/sec Loss 7.2233 LearningRate 0.0543 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:40,509-Speed 5673.10 samples/sec Loss 7.1366 LearningRate 0.0543 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:42,325-Speed 5640.14 samples/sec Loss 7.0537 LearningRate 0.0543 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:44,136-Speed 5657.94 samples/sec Loss 7.0935 LearningRate 0.0543 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:46,023-Speed 5428.24 samples/sec Loss 6.9823 LearningRate 0.0543 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:47,880-Speed 5515.40 samples/sec Loss 7.2024 LearningRate 0.0542 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:49,705-Speed 5611.86 samples/sec Loss 7.0772 LearningRate 0.0542 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:51,556-Speed 5534.72 samples/sec Loss 7.0162 LearningRate 0.0542 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:53,373-Speed 5638.91 samples/sec Loss 7.0283 LearningRate 0.0542 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:19:55,210-Speed 5575.83 samples/sec Loss 7.2309 LearningRate 0.0542 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:20:21,450-[lfw][30000]XNorm: 22.914327 Training: 2022-04-27 03:20:21,451-[lfw][30000]Accuracy-Flip: 0.99700+-0.00323 Training: 2022-04-27 03:20:21,452-[lfw][30000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:20:52,786-[cfp_fp][30000]XNorm: 20.399378 Training: 2022-04-27 03:20:52,787-[cfp_fp][30000]Accuracy-Flip: 0.92700+-0.01357 Training: 2022-04-27 03:20:52,788-[cfp_fp][30000]Accuracy-Highest: 0.93243 Training: 2022-04-27 03:21:19,699-[agedb_30][30000]XNorm: 22.715611 Training: 2022-04-27 03:21:19,700-[agedb_30][30000]Accuracy-Flip: 0.96767+-0.00967 Training: 2022-04-27 03:21:19,700-[agedb_30][30000]Accuracy-Highest: 0.96883 Training: 2022-04-27 03:21:21,522-Speed 118.64 samples/sec Loss 7.0276 LearningRate 0.0542 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:23,364-Speed 5561.17 samples/sec Loss 7.1284 LearningRate 0.0542 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:25,180-Speed 5641.80 samples/sec Loss 7.0219 LearningRate 0.0542 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:27,016-Speed 5580.89 samples/sec Loss 7.0929 LearningRate 0.0541 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:28,852-Speed 5583.41 samples/sec Loss 7.0706 LearningRate 0.0541 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:30,684-Speed 5590.44 samples/sec Loss 7.1659 LearningRate 0.0541 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:32,509-Speed 5614.68 samples/sec Loss 7.2337 LearningRate 0.0541 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:34,346-Speed 5577.91 samples/sec Loss 7.0110 LearningRate 0.0541 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:36,167-Speed 5626.47 samples/sec Loss 7.2490 LearningRate 0.0541 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:37,999-Speed 5594.92 samples/sec Loss 7.0532 LearningRate 0.0541 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:39,845-Speed 5549.74 samples/sec Loss 7.1994 LearningRate 0.0541 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:21:41,669-Speed 5616.93 samples/sec Loss 7.1213 LearningRate 0.0540 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:43,511-Speed 5561.33 samples/sec Loss 7.0243 LearningRate 0.0540 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:45,360-Speed 5541.86 samples/sec Loss 7.0903 LearningRate 0.0540 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:47,181-Speed 5628.83 samples/sec Loss 6.9572 LearningRate 0.0540 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:49,038-Speed 5516.92 samples/sec Loss 7.1079 LearningRate 0.0540 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:50,850-Speed 5654.93 samples/sec Loss 7.0744 LearningRate 0.0540 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:52,682-Speed 5591.88 samples/sec Loss 7.2225 LearningRate 0.0540 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:54,539-Speed 5519.56 samples/sec Loss 7.1624 LearningRate 0.0540 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:56,392-Speed 5526.79 samples/sec Loss 7.1332 LearningRate 0.0539 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:21:58,255-Speed 5500.73 samples/sec Loss 6.9912 LearningRate 0.0539 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:00,099-Speed 5557.92 samples/sec Loss 7.2180 LearningRate 0.0539 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:01,913-Speed 5646.62 samples/sec Loss 7.0828 LearningRate 0.0539 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:03,717-Speed 5680.05 samples/sec Loss 6.9989 LearningRate 0.0539 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:05,550-Speed 5590.14 samples/sec Loss 7.1211 LearningRate 0.0539 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:07,383-Speed 5588.35 samples/sec Loss 7.1842 LearningRate 0.0539 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:09,216-Speed 5589.22 samples/sec Loss 7.1908 LearningRate 0.0538 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:11,053-Speed 5576.82 samples/sec Loss 7.0320 LearningRate 0.0538 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:12,921-Speed 5484.58 samples/sec Loss 7.0686 LearningRate 0.0538 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:14,762-Speed 5566.67 samples/sec Loss 6.8756 LearningRate 0.0538 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:16,595-Speed 5590.20 samples/sec Loss 7.0820 LearningRate 0.0538 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:18,443-Speed 5547.81 samples/sec Loss 7.1948 LearningRate 0.0538 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:20,254-Speed 5659.63 samples/sec Loss 7.0479 LearningRate 0.0538 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:22,095-Speed 5564.37 samples/sec Loss 7.3191 LearningRate 0.0538 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:23,936-Speed 5566.06 samples/sec Loss 7.1249 LearningRate 0.0537 Epoch: 5 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:25,762-Speed 5607.72 samples/sec Loss 7.1107 LearningRate 0.0537 Epoch: 5 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:27,583-Speed 5627.76 samples/sec Loss 7.0894 LearningRate 0.0537 Epoch: 5 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:29,410-Speed 5608.15 samples/sec Loss 7.1623 LearningRate 0.0537 Epoch: 5 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:31,219-Speed 5662.99 samples/sec Loss 7.0161 LearningRate 0.0537 Epoch: 5 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:33,033-Speed 5647.40 samples/sec Loss 6.9894 LearningRate 0.0537 Epoch: 5 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:34,878-Speed 5554.07 samples/sec Loss 7.1691 LearningRate 0.0537 Epoch: 5 Global Step: 30410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:36,693-Speed 5642.98 samples/sec Loss 7.0671 LearningRate 0.0537 Epoch: 5 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:38,554-Speed 5504.53 samples/sec Loss 7.2064 LearningRate 0.0536 Epoch: 5 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:40,400-Speed 5551.67 samples/sec Loss 7.0762 LearningRate 0.0536 Epoch: 5 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:42,250-Speed 5536.76 samples/sec Loss 6.8777 LearningRate 0.0536 Epoch: 5 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:44,103-Speed 5531.69 samples/sec Loss 7.0047 LearningRate 0.0536 Epoch: 5 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:45,927-Speed 5615.64 samples/sec Loss 7.1888 LearningRate 0.0536 Epoch: 5 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:47,764-Speed 5578.06 samples/sec Loss 7.0417 LearningRate 0.0536 Epoch: 5 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:49,607-Speed 5560.55 samples/sec Loss 6.9701 LearningRate 0.0536 Epoch: 5 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:51,450-Speed 5557.35 samples/sec Loss 7.0498 LearningRate 0.0536 Epoch: 5 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:53,322-Speed 5472.22 samples/sec Loss 7.1043 LearningRate 0.0535 Epoch: 5 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:22:55,162-Speed 5570.26 samples/sec Loss 7.1436 LearningRate 0.0535 Epoch: 5 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:56,986-Speed 5618.81 samples/sec Loss 7.1158 LearningRate 0.0535 Epoch: 5 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:22:58,827-Speed 5562.25 samples/sec Loss 7.2537 LearningRate 0.0535 Epoch: 5 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:00,666-Speed 5574.45 samples/sec Loss 7.0807 LearningRate 0.0535 Epoch: 5 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:02,483-Speed 5637.00 samples/sec Loss 7.0340 LearningRate 0.0535 Epoch: 5 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:04,339-Speed 5519.26 samples/sec Loss 6.9575 LearningRate 0.0535 Epoch: 5 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:06,167-Speed 5606.38 samples/sec Loss 7.2490 LearningRate 0.0534 Epoch: 5 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:07,990-Speed 5618.93 samples/sec Loss 7.0751 LearningRate 0.0534 Epoch: 5 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:09,831-Speed 5565.96 samples/sec Loss 7.1568 LearningRate 0.0534 Epoch: 5 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:11,657-Speed 5613.87 samples/sec Loss 7.1516 LearningRate 0.0534 Epoch: 5 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:13,522-Speed 5492.53 samples/sec Loss 7.1467 LearningRate 0.0534 Epoch: 5 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:15,353-Speed 5594.30 samples/sec Loss 6.9989 LearningRate 0.0534 Epoch: 5 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:17,175-Speed 5624.65 samples/sec Loss 7.0781 LearningRate 0.0534 Epoch: 5 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:18,999-Speed 5616.16 samples/sec Loss 6.9980 LearningRate 0.0534 Epoch: 5 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:20,828-Speed 5601.40 samples/sec Loss 6.9276 LearningRate 0.0533 Epoch: 5 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:22,683-Speed 5521.92 samples/sec Loss 7.2009 LearningRate 0.0533 Epoch: 5 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:24,523-Speed 5570.99 samples/sec Loss 7.0262 LearningRate 0.0533 Epoch: 5 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:26,363-Speed 5566.64 samples/sec Loss 7.0279 LearningRate 0.0533 Epoch: 5 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:28,200-Speed 5577.68 samples/sec Loss 7.0175 LearningRate 0.0533 Epoch: 5 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:30,064-Speed 5495.43 samples/sec Loss 7.0543 LearningRate 0.0533 Epoch: 5 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:31,908-Speed 5557.65 samples/sec Loss 6.9451 LearningRate 0.0533 Epoch: 5 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:33,752-Speed 5555.49 samples/sec Loss 6.9481 LearningRate 0.0533 Epoch: 5 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:35,577-Speed 5613.16 samples/sec Loss 6.8803 LearningRate 0.0532 Epoch: 5 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:37,420-Speed 5557.39 samples/sec Loss 7.0301 LearningRate 0.0532 Epoch: 5 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:39,251-Speed 5597.12 samples/sec Loss 7.0892 LearningRate 0.0532 Epoch: 5 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:41,085-Speed 5586.82 samples/sec Loss 7.2584 LearningRate 0.0532 Epoch: 5 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:42,915-Speed 5597.17 samples/sec Loss 7.0604 LearningRate 0.0532 Epoch: 5 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:44,782-Speed 5485.29 samples/sec Loss 6.9500 LearningRate 0.0532 Epoch: 5 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:46,619-Speed 5576.19 samples/sec Loss 7.0442 LearningRate 0.0532 Epoch: 5 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:48,475-Speed 5522.54 samples/sec Loss 6.9467 LearningRate 0.0532 Epoch: 5 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:50,305-Speed 5596.13 samples/sec Loss 7.1666 LearningRate 0.0531 Epoch: 5 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:52,154-Speed 5540.20 samples/sec Loss 7.0510 LearningRate 0.0531 Epoch: 5 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:23:53,956-Speed 5683.33 samples/sec Loss 6.9659 LearningRate 0.0531 Epoch: 5 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:55,787-Speed 5594.93 samples/sec Loss 7.0849 LearningRate 0.0531 Epoch: 5 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:57,606-Speed 5632.80 samples/sec Loss 7.1561 LearningRate 0.0531 Epoch: 5 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:23:59,432-Speed 5611.92 samples/sec Loss 6.9543 LearningRate 0.0531 Epoch: 5 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:01,244-Speed 5653.72 samples/sec Loss 7.0599 LearningRate 0.0531 Epoch: 5 Global Step: 30880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:03,089-Speed 5551.23 samples/sec Loss 6.9704 LearningRate 0.0531 Epoch: 5 Global Step: 30890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:04,936-Speed 5548.16 samples/sec Loss 6.8821 LearningRate 0.0530 Epoch: 5 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:06,773-Speed 5576.10 samples/sec Loss 7.1022 LearningRate 0.0530 Epoch: 5 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:08,612-Speed 5571.03 samples/sec Loss 7.0280 LearningRate 0.0530 Epoch: 5 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:10,435-Speed 5619.27 samples/sec Loss 6.9022 LearningRate 0.0530 Epoch: 5 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:12,261-Speed 5609.60 samples/sec Loss 6.9427 LearningRate 0.0530 Epoch: 5 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:24:14,086-Speed 5617.58 samples/sec Loss 7.0299 LearningRate 0.0530 Epoch: 5 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:15,928-Speed 5576.46 samples/sec Loss 7.1415 LearningRate 0.0530 Epoch: 5 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:17,784-Speed 5522.34 samples/sec Loss 7.1041 LearningRate 0.0529 Epoch: 5 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:19,621-Speed 5577.15 samples/sec Loss 6.9608 LearningRate 0.0529 Epoch: 5 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:21,454-Speed 5588.17 samples/sec Loss 7.0351 LearningRate 0.0529 Epoch: 5 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:23,297-Speed 5558.38 samples/sec Loss 6.9661 LearningRate 0.0529 Epoch: 5 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:25,138-Speed 5564.12 samples/sec Loss 6.8005 LearningRate 0.0529 Epoch: 5 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:26,993-Speed 5524.25 samples/sec Loss 6.9095 LearningRate 0.0529 Epoch: 5 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:28,851-Speed 5513.85 samples/sec Loss 7.1393 LearningRate 0.0529 Epoch: 5 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:30,691-Speed 5567.67 samples/sec Loss 7.0804 LearningRate 0.0529 Epoch: 5 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:32,527-Speed 5579.47 samples/sec Loss 7.0347 LearningRate 0.0528 Epoch: 5 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:24:34,383-Speed 5547.67 samples/sec Loss 6.8331 LearningRate 0.0528 Epoch: 5 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:24:36,236-Speed 5527.72 samples/sec Loss 7.1609 LearningRate 0.0528 Epoch: 5 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:24:38,103-Speed 5488.66 samples/sec Loss 7.0225 LearningRate 0.0528 Epoch: 5 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:24:39,932-Speed 5602.51 samples/sec Loss 6.8231 LearningRate 0.0528 Epoch: 5 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:41,783-Speed 5537.82 samples/sec Loss 6.9888 LearningRate 0.0528 Epoch: 5 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:43,604-Speed 5625.43 samples/sec Loss 6.9686 LearningRate 0.0528 Epoch: 5 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:45,438-Speed 5586.22 samples/sec Loss 7.0475 LearningRate 0.0528 Epoch: 5 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:47,269-Speed 5595.23 samples/sec Loss 6.9981 LearningRate 0.0527 Epoch: 5 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:49,145-Speed 5462.05 samples/sec Loss 6.9163 LearningRate 0.0527 Epoch: 5 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:51,071-Speed 5319.64 samples/sec Loss 6.9897 LearningRate 0.0527 Epoch: 5 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:52,994-Speed 5329.45 samples/sec Loss 7.1452 LearningRate 0.0527 Epoch: 5 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:54,918-Speed 5324.10 samples/sec Loss 7.0468 LearningRate 0.0527 Epoch: 5 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:56,842-Speed 5323.05 samples/sec Loss 7.0923 LearningRate 0.0527 Epoch: 5 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:24:58,768-Speed 5320.46 samples/sec Loss 6.9082 LearningRate 0.0527 Epoch: 5 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:25:00,695-Speed 5316.28 samples/sec Loss 7.0005 LearningRate 0.0527 Epoch: 5 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:25:02,623-Speed 5315.37 samples/sec Loss 7.1752 LearningRate 0.0526 Epoch: 5 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:25:04,484-Speed 5507.81 samples/sec Loss 7.0588 LearningRate 0.0526 Epoch: 5 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:25:06,311-Speed 5605.70 samples/sec Loss 7.1026 LearningRate 0.0526 Epoch: 5 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:08,148-Speed 5576.88 samples/sec Loss 7.0782 LearningRate 0.0526 Epoch: 5 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:09,982-Speed 5590.33 samples/sec Loss 7.0432 LearningRate 0.0526 Epoch: 5 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:11,814-Speed 5591.74 samples/sec Loss 7.0757 LearningRate 0.0526 Epoch: 5 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:13,681-Speed 5488.33 samples/sec Loss 6.9831 LearningRate 0.0526 Epoch: 5 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:15,514-Speed 5589.10 samples/sec Loss 7.2018 LearningRate 0.0526 Epoch: 5 Global Step: 31280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:17,388-Speed 5466.41 samples/sec Loss 7.1737 LearningRate 0.0525 Epoch: 5 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:19,230-Speed 5563.97 samples/sec Loss 7.0344 LearningRate 0.0525 Epoch: 5 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:21,062-Speed 5591.22 samples/sec Loss 7.0551 LearningRate 0.0525 Epoch: 5 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:22,891-Speed 5602.06 samples/sec Loss 6.8964 LearningRate 0.0525 Epoch: 5 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:24,687-Speed 5701.17 samples/sec Loss 6.8683 LearningRate 0.0525 Epoch: 5 Global Step: 31330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:26,525-Speed 5576.61 samples/sec Loss 6.8989 LearningRate 0.0525 Epoch: 5 Global Step: 31340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:28,387-Speed 5501.23 samples/sec Loss 6.9303 LearningRate 0.0525 Epoch: 5 Global Step: 31350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:30,213-Speed 5608.87 samples/sec Loss 6.8236 LearningRate 0.0525 Epoch: 5 Global Step: 31360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:32,032-Speed 5631.42 samples/sec Loss 6.9066 LearningRate 0.0524 Epoch: 5 Global Step: 31370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:33,845-Speed 5652.75 samples/sec Loss 7.0591 LearningRate 0.0524 Epoch: 5 Global Step: 31380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:35,655-Speed 5659.02 samples/sec Loss 7.1044 LearningRate 0.0524 Epoch: 5 Global Step: 31390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:37,500-Speed 5554.56 samples/sec Loss 6.8619 LearningRate 0.0524 Epoch: 5 Global Step: 31400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:39,353-Speed 5525.58 samples/sec Loss 6.8893 LearningRate 0.0524 Epoch: 5 Global Step: 31410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:41,187-Speed 5588.87 samples/sec Loss 6.8872 LearningRate 0.0524 Epoch: 5 Global Step: 31420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:25:43,014-Speed 5609.12 samples/sec Loss 7.0421 LearningRate 0.0524 Epoch: 5 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:44,878-Speed 5497.51 samples/sec Loss 7.0641 LearningRate 0.0523 Epoch: 5 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:46,726-Speed 5545.23 samples/sec Loss 6.9870 LearningRate 0.0523 Epoch: 5 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:48,554-Speed 5604.85 samples/sec Loss 6.9612 LearningRate 0.0523 Epoch: 5 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:50,384-Speed 5599.50 samples/sec Loss 6.8290 LearningRate 0.0523 Epoch: 5 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:52,206-Speed 5621.31 samples/sec Loss 7.0368 LearningRate 0.0523 Epoch: 5 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:54,052-Speed 5551.30 samples/sec Loss 6.9769 LearningRate 0.0523 Epoch: 5 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:55,917-Speed 5492.98 samples/sec Loss 7.0518 LearningRate 0.0523 Epoch: 5 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:57,768-Speed 5533.89 samples/sec Loss 6.9059 LearningRate 0.0523 Epoch: 5 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:25:59,629-Speed 5507.89 samples/sec Loss 6.9522 LearningRate 0.0522 Epoch: 5 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:01,490-Speed 5504.72 samples/sec Loss 6.8381 LearningRate 0.0522 Epoch: 5 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:03,339-Speed 5541.89 samples/sec Loss 7.1866 LearningRate 0.0522 Epoch: 5 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:05,188-Speed 5539.77 samples/sec Loss 6.9576 LearningRate 0.0522 Epoch: 5 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:07,021-Speed 5589.62 samples/sec Loss 6.9283 LearningRate 0.0522 Epoch: 5 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:08,854-Speed 5588.28 samples/sec Loss 6.9266 LearningRate 0.0522 Epoch: 5 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:10,669-Speed 5644.21 samples/sec Loss 6.9519 LearningRate 0.0522 Epoch: 5 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:12,482-Speed 5653.92 samples/sec Loss 6.9328 LearningRate 0.0522 Epoch: 5 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:14,310-Speed 5604.69 samples/sec Loss 6.9865 LearningRate 0.0521 Epoch: 5 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:16,162-Speed 5531.02 samples/sec Loss 6.9455 LearningRate 0.0521 Epoch: 5 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:17,995-Speed 5590.74 samples/sec Loss 6.9899 LearningRate 0.0521 Epoch: 5 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:19,825-Speed 5598.44 samples/sec Loss 7.1106 LearningRate 0.0521 Epoch: 5 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:21,653-Speed 5602.44 samples/sec Loss 6.8541 LearningRate 0.0521 Epoch: 5 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:23,478-Speed 5616.12 samples/sec Loss 6.9356 LearningRate 0.0521 Epoch: 5 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:25,314-Speed 5580.57 samples/sec Loss 6.9712 LearningRate 0.0521 Epoch: 5 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:27,170-Speed 5518.07 samples/sec Loss 6.9075 LearningRate 0.0521 Epoch: 5 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:28,989-Speed 5633.24 samples/sec Loss 7.0926 LearningRate 0.0520 Epoch: 5 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:26:30,827-Speed 5574.35 samples/sec Loss 6.9031 LearningRate 0.0520 Epoch: 5 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:32,654-Speed 5608.33 samples/sec Loss 6.8883 LearningRate 0.0520 Epoch: 5 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:34,470-Speed 5642.66 samples/sec Loss 7.0947 LearningRate 0.0520 Epoch: 5 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:36,294-Speed 5617.42 samples/sec Loss 7.0364 LearningRate 0.0520 Epoch: 5 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:38,129-Speed 5582.23 samples/sec Loss 7.0069 LearningRate 0.0520 Epoch: 5 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:39,946-Speed 5638.04 samples/sec Loss 6.8681 LearningRate 0.0520 Epoch: 5 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:41,791-Speed 5553.57 samples/sec Loss 7.0663 LearningRate 0.0520 Epoch: 5 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:43,620-Speed 5601.18 samples/sec Loss 6.9268 LearningRate 0.0519 Epoch: 5 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:45,443-Speed 5619.98 samples/sec Loss 7.0019 LearningRate 0.0519 Epoch: 5 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:47,276-Speed 5589.06 samples/sec Loss 6.9366 LearningRate 0.0519 Epoch: 5 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:49,096-Speed 5627.40 samples/sec Loss 6.9106 LearningRate 0.0519 Epoch: 5 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:50,944-Speed 5542.47 samples/sec Loss 6.9494 LearningRate 0.0519 Epoch: 5 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:52,774-Speed 5597.40 samples/sec Loss 7.0066 LearningRate 0.0519 Epoch: 5 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:54,628-Speed 5527.69 samples/sec Loss 6.9736 LearningRate 0.0519 Epoch: 5 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:56,465-Speed 5575.53 samples/sec Loss 7.1330 LearningRate 0.0519 Epoch: 5 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:26:58,321-Speed 5522.61 samples/sec Loss 6.9117 LearningRate 0.0518 Epoch: 5 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:27:00,138-Speed 5637.21 samples/sec Loss 6.9682 LearningRate 0.0518 Epoch: 5 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:01,967-Speed 5602.41 samples/sec Loss 6.7647 LearningRate 0.0518 Epoch: 5 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:03,791-Speed 5616.47 samples/sec Loss 6.9388 LearningRate 0.0518 Epoch: 5 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:05,620-Speed 5600.78 samples/sec Loss 6.9294 LearningRate 0.0518 Epoch: 5 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:07,459-Speed 5570.41 samples/sec Loss 6.9610 LearningRate 0.0518 Epoch: 5 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:09,304-Speed 5554.13 samples/sec Loss 7.0137 LearningRate 0.0518 Epoch: 5 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:11,177-Speed 5470.36 samples/sec Loss 6.9715 LearningRate 0.0518 Epoch: 5 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:13,066-Speed 5420.97 samples/sec Loss 7.0002 LearningRate 0.0517 Epoch: 5 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:14,948-Speed 5445.88 samples/sec Loss 7.1102 LearningRate 0.0517 Epoch: 5 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:16,780-Speed 5591.50 samples/sec Loss 6.9682 LearningRate 0.0517 Epoch: 5 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:18,625-Speed 5553.32 samples/sec Loss 7.0530 LearningRate 0.0517 Epoch: 5 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:27:20,463-Speed 5572.39 samples/sec Loss 6.9556 LearningRate 0.0517 Epoch: 5 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:27:22,279-Speed 5642.52 samples/sec Loss 6.9600 LearningRate 0.0517 Epoch: 5 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:24,115-Speed 5579.35 samples/sec Loss 6.9893 LearningRate 0.0517 Epoch: 5 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:25,937-Speed 5624.87 samples/sec Loss 7.0177 LearningRate 0.0517 Epoch: 5 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:27,813-Speed 5462.06 samples/sec Loss 7.0371 LearningRate 0.0516 Epoch: 5 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:27:57,109-[lfw][32000]XNorm: 22.176498 Training: 2022-04-27 03:27:57,110-[lfw][32000]Accuracy-Flip: 0.99650+-0.00345 Training: 2022-04-27 03:27:57,110-[lfw][32000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:28:28,314-[cfp_fp][32000]XNorm: 19.174454 Training: 2022-04-27 03:28:28,315-[cfp_fp][32000]Accuracy-Flip: 0.94771+-0.01044 Training: 2022-04-27 03:28:28,316-[cfp_fp][32000]Accuracy-Highest: 0.94771 Training: 2022-04-27 03:28:56,109-[agedb_30][32000]XNorm: 21.835893 Training: 2022-04-27 03:28:56,109-[agedb_30][32000]Accuracy-Flip: 0.96683+-0.00953 Training: 2022-04-27 03:28:56,110-[agedb_30][32000]Accuracy-Highest: 0.96883 Training: 2022-04-27 03:28:57,962-Speed 113.59 samples/sec Loss 6.9282 LearningRate 0.0516 Epoch: 5 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:28:59,806-Speed 5559.12 samples/sec Loss 6.8021 LearningRate 0.0516 Epoch: 5 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:01,629-Speed 5620.44 samples/sec Loss 6.9622 LearningRate 0.0516 Epoch: 5 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:03,465-Speed 5581.08 samples/sec Loss 6.9314 LearningRate 0.0516 Epoch: 5 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:05,275-Speed 5662.25 samples/sec Loss 6.8034 LearningRate 0.0516 Epoch: 5 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:07,105-Speed 5597.17 samples/sec Loss 6.9730 LearningRate 0.0516 Epoch: 5 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:08,940-Speed 5585.01 samples/sec Loss 7.0174 LearningRate 0.0516 Epoch: 5 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:10,781-Speed 5565.18 samples/sec Loss 6.9010 LearningRate 0.0515 Epoch: 5 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:12,594-Speed 5653.40 samples/sec Loss 7.0084 LearningRate 0.0515 Epoch: 5 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:14,434-Speed 5566.68 samples/sec Loss 7.0044 LearningRate 0.0515 Epoch: 5 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:16,275-Speed 5566.49 samples/sec Loss 6.7955 LearningRate 0.0515 Epoch: 5 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:18,108-Speed 5588.97 samples/sec Loss 6.7964 LearningRate 0.0515 Epoch: 5 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:19,936-Speed 5606.90 samples/sec Loss 6.9159 LearningRate 0.0515 Epoch: 5 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:21,815-Speed 5450.33 samples/sec Loss 7.0787 LearningRate 0.0515 Epoch: 5 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:23,656-Speed 5567.54 samples/sec Loss 6.8884 LearningRate 0.0515 Epoch: 5 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:25,492-Speed 5577.89 samples/sec Loss 7.0705 LearningRate 0.0514 Epoch: 5 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:27,390-Speed 5400.43 samples/sec Loss 7.1282 LearningRate 0.0514 Epoch: 5 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:29,220-Speed 5599.78 samples/sec Loss 6.9864 LearningRate 0.0514 Epoch: 5 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:31,065-Speed 5552.90 samples/sec Loss 7.0177 LearningRate 0.0514 Epoch: 5 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:32,896-Speed 5593.88 samples/sec Loss 6.9977 LearningRate 0.0514 Epoch: 5 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:34,725-Speed 5605.40 samples/sec Loss 7.1232 LearningRate 0.0514 Epoch: 5 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:36,585-Speed 5506.85 samples/sec Loss 6.9478 LearningRate 0.0514 Epoch: 5 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:38,409-Speed 5618.50 samples/sec Loss 7.0855 LearningRate 0.0513 Epoch: 5 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:40,238-Speed 5601.23 samples/sec Loss 6.8555 LearningRate 0.0513 Epoch: 5 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:42,067-Speed 5601.36 samples/sec Loss 6.9620 LearningRate 0.0513 Epoch: 5 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:43,880-Speed 5652.47 samples/sec Loss 6.9331 LearningRate 0.0513 Epoch: 5 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:45,728-Speed 5544.32 samples/sec Loss 6.9586 LearningRate 0.0513 Epoch: 5 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:47,578-Speed 5539.55 samples/sec Loss 6.9261 LearningRate 0.0513 Epoch: 5 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:49,394-Speed 5640.83 samples/sec Loss 6.9663 LearningRate 0.0513 Epoch: 5 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:29:51,228-Speed 5584.22 samples/sec Loss 6.9369 LearningRate 0.0513 Epoch: 5 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:53,062-Speed 5587.22 samples/sec Loss 6.8958 LearningRate 0.0512 Epoch: 5 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:54,890-Speed 5604.36 samples/sec Loss 7.0311 LearningRate 0.0512 Epoch: 5 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:56,716-Speed 5612.91 samples/sec Loss 6.9680 LearningRate 0.0512 Epoch: 5 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:29:58,567-Speed 5535.56 samples/sec Loss 6.9349 LearningRate 0.0512 Epoch: 5 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:00,413-Speed 5548.79 samples/sec Loss 6.9415 LearningRate 0.0512 Epoch: 5 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:02,304-Speed 5418.92 samples/sec Loss 6.8846 LearningRate 0.0512 Epoch: 5 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:04,184-Speed 5450.04 samples/sec Loss 7.0452 LearningRate 0.0512 Epoch: 5 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:06,022-Speed 5574.32 samples/sec Loss 6.7893 LearningRate 0.0512 Epoch: 5 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:07,850-Speed 5605.12 samples/sec Loss 6.9299 LearningRate 0.0511 Epoch: 5 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:09,721-Speed 5475.83 samples/sec Loss 6.9581 LearningRate 0.0511 Epoch: 5 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:11,568-Speed 5545.47 samples/sec Loss 7.0490 LearningRate 0.0511 Epoch: 5 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:13,422-Speed 5527.76 samples/sec Loss 7.0552 LearningRate 0.0511 Epoch: 5 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:15,243-Speed 5624.28 samples/sec Loss 6.9958 LearningRate 0.0511 Epoch: 5 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:17,075-Speed 5593.50 samples/sec Loss 6.7806 LearningRate 0.0511 Epoch: 5 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:18,920-Speed 5556.23 samples/sec Loss 6.8027 LearningRate 0.0511 Epoch: 5 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:20,734-Speed 5646.85 samples/sec Loss 6.9429 LearningRate 0.0511 Epoch: 5 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:22,558-Speed 5617.44 samples/sec Loss 6.8233 LearningRate 0.0510 Epoch: 5 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:24,386-Speed 5605.32 samples/sec Loss 6.9214 LearningRate 0.0510 Epoch: 5 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:26,216-Speed 5597.04 samples/sec Loss 6.8502 LearningRate 0.0510 Epoch: 5 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:28,038-Speed 5626.60 samples/sec Loss 6.8860 LearningRate 0.0510 Epoch: 5 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:29,868-Speed 5598.05 samples/sec Loss 7.0676 LearningRate 0.0510 Epoch: 5 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:31,708-Speed 5566.98 samples/sec Loss 7.0472 LearningRate 0.0510 Epoch: 5 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:33,549-Speed 5565.73 samples/sec Loss 6.8661 LearningRate 0.0510 Epoch: 5 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:35,367-Speed 5635.48 samples/sec Loss 6.8273 LearningRate 0.0510 Epoch: 5 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:37,199-Speed 5594.69 samples/sec Loss 6.7623 LearningRate 0.0509 Epoch: 5 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:30:39,019-Speed 5630.50 samples/sec Loss 6.9342 LearningRate 0.0509 Epoch: 5 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:40,859-Speed 5566.21 samples/sec Loss 6.8998 LearningRate 0.0509 Epoch: 5 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:42,705-Speed 5552.86 samples/sec Loss 6.9057 LearningRate 0.0509 Epoch: 5 Global Step: 32580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:44,556-Speed 5533.48 samples/sec Loss 6.9541 LearningRate 0.0509 Epoch: 5 Global Step: 32590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:46,420-Speed 5498.00 samples/sec Loss 6.7134 LearningRate 0.0509 Epoch: 5 Global Step: 32600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:48,251-Speed 5595.59 samples/sec Loss 6.9505 LearningRate 0.0509 Epoch: 5 Global Step: 32610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:50,060-Speed 5662.86 samples/sec Loss 7.0860 LearningRate 0.0509 Epoch: 5 Global Step: 32620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:51,904-Speed 5557.96 samples/sec Loss 6.9327 LearningRate 0.0508 Epoch: 5 Global Step: 32630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:53,734-Speed 5597.05 samples/sec Loss 6.8654 LearningRate 0.0508 Epoch: 5 Global Step: 32640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:55,550-Speed 5641.99 samples/sec Loss 6.9587 LearningRate 0.0508 Epoch: 5 Global Step: 32650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:57,372-Speed 5624.30 samples/sec Loss 6.8148 LearningRate 0.0508 Epoch: 5 Global Step: 32660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:30:59,185-Speed 5650.87 samples/sec Loss 6.9513 LearningRate 0.0508 Epoch: 5 Global Step: 32670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:01,004-Speed 5632.82 samples/sec Loss 6.9781 LearningRate 0.0508 Epoch: 5 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:02,833-Speed 5601.69 samples/sec Loss 6.8264 LearningRate 0.0508 Epoch: 5 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:04,658-Speed 5614.67 samples/sec Loss 7.0216 LearningRate 0.0508 Epoch: 5 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:06,494-Speed 5578.97 samples/sec Loss 7.0430 LearningRate 0.0507 Epoch: 5 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:08,320-Speed 5613.27 samples/sec Loss 6.7838 LearningRate 0.0507 Epoch: 5 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:10,173-Speed 5526.28 samples/sec Loss 6.9564 LearningRate 0.0507 Epoch: 5 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:12,002-Speed 5602.83 samples/sec Loss 6.8672 LearningRate 0.0507 Epoch: 5 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:13,843-Speed 5564.67 samples/sec Loss 6.9400 LearningRate 0.0507 Epoch: 5 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:15,684-Speed 5567.22 samples/sec Loss 6.8286 LearningRate 0.0507 Epoch: 5 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:17,501-Speed 5638.82 samples/sec Loss 6.8689 LearningRate 0.0507 Epoch: 5 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:19,334-Speed 5590.30 samples/sec Loss 6.9951 LearningRate 0.0507 Epoch: 5 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:21,165-Speed 5597.33 samples/sec Loss 6.8005 LearningRate 0.0506 Epoch: 5 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:22,999-Speed 5584.98 samples/sec Loss 6.8706 LearningRate 0.0506 Epoch: 5 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:24,848-Speed 5542.46 samples/sec Loss 6.8445 LearningRate 0.0506 Epoch: 5 Global Step: 32810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:26,699-Speed 5536.38 samples/sec Loss 6.8758 LearningRate 0.0506 Epoch: 5 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:28,530-Speed 5595.74 samples/sec Loss 6.8243 LearningRate 0.0506 Epoch: 5 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:30,360-Speed 5597.01 samples/sec Loss 6.8574 LearningRate 0.0506 Epoch: 5 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:32,177-Speed 5642.69 samples/sec Loss 6.8706 LearningRate 0.0506 Epoch: 5 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:33,999-Speed 5622.23 samples/sec Loss 6.9326 LearningRate 0.0506 Epoch: 5 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:35,832-Speed 5590.61 samples/sec Loss 6.8412 LearningRate 0.0505 Epoch: 5 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:37,667-Speed 5582.13 samples/sec Loss 6.7665 LearningRate 0.0505 Epoch: 5 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:39,502-Speed 5584.82 samples/sec Loss 6.9803 LearningRate 0.0505 Epoch: 5 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:41,334-Speed 5591.57 samples/sec Loss 6.9593 LearningRate 0.0505 Epoch: 5 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:43,187-Speed 5529.76 samples/sec Loss 6.8236 LearningRate 0.0505 Epoch: 5 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:45,011-Speed 5614.96 samples/sec Loss 6.9051 LearningRate 0.0505 Epoch: 5 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:46,838-Speed 5609.46 samples/sec Loss 6.8946 LearningRate 0.0505 Epoch: 5 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:48,664-Speed 5612.34 samples/sec Loss 6.8989 LearningRate 0.0505 Epoch: 5 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:50,497-Speed 5590.06 samples/sec Loss 6.7450 LearningRate 0.0504 Epoch: 5 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:31:52,328-Speed 5594.02 samples/sec Loss 7.0546 LearningRate 0.0504 Epoch: 5 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:54,162-Speed 5587.47 samples/sec Loss 6.8494 LearningRate 0.0504 Epoch: 5 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:55,986-Speed 5614.52 samples/sec Loss 6.9051 LearningRate 0.0504 Epoch: 5 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:57,825-Speed 5574.80 samples/sec Loss 7.0230 LearningRate 0.0504 Epoch: 5 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:31:59,688-Speed 5499.47 samples/sec Loss 6.7334 LearningRate 0.0504 Epoch: 5 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:01,520-Speed 5590.83 samples/sec Loss 6.7985 LearningRate 0.0504 Epoch: 5 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:03,343-Speed 5621.83 samples/sec Loss 6.8676 LearningRate 0.0504 Epoch: 5 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:05,166-Speed 5619.51 samples/sec Loss 6.8652 LearningRate 0.0503 Epoch: 5 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:06,980-Speed 5650.75 samples/sec Loss 6.8374 LearningRate 0.0503 Epoch: 5 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:08,821-Speed 5561.25 samples/sec Loss 6.8425 LearningRate 0.0503 Epoch: 5 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:10,654-Speed 5592.17 samples/sec Loss 6.8955 LearningRate 0.0503 Epoch: 5 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:12,490-Speed 5577.06 samples/sec Loss 6.8684 LearningRate 0.0503 Epoch: 5 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:14,341-Speed 5534.68 samples/sec Loss 6.8826 LearningRate 0.0503 Epoch: 5 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:16,172-Speed 5597.39 samples/sec Loss 7.0700 LearningRate 0.0503 Epoch: 5 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:18,005-Speed 5589.07 samples/sec Loss 6.7896 LearningRate 0.0503 Epoch: 5 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:19,839-Speed 5586.80 samples/sec Loss 7.0187 LearningRate 0.0502 Epoch: 5 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:21,669-Speed 5598.37 samples/sec Loss 6.6778 LearningRate 0.0502 Epoch: 5 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:23,524-Speed 5522.86 samples/sec Loss 6.8910 LearningRate 0.0502 Epoch: 5 Global Step: 33130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:25,375-Speed 5536.14 samples/sec Loss 6.7918 LearningRate 0.0502 Epoch: 5 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:27,202-Speed 5607.05 samples/sec Loss 6.8280 LearningRate 0.0502 Epoch: 5 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:29,070-Speed 5486.43 samples/sec Loss 6.8364 LearningRate 0.0502 Epoch: 5 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:30,930-Speed 5510.49 samples/sec Loss 6.8739 LearningRate 0.0502 Epoch: 5 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:32,763-Speed 5589.13 samples/sec Loss 6.7009 LearningRate 0.0502 Epoch: 5 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:34,600-Speed 5577.63 samples/sec Loss 6.8691 LearningRate 0.0501 Epoch: 5 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:36,433-Speed 5589.96 samples/sec Loss 6.9075 LearningRate 0.0501 Epoch: 5 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:38,327-Speed 5408.61 samples/sec Loss 6.8142 LearningRate 0.0501 Epoch: 5 Global Step: 33210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:40,160-Speed 5589.56 samples/sec Loss 6.9315 LearningRate 0.0501 Epoch: 5 Global Step: 33220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:41,982-Speed 5624.48 samples/sec Loss 6.8491 LearningRate 0.0501 Epoch: 5 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:32:43,810-Speed 5604.07 samples/sec Loss 6.7781 LearningRate 0.0501 Epoch: 5 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:45,679-Speed 5479.48 samples/sec Loss 6.7029 LearningRate 0.0501 Epoch: 5 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:47,564-Speed 5437.40 samples/sec Loss 6.9643 LearningRate 0.0501 Epoch: 5 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:49,406-Speed 5561.26 samples/sec Loss 6.7780 LearningRate 0.0500 Epoch: 5 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:51,243-Speed 5578.22 samples/sec Loss 6.9005 LearningRate 0.0500 Epoch: 5 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:53,103-Speed 5507.41 samples/sec Loss 6.6649 LearningRate 0.0500 Epoch: 5 Global Step: 33290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:54,944-Speed 5563.33 samples/sec Loss 6.8121 LearningRate 0.0500 Epoch: 5 Global Step: 33300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:56,775-Speed 5594.23 samples/sec Loss 6.7685 LearningRate 0.0500 Epoch: 5 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:32:58,599-Speed 5617.42 samples/sec Loss 7.0180 LearningRate 0.0500 Epoch: 5 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:00,456-Speed 5517.10 samples/sec Loss 6.7561 LearningRate 0.0500 Epoch: 5 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:02,280-Speed 5619.08 samples/sec Loss 6.8326 LearningRate 0.0500 Epoch: 5 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:04,113-Speed 5586.90 samples/sec Loss 6.8750 LearningRate 0.0499 Epoch: 5 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:05,980-Speed 5490.42 samples/sec Loss 6.7364 LearningRate 0.0499 Epoch: 5 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:07,803-Speed 5619.60 samples/sec Loss 6.8270 LearningRate 0.0499 Epoch: 5 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:09,638-Speed 5584.66 samples/sec Loss 6.9194 LearningRate 0.0499 Epoch: 5 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:11,454-Speed 5641.78 samples/sec Loss 6.7643 LearningRate 0.0499 Epoch: 5 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:13,269-Speed 5645.91 samples/sec Loss 6.8597 LearningRate 0.0499 Epoch: 5 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:15,090-Speed 5626.13 samples/sec Loss 6.8121 LearningRate 0.0499 Epoch: 5 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:16,928-Speed 5573.37 samples/sec Loss 6.8481 LearningRate 0.0499 Epoch: 5 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:18,747-Speed 5632.57 samples/sec Loss 6.8592 LearningRate 0.0498 Epoch: 5 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:20,552-Speed 5676.33 samples/sec Loss 6.8001 LearningRate 0.0498 Epoch: 5 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:22,373-Speed 5625.80 samples/sec Loss 6.9341 LearningRate 0.0498 Epoch: 5 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:24,200-Speed 5606.64 samples/sec Loss 6.8500 LearningRate 0.0498 Epoch: 5 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:26,024-Speed 5618.49 samples/sec Loss 6.7992 LearningRate 0.0498 Epoch: 5 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:27,854-Speed 5597.52 samples/sec Loss 6.8568 LearningRate 0.0498 Epoch: 5 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:29,673-Speed 5633.37 samples/sec Loss 6.8172 LearningRate 0.0498 Epoch: 5 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:31,493-Speed 5630.17 samples/sec Loss 6.9347 LearningRate 0.0498 Epoch: 5 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:33,310-Speed 5638.96 samples/sec Loss 6.9162 LearningRate 0.0497 Epoch: 5 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:35,140-Speed 5597.25 samples/sec Loss 6.8513 LearningRate 0.0497 Epoch: 5 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:36,971-Speed 5596.84 samples/sec Loss 6.7930 LearningRate 0.0497 Epoch: 5 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:38,782-Speed 5658.33 samples/sec Loss 7.0006 LearningRate 0.0497 Epoch: 5 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:40,609-Speed 5605.14 samples/sec Loss 6.8002 LearningRate 0.0497 Epoch: 5 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:33:42,468-Speed 5514.43 samples/sec Loss 6.9148 LearningRate 0.0497 Epoch: 5 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:44,301-Speed 5591.52 samples/sec Loss 6.6762 LearningRate 0.0497 Epoch: 5 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:46,131-Speed 5595.03 samples/sec Loss 6.9138 LearningRate 0.0497 Epoch: 5 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:47,956-Speed 5615.20 samples/sec Loss 6.8668 LearningRate 0.0496 Epoch: 5 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:49,796-Speed 5568.36 samples/sec Loss 6.7486 LearningRate 0.0496 Epoch: 5 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:51,627-Speed 5596.57 samples/sec Loss 6.7421 LearningRate 0.0496 Epoch: 5 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:53,461-Speed 5586.77 samples/sec Loss 6.7165 LearningRate 0.0496 Epoch: 5 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:55,297-Speed 5580.69 samples/sec Loss 6.7471 LearningRate 0.0496 Epoch: 5 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:57,120-Speed 5617.81 samples/sec Loss 6.9063 LearningRate 0.0496 Epoch: 5 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:33:58,949-Speed 5603.24 samples/sec Loss 6.9459 LearningRate 0.0496 Epoch: 5 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:34:00,782-Speed 5587.62 samples/sec Loss 6.8556 LearningRate 0.0496 Epoch: 5 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:34:02,621-Speed 5572.55 samples/sec Loss 6.7695 LearningRate 0.0496 Epoch: 5 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:34:04,446-Speed 5612.54 samples/sec Loss 6.8822 LearningRate 0.0495 Epoch: 5 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:34:06,271-Speed 5613.30 samples/sec Loss 6.7921 LearningRate 0.0495 Epoch: 5 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:34:08,082-Speed 5659.06 samples/sec Loss 6.8228 LearningRate 0.0495 Epoch: 5 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:09,911-Speed 5600.46 samples/sec Loss 6.7327 LearningRate 0.0495 Epoch: 5 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:11,761-Speed 5539.81 samples/sec Loss 6.7076 LearningRate 0.0495 Epoch: 5 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:13,599-Speed 5573.74 samples/sec Loss 6.8318 LearningRate 0.0495 Epoch: 5 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:15,442-Speed 5558.22 samples/sec Loss 6.8340 LearningRate 0.0495 Epoch: 5 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:17,282-Speed 5569.95 samples/sec Loss 6.6421 LearningRate 0.0495 Epoch: 5 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:19,115-Speed 5588.25 samples/sec Loss 6.6788 LearningRate 0.0494 Epoch: 5 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:20,949-Speed 5586.59 samples/sec Loss 6.6766 LearningRate 0.0494 Epoch: 5 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:22,795-Speed 5552.57 samples/sec Loss 6.7142 LearningRate 0.0494 Epoch: 5 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:24,608-Speed 5648.57 samples/sec Loss 6.8183 LearningRate 0.0494 Epoch: 5 Global Step: 33790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:26,445-Speed 5579.80 samples/sec Loss 6.9685 LearningRate 0.0494 Epoch: 5 Global Step: 33800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:28,299-Speed 5526.64 samples/sec Loss 6.8192 LearningRate 0.0494 Epoch: 5 Global Step: 33810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:30,138-Speed 5570.16 samples/sec Loss 6.7602 LearningRate 0.0494 Epoch: 5 Global Step: 33820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:31,990-Speed 5531.71 samples/sec Loss 6.8460 LearningRate 0.0494 Epoch: 5 Global Step: 33830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:33,830-Speed 5566.34 samples/sec Loss 6.7463 LearningRate 0.0493 Epoch: 5 Global Step: 33840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:35,692-Speed 5503.69 samples/sec Loss 6.7605 LearningRate 0.0493 Epoch: 5 Global Step: 33850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:37,534-Speed 5561.86 samples/sec Loss 6.6743 LearningRate 0.0493 Epoch: 5 Global Step: 33860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:39,356-Speed 5623.74 samples/sec Loss 6.7215 LearningRate 0.0493 Epoch: 5 Global Step: 33870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:41,171-Speed 5643.40 samples/sec Loss 6.8630 LearningRate 0.0493 Epoch: 5 Global Step: 33880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:34:42,989-Speed 5634.73 samples/sec Loss 6.9127 LearningRate 0.0493 Epoch: 5 Global Step: 33890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:44,833-Speed 5555.80 samples/sec Loss 6.8289 LearningRate 0.0493 Epoch: 5 Global Step: 33900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:46,739-Speed 5376.63 samples/sec Loss 6.8299 LearningRate 0.0493 Epoch: 5 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:48,587-Speed 5546.58 samples/sec Loss 6.9689 LearningRate 0.0492 Epoch: 5 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:50,431-Speed 5555.20 samples/sec Loss 6.8437 LearningRate 0.0492 Epoch: 5 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:52,339-Speed 5372.30 samples/sec Loss 6.8548 LearningRate 0.0492 Epoch: 5 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:54,201-Speed 5501.68 samples/sec Loss 6.8533 LearningRate 0.0492 Epoch: 5 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:56,046-Speed 5551.61 samples/sec Loss 6.7817 LearningRate 0.0492 Epoch: 5 Global Step: 33960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:57,883-Speed 5579.44 samples/sec Loss 6.7234 LearningRate 0.0492 Epoch: 5 Global Step: 33970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:34:59,731-Speed 5544.62 samples/sec Loss 6.8584 LearningRate 0.0492 Epoch: 5 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:35:01,569-Speed 5571.91 samples/sec Loss 6.9021 LearningRate 0.0492 Epoch: 5 Global Step: 33990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:35:03,384-Speed 5645.57 samples/sec Loss 6.8291 LearningRate 0.0491 Epoch: 5 Global Step: 34000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:35:29,941-[lfw][34000]XNorm: 22.483673 Training: 2022-04-27 03:35:29,942-[lfw][34000]Accuracy-Flip: 0.99650+-0.00293 Training: 2022-04-27 03:35:29,942-[lfw][34000]Accuracy-Highest: 0.99717 Training: 2022-04-27 03:36:02,641-[cfp_fp][34000]XNorm: 19.802441 Training: 2022-04-27 03:36:02,642-[cfp_fp][34000]Accuracy-Flip: 0.93657+-0.01173 Training: 2022-04-27 03:36:02,643-[cfp_fp][34000]Accuracy-Highest: 0.94771 Training: 2022-04-27 03:36:29,690-[agedb_30][34000]XNorm: 22.284478 Training: 2022-04-27 03:36:29,691-[agedb_30][34000]Accuracy-Flip: 0.97283+-0.00699 Training: 2022-04-27 03:36:29,692-[agedb_30][34000]Accuracy-Highest: 0.97283 Training: 2022-04-27 03:36:31,526-Speed 116.18 samples/sec Loss 6.7492 LearningRate 0.0491 Epoch: 5 Global Step: 34010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:33,341-Speed 5644.88 samples/sec Loss 6.8411 LearningRate 0.0491 Epoch: 5 Global Step: 34020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:35,159-Speed 5634.47 samples/sec Loss 6.7198 LearningRate 0.0491 Epoch: 5 Global Step: 34030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:36,977-Speed 5633.20 samples/sec Loss 6.7161 LearningRate 0.0491 Epoch: 5 Global Step: 34040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:38,803-Speed 5609.72 samples/sec Loss 6.8204 LearningRate 0.0491 Epoch: 5 Global Step: 34050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:40,606-Speed 5681.74 samples/sec Loss 6.7373 LearningRate 0.0491 Epoch: 5 Global Step: 34060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:42,410-Speed 5679.01 samples/sec Loss 6.8463 LearningRate 0.0491 Epoch: 5 Global Step: 34070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:44,216-Speed 5670.16 samples/sec Loss 6.8484 LearningRate 0.0490 Epoch: 5 Global Step: 34080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:36:46,030-Speed 5648.82 samples/sec Loss 6.8152 LearningRate 0.0490 Epoch: 5 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:36:47,852-Speed 5621.85 samples/sec Loss 6.7350 LearningRate 0.0490 Epoch: 5 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:36:49,793-Speed 5277.77 samples/sec Loss 6.8782 LearningRate 0.0490 Epoch: 5 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:01,347-Speed 886.38 samples/sec Loss 6.4894 LearningRate 0.0490 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:03,194-Speed 5546.73 samples/sec Loss 6.1559 LearningRate 0.0490 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:05,029-Speed 5583.40 samples/sec Loss 6.0390 LearningRate 0.0490 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:06,853-Speed 5615.44 samples/sec Loss 6.0863 LearningRate 0.0490 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:08,690-Speed 5576.75 samples/sec Loss 6.0015 LearningRate 0.0489 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:10,599-Speed 5365.81 samples/sec Loss 6.1044 LearningRate 0.0489 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:12,481-Speed 5442.68 samples/sec Loss 6.0553 LearningRate 0.0489 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:14,315-Speed 5587.73 samples/sec Loss 5.9625 LearningRate 0.0489 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:16,156-Speed 5564.67 samples/sec Loss 6.2596 LearningRate 0.0489 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:18,016-Speed 5506.34 samples/sec Loss 6.1802 LearningRate 0.0489 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:19,840-Speed 5614.63 samples/sec Loss 6.2307 LearningRate 0.0489 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:21,676-Speed 5581.21 samples/sec Loss 6.2906 LearningRate 0.0489 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:23,481-Speed 5674.67 samples/sec Loss 6.1947 LearningRate 0.0488 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:25,305-Speed 5617.51 samples/sec Loss 6.1797 LearningRate 0.0488 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:27,130-Speed 5610.87 samples/sec Loss 6.3523 LearningRate 0.0488 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:28,968-Speed 5572.36 samples/sec Loss 6.1893 LearningRate 0.0488 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:30,796-Speed 5605.74 samples/sec Loss 6.2743 LearningRate 0.0488 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:32,632-Speed 5577.64 samples/sec Loss 6.2138 LearningRate 0.0488 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:34,462-Speed 5598.30 samples/sec Loss 6.2440 LearningRate 0.0488 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:36,286-Speed 5616.68 samples/sec Loss 6.3037 LearningRate 0.0488 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:38,100-Speed 5646.88 samples/sec Loss 6.3098 LearningRate 0.0487 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:39,925-Speed 5614.03 samples/sec Loss 6.4361 LearningRate 0.0487 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:41,737-Speed 5653.00 samples/sec Loss 6.1162 LearningRate 0.0487 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:43,588-Speed 5531.89 samples/sec Loss 6.3093 LearningRate 0.0487 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:45,419-Speed 5596.18 samples/sec Loss 6.3212 LearningRate 0.0487 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:47,246-Speed 5605.03 samples/sec Loss 6.4354 LearningRate 0.0487 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:49,069-Speed 5620.02 samples/sec Loss 6.4452 LearningRate 0.0487 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:37:50,878-Speed 5661.23 samples/sec Loss 6.3245 LearningRate 0.0487 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:52,694-Speed 5642.04 samples/sec Loss 6.2928 LearningRate 0.0487 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:54,544-Speed 5536.32 samples/sec Loss 6.2530 LearningRate 0.0486 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:56,375-Speed 5594.54 samples/sec Loss 6.1274 LearningRate 0.0486 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:37:58,216-Speed 5565.04 samples/sec Loss 6.2420 LearningRate 0.0486 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:00,059-Speed 5557.75 samples/sec Loss 6.3257 LearningRate 0.0486 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:01,877-Speed 5633.32 samples/sec Loss 6.4135 LearningRate 0.0486 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:03,717-Speed 5569.95 samples/sec Loss 6.4822 LearningRate 0.0486 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:05,549-Speed 5593.24 samples/sec Loss 6.3507 LearningRate 0.0486 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:07,388-Speed 5567.71 samples/sec Loss 6.4280 LearningRate 0.0486 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:09,256-Speed 5483.68 samples/sec Loss 6.4476 LearningRate 0.0485 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:11,073-Speed 5636.97 samples/sec Loss 6.1843 LearningRate 0.0485 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:12,910-Speed 5577.03 samples/sec Loss 6.5307 LearningRate 0.0485 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:14,736-Speed 5611.05 samples/sec Loss 6.3640 LearningRate 0.0485 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:16,547-Speed 5655.73 samples/sec Loss 6.3510 LearningRate 0.0485 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:18,359-Speed 5653.30 samples/sec Loss 6.3315 LearningRate 0.0485 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:20,169-Speed 5657.60 samples/sec Loss 6.4602 LearningRate 0.0485 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:21,996-Speed 5607.42 samples/sec Loss 6.4617 LearningRate 0.0485 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:23,806-Speed 5658.67 samples/sec Loss 6.4788 LearningRate 0.0484 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:25,634-Speed 5605.89 samples/sec Loss 6.4374 LearningRate 0.0484 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:27,462-Speed 5603.84 samples/sec Loss 6.4418 LearningRate 0.0484 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:29,296-Speed 5583.74 samples/sec Loss 6.3658 LearningRate 0.0484 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:31,125-Speed 5601.00 samples/sec Loss 6.4876 LearningRate 0.0484 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:32,946-Speed 5624.72 samples/sec Loss 6.4655 LearningRate 0.0484 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:34,789-Speed 5558.82 samples/sec Loss 6.4368 LearningRate 0.0484 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:36,652-Speed 5496.43 samples/sec Loss 6.4029 LearningRate 0.0484 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:38,497-Speed 5552.43 samples/sec Loss 6.4240 LearningRate 0.0483 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:40,321-Speed 5615.87 samples/sec Loss 6.3845 LearningRate 0.0483 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:42,140-Speed 5634.41 samples/sec Loss 6.3687 LearningRate 0.0483 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:43,975-Speed 5580.64 samples/sec Loss 6.3826 LearningRate 0.0483 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:45,787-Speed 5652.96 samples/sec Loss 6.4136 LearningRate 0.0483 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:38:47,595-Speed 5664.30 samples/sec Loss 6.5750 LearningRate 0.0483 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:49,402-Speed 5669.10 samples/sec Loss 6.5251 LearningRate 0.0483 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:51,216-Speed 5647.49 samples/sec Loss 6.4009 LearningRate 0.0483 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:53,048-Speed 5590.37 samples/sec Loss 6.5055 LearningRate 0.0482 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:54,856-Speed 5667.76 samples/sec Loss 6.4670 LearningRate 0.0482 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:56,670-Speed 5646.32 samples/sec Loss 6.5284 LearningRate 0.0482 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:38:58,499-Speed 5601.89 samples/sec Loss 6.6088 LearningRate 0.0482 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:00,324-Speed 5610.65 samples/sec Loss 6.4711 LearningRate 0.0482 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:02,149-Speed 5613.22 samples/sec Loss 6.4958 LearningRate 0.0482 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:03,989-Speed 5567.41 samples/sec Loss 6.5891 LearningRate 0.0482 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:05,830-Speed 5563.05 samples/sec Loss 6.5160 LearningRate 0.0482 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:07,661-Speed 5594.87 samples/sec Loss 6.3696 LearningRate 0.0481 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:09,502-Speed 5564.52 samples/sec Loss 6.3975 LearningRate 0.0481 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:11,319-Speed 5636.99 samples/sec Loss 6.4317 LearningRate 0.0481 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:13,123-Speed 5678.81 samples/sec Loss 6.5024 LearningRate 0.0481 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:14,973-Speed 5537.73 samples/sec Loss 6.5991 LearningRate 0.0481 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:16,828-Speed 5521.27 samples/sec Loss 6.4814 LearningRate 0.0481 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:18,661-Speed 5588.59 samples/sec Loss 6.4915 LearningRate 0.0481 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:20,485-Speed 5614.97 samples/sec Loss 6.4811 LearningRate 0.0481 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:22,306-Speed 5626.25 samples/sec Loss 6.4980 LearningRate 0.0481 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:24,148-Speed 5559.82 samples/sec Loss 6.5308 LearningRate 0.0480 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:25,982-Speed 5587.11 samples/sec Loss 6.3774 LearningRate 0.0480 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:27,870-Speed 5424.33 samples/sec Loss 6.4585 LearningRate 0.0480 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:29,688-Speed 5636.30 samples/sec Loss 6.5420 LearningRate 0.0480 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:31,517-Speed 5598.62 samples/sec Loss 6.5410 LearningRate 0.0480 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:33,347-Speed 5597.78 samples/sec Loss 6.6002 LearningRate 0.0480 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:35,176-Speed 5599.79 samples/sec Loss 6.5385 LearningRate 0.0480 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:37,011-Speed 5581.88 samples/sec Loss 6.3504 LearningRate 0.0480 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:38,873-Speed 5502.76 samples/sec Loss 6.5527 LearningRate 0.0479 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:39:40,701-Speed 5602.22 samples/sec Loss 6.5065 LearningRate 0.0479 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:42,523-Speed 5623.38 samples/sec Loss 6.6983 LearningRate 0.0479 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:44,371-Speed 5542.85 samples/sec Loss 6.4088 LearningRate 0.0479 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:46,199-Speed 5603.77 samples/sec Loss 6.5921 LearningRate 0.0479 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:48,064-Speed 5493.08 samples/sec Loss 6.5274 LearningRate 0.0479 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:49,913-Speed 5539.45 samples/sec Loss 6.4777 LearningRate 0.0479 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:51,753-Speed 5566.28 samples/sec Loss 6.4413 LearningRate 0.0479 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:53,567-Speed 5649.57 samples/sec Loss 6.4045 LearningRate 0.0478 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:55,398-Speed 5594.42 samples/sec Loss 6.5234 LearningRate 0.0478 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:57,226-Speed 5602.80 samples/sec Loss 6.5330 LearningRate 0.0478 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:39:59,070-Speed 5555.23 samples/sec Loss 6.5793 LearningRate 0.0478 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:00,947-Speed 5456.56 samples/sec Loss 6.5085 LearningRate 0.0478 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:02,788-Speed 5563.35 samples/sec Loss 6.5589 LearningRate 0.0478 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:04,609-Speed 5626.83 samples/sec Loss 6.6815 LearningRate 0.0478 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:06,422-Speed 5648.41 samples/sec Loss 6.5992 LearningRate 0.0478 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:08,233-Speed 5657.84 samples/sec Loss 6.7560 LearningRate 0.0477 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:10,060-Speed 5605.52 samples/sec Loss 6.5507 LearningRate 0.0477 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:11,889-Speed 5602.17 samples/sec Loss 6.5181 LearningRate 0.0477 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:13,695-Speed 5672.79 samples/sec Loss 6.5549 LearningRate 0.0477 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:15,515-Speed 5625.43 samples/sec Loss 6.4595 LearningRate 0.0477 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:17,342-Speed 5607.80 samples/sec Loss 6.5683 LearningRate 0.0477 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:19,174-Speed 5589.76 samples/sec Loss 6.6923 LearningRate 0.0477 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:21,005-Speed 5595.96 samples/sec Loss 6.5646 LearningRate 0.0477 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:22,836-Speed 5593.02 samples/sec Loss 6.5331 LearningRate 0.0477 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:24,646-Speed 5660.42 samples/sec Loss 6.4931 LearningRate 0.0476 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:26,462-Speed 5647.52 samples/sec Loss 6.5954 LearningRate 0.0476 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:28,349-Speed 5429.48 samples/sec Loss 6.4812 LearningRate 0.0476 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:30,167-Speed 5635.20 samples/sec Loss 6.6733 LearningRate 0.0476 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:32,008-Speed 5564.14 samples/sec Loss 6.6848 LearningRate 0.0476 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:33,835-Speed 5605.07 samples/sec Loss 6.5990 LearningRate 0.0476 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:35,675-Speed 5569.48 samples/sec Loss 6.5688 LearningRate 0.0476 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:37,524-Speed 5537.71 samples/sec Loss 6.5689 LearningRate 0.0476 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:39,343-Speed 5633.65 samples/sec Loss 6.6518 LearningRate 0.0475 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:41,168-Speed 5611.93 samples/sec Loss 6.6230 LearningRate 0.0475 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:42,988-Speed 5628.03 samples/sec Loss 6.6662 LearningRate 0.0475 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:44,794-Speed 5672.55 samples/sec Loss 6.5212 LearningRate 0.0475 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:46,624-Speed 5596.78 samples/sec Loss 6.5641 LearningRate 0.0475 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:48,496-Speed 5472.98 samples/sec Loss 6.6997 LearningRate 0.0475 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:50,342-Speed 5549.65 samples/sec Loss 6.5065 LearningRate 0.0475 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:52,151-Speed 5662.97 samples/sec Loss 6.4523 LearningRate 0.0475 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:53,957-Speed 5669.88 samples/sec Loss 6.5980 LearningRate 0.0474 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:55,768-Speed 5656.94 samples/sec Loss 6.5176 LearningRate 0.0474 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:40:57,608-Speed 5565.78 samples/sec Loss 6.4054 LearningRate 0.0474 Epoch: 6 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:40:59,427-Speed 5632.10 samples/sec Loss 6.5557 LearningRate 0.0474 Epoch: 6 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:01,262-Speed 5581.17 samples/sec Loss 6.6215 LearningRate 0.0474 Epoch: 6 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:03,098-Speed 5581.03 samples/sec Loss 6.4090 LearningRate 0.0474 Epoch: 6 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:04,936-Speed 5573.11 samples/sec Loss 6.5505 LearningRate 0.0474 Epoch: 6 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:06,746-Speed 5659.42 samples/sec Loss 6.5016 LearningRate 0.0474 Epoch: 6 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:08,616-Speed 5476.97 samples/sec Loss 6.4950 LearningRate 0.0473 Epoch: 6 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:10,500-Speed 5438.58 samples/sec Loss 6.7036 LearningRate 0.0473 Epoch: 6 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:12,351-Speed 5534.19 samples/sec Loss 6.7606 LearningRate 0.0473 Epoch: 6 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:14,178-Speed 5607.59 samples/sec Loss 6.5501 LearningRate 0.0473 Epoch: 6 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:15,995-Speed 5638.48 samples/sec Loss 6.5877 LearningRate 0.0473 Epoch: 6 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:17,817-Speed 5620.99 samples/sec Loss 6.4771 LearningRate 0.0473 Epoch: 6 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:19,664-Speed 5544.71 samples/sec Loss 6.5216 LearningRate 0.0473 Epoch: 6 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:21,493-Speed 5600.99 samples/sec Loss 6.4211 LearningRate 0.0473 Epoch: 6 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:23,320-Speed 5606.07 samples/sec Loss 6.4389 LearningRate 0.0473 Epoch: 6 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:25,151-Speed 5594.45 samples/sec Loss 6.5139 LearningRate 0.0472 Epoch: 6 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:26,973-Speed 5622.84 samples/sec Loss 6.5182 LearningRate 0.0472 Epoch: 6 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:28,796-Speed 5619.67 samples/sec Loss 6.5145 LearningRate 0.0472 Epoch: 6 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:30,616-Speed 5626.85 samples/sec Loss 6.6959 LearningRate 0.0472 Epoch: 6 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:32,434-Speed 5635.64 samples/sec Loss 6.5013 LearningRate 0.0472 Epoch: 6 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:34,261-Speed 5607.32 samples/sec Loss 6.4675 LearningRate 0.0472 Epoch: 6 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:36,090-Speed 5601.60 samples/sec Loss 6.7066 LearningRate 0.0472 Epoch: 6 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:37,944-Speed 5522.76 samples/sec Loss 6.5435 LearningRate 0.0472 Epoch: 6 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:39,761-Speed 5637.83 samples/sec Loss 6.6644 LearningRate 0.0471 Epoch: 6 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:41,590-Speed 5601.51 samples/sec Loss 6.6375 LearningRate 0.0471 Epoch: 6 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:43,435-Speed 5551.85 samples/sec Loss 6.5817 LearningRate 0.0471 Epoch: 6 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:45,261-Speed 5610.81 samples/sec Loss 6.4348 LearningRate 0.0471 Epoch: 6 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:47,103-Speed 5561.04 samples/sec Loss 6.6479 LearningRate 0.0471 Epoch: 6 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:48,925-Speed 5620.95 samples/sec Loss 6.6010 LearningRate 0.0471 Epoch: 6 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:50,765-Speed 5566.53 samples/sec Loss 6.5837 LearningRate 0.0471 Epoch: 6 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:41:52,630-Speed 5493.22 samples/sec Loss 6.4914 LearningRate 0.0471 Epoch: 6 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:54,467-Speed 5573.93 samples/sec Loss 6.5258 LearningRate 0.0470 Epoch: 6 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:56,285-Speed 5638.33 samples/sec Loss 6.6621 LearningRate 0.0470 Epoch: 6 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:41:58,096-Speed 5655.32 samples/sec Loss 6.6568 LearningRate 0.0470 Epoch: 6 Global Step: 35740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:41:59,963-Speed 5485.41 samples/sec Loss 6.5325 LearningRate 0.0470 Epoch: 6 Global Step: 35750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:01,817-Speed 5525.45 samples/sec Loss 6.5172 LearningRate 0.0470 Epoch: 6 Global Step: 35760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:03,659-Speed 5560.98 samples/sec Loss 6.6355 LearningRate 0.0470 Epoch: 6 Global Step: 35770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:05,504-Speed 5551.01 samples/sec Loss 6.4889 LearningRate 0.0470 Epoch: 6 Global Step: 35780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:07,335-Speed 5595.64 samples/sec Loss 6.5656 LearningRate 0.0470 Epoch: 6 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:09,145-Speed 5659.16 samples/sec Loss 6.5973 LearningRate 0.0469 Epoch: 6 Global Step: 35800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:10,990-Speed 5553.05 samples/sec Loss 6.4658 LearningRate 0.0469 Epoch: 6 Global Step: 35810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:12,817-Speed 5607.48 samples/sec Loss 6.6014 LearningRate 0.0469 Epoch: 6 Global Step: 35820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:14,662-Speed 5548.88 samples/sec Loss 6.6006 LearningRate 0.0469 Epoch: 6 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:42:16,500-Speed 5575.03 samples/sec Loss 6.5402 LearningRate 0.0469 Epoch: 6 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:18,318-Speed 5633.92 samples/sec Loss 6.6960 LearningRate 0.0469 Epoch: 6 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:20,127-Speed 5661.76 samples/sec Loss 6.6731 LearningRate 0.0469 Epoch: 6 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:21,970-Speed 5559.30 samples/sec Loss 6.6629 LearningRate 0.0469 Epoch: 6 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:23,800-Speed 5596.64 samples/sec Loss 6.5688 LearningRate 0.0469 Epoch: 6 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:25,634-Speed 5587.49 samples/sec Loss 6.5692 LearningRate 0.0468 Epoch: 6 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:27,463-Speed 5598.89 samples/sec Loss 6.4965 LearningRate 0.0468 Epoch: 6 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:29,282-Speed 5631.44 samples/sec Loss 6.6280 LearningRate 0.0468 Epoch: 6 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:31,115-Speed 5589.41 samples/sec Loss 6.4840 LearningRate 0.0468 Epoch: 6 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:32,940-Speed 5610.42 samples/sec Loss 6.5628 LearningRate 0.0468 Epoch: 6 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:34,768-Speed 5603.30 samples/sec Loss 6.4613 LearningRate 0.0468 Epoch: 6 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:42:36,594-Speed 5611.13 samples/sec Loss 6.4791 LearningRate 0.0468 Epoch: 6 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:38,410-Speed 5639.52 samples/sec Loss 6.6042 LearningRate 0.0468 Epoch: 6 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:40,231-Speed 5626.74 samples/sec Loss 6.4516 LearningRate 0.0467 Epoch: 6 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:42,055-Speed 5615.17 samples/sec Loss 6.6322 LearningRate 0.0467 Epoch: 6 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:43,872-Speed 5637.98 samples/sec Loss 6.5452 LearningRate 0.0467 Epoch: 6 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:42:45,687-Speed 5644.24 samples/sec Loss 6.5135 LearningRate 0.0467 Epoch: 6 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:43:12,079-[lfw][36000]XNorm: 23.653991 Training: 2022-04-27 03:43:12,080-[lfw][36000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-27 03:43:12,080-[lfw][36000]Accuracy-Highest: 0.99750 Training: 2022-04-27 03:43:43,233-[cfp_fp][36000]XNorm: 20.816380 Training: 2022-04-27 03:43:43,233-[cfp_fp][36000]Accuracy-Flip: 0.94343+-0.01184 Training: 2022-04-27 03:43:43,234-[cfp_fp][36000]Accuracy-Highest: 0.94771 Training: 2022-04-27 03:44:10,487-[agedb_30][36000]XNorm: 23.144299 Training: 2022-04-27 03:44:10,487-[agedb_30][36000]Accuracy-Flip: 0.97300+-0.00819 Training: 2022-04-27 03:44:10,488-[agedb_30][36000]Accuracy-Highest: 0.97300 Training: 2022-04-27 03:44:12,341-Speed 118.17 samples/sec Loss 6.5886 LearningRate 0.0467 Epoch: 6 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:14,152-Speed 5656.17 samples/sec Loss 6.5730 LearningRate 0.0467 Epoch: 6 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:15,965-Speed 5649.34 samples/sec Loss 6.6172 LearningRate 0.0467 Epoch: 6 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:17,784-Speed 5630.34 samples/sec Loss 6.5202 LearningRate 0.0467 Epoch: 6 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:19,607-Speed 5619.95 samples/sec Loss 6.5089 LearningRate 0.0466 Epoch: 6 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:21,460-Speed 5528.49 samples/sec Loss 6.5691 LearningRate 0.0466 Epoch: 6 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:23,330-Speed 5476.63 samples/sec Loss 6.6441 LearningRate 0.0466 Epoch: 6 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:25,165-Speed 5581.69 samples/sec Loss 6.6350 LearningRate 0.0466 Epoch: 6 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:26,992-Speed 5607.82 samples/sec Loss 6.5897 LearningRate 0.0466 Epoch: 6 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:28,865-Speed 5469.07 samples/sec Loss 6.5851 LearningRate 0.0466 Epoch: 6 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:30,680-Speed 5643.46 samples/sec Loss 6.6289 LearningRate 0.0466 Epoch: 6 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:32,510-Speed 5595.79 samples/sec Loss 6.5168 LearningRate 0.0466 Epoch: 6 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:34,369-Speed 5510.78 samples/sec Loss 6.3880 LearningRate 0.0466 Epoch: 6 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:36,193-Speed 5617.88 samples/sec Loss 6.5977 LearningRate 0.0465 Epoch: 6 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:38,019-Speed 5609.98 samples/sec Loss 6.5288 LearningRate 0.0465 Epoch: 6 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:39,836-Speed 5635.86 samples/sec Loss 6.4717 LearningRate 0.0465 Epoch: 6 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:41,657-Speed 5625.46 samples/sec Loss 6.5489 LearningRate 0.0465 Epoch: 6 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:43,499-Speed 5562.87 samples/sec Loss 6.5035 LearningRate 0.0465 Epoch: 6 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:45,331-Speed 5591.45 samples/sec Loss 6.6465 LearningRate 0.0465 Epoch: 6 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:47,166-Speed 5581.92 samples/sec Loss 6.4984 LearningRate 0.0465 Epoch: 6 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:44:48,994-Speed 5603.09 samples/sec Loss 6.6365 LearningRate 0.0465 Epoch: 6 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:50,828-Speed 5584.89 samples/sec Loss 6.6207 LearningRate 0.0464 Epoch: 6 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:52,653-Speed 5611.37 samples/sec Loss 6.5407 LearningRate 0.0464 Epoch: 6 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:54,498-Speed 5552.72 samples/sec Loss 6.5059 LearningRate 0.0464 Epoch: 6 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:56,342-Speed 5555.13 samples/sec Loss 6.6058 LearningRate 0.0464 Epoch: 6 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:44:58,169-Speed 5606.77 samples/sec Loss 6.6051 LearningRate 0.0464 Epoch: 6 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:00,003-Speed 5583.81 samples/sec Loss 6.5113 LearningRate 0.0464 Epoch: 6 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:01,865-Speed 5501.40 samples/sec Loss 6.6343 LearningRate 0.0464 Epoch: 6 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:03,678-Speed 5650.16 samples/sec Loss 6.6235 LearningRate 0.0464 Epoch: 6 Global Step: 36290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:05,506-Speed 5606.86 samples/sec Loss 6.5576 LearningRate 0.0463 Epoch: 6 Global Step: 36300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:07,335-Speed 5599.19 samples/sec Loss 6.5380 LearningRate 0.0463 Epoch: 6 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:09,178-Speed 5557.95 samples/sec Loss 6.4941 LearningRate 0.0463 Epoch: 6 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:11,002-Speed 5616.05 samples/sec Loss 6.3698 LearningRate 0.0463 Epoch: 6 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:12,829-Speed 5605.72 samples/sec Loss 6.5910 LearningRate 0.0463 Epoch: 6 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:14,677-Speed 5544.26 samples/sec Loss 6.4117 LearningRate 0.0463 Epoch: 6 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:16,492-Speed 5644.25 samples/sec Loss 6.5291 LearningRate 0.0463 Epoch: 6 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:18,313-Speed 5625.42 samples/sec Loss 6.4151 LearningRate 0.0463 Epoch: 6 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:20,134-Speed 5622.24 samples/sec Loss 6.6263 LearningRate 0.0463 Epoch: 6 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:22,004-Speed 5480.14 samples/sec Loss 6.6366 LearningRate 0.0462 Epoch: 6 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:23,851-Speed 5544.44 samples/sec Loss 6.5622 LearningRate 0.0462 Epoch: 6 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:25,684-Speed 5588.34 samples/sec Loss 6.4805 LearningRate 0.0462 Epoch: 6 Global Step: 36410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:27,507-Speed 5621.20 samples/sec Loss 6.6353 LearningRate 0.0462 Epoch: 6 Global Step: 36420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:29,339-Speed 5591.08 samples/sec Loss 6.4909 LearningRate 0.0462 Epoch: 6 Global Step: 36430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:31,183-Speed 5554.16 samples/sec Loss 6.4801 LearningRate 0.0462 Epoch: 6 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:33,012-Speed 5600.54 samples/sec Loss 6.5970 LearningRate 0.0462 Epoch: 6 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:34,851-Speed 5571.41 samples/sec Loss 6.5838 LearningRate 0.0462 Epoch: 6 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:36,670-Speed 5631.23 samples/sec Loss 6.4596 LearningRate 0.0461 Epoch: 6 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:38,511-Speed 5563.37 samples/sec Loss 6.6109 LearningRate 0.0461 Epoch: 6 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:40,337-Speed 5608.95 samples/sec Loss 6.4629 LearningRate 0.0461 Epoch: 6 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:42,180-Speed 5557.23 samples/sec Loss 6.4909 LearningRate 0.0461 Epoch: 6 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:44,012-Speed 5591.68 samples/sec Loss 6.3863 LearningRate 0.0461 Epoch: 6 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:45,835-Speed 5618.75 samples/sec Loss 6.5199 LearningRate 0.0461 Epoch: 6 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:47,666-Speed 5595.47 samples/sec Loss 6.3617 LearningRate 0.0461 Epoch: 6 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:49,508-Speed 5561.58 samples/sec Loss 6.4440 LearningRate 0.0461 Epoch: 6 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:51,330-Speed 5620.59 samples/sec Loss 6.5995 LearningRate 0.0460 Epoch: 6 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:53,158-Speed 5605.99 samples/sec Loss 6.6441 LearningRate 0.0460 Epoch: 6 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:54,971-Speed 5647.30 samples/sec Loss 6.5197 LearningRate 0.0460 Epoch: 6 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:45:56,809-Speed 5574.09 samples/sec Loss 6.5477 LearningRate 0.0460 Epoch: 6 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:45:58,640-Speed 5594.69 samples/sec Loss 6.5079 LearningRate 0.0460 Epoch: 6 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:00,455-Speed 5644.24 samples/sec Loss 6.3136 LearningRate 0.0460 Epoch: 6 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:02,288-Speed 5587.89 samples/sec Loss 6.4631 LearningRate 0.0460 Epoch: 6 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:04,116-Speed 5604.81 samples/sec Loss 6.4440 LearningRate 0.0460 Epoch: 6 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:05,959-Speed 5555.97 samples/sec Loss 6.7014 LearningRate 0.0460 Epoch: 6 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:07,807-Speed 5543.24 samples/sec Loss 6.6273 LearningRate 0.0459 Epoch: 6 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:09,618-Speed 5656.65 samples/sec Loss 6.5480 LearningRate 0.0459 Epoch: 6 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:11,452-Speed 5584.91 samples/sec Loss 6.5233 LearningRate 0.0459 Epoch: 6 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:13,277-Speed 5615.70 samples/sec Loss 6.6093 LearningRate 0.0459 Epoch: 6 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:15,103-Speed 5607.17 samples/sec Loss 6.5521 LearningRate 0.0459 Epoch: 6 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:16,948-Speed 5552.28 samples/sec Loss 6.4564 LearningRate 0.0459 Epoch: 6 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:18,758-Speed 5659.63 samples/sec Loss 6.3977 LearningRate 0.0459 Epoch: 6 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:20,585-Speed 5606.61 samples/sec Loss 6.6120 LearningRate 0.0459 Epoch: 6 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:22,406-Speed 5625.91 samples/sec Loss 6.5594 LearningRate 0.0458 Epoch: 6 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:24,214-Speed 5665.86 samples/sec Loss 6.6830 LearningRate 0.0458 Epoch: 6 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:26,058-Speed 5554.86 samples/sec Loss 6.4413 LearningRate 0.0458 Epoch: 6 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:27,879-Speed 5625.16 samples/sec Loss 6.4393 LearningRate 0.0458 Epoch: 6 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:29,706-Speed 5606.85 samples/sec Loss 6.4788 LearningRate 0.0458 Epoch: 6 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:31,533-Speed 5606.44 samples/sec Loss 6.5856 LearningRate 0.0458 Epoch: 6 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:33,354-Speed 5622.70 samples/sec Loss 6.6041 LearningRate 0.0458 Epoch: 6 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:35,211-Speed 5517.96 samples/sec Loss 6.4935 LearningRate 0.0458 Epoch: 6 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:37,037-Speed 5610.44 samples/sec Loss 6.4914 LearningRate 0.0458 Epoch: 6 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:38,861-Speed 5616.95 samples/sec Loss 6.5157 LearningRate 0.0457 Epoch: 6 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:40,695-Speed 5582.43 samples/sec Loss 6.7175 LearningRate 0.0457 Epoch: 6 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:42,505-Speed 5659.51 samples/sec Loss 6.4202 LearningRate 0.0457 Epoch: 6 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:44,325-Speed 5630.72 samples/sec Loss 6.4089 LearningRate 0.0457 Epoch: 6 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:46,151-Speed 5609.29 samples/sec Loss 6.5959 LearningRate 0.0457 Epoch: 6 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:47,983-Speed 5590.40 samples/sec Loss 6.4787 LearningRate 0.0457 Epoch: 6 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:49,829-Speed 5548.56 samples/sec Loss 6.5325 LearningRate 0.0457 Epoch: 6 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:51,658-Speed 5600.96 samples/sec Loss 6.6423 LearningRate 0.0457 Epoch: 6 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:53,466-Speed 5663.39 samples/sec Loss 6.4827 LearningRate 0.0456 Epoch: 6 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:55,313-Speed 5548.16 samples/sec Loss 6.5111 LearningRate 0.0456 Epoch: 6 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:46:57,173-Speed 5525.77 samples/sec Loss 6.5920 LearningRate 0.0456 Epoch: 6 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:46:59,012-Speed 5572.12 samples/sec Loss 6.4969 LearningRate 0.0456 Epoch: 6 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:00,851-Speed 5567.05 samples/sec Loss 6.5566 LearningRate 0.0456 Epoch: 6 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:02,677-Speed 5612.42 samples/sec Loss 6.6293 LearningRate 0.0456 Epoch: 6 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:04,493-Speed 5638.72 samples/sec Loss 6.6390 LearningRate 0.0456 Epoch: 6 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:06,314-Speed 5629.66 samples/sec Loss 6.5844 LearningRate 0.0456 Epoch: 6 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:08,140-Speed 5607.34 samples/sec Loss 6.4365 LearningRate 0.0455 Epoch: 6 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:09,980-Speed 5567.39 samples/sec Loss 6.5220 LearningRate 0.0455 Epoch: 6 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:11,801-Speed 5625.19 samples/sec Loss 6.4592 LearningRate 0.0455 Epoch: 6 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:13,615-Speed 5647.20 samples/sec Loss 6.4974 LearningRate 0.0455 Epoch: 6 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:15,438-Speed 5618.48 samples/sec Loss 6.4390 LearningRate 0.0455 Epoch: 6 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:17,264-Speed 5612.45 samples/sec Loss 6.5119 LearningRate 0.0455 Epoch: 6 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:19,096-Speed 5589.09 samples/sec Loss 6.4186 LearningRate 0.0455 Epoch: 6 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:20,910-Speed 5647.63 samples/sec Loss 6.4075 LearningRate 0.0455 Epoch: 6 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:22,730-Speed 5629.13 samples/sec Loss 6.6003 LearningRate 0.0455 Epoch: 6 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:24,561-Speed 5592.06 samples/sec Loss 6.4504 LearningRate 0.0454 Epoch: 6 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:26,398-Speed 5579.01 samples/sec Loss 6.3352 LearningRate 0.0454 Epoch: 6 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:28,246-Speed 5542.05 samples/sec Loss 6.5175 LearningRate 0.0454 Epoch: 6 Global Step: 37080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:30,072-Speed 5609.95 samples/sec Loss 6.4335 LearningRate 0.0454 Epoch: 6 Global Step: 37090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:31,918-Speed 5547.98 samples/sec Loss 6.5258 LearningRate 0.0454 Epoch: 6 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:33,838-Speed 5336.08 samples/sec Loss 6.4288 LearningRate 0.0454 Epoch: 6 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:35,691-Speed 5527.70 samples/sec Loss 6.5406 LearningRate 0.0454 Epoch: 6 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:37,524-Speed 5589.01 samples/sec Loss 6.5807 LearningRate 0.0454 Epoch: 6 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:39,340-Speed 5641.56 samples/sec Loss 6.5153 LearningRate 0.0453 Epoch: 6 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:41,182-Speed 5561.03 samples/sec Loss 6.3377 LearningRate 0.0453 Epoch: 6 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:43,034-Speed 5530.82 samples/sec Loss 6.2921 LearningRate 0.0453 Epoch: 6 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:44,864-Speed 5596.84 samples/sec Loss 6.4807 LearningRate 0.0453 Epoch: 6 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:47:46,683-Speed 5629.53 samples/sec Loss 6.4759 LearningRate 0.0453 Epoch: 6 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:48,522-Speed 5569.79 samples/sec Loss 6.5042 LearningRate 0.0453 Epoch: 6 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:50,367-Speed 5553.67 samples/sec Loss 6.5465 LearningRate 0.0453 Epoch: 6 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:52,209-Speed 5558.98 samples/sec Loss 6.5724 LearningRate 0.0453 Epoch: 6 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:54,035-Speed 5609.88 samples/sec Loss 6.3878 LearningRate 0.0453 Epoch: 6 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:55,884-Speed 5541.22 samples/sec Loss 6.4686 LearningRate 0.0452 Epoch: 6 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:57,716-Speed 5590.35 samples/sec Loss 6.5000 LearningRate 0.0452 Epoch: 6 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:47:59,639-Speed 5328.63 samples/sec Loss 6.2997 LearningRate 0.0452 Epoch: 6 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:01,545-Speed 5376.47 samples/sec Loss 6.5390 LearningRate 0.0452 Epoch: 6 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:03,369-Speed 5613.34 samples/sec Loss 6.5601 LearningRate 0.0452 Epoch: 6 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:05,198-Speed 5602.47 samples/sec Loss 6.3128 LearningRate 0.0452 Epoch: 6 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:07,035-Speed 5574.35 samples/sec Loss 6.4750 LearningRate 0.0452 Epoch: 6 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:08,866-Speed 5596.65 samples/sec Loss 6.4118 LearningRate 0.0452 Epoch: 6 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:10,703-Speed 5575.97 samples/sec Loss 6.4714 LearningRate 0.0451 Epoch: 6 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:12,540-Speed 5575.71 samples/sec Loss 6.4080 LearningRate 0.0451 Epoch: 6 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:14,351-Speed 5656.94 samples/sec Loss 6.4928 LearningRate 0.0451 Epoch: 6 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:16,174-Speed 5616.82 samples/sec Loss 6.3153 LearningRate 0.0451 Epoch: 6 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:18,020-Speed 5550.55 samples/sec Loss 6.4845 LearningRate 0.0451 Epoch: 6 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:19,844-Speed 5614.33 samples/sec Loss 6.5187 LearningRate 0.0451 Epoch: 6 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:21,701-Speed 5516.99 samples/sec Loss 6.5255 LearningRate 0.0451 Epoch: 6 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:23,522-Speed 5625.12 samples/sec Loss 6.5482 LearningRate 0.0451 Epoch: 6 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:25,386-Speed 5495.77 samples/sec Loss 6.5704 LearningRate 0.0451 Epoch: 6 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:27,217-Speed 5593.04 samples/sec Loss 6.5406 LearningRate 0.0450 Epoch: 6 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:48:29,035-Speed 5634.89 samples/sec Loss 6.4863 LearningRate 0.0450 Epoch: 6 Global Step: 37410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:30,860-Speed 5614.17 samples/sec Loss 6.2746 LearningRate 0.0450 Epoch: 6 Global Step: 37420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:32,731-Speed 5473.63 samples/sec Loss 6.4325 LearningRate 0.0450 Epoch: 6 Global Step: 37430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:34,571-Speed 5568.29 samples/sec Loss 6.3930 LearningRate 0.0450 Epoch: 6 Global Step: 37440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:36,427-Speed 5519.18 samples/sec Loss 6.5507 LearningRate 0.0450 Epoch: 6 Global Step: 37450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:38,264-Speed 5576.72 samples/sec Loss 6.4823 LearningRate 0.0450 Epoch: 6 Global Step: 37460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:40,089-Speed 5612.72 samples/sec Loss 6.5775 LearningRate 0.0450 Epoch: 6 Global Step: 37470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:41,912-Speed 5618.14 samples/sec Loss 6.4971 LearningRate 0.0449 Epoch: 6 Global Step: 37480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:43,740-Speed 5603.15 samples/sec Loss 6.5206 LearningRate 0.0449 Epoch: 6 Global Step: 37490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:45,571-Speed 5595.40 samples/sec Loss 6.2341 LearningRate 0.0449 Epoch: 6 Global Step: 37500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 03:48:47,387-Speed 5641.12 samples/sec Loss 6.4971 LearningRate 0.0449 Epoch: 6 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:49,213-Speed 5609.01 samples/sec Loss 6.3353 LearningRate 0.0449 Epoch: 6 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:51,041-Speed 5602.89 samples/sec Loss 6.3780 LearningRate 0.0449 Epoch: 6 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:52,864-Speed 5619.84 samples/sec Loss 6.5861 LearningRate 0.0449 Epoch: 6 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:54,702-Speed 5571.72 samples/sec Loss 6.4234 LearningRate 0.0449 Epoch: 6 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:56,539-Speed 5576.19 samples/sec Loss 6.6042 LearningRate 0.0449 Epoch: 6 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:48:58,375-Speed 5579.47 samples/sec Loss 6.3781 LearningRate 0.0448 Epoch: 6 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:00,201-Speed 5609.72 samples/sec Loss 6.4112 LearningRate 0.0448 Epoch: 6 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:02,029-Speed 5603.54 samples/sec Loss 6.5914 LearningRate 0.0448 Epoch: 6 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:03,857-Speed 5604.50 samples/sec Loss 6.5062 LearningRate 0.0448 Epoch: 6 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:05,686-Speed 5600.55 samples/sec Loss 6.5457 LearningRate 0.0448 Epoch: 6 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:07,502-Speed 5641.39 samples/sec Loss 6.4679 LearningRate 0.0448 Epoch: 6 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:09,329-Speed 5609.62 samples/sec Loss 6.5767 LearningRate 0.0448 Epoch: 6 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:11,164-Speed 5581.04 samples/sec Loss 6.4175 LearningRate 0.0448 Epoch: 6 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:13,039-Speed 5463.12 samples/sec Loss 6.3926 LearningRate 0.0447 Epoch: 6 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:14,902-Speed 5499.64 samples/sec Loss 6.3547 LearningRate 0.0447 Epoch: 6 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:16,728-Speed 5607.25 samples/sec Loss 6.5187 LearningRate 0.0447 Epoch: 6 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:18,566-Speed 5574.12 samples/sec Loss 6.3564 LearningRate 0.0447 Epoch: 6 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:20,399-Speed 5588.63 samples/sec Loss 6.4315 LearningRate 0.0447 Epoch: 6 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:22,238-Speed 5567.89 samples/sec Loss 6.5221 LearningRate 0.0447 Epoch: 6 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:49:24,059-Speed 5626.47 samples/sec Loss 6.3958 LearningRate 0.0447 Epoch: 6 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:25,894-Speed 5581.24 samples/sec Loss 6.5762 LearningRate 0.0447 Epoch: 6 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:27,731-Speed 5577.93 samples/sec Loss 6.5149 LearningRate 0.0447 Epoch: 6 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:29,560-Speed 5600.46 samples/sec Loss 6.3882 LearningRate 0.0446 Epoch: 6 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:31,373-Speed 5651.03 samples/sec Loss 6.5474 LearningRate 0.0446 Epoch: 6 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:33,206-Speed 5587.79 samples/sec Loss 6.5223 LearningRate 0.0446 Epoch: 6 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:35,031-Speed 5610.86 samples/sec Loss 6.6320 LearningRate 0.0446 Epoch: 6 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:36,858-Speed 5606.84 samples/sec Loss 6.4539 LearningRate 0.0446 Epoch: 6 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:38,687-Speed 5601.71 samples/sec Loss 6.3054 LearningRate 0.0446 Epoch: 6 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:40,531-Speed 5554.02 samples/sec Loss 6.4559 LearningRate 0.0446 Epoch: 6 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:42,351-Speed 5628.93 samples/sec Loss 6.3776 LearningRate 0.0446 Epoch: 6 Global Step: 37810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-27 03:49:44,158-Speed 5670.52 samples/sec Loss 6.5029 LearningRate 0.0445 Epoch: 6 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:45,980-Speed 5620.62 samples/sec Loss 6.5721 LearningRate 0.0445 Epoch: 6 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:47,797-Speed 5638.68 samples/sec Loss 6.4617 LearningRate 0.0445 Epoch: 6 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:49,645-Speed 5542.24 samples/sec Loss 6.5366 LearningRate 0.0445 Epoch: 6 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:51,483-Speed 5572.84 samples/sec Loss 6.5085 LearningRate 0.0445 Epoch: 6 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:53,312-Speed 5600.68 samples/sec Loss 6.3129 LearningRate 0.0445 Epoch: 6 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:55,151-Speed 5568.78 samples/sec Loss 6.4012 LearningRate 0.0445 Epoch: 6 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:56,972-Speed 5627.33 samples/sec Loss 6.3893 LearningRate 0.0445 Epoch: 6 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:49:58,793-Speed 5624.84 samples/sec Loss 6.5036 LearningRate 0.0445 Epoch: 6 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:00,609-Speed 5640.45 samples/sec Loss 6.5370 LearningRate 0.0444 Epoch: 6 Global Step: 37910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:02,435-Speed 5609.45 samples/sec Loss 6.3124 LearningRate 0.0444 Epoch: 6 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:04,262-Speed 5606.32 samples/sec Loss 6.4628 LearningRate 0.0444 Epoch: 6 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:06,091-Speed 5601.67 samples/sec Loss 6.4595 LearningRate 0.0444 Epoch: 6 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:07,918-Speed 5606.76 samples/sec Loss 6.3599 LearningRate 0.0444 Epoch: 6 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:09,746-Speed 5603.32 samples/sec Loss 6.3656 LearningRate 0.0444 Epoch: 6 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:11,583-Speed 5573.91 samples/sec Loss 6.3933 LearningRate 0.0444 Epoch: 6 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:13,413-Speed 5599.83 samples/sec Loss 6.3271 LearningRate 0.0444 Epoch: 6 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:15,245-Speed 5590.90 samples/sec Loss 6.5019 LearningRate 0.0443 Epoch: 6 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:17,088-Speed 5559.10 samples/sec Loss 6.4761 LearningRate 0.0443 Epoch: 6 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:50:43,634-[lfw][38000]XNorm: 21.222165 Training: 2022-04-27 03:50:43,635-[lfw][38000]Accuracy-Flip: 0.99683+-0.00302 Training: 2022-04-27 03:50:43,635-[lfw][38000]Accuracy-Highest: 0.99750 Training: 2022-04-27 03:51:14,127-[cfp_fp][38000]XNorm: 18.071753 Training: 2022-04-27 03:51:14,128-[cfp_fp][38000]Accuracy-Flip: 0.94043+-0.01036 Training: 2022-04-27 03:51:14,128-[cfp_fp][38000]Accuracy-Highest: 0.94771 Training: 2022-04-27 03:51:40,431-[agedb_30][38000]XNorm: 20.705648 Training: 2022-04-27 03:51:40,431-[agedb_30][38000]Accuracy-Flip: 0.97150+-0.00947 Training: 2022-04-27 03:51:40,432-[agedb_30][38000]Accuracy-Highest: 0.97300 Training: 2022-04-27 03:51:42,284-Speed 120.19 samples/sec Loss 6.4551 LearningRate 0.0443 Epoch: 6 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:51:44,102-Speed 5636.19 samples/sec Loss 6.4517 LearningRate 0.0443 Epoch: 6 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:51:45,912-Speed 5659.25 samples/sec Loss 6.4428 LearningRate 0.0443 Epoch: 6 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:47,716-Speed 5678.98 samples/sec Loss 6.4753 LearningRate 0.0443 Epoch: 6 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:49,542-Speed 5610.23 samples/sec Loss 6.3447 LearningRate 0.0443 Epoch: 6 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:51,351-Speed 5659.27 samples/sec Loss 6.3783 LearningRate 0.0443 Epoch: 6 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:53,159-Speed 5666.70 samples/sec Loss 6.4396 LearningRate 0.0443 Epoch: 6 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:54,972-Speed 5649.81 samples/sec Loss 6.5568 LearningRate 0.0442 Epoch: 6 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:56,791-Speed 5632.86 samples/sec Loss 6.4915 LearningRate 0.0442 Epoch: 6 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:51:58,610-Speed 5630.67 samples/sec Loss 6.3708 LearningRate 0.0442 Epoch: 6 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:00,425-Speed 5644.08 samples/sec Loss 6.3992 LearningRate 0.0442 Epoch: 6 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:02,233-Speed 5666.17 samples/sec Loss 6.3999 LearningRate 0.0442 Epoch: 6 Global Step: 38120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:04,074-Speed 5561.29 samples/sec Loss 6.4238 LearningRate 0.0442 Epoch: 6 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:05,895-Speed 5626.39 samples/sec Loss 6.4116 LearningRate 0.0442 Epoch: 6 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:07,710-Speed 5642.85 samples/sec Loss 6.4352 LearningRate 0.0442 Epoch: 6 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:09,539-Speed 5600.87 samples/sec Loss 6.4688 LearningRate 0.0441 Epoch: 6 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:11,361-Speed 5623.50 samples/sec Loss 6.4016 LearningRate 0.0441 Epoch: 6 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:13,178-Speed 5637.85 samples/sec Loss 6.4413 LearningRate 0.0441 Epoch: 6 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:14,994-Speed 5639.97 samples/sec Loss 6.2987 LearningRate 0.0441 Epoch: 6 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:16,809-Speed 5642.65 samples/sec Loss 6.4706 LearningRate 0.0441 Epoch: 6 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:18,656-Speed 5547.03 samples/sec Loss 6.5385 LearningRate 0.0441 Epoch: 6 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:20,496-Speed 5567.81 samples/sec Loss 6.5234 LearningRate 0.0441 Epoch: 6 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:22,327-Speed 5591.88 samples/sec Loss 6.3267 LearningRate 0.0441 Epoch: 6 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:24,161-Speed 5587.32 samples/sec Loss 6.2691 LearningRate 0.0441 Epoch: 6 Global Step: 38240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:25,973-Speed 5654.02 samples/sec Loss 6.5153 LearningRate 0.0440 Epoch: 6 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:27,791-Speed 5632.50 samples/sec Loss 6.3544 LearningRate 0.0440 Epoch: 6 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:29,623-Speed 5592.36 samples/sec Loss 6.4612 LearningRate 0.0440 Epoch: 6 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:31,453-Speed 5596.91 samples/sec Loss 6.3152 LearningRate 0.0440 Epoch: 6 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:33,279-Speed 5609.50 samples/sec Loss 6.5122 LearningRate 0.0440 Epoch: 6 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:35,099-Speed 5628.75 samples/sec Loss 6.4120 LearningRate 0.0440 Epoch: 6 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:36,932-Speed 5588.40 samples/sec Loss 6.2830 LearningRate 0.0440 Epoch: 6 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:38,756-Speed 5615.26 samples/sec Loss 6.1685 LearningRate 0.0440 Epoch: 6 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:40,557-Speed 5687.53 samples/sec Loss 6.3116 LearningRate 0.0439 Epoch: 6 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:42,393-Speed 5580.88 samples/sec Loss 6.3226 LearningRate 0.0439 Epoch: 6 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:52:44,234-Speed 5563.30 samples/sec Loss 6.3997 LearningRate 0.0439 Epoch: 6 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:46,071-Speed 5575.68 samples/sec Loss 6.3735 LearningRate 0.0439 Epoch: 6 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:47,894-Speed 5619.68 samples/sec Loss 6.3853 LearningRate 0.0439 Epoch: 6 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:49,709-Speed 5644.54 samples/sec Loss 6.4420 LearningRate 0.0439 Epoch: 6 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:51,525-Speed 5637.99 samples/sec Loss 6.2830 LearningRate 0.0439 Epoch: 6 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:53,353-Speed 5604.81 samples/sec Loss 6.3153 LearningRate 0.0439 Epoch: 6 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:55,168-Speed 5643.75 samples/sec Loss 6.3492 LearningRate 0.0439 Epoch: 6 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:56,995-Speed 5609.11 samples/sec Loss 6.3948 LearningRate 0.0438 Epoch: 6 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:52:58,832-Speed 5574.69 samples/sec Loss 6.3303 LearningRate 0.0438 Epoch: 6 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:00,638-Speed 5671.43 samples/sec Loss 6.4623 LearningRate 0.0438 Epoch: 6 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:02,463-Speed 5614.48 samples/sec Loss 6.4525 LearningRate 0.0438 Epoch: 6 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:04,283-Speed 5627.28 samples/sec Loss 6.3420 LearningRate 0.0438 Epoch: 6 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:06,126-Speed 5558.22 samples/sec Loss 6.2942 LearningRate 0.0438 Epoch: 6 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:07,953-Speed 5605.10 samples/sec Loss 6.3784 LearningRate 0.0438 Epoch: 6 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:09,762-Speed 5662.71 samples/sec Loss 6.4106 LearningRate 0.0438 Epoch: 6 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:11,591-Speed 5602.38 samples/sec Loss 6.3250 LearningRate 0.0438 Epoch: 6 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:13,408-Speed 5635.08 samples/sec Loss 6.5364 LearningRate 0.0437 Epoch: 6 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:15,222-Speed 5647.49 samples/sec Loss 6.3634 LearningRate 0.0437 Epoch: 6 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:17,045-Speed 5619.98 samples/sec Loss 6.4310 LearningRate 0.0437 Epoch: 6 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:18,863-Speed 5634.87 samples/sec Loss 6.3721 LearningRate 0.0437 Epoch: 6 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:20,695-Speed 5593.40 samples/sec Loss 6.2946 LearningRate 0.0437 Epoch: 6 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:22,519-Speed 5613.32 samples/sec Loss 6.3011 LearningRate 0.0437 Epoch: 6 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:24,356-Speed 5580.53 samples/sec Loss 6.2393 LearningRate 0.0437 Epoch: 6 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:26,198-Speed 5558.34 samples/sec Loss 6.5762 LearningRate 0.0437 Epoch: 6 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:28,029-Speed 5596.92 samples/sec Loss 6.2853 LearningRate 0.0436 Epoch: 6 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:29,856-Speed 5604.24 samples/sec Loss 6.3660 LearningRate 0.0436 Epoch: 6 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:31,666-Speed 5659.37 samples/sec Loss 6.4881 LearningRate 0.0436 Epoch: 6 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:33,487-Speed 5625.16 samples/sec Loss 6.4220 LearningRate 0.0436 Epoch: 6 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:35,309-Speed 5623.06 samples/sec Loss 6.3202 LearningRate 0.0436 Epoch: 6 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:37,128-Speed 5629.89 samples/sec Loss 6.3718 LearningRate 0.0436 Epoch: 6 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:53:38,926-Speed 5698.96 samples/sec Loss 6.5327 LearningRate 0.0436 Epoch: 6 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:40,744-Speed 5634.46 samples/sec Loss 6.2710 LearningRate 0.0436 Epoch: 6 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:42,571-Speed 5605.88 samples/sec Loss 6.3478 LearningRate 0.0436 Epoch: 6 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:44,391-Speed 5628.09 samples/sec Loss 6.2888 LearningRate 0.0435 Epoch: 6 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:46,209-Speed 5637.63 samples/sec Loss 6.4561 LearningRate 0.0435 Epoch: 6 Global Step: 38690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:48,032-Speed 5617.58 samples/sec Loss 6.2375 LearningRate 0.0435 Epoch: 6 Global Step: 38700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:49,848-Speed 5641.64 samples/sec Loss 6.3544 LearningRate 0.0435 Epoch: 6 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:51,674-Speed 5609.46 samples/sec Loss 6.3310 LearningRate 0.0435 Epoch: 6 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:53,491-Speed 5637.40 samples/sec Loss 6.3952 LearningRate 0.0435 Epoch: 6 Global Step: 38730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:55,334-Speed 5559.25 samples/sec Loss 6.4519 LearningRate 0.0435 Epoch: 6 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:57,155-Speed 5625.56 samples/sec Loss 6.3068 LearningRate 0.0435 Epoch: 6 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:53:58,983-Speed 5601.98 samples/sec Loss 6.2259 LearningRate 0.0434 Epoch: 6 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:00,795-Speed 5655.22 samples/sec Loss 6.4730 LearningRate 0.0434 Epoch: 6 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:02,612-Speed 5634.56 samples/sec Loss 6.5117 LearningRate 0.0434 Epoch: 6 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:04,433-Speed 5628.29 samples/sec Loss 6.4401 LearningRate 0.0434 Epoch: 6 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:06,246-Speed 5648.38 samples/sec Loss 6.4722 LearningRate 0.0434 Epoch: 6 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:08,082-Speed 5580.95 samples/sec Loss 6.3986 LearningRate 0.0434 Epoch: 6 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:09,903-Speed 5622.54 samples/sec Loss 6.3373 LearningRate 0.0434 Epoch: 6 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:11,721-Speed 5634.52 samples/sec Loss 6.3460 LearningRate 0.0434 Epoch: 6 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:13,556-Speed 5582.27 samples/sec Loss 6.2923 LearningRate 0.0434 Epoch: 6 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:15,383-Speed 5606.76 samples/sec Loss 6.2249 LearningRate 0.0433 Epoch: 6 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:54:17,200-Speed 5637.65 samples/sec Loss 6.3889 LearningRate 0.0433 Epoch: 6 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:19,052-Speed 5530.74 samples/sec Loss 6.2983 LearningRate 0.0433 Epoch: 6 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:20,882-Speed 5598.62 samples/sec Loss 6.3237 LearningRate 0.0433 Epoch: 6 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:22,726-Speed 5554.05 samples/sec Loss 6.4499 LearningRate 0.0433 Epoch: 6 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:24,562-Speed 5579.70 samples/sec Loss 6.4244 LearningRate 0.0433 Epoch: 6 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:26,413-Speed 5535.07 samples/sec Loss 6.2212 LearningRate 0.0433 Epoch: 6 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:28,245-Speed 5590.59 samples/sec Loss 6.3596 LearningRate 0.0433 Epoch: 6 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:30,072-Speed 5607.92 samples/sec Loss 6.3625 LearningRate 0.0433 Epoch: 6 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:31,896-Speed 5614.90 samples/sec Loss 6.1933 LearningRate 0.0432 Epoch: 6 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:33,725-Speed 5600.49 samples/sec Loss 6.2491 LearningRate 0.0432 Epoch: 6 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:35,543-Speed 5636.20 samples/sec Loss 6.4729 LearningRate 0.0432 Epoch: 6 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:54:37,371-Speed 5602.41 samples/sec Loss 6.4041 LearningRate 0.0432 Epoch: 6 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:54:39,205-Speed 5584.54 samples/sec Loss 6.2740 LearningRate 0.0432 Epoch: 6 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:54:41,039-Speed 5585.48 samples/sec Loss 6.4090 LearningRate 0.0432 Epoch: 6 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:42,865-Speed 5611.26 samples/sec Loss 6.2934 LearningRate 0.0432 Epoch: 6 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:44,705-Speed 5565.87 samples/sec Loss 6.3074 LearningRate 0.0432 Epoch: 6 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:46,541-Speed 5579.35 samples/sec Loss 6.2355 LearningRate 0.0431 Epoch: 6 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:48,361-Speed 5629.49 samples/sec Loss 6.3448 LearningRate 0.0431 Epoch: 6 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:50,193-Speed 5590.94 samples/sec Loss 6.3079 LearningRate 0.0431 Epoch: 6 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:52,008-Speed 5645.47 samples/sec Loss 6.3386 LearningRate 0.0431 Epoch: 6 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:53,827-Speed 5629.50 samples/sec Loss 6.2882 LearningRate 0.0431 Epoch: 6 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:55,657-Speed 5599.12 samples/sec Loss 6.2671 LearningRate 0.0431 Epoch: 6 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:57,475-Speed 5634.24 samples/sec Loss 6.3614 LearningRate 0.0431 Epoch: 6 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:54:59,288-Speed 5647.84 samples/sec Loss 6.3873 LearningRate 0.0431 Epoch: 6 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:01,106-Speed 5634.26 samples/sec Loss 6.3947 LearningRate 0.0431 Epoch: 6 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:02,921-Speed 5644.98 samples/sec Loss 6.2470 LearningRate 0.0430 Epoch: 6 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:04,743-Speed 5622.12 samples/sec Loss 6.5578 LearningRate 0.0430 Epoch: 6 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:06,572-Speed 5600.84 samples/sec Loss 6.3257 LearningRate 0.0430 Epoch: 6 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:08,403-Speed 5594.48 samples/sec Loss 6.2434 LearningRate 0.0430 Epoch: 6 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:10,251-Speed 5542.19 samples/sec Loss 6.2537 LearningRate 0.0430 Epoch: 6 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:12,070-Speed 5632.46 samples/sec Loss 6.4199 LearningRate 0.0430 Epoch: 6 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:13,897-Speed 5605.93 samples/sec Loss 6.2903 LearningRate 0.0430 Epoch: 6 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:15,720-Speed 5620.55 samples/sec Loss 6.3226 LearningRate 0.0430 Epoch: 6 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:17,543-Speed 5619.12 samples/sec Loss 6.3284 LearningRate 0.0430 Epoch: 6 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:19,375-Speed 5590.64 samples/sec Loss 6.4716 LearningRate 0.0429 Epoch: 6 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:21,210-Speed 5580.43 samples/sec Loss 6.3376 LearningRate 0.0429 Epoch: 6 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:23,040-Speed 5601.78 samples/sec Loss 6.3181 LearningRate 0.0429 Epoch: 6 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:24,862-Speed 5620.65 samples/sec Loss 6.4797 LearningRate 0.0429 Epoch: 6 Global Step: 39230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:26,695-Speed 5586.92 samples/sec Loss 6.3337 LearningRate 0.0429 Epoch: 6 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:28,533-Speed 5575.43 samples/sec Loss 6.3301 LearningRate 0.0429 Epoch: 6 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:30,357-Speed 5613.17 samples/sec Loss 6.3516 LearningRate 0.0429 Epoch: 6 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:32,188-Speed 5596.26 samples/sec Loss 6.2448 LearningRate 0.0429 Epoch: 6 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:34,005-Speed 5637.37 samples/sec Loss 6.3257 LearningRate 0.0428 Epoch: 6 Global Step: 39280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:35,861-Speed 5518.69 samples/sec Loss 6.2782 LearningRate 0.0428 Epoch: 6 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:37,698-Speed 5578.30 samples/sec Loss 6.2597 LearningRate 0.0428 Epoch: 6 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:39,511-Speed 5648.48 samples/sec Loss 6.3147 LearningRate 0.0428 Epoch: 6 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:41,329-Speed 5634.69 samples/sec Loss 6.3121 LearningRate 0.0428 Epoch: 6 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:43,150-Speed 5623.28 samples/sec Loss 6.3828 LearningRate 0.0428 Epoch: 6 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:44,974-Speed 5616.17 samples/sec Loss 6.1764 LearningRate 0.0428 Epoch: 6 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:46,804-Speed 5600.23 samples/sec Loss 6.3604 LearningRate 0.0428 Epoch: 6 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:48,646-Speed 5559.30 samples/sec Loss 6.4487 LearningRate 0.0428 Epoch: 6 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:55:50,480-Speed 5586.90 samples/sec Loss 6.2570 LearningRate 0.0427 Epoch: 6 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:52,320-Speed 5565.68 samples/sec Loss 6.2742 LearningRate 0.0427 Epoch: 6 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:54,147-Speed 5608.04 samples/sec Loss 6.3976 LearningRate 0.0427 Epoch: 6 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:55,998-Speed 5534.17 samples/sec Loss 6.2771 LearningRate 0.0427 Epoch: 6 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:57,826-Speed 5602.11 samples/sec Loss 6.3059 LearningRate 0.0427 Epoch: 6 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:55:59,639-Speed 5650.74 samples/sec Loss 6.3455 LearningRate 0.0427 Epoch: 6 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:01,454-Speed 5645.89 samples/sec Loss 6.3529 LearningRate 0.0427 Epoch: 6 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:03,267-Speed 5647.61 samples/sec Loss 6.1618 LearningRate 0.0427 Epoch: 6 Global Step: 39440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:05,112-Speed 5553.99 samples/sec Loss 6.2038 LearningRate 0.0427 Epoch: 6 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:06,975-Speed 5495.50 samples/sec Loss 6.3129 LearningRate 0.0426 Epoch: 6 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:08,792-Speed 5638.76 samples/sec Loss 6.1712 LearningRate 0.0426 Epoch: 6 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:10,621-Speed 5599.80 samples/sec Loss 6.3842 LearningRate 0.0426 Epoch: 6 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:12,446-Speed 5615.41 samples/sec Loss 6.3737 LearningRate 0.0426 Epoch: 6 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:14,258-Speed 5651.16 samples/sec Loss 6.2840 LearningRate 0.0426 Epoch: 6 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:16,088-Speed 5599.57 samples/sec Loss 6.3270 LearningRate 0.0426 Epoch: 6 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:17,925-Speed 5575.88 samples/sec Loss 6.3274 LearningRate 0.0426 Epoch: 6 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:19,744-Speed 5630.93 samples/sec Loss 6.3720 LearningRate 0.0426 Epoch: 6 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:21,561-Speed 5638.60 samples/sec Loss 6.2936 LearningRate 0.0426 Epoch: 6 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:23,382-Speed 5625.24 samples/sec Loss 6.2988 LearningRate 0.0425 Epoch: 6 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:25,194-Speed 5652.79 samples/sec Loss 6.3578 LearningRate 0.0425 Epoch: 6 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:27,013-Speed 5630.05 samples/sec Loss 6.2536 LearningRate 0.0425 Epoch: 6 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:28,843-Speed 5597.48 samples/sec Loss 6.3029 LearningRate 0.0425 Epoch: 6 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:30,670-Speed 5606.25 samples/sec Loss 6.2353 LearningRate 0.0425 Epoch: 6 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:32,498-Speed 5603.01 samples/sec Loss 6.3529 LearningRate 0.0425 Epoch: 6 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:34,340-Speed 5560.77 samples/sec Loss 6.3187 LearningRate 0.0425 Epoch: 6 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:36,164-Speed 5618.00 samples/sec Loss 6.2281 LearningRate 0.0425 Epoch: 6 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:38,008-Speed 5555.89 samples/sec Loss 6.2307 LearningRate 0.0424 Epoch: 6 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:39,845-Speed 5576.34 samples/sec Loss 6.2491 LearningRate 0.0424 Epoch: 6 Global Step: 39640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:41,656-Speed 5657.77 samples/sec Loss 6.2771 LearningRate 0.0424 Epoch: 6 Global Step: 39650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:43,485-Speed 5600.52 samples/sec Loss 6.3360 LearningRate 0.0424 Epoch: 6 Global Step: 39660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:45,314-Speed 5599.69 samples/sec Loss 6.2785 LearningRate 0.0424 Epoch: 6 Global Step: 39670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:47,147-Speed 5586.89 samples/sec Loss 6.2380 LearningRate 0.0424 Epoch: 6 Global Step: 39680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:49,002-Speed 5521.75 samples/sec Loss 6.3907 LearningRate 0.0424 Epoch: 6 Global Step: 39690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:56:50,826-Speed 5617.36 samples/sec Loss 6.2675 LearningRate 0.0424 Epoch: 6 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:52,629-Speed 5681.55 samples/sec Loss 6.3084 LearningRate 0.0424 Epoch: 6 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:54,454-Speed 5614.34 samples/sec Loss 6.1734 LearningRate 0.0423 Epoch: 6 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:56,264-Speed 5656.93 samples/sec Loss 6.3382 LearningRate 0.0423 Epoch: 6 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:58,074-Speed 5662.61 samples/sec Loss 6.2875 LearningRate 0.0423 Epoch: 6 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:56:59,890-Speed 5641.45 samples/sec Loss 6.3057 LearningRate 0.0423 Epoch: 6 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:01,699-Speed 5662.22 samples/sec Loss 6.3860 LearningRate 0.0423 Epoch: 6 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:03,512-Speed 5647.42 samples/sec Loss 6.1675 LearningRate 0.0423 Epoch: 6 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:05,336-Speed 5616.39 samples/sec Loss 6.1490 LearningRate 0.0423 Epoch: 6 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:07,151-Speed 5645.33 samples/sec Loss 6.2811 LearningRate 0.0423 Epoch: 6 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:09,009-Speed 5511.73 samples/sec Loss 6.2505 LearningRate 0.0423 Epoch: 6 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:20,926-Speed 859.35 samples/sec Loss 5.7272 LearningRate 0.0422 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:22,802-Speed 5460.94 samples/sec Loss 5.6003 LearningRate 0.0422 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:24,670-Speed 5484.44 samples/sec Loss 5.6331 LearningRate 0.0422 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:26,509-Speed 5568.27 samples/sec Loss 5.5240 LearningRate 0.0422 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:28,337-Speed 5603.92 samples/sec Loss 5.6577 LearningRate 0.0422 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:30,499-Speed 4737.62 samples/sec Loss 5.8037 LearningRate 0.0422 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:32,350-Speed 5535.77 samples/sec Loss 5.7585 LearningRate 0.0422 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:34,177-Speed 5606.58 samples/sec Loss 5.6469 LearningRate 0.0422 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:36,014-Speed 5576.89 samples/sec Loss 5.8597 LearningRate 0.0421 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:37,843-Speed 5601.31 samples/sec Loss 5.6976 LearningRate 0.0421 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:39,691-Speed 5540.80 samples/sec Loss 5.6961 LearningRate 0.0421 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:57:41,542-Speed 5536.02 samples/sec Loss 5.6634 LearningRate 0.0421 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:43,359-Speed 5639.01 samples/sec Loss 5.6733 LearningRate 0.0421 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:45,225-Speed 5489.03 samples/sec Loss 5.7822 LearningRate 0.0421 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:47,070-Speed 5550.82 samples/sec Loss 5.5437 LearningRate 0.0421 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:48,888-Speed 5636.61 samples/sec Loss 5.6270 LearningRate 0.0421 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:50,706-Speed 5633.86 samples/sec Loss 5.6557 LearningRate 0.0421 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:52,547-Speed 5563.94 samples/sec Loss 5.7182 LearningRate 0.0420 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:54,373-Speed 5611.50 samples/sec Loss 5.9953 LearningRate 0.0420 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:57:56,197-Speed 5615.86 samples/sec Loss 5.7426 LearningRate 0.0420 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:58:22,205-[lfw][40000]XNorm: 22.794917 Training: 2022-04-27 03:58:22,205-[lfw][40000]Accuracy-Flip: 0.99583+-0.00300 Training: 2022-04-27 03:58:22,206-[lfw][40000]Accuracy-Highest: 0.99750 Training: 2022-04-27 03:58:52,351-[cfp_fp][40000]XNorm: 19.739909 Training: 2022-04-27 03:58:52,351-[cfp_fp][40000]Accuracy-Flip: 0.93886+-0.01240 Training: 2022-04-27 03:58:52,352-[cfp_fp][40000]Accuracy-Highest: 0.94771 Training: 2022-04-27 03:59:18,385-[agedb_30][40000]XNorm: 22.522597 Training: 2022-04-27 03:59:18,385-[agedb_30][40000]Accuracy-Flip: 0.97117+-0.00522 Training: 2022-04-27 03:59:18,385-[agedb_30][40000]Accuracy-Highest: 0.97300 Training: 2022-04-27 03:59:20,220-Speed 121.87 samples/sec Loss 5.7374 LearningRate 0.0420 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:22,017-Speed 5702.18 samples/sec Loss 5.7600 LearningRate 0.0420 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:23,835-Speed 5633.24 samples/sec Loss 5.8910 LearningRate 0.0420 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:25,662-Speed 5608.80 samples/sec Loss 5.9187 LearningRate 0.0420 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:27,480-Speed 5633.67 samples/sec Loss 5.8355 LearningRate 0.0420 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:29,314-Speed 5585.54 samples/sec Loss 5.7933 LearningRate 0.0420 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:31,131-Speed 5638.13 samples/sec Loss 5.5650 LearningRate 0.0419 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:32,957-Speed 5609.49 samples/sec Loss 5.7391 LearningRate 0.0419 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:34,796-Speed 5571.22 samples/sec Loss 6.0047 LearningRate 0.0419 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:36,628-Speed 5588.76 samples/sec Loss 5.7968 LearningRate 0.0419 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:38,455-Speed 5606.92 samples/sec Loss 5.6955 LearningRate 0.0419 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:40,273-Speed 5634.97 samples/sec Loss 5.8867 LearningRate 0.0419 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:42,082-Speed 5662.89 samples/sec Loss 5.8195 LearningRate 0.0419 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:43,906-Speed 5615.37 samples/sec Loss 5.9328 LearningRate 0.0419 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:45,727-Speed 5626.75 samples/sec Loss 5.8665 LearningRate 0.0419 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:47,582-Speed 5521.78 samples/sec Loss 5.7694 LearningRate 0.0418 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 03:59:49,418-Speed 5578.66 samples/sec Loss 5.9927 LearningRate 0.0418 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:51,250-Speed 5591.61 samples/sec Loss 5.9488 LearningRate 0.0418 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:53,106-Speed 5517.87 samples/sec Loss 5.8938 LearningRate 0.0418 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:54,923-Speed 5639.36 samples/sec Loss 5.7992 LearningRate 0.0418 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:56,752-Speed 5600.55 samples/sec Loss 5.8906 LearningRate 0.0418 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 03:59:58,585-Speed 5588.40 samples/sec Loss 5.8966 LearningRate 0.0418 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:00,395-Speed 5657.00 samples/sec Loss 5.8452 LearningRate 0.0418 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:02,204-Speed 5663.47 samples/sec Loss 5.8466 LearningRate 0.0418 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:04,011-Speed 5671.60 samples/sec Loss 5.9040 LearningRate 0.0417 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:05,844-Speed 5585.98 samples/sec Loss 5.7977 LearningRate 0.0417 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:07,651-Speed 5668.66 samples/sec Loss 5.9248 LearningRate 0.0417 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:09,486-Speed 5582.86 samples/sec Loss 5.9767 LearningRate 0.0417 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:11,301-Speed 5642.87 samples/sec Loss 5.8902 LearningRate 0.0417 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:13,129-Speed 5603.47 samples/sec Loss 5.8846 LearningRate 0.0417 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:14,945-Speed 5642.53 samples/sec Loss 5.8743 LearningRate 0.0417 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:16,755-Speed 5658.87 samples/sec Loss 5.9951 LearningRate 0.0417 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:18,573-Speed 5634.48 samples/sec Loss 5.8198 LearningRate 0.0416 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:20,388-Speed 5642.12 samples/sec Loss 6.0333 LearningRate 0.0416 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:22,215-Speed 5612.54 samples/sec Loss 5.8758 LearningRate 0.0416 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:24,053-Speed 5572.52 samples/sec Loss 5.8821 LearningRate 0.0416 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:25,926-Speed 5467.39 samples/sec Loss 6.0120 LearningRate 0.0416 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:27,792-Speed 5491.93 samples/sec Loss 6.1105 LearningRate 0.0416 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:29,626-Speed 5582.59 samples/sec Loss 5.8884 LearningRate 0.0416 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:31,440-Speed 5647.28 samples/sec Loss 5.9589 LearningRate 0.0416 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:33,286-Speed 5549.97 samples/sec Loss 5.9397 LearningRate 0.0416 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:35,110-Speed 5614.96 samples/sec Loss 6.0534 LearningRate 0.0415 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:36,932-Speed 5622.78 samples/sec Loss 5.7971 LearningRate 0.0415 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:38,752-Speed 5627.41 samples/sec Loss 6.0672 LearningRate 0.0415 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:40,587-Speed 5584.32 samples/sec Loss 5.9733 LearningRate 0.0415 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:42,417-Speed 5597.36 samples/sec Loss 5.9466 LearningRate 0.0415 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:44,241-Speed 5613.22 samples/sec Loss 6.0344 LearningRate 0.0415 Epoch: 7 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:46,105-Speed 5496.81 samples/sec Loss 5.9919 LearningRate 0.0415 Epoch: 7 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:47,937-Speed 5593.45 samples/sec Loss 6.0547 LearningRate 0.0415 Epoch: 7 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:00:49,795-Speed 5512.74 samples/sec Loss 5.8505 LearningRate 0.0415 Epoch: 7 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:51,685-Speed 5420.04 samples/sec Loss 5.8578 LearningRate 0.0414 Epoch: 7 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:53,505-Speed 5626.81 samples/sec Loss 5.8981 LearningRate 0.0414 Epoch: 7 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:55,347-Speed 5560.75 samples/sec Loss 5.9429 LearningRate 0.0414 Epoch: 7 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:57,197-Speed 5538.02 samples/sec Loss 5.9234 LearningRate 0.0414 Epoch: 7 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:00:59,035-Speed 5572.16 samples/sec Loss 5.9829 LearningRate 0.0414 Epoch: 7 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:00,872-Speed 5575.40 samples/sec Loss 5.9985 LearningRate 0.0414 Epoch: 7 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:02,714-Speed 5560.49 samples/sec Loss 6.1632 LearningRate 0.0414 Epoch: 7 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:04,532-Speed 5634.00 samples/sec Loss 5.8989 LearningRate 0.0414 Epoch: 7 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:06,368-Speed 5581.80 samples/sec Loss 6.0403 LearningRate 0.0414 Epoch: 7 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:08,186-Speed 5634.40 samples/sec Loss 5.8996 LearningRate 0.0413 Epoch: 7 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:10,043-Speed 5516.58 samples/sec Loss 6.0297 LearningRate 0.0413 Epoch: 7 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:11,853-Speed 5657.51 samples/sec Loss 6.0319 LearningRate 0.0413 Epoch: 7 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:13,685-Speed 5593.59 samples/sec Loss 6.1131 LearningRate 0.0413 Epoch: 7 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:15,545-Speed 5506.75 samples/sec Loss 6.1633 LearningRate 0.0413 Epoch: 7 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:17,383-Speed 5573.69 samples/sec Loss 5.9573 LearningRate 0.0413 Epoch: 7 Global Step: 40650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:19,222-Speed 5569.39 samples/sec Loss 6.0951 LearningRate 0.0413 Epoch: 7 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:21,037-Speed 5644.03 samples/sec Loss 6.0730 LearningRate 0.0413 Epoch: 7 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:22,862-Speed 5611.15 samples/sec Loss 6.1305 LearningRate 0.0413 Epoch: 7 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:24,681-Speed 5632.12 samples/sec Loss 5.9536 LearningRate 0.0412 Epoch: 7 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:26,535-Speed 5525.54 samples/sec Loss 5.9679 LearningRate 0.0412 Epoch: 7 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:28,372-Speed 5574.44 samples/sec Loss 5.9689 LearningRate 0.0412 Epoch: 7 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:30,200-Speed 5603.71 samples/sec Loss 6.1464 LearningRate 0.0412 Epoch: 7 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:32,051-Speed 5535.00 samples/sec Loss 6.1266 LearningRate 0.0412 Epoch: 7 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:33,882-Speed 5594.92 samples/sec Loss 6.0408 LearningRate 0.0412 Epoch: 7 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:35,700-Speed 5633.58 samples/sec Loss 6.0001 LearningRate 0.0412 Epoch: 7 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:37,537-Speed 5578.21 samples/sec Loss 6.0488 LearningRate 0.0412 Epoch: 7 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:39,371-Speed 5584.14 samples/sec Loss 5.9507 LearningRate 0.0412 Epoch: 7 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:01:41,175-Speed 5677.44 samples/sec Loss 6.1184 LearningRate 0.0411 Epoch: 7 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:43,001-Speed 5610.88 samples/sec Loss 6.0774 LearningRate 0.0411 Epoch: 7 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:44,845-Speed 5559.38 samples/sec Loss 6.0225 LearningRate 0.0411 Epoch: 7 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:46,674-Speed 5600.83 samples/sec Loss 5.8966 LearningRate 0.0411 Epoch: 7 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:48,514-Speed 5566.83 samples/sec Loss 6.1604 LearningRate 0.0411 Epoch: 7 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:50,369-Speed 5520.78 samples/sec Loss 6.0833 LearningRate 0.0411 Epoch: 7 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:52,234-Speed 5492.65 samples/sec Loss 6.0428 LearningRate 0.0411 Epoch: 7 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:54,145-Speed 5361.70 samples/sec Loss 6.0688 LearningRate 0.0411 Epoch: 7 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:55,979-Speed 5586.03 samples/sec Loss 6.1262 LearningRate 0.0410 Epoch: 7 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:57,797-Speed 5634.90 samples/sec Loss 6.0797 LearningRate 0.0410 Epoch: 7 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:01:59,643-Speed 5547.77 samples/sec Loss 6.1024 LearningRate 0.0410 Epoch: 7 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:01,462-Speed 5631.58 samples/sec Loss 6.0159 LearningRate 0.0410 Epoch: 7 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:03,271-Speed 5661.43 samples/sec Loss 6.1905 LearningRate 0.0410 Epoch: 7 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:05,121-Speed 5537.19 samples/sec Loss 6.1464 LearningRate 0.0410 Epoch: 7 Global Step: 40910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:06,943-Speed 5623.15 samples/sec Loss 6.0393 LearningRate 0.0410 Epoch: 7 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:08,775-Speed 5591.30 samples/sec Loss 5.9744 LearningRate 0.0410 Epoch: 7 Global Step: 40930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:10,619-Speed 5553.81 samples/sec Loss 6.0879 LearningRate 0.0410 Epoch: 7 Global Step: 40940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:12,432-Speed 5650.00 samples/sec Loss 5.9193 LearningRate 0.0409 Epoch: 7 Global Step: 40950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:14,300-Speed 5483.75 samples/sec Loss 5.9980 LearningRate 0.0409 Epoch: 7 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:16,138-Speed 5574.33 samples/sec Loss 6.0177 LearningRate 0.0409 Epoch: 7 Global Step: 40970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:17,981-Speed 5557.01 samples/sec Loss 6.0148 LearningRate 0.0409 Epoch: 7 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:19,796-Speed 5645.47 samples/sec Loss 6.0670 LearningRate 0.0409 Epoch: 7 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:21,667-Speed 5474.84 samples/sec Loss 6.0782 LearningRate 0.0409 Epoch: 7 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:23,486-Speed 5630.60 samples/sec Loss 6.0039 LearningRate 0.0409 Epoch: 7 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:25,312-Speed 5609.35 samples/sec Loss 6.0044 LearningRate 0.0409 Epoch: 7 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:27,138-Speed 5609.58 samples/sec Loss 6.1958 LearningRate 0.0409 Epoch: 7 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:28,965-Speed 5606.94 samples/sec Loss 6.0262 LearningRate 0.0408 Epoch: 7 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:30,795-Speed 5598.03 samples/sec Loss 6.0882 LearningRate 0.0408 Epoch: 7 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:32,613-Speed 5632.77 samples/sec Loss 6.0052 LearningRate 0.0408 Epoch: 7 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:34,461-Speed 5543.35 samples/sec Loss 6.0269 LearningRate 0.0408 Epoch: 7 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:36,342-Speed 5446.17 samples/sec Loss 6.2014 LearningRate 0.0408 Epoch: 7 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:38,204-Speed 5504.19 samples/sec Loss 6.0242 LearningRate 0.0408 Epoch: 7 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:40,025-Speed 5624.30 samples/sec Loss 6.0676 LearningRate 0.0408 Epoch: 7 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:41,854-Speed 5599.80 samples/sec Loss 5.9011 LearningRate 0.0408 Epoch: 7 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:43,676-Speed 5620.84 samples/sec Loss 6.0890 LearningRate 0.0408 Epoch: 7 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:45,508-Speed 5591.20 samples/sec Loss 5.9788 LearningRate 0.0407 Epoch: 7 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:47,338-Speed 5598.30 samples/sec Loss 6.0184 LearningRate 0.0407 Epoch: 7 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:02:49,158-Speed 5628.27 samples/sec Loss 5.9941 LearningRate 0.0407 Epoch: 7 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:50,987-Speed 5601.91 samples/sec Loss 6.1005 LearningRate 0.0407 Epoch: 7 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:52,797-Speed 5657.26 samples/sec Loss 6.0907 LearningRate 0.0407 Epoch: 7 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:54,621-Speed 5615.00 samples/sec Loss 5.9572 LearningRate 0.0407 Epoch: 7 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:56,435-Speed 5648.29 samples/sec Loss 6.0079 LearningRate 0.0407 Epoch: 7 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:02:58,255-Speed 5628.97 samples/sec Loss 5.9664 LearningRate 0.0407 Epoch: 7 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:00,069-Speed 5647.15 samples/sec Loss 5.9427 LearningRate 0.0407 Epoch: 7 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:01,905-Speed 5578.99 samples/sec Loss 6.1523 LearningRate 0.0406 Epoch: 7 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:03,765-Speed 5507.60 samples/sec Loss 5.9890 LearningRate 0.0406 Epoch: 7 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:05,676-Speed 5361.47 samples/sec Loss 6.1253 LearningRate 0.0406 Epoch: 7 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:07,496-Speed 5628.80 samples/sec Loss 6.0597 LearningRate 0.0406 Epoch: 7 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:09,309-Speed 5648.40 samples/sec Loss 6.0662 LearningRate 0.0406 Epoch: 7 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:11,200-Speed 5417.38 samples/sec Loss 6.2670 LearningRate 0.0406 Epoch: 7 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:13,020-Speed 5627.78 samples/sec Loss 6.0883 LearningRate 0.0406 Epoch: 7 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:14,855-Speed 5582.98 samples/sec Loss 6.0995 LearningRate 0.0406 Epoch: 7 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:16,689-Speed 5584.39 samples/sec Loss 6.0607 LearningRate 0.0406 Epoch: 7 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:18,522-Speed 5588.95 samples/sec Loss 5.9383 LearningRate 0.0405 Epoch: 7 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:20,335-Speed 5649.93 samples/sec Loss 6.1578 LearningRate 0.0405 Epoch: 7 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:22,146-Speed 5654.95 samples/sec Loss 6.0725 LearningRate 0.0405 Epoch: 7 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:23,959-Speed 5651.45 samples/sec Loss 5.9779 LearningRate 0.0405 Epoch: 7 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:25,787-Speed 5602.89 samples/sec Loss 6.0706 LearningRate 0.0405 Epoch: 7 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:27,619-Speed 5595.48 samples/sec Loss 6.0015 LearningRate 0.0405 Epoch: 7 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:29,443-Speed 5616.14 samples/sec Loss 6.0984 LearningRate 0.0405 Epoch: 7 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:31,258-Speed 5641.90 samples/sec Loss 5.9710 LearningRate 0.0405 Epoch: 7 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:33,087-Speed 5600.59 samples/sec Loss 6.0272 LearningRate 0.0405 Epoch: 7 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:03:34,922-Speed 5581.45 samples/sec Loss 5.9386 LearningRate 0.0404 Epoch: 7 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:36,736-Speed 5648.72 samples/sec Loss 6.0775 LearningRate 0.0404 Epoch: 7 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:38,572-Speed 5577.82 samples/sec Loss 6.0121 LearningRate 0.0404 Epoch: 7 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:40,398-Speed 5610.55 samples/sec Loss 5.8985 LearningRate 0.0404 Epoch: 7 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:42,258-Speed 5508.28 samples/sec Loss 6.0548 LearningRate 0.0404 Epoch: 7 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:44,087-Speed 5598.96 samples/sec Loss 6.0095 LearningRate 0.0404 Epoch: 7 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:45,928-Speed 5563.82 samples/sec Loss 6.1505 LearningRate 0.0404 Epoch: 7 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:47,777-Speed 5541.88 samples/sec Loss 5.9212 LearningRate 0.0404 Epoch: 7 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:49,602-Speed 5611.71 samples/sec Loss 6.0548 LearningRate 0.0404 Epoch: 7 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:51,445-Speed 5558.39 samples/sec Loss 6.0958 LearningRate 0.0403 Epoch: 7 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:53,251-Speed 5672.26 samples/sec Loss 5.9979 LearningRate 0.0403 Epoch: 7 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:55,064-Speed 5647.59 samples/sec Loss 6.0301 LearningRate 0.0403 Epoch: 7 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:56,878-Speed 5648.72 samples/sec Loss 6.0143 LearningRate 0.0403 Epoch: 7 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:03:58,690-Speed 5652.06 samples/sec Loss 6.0581 LearningRate 0.0403 Epoch: 7 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:04:00,533-Speed 5557.00 samples/sec Loss 6.1056 LearningRate 0.0403 Epoch: 7 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:04:02,363-Speed 5599.21 samples/sec Loss 6.0896 LearningRate 0.0403 Epoch: 7 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:04,201-Speed 5573.71 samples/sec Loss 6.0349 LearningRate 0.0403 Epoch: 7 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:06,046-Speed 5550.53 samples/sec Loss 5.9799 LearningRate 0.0403 Epoch: 7 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:07,866-Speed 5629.83 samples/sec Loss 6.0423 LearningRate 0.0402 Epoch: 7 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:09,690-Speed 5616.55 samples/sec Loss 6.1574 LearningRate 0.0402 Epoch: 7 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:11,512-Speed 5620.39 samples/sec Loss 6.2088 LearningRate 0.0402 Epoch: 7 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:13,383-Speed 5475.60 samples/sec Loss 6.0565 LearningRate 0.0402 Epoch: 7 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:15,203-Speed 5628.85 samples/sec Loss 6.0566 LearningRate 0.0402 Epoch: 7 Global Step: 41620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:17,018-Speed 5643.20 samples/sec Loss 5.9441 LearningRate 0.0402 Epoch: 7 Global Step: 41630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:18,849-Speed 5593.37 samples/sec Loss 5.8315 LearningRate 0.0402 Epoch: 7 Global Step: 41640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:04:20,697-Speed 5543.12 samples/sec Loss 5.9978 LearningRate 0.0402 Epoch: 7 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:04:22,531-Speed 5585.73 samples/sec Loss 6.0218 LearningRate 0.0402 Epoch: 7 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-27 04:04:24,349-Speed 5633.38 samples/sec Loss 6.0673 LearningRate 0.0401 Epoch: 7 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:04:26,154-Speed 5673.94 samples/sec Loss 6.0035 LearningRate 0.0401 Epoch: 7 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:27,966-Speed 5655.37 samples/sec Loss 6.0645 LearningRate 0.0401 Epoch: 7 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:29,791-Speed 5612.61 samples/sec Loss 6.1100 LearningRate 0.0401 Epoch: 7 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:31,617-Speed 5608.17 samples/sec Loss 5.9683 LearningRate 0.0401 Epoch: 7 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:33,437-Speed 5630.68 samples/sec Loss 5.9942 LearningRate 0.0401 Epoch: 7 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:35,268-Speed 5592.87 samples/sec Loss 6.1931 LearningRate 0.0401 Epoch: 7 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:37,111-Speed 5558.44 samples/sec Loss 6.0725 LearningRate 0.0401 Epoch: 7 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:38,946-Speed 5581.21 samples/sec Loss 6.0976 LearningRate 0.0401 Epoch: 7 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:40,796-Speed 5537.15 samples/sec Loss 5.9628 LearningRate 0.0400 Epoch: 7 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:42,623-Speed 5608.04 samples/sec Loss 5.9955 LearningRate 0.0400 Epoch: 7 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:44,466-Speed 5555.73 samples/sec Loss 5.9431 LearningRate 0.0400 Epoch: 7 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:04:46,344-Speed 5455.05 samples/sec Loss 6.1706 LearningRate 0.0400 Epoch: 7 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:04:48,166-Speed 5622.02 samples/sec Loss 6.0119 LearningRate 0.0400 Epoch: 7 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:04:49,994-Speed 5606.00 samples/sec Loss 6.1806 LearningRate 0.0400 Epoch: 7 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:51,845-Speed 5533.93 samples/sec Loss 6.1614 LearningRate 0.0400 Epoch: 7 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:53,716-Speed 5473.55 samples/sec Loss 6.0179 LearningRate 0.0400 Epoch: 7 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:55,536-Speed 5626.70 samples/sec Loss 6.1042 LearningRate 0.0400 Epoch: 7 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:57,365-Speed 5601.83 samples/sec Loss 6.1650 LearningRate 0.0399 Epoch: 7 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:04:59,188-Speed 5618.06 samples/sec Loss 6.0372 LearningRate 0.0399 Epoch: 7 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:05:01,020-Speed 5593.05 samples/sec Loss 6.0150 LearningRate 0.0399 Epoch: 7 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:05:02,851-Speed 5593.12 samples/sec Loss 6.1965 LearningRate 0.0399 Epoch: 7 Global Step: 41880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:05:04,677-Speed 5610.53 samples/sec Loss 6.1249 LearningRate 0.0399 Epoch: 7 Global Step: 41890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:05:06,517-Speed 5565.77 samples/sec Loss 6.0487 LearningRate 0.0399 Epoch: 7 Global Step: 41900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:05:08,328-Speed 5656.36 samples/sec Loss 6.0315 LearningRate 0.0399 Epoch: 7 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:05:10,165-Speed 5576.41 samples/sec Loss 6.1938 LearningRate 0.0399 Epoch: 7 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:05:11,987-Speed 5624.50 samples/sec Loss 6.0596 LearningRate 0.0399 Epoch: 7 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:05:13,865-Speed 5453.22 samples/sec Loss 6.1408 LearningRate 0.0398 Epoch: 7 Global Step: 41940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:15,719-Speed 5524.96 samples/sec Loss 5.9951 LearningRate 0.0398 Epoch: 7 Global Step: 41950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:17,539-Speed 5628.98 samples/sec Loss 6.0947 LearningRate 0.0398 Epoch: 7 Global Step: 41960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:19,380-Speed 5564.15 samples/sec Loss 6.0018 LearningRate 0.0398 Epoch: 7 Global Step: 41970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:21,280-Speed 5392.74 samples/sec Loss 5.9395 LearningRate 0.0398 Epoch: 7 Global Step: 41980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:23,122-Speed 5558.32 samples/sec Loss 6.1379 LearningRate 0.0398 Epoch: 7 Global Step: 41990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:24,951-Speed 5601.83 samples/sec Loss 6.1014 LearningRate 0.0398 Epoch: 7 Global Step: 42000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:05:51,123-[lfw][42000]XNorm: 20.757936 Training: 2022-04-27 04:05:51,123-[lfw][42000]Accuracy-Flip: 0.99650+-0.00361 Training: 2022-04-27 04:05:51,124-[lfw][42000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:06:21,569-[cfp_fp][42000]XNorm: 17.877480 Training: 2022-04-27 04:06:21,570-[cfp_fp][42000]Accuracy-Flip: 0.94857+-0.00969 Training: 2022-04-27 04:06:21,570-[cfp_fp][42000]Accuracy-Highest: 0.94857 Training: 2022-04-27 04:06:47,839-[agedb_30][42000]XNorm: 20.327300 Training: 2022-04-27 04:06:47,840-[agedb_30][42000]Accuracy-Flip: 0.97250+-0.00817 Training: 2022-04-27 04:06:47,840-[agedb_30][42000]Accuracy-Highest: 0.97300 Training: 2022-04-27 04:06:49,689-Speed 120.84 samples/sec Loss 6.0675 LearningRate 0.0398 Epoch: 7 Global Step: 42010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 04:06:51,528-Speed 5567.47 samples/sec Loss 5.9514 LearningRate 0.0398 Epoch: 7 Global Step: 42020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 04:06:53,369-Speed 5566.32 samples/sec Loss 6.0587 LearningRate 0.0397 Epoch: 7 Global Step: 42030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 04:06:55,175-Speed 5670.44 samples/sec Loss 5.9420 LearningRate 0.0397 Epoch: 7 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:06:56,998-Speed 5618.24 samples/sec Loss 6.0608 LearningRate 0.0397 Epoch: 7 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:06:58,819-Speed 5625.62 samples/sec Loss 6.1608 LearningRate 0.0397 Epoch: 7 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:00,636-Speed 5637.05 samples/sec Loss 6.0628 LearningRate 0.0397 Epoch: 7 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:02,451-Speed 5644.49 samples/sec Loss 6.1166 LearningRate 0.0397 Epoch: 7 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:04,272-Speed 5625.01 samples/sec Loss 5.9460 LearningRate 0.0397 Epoch: 7 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:06,096-Speed 5616.52 samples/sec Loss 6.0145 LearningRate 0.0397 Epoch: 7 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:07,920-Speed 5614.86 samples/sec Loss 5.8991 LearningRate 0.0397 Epoch: 7 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:09,728-Speed 5666.09 samples/sec Loss 6.0533 LearningRate 0.0396 Epoch: 7 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:11,563-Speed 5580.09 samples/sec Loss 6.1318 LearningRate 0.0396 Epoch: 7 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 04:07:13,372-Speed 5664.18 samples/sec Loss 6.0870 LearningRate 0.0396 Epoch: 7 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:15,180-Speed 5665.33 samples/sec Loss 6.1027 LearningRate 0.0396 Epoch: 7 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:17,000-Speed 5627.80 samples/sec Loss 5.9908 LearningRate 0.0396 Epoch: 7 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:18,810-Speed 5659.70 samples/sec Loss 6.1487 LearningRate 0.0396 Epoch: 7 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:20,636-Speed 5612.25 samples/sec Loss 6.0081 LearningRate 0.0396 Epoch: 7 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:22,459-Speed 5618.11 samples/sec Loss 5.9818 LearningRate 0.0396 Epoch: 7 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:24,277-Speed 5634.17 samples/sec Loss 6.0464 LearningRate 0.0396 Epoch: 7 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:26,081-Speed 5678.38 samples/sec Loss 5.9498 LearningRate 0.0395 Epoch: 7 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:27,915-Speed 5582.67 samples/sec Loss 5.9604 LearningRate 0.0395 Epoch: 7 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:29,750-Speed 5584.14 samples/sec Loss 6.0431 LearningRate 0.0395 Epoch: 7 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:31,585-Speed 5581.66 samples/sec Loss 6.0639 LearningRate 0.0395 Epoch: 7 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:33,417-Speed 5590.17 samples/sec Loss 6.0044 LearningRate 0.0395 Epoch: 7 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:35,240-Speed 5618.93 samples/sec Loss 6.0395 LearningRate 0.0395 Epoch: 7 Global Step: 42260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:37,085-Speed 5552.58 samples/sec Loss 5.9918 LearningRate 0.0395 Epoch: 7 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:38,920-Speed 5581.41 samples/sec Loss 5.8398 LearningRate 0.0395 Epoch: 7 Global Step: 42280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:40,732-Speed 5656.48 samples/sec Loss 6.1499 LearningRate 0.0395 Epoch: 7 Global Step: 42290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:42,572-Speed 5567.98 samples/sec Loss 6.0022 LearningRate 0.0394 Epoch: 7 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:44,386-Speed 5645.14 samples/sec Loss 6.0112 LearningRate 0.0394 Epoch: 7 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:46,214-Speed 5602.57 samples/sec Loss 6.0159 LearningRate 0.0394 Epoch: 7 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:48,070-Speed 5519.85 samples/sec Loss 6.0557 LearningRate 0.0394 Epoch: 7 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:49,934-Speed 5496.90 samples/sec Loss 6.2047 LearningRate 0.0394 Epoch: 7 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:51,779-Speed 5550.82 samples/sec Loss 6.2143 LearningRate 0.0394 Epoch: 7 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:53,616-Speed 5577.00 samples/sec Loss 6.1040 LearningRate 0.0394 Epoch: 7 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:55,447-Speed 5592.69 samples/sec Loss 5.9667 LearningRate 0.0394 Epoch: 7 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:07:57,278-Speed 5595.64 samples/sec Loss 6.1133 LearningRate 0.0394 Epoch: 7 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:07:59,105-Speed 5605.59 samples/sec Loss 6.1064 LearningRate 0.0393 Epoch: 7 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:00,935-Speed 5596.27 samples/sec Loss 6.0544 LearningRate 0.0393 Epoch: 7 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:02,774-Speed 5570.64 samples/sec Loss 6.2302 LearningRate 0.0393 Epoch: 7 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:04,588-Speed 5648.57 samples/sec Loss 5.9652 LearningRate 0.0393 Epoch: 7 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:06,421-Speed 5587.31 samples/sec Loss 5.9395 LearningRate 0.0393 Epoch: 7 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:08,248-Speed 5606.80 samples/sec Loss 6.0449 LearningRate 0.0393 Epoch: 7 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:10,091-Speed 5557.23 samples/sec Loss 5.8803 LearningRate 0.0393 Epoch: 7 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:11,915-Speed 5615.89 samples/sec Loss 5.9537 LearningRate 0.0393 Epoch: 7 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:13,737-Speed 5623.04 samples/sec Loss 5.9439 LearningRate 0.0393 Epoch: 7 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:15,567-Speed 5598.31 samples/sec Loss 5.8978 LearningRate 0.0392 Epoch: 7 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:08:17,411-Speed 5555.40 samples/sec Loss 6.1573 LearningRate 0.0392 Epoch: 7 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:08:19,216-Speed 5675.04 samples/sec Loss 6.0924 LearningRate 0.0392 Epoch: 7 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:21,023-Speed 5667.93 samples/sec Loss 6.0098 LearningRate 0.0392 Epoch: 7 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:22,839-Speed 5638.96 samples/sec Loss 5.8245 LearningRate 0.0392 Epoch: 7 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:24,656-Speed 5638.83 samples/sec Loss 5.9067 LearningRate 0.0392 Epoch: 7 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:26,480-Speed 5617.88 samples/sec Loss 6.0154 LearningRate 0.0392 Epoch: 7 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:28,302-Speed 5621.75 samples/sec Loss 6.0263 LearningRate 0.0392 Epoch: 7 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:30,131-Speed 5598.81 samples/sec Loss 5.9906 LearningRate 0.0392 Epoch: 7 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:31,950-Speed 5632.27 samples/sec Loss 5.8645 LearningRate 0.0391 Epoch: 7 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:33,777-Speed 5607.46 samples/sec Loss 6.0228 LearningRate 0.0391 Epoch: 7 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:35,590-Speed 5647.75 samples/sec Loss 6.0242 LearningRate 0.0391 Epoch: 7 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:37,422-Speed 5591.23 samples/sec Loss 5.9631 LearningRate 0.0391 Epoch: 7 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:08:39,237-Speed 5644.68 samples/sec Loss 5.9187 LearningRate 0.0391 Epoch: 7 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:41,049-Speed 5652.05 samples/sec Loss 6.0118 LearningRate 0.0391 Epoch: 7 Global Step: 42620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:42,873-Speed 5615.74 samples/sec Loss 6.0070 LearningRate 0.0391 Epoch: 7 Global Step: 42630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:44,697-Speed 5616.60 samples/sec Loss 5.9963 LearningRate 0.0391 Epoch: 7 Global Step: 42640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:46,504-Speed 5669.47 samples/sec Loss 6.0936 LearningRate 0.0391 Epoch: 7 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:48,317-Speed 5650.15 samples/sec Loss 5.9843 LearningRate 0.0390 Epoch: 7 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:50,130-Speed 5650.82 samples/sec Loss 5.9952 LearningRate 0.0390 Epoch: 7 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:51,953-Speed 5617.87 samples/sec Loss 6.1856 LearningRate 0.0390 Epoch: 7 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:53,785-Speed 5591.13 samples/sec Loss 6.0501 LearningRate 0.0390 Epoch: 7 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:55,626-Speed 5565.01 samples/sec Loss 6.0981 LearningRate 0.0390 Epoch: 7 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:08:57,439-Speed 5649.76 samples/sec Loss 5.9070 LearningRate 0.0390 Epoch: 7 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:08:59,331-Speed 5414.03 samples/sec Loss 6.0570 LearningRate 0.0390 Epoch: 7 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:01,186-Speed 5520.80 samples/sec Loss 5.9809 LearningRate 0.0390 Epoch: 7 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:03,035-Speed 5540.67 samples/sec Loss 5.9338 LearningRate 0.0390 Epoch: 7 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:04,857-Speed 5623.72 samples/sec Loss 6.0755 LearningRate 0.0389 Epoch: 7 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:06,698-Speed 5562.03 samples/sec Loss 6.0340 LearningRate 0.0389 Epoch: 7 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:08,543-Speed 5553.76 samples/sec Loss 5.8374 LearningRate 0.0389 Epoch: 7 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:10,378-Speed 5582.64 samples/sec Loss 5.9960 LearningRate 0.0389 Epoch: 7 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:12,203-Speed 5611.57 samples/sec Loss 5.9989 LearningRate 0.0389 Epoch: 7 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:14,019-Speed 5640.48 samples/sec Loss 6.0499 LearningRate 0.0389 Epoch: 7 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:15,844-Speed 5612.55 samples/sec Loss 6.0980 LearningRate 0.0389 Epoch: 7 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:17,680-Speed 5580.09 samples/sec Loss 5.9747 LearningRate 0.0389 Epoch: 7 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:19,545-Speed 5493.94 samples/sec Loss 5.9173 LearningRate 0.0389 Epoch: 7 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:21,375-Speed 5597.42 samples/sec Loss 6.0163 LearningRate 0.0388 Epoch: 7 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:23,194-Speed 5630.64 samples/sec Loss 5.8985 LearningRate 0.0388 Epoch: 7 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:09:25,034-Speed 5565.50 samples/sec Loss 6.0883 LearningRate 0.0388 Epoch: 7 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:26,880-Speed 5548.92 samples/sec Loss 6.0315 LearningRate 0.0388 Epoch: 7 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:28,718-Speed 5573.24 samples/sec Loss 5.9384 LearningRate 0.0388 Epoch: 7 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:30,565-Speed 5546.70 samples/sec Loss 6.1196 LearningRate 0.0388 Epoch: 7 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:32,415-Speed 5536.94 samples/sec Loss 5.9550 LearningRate 0.0388 Epoch: 7 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:34,259-Speed 5556.06 samples/sec Loss 6.0921 LearningRate 0.0388 Epoch: 7 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:36,066-Speed 5667.74 samples/sec Loss 5.9477 LearningRate 0.0388 Epoch: 7 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:37,877-Speed 5657.57 samples/sec Loss 6.0442 LearningRate 0.0387 Epoch: 7 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:39,706-Speed 5598.71 samples/sec Loss 5.9275 LearningRate 0.0387 Epoch: 7 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:41,549-Speed 5559.55 samples/sec Loss 5.9754 LearningRate 0.0387 Epoch: 7 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:43,361-Speed 5651.93 samples/sec Loss 6.0245 LearningRate 0.0387 Epoch: 7 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:45,192-Speed 5594.03 samples/sec Loss 5.9841 LearningRate 0.0387 Epoch: 7 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:47,021-Speed 5600.53 samples/sec Loss 5.9196 LearningRate 0.0387 Epoch: 7 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:48,863-Speed 5561.43 samples/sec Loss 5.9393 LearningRate 0.0387 Epoch: 7 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:50,696-Speed 5588.87 samples/sec Loss 6.1383 LearningRate 0.0387 Epoch: 7 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:52,530-Speed 5586.59 samples/sec Loss 5.9815 LearningRate 0.0387 Epoch: 7 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:54,347-Speed 5637.18 samples/sec Loss 5.8893 LearningRate 0.0387 Epoch: 7 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:56,181-Speed 5584.84 samples/sec Loss 6.0098 LearningRate 0.0386 Epoch: 7 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:58,005-Speed 5614.58 samples/sec Loss 6.0007 LearningRate 0.0386 Epoch: 7 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:09:59,829-Speed 5617.16 samples/sec Loss 6.0065 LearningRate 0.0386 Epoch: 7 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:01,660-Speed 5592.77 samples/sec Loss 6.0559 LearningRate 0.0386 Epoch: 7 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:03,476-Speed 5643.04 samples/sec Loss 5.9441 LearningRate 0.0386 Epoch: 7 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:05,289-Speed 5649.17 samples/sec Loss 6.0719 LearningRate 0.0386 Epoch: 7 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:07,108-Speed 5631.62 samples/sec Loss 5.9173 LearningRate 0.0386 Epoch: 7 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:08,940-Speed 5589.72 samples/sec Loss 5.9261 LearningRate 0.0386 Epoch: 7 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:10,778-Speed 5573.82 samples/sec Loss 5.9390 LearningRate 0.0386 Epoch: 7 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:12,596-Speed 5633.38 samples/sec Loss 6.0387 LearningRate 0.0385 Epoch: 7 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:14,426-Speed 5600.49 samples/sec Loss 5.8735 LearningRate 0.0385 Epoch: 7 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:16,253-Speed 5605.74 samples/sec Loss 6.1096 LearningRate 0.0385 Epoch: 7 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:18,068-Speed 5641.95 samples/sec Loss 5.8777 LearningRate 0.0385 Epoch: 7 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:19,879-Speed 5659.15 samples/sec Loss 6.0399 LearningRate 0.0385 Epoch: 7 Global Step: 43160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:21,701-Speed 5619.72 samples/sec Loss 5.9900 LearningRate 0.0385 Epoch: 7 Global Step: 43170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:23,521-Speed 5628.18 samples/sec Loss 6.0268 LearningRate 0.0385 Epoch: 7 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:25,346-Speed 5612.41 samples/sec Loss 6.0991 LearningRate 0.0385 Epoch: 7 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:27,174-Speed 5605.13 samples/sec Loss 6.0068 LearningRate 0.0385 Epoch: 7 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:28,991-Speed 5638.14 samples/sec Loss 5.9433 LearningRate 0.0384 Epoch: 7 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:30,806-Speed 5641.35 samples/sec Loss 5.9485 LearningRate 0.0384 Epoch: 7 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:32,614-Speed 5665.91 samples/sec Loss 6.2161 LearningRate 0.0384 Epoch: 7 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:34,426-Speed 5653.26 samples/sec Loss 5.9463 LearningRate 0.0384 Epoch: 7 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:36,253-Speed 5607.37 samples/sec Loss 5.9641 LearningRate 0.0384 Epoch: 7 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:38,063-Speed 5659.51 samples/sec Loss 5.9675 LearningRate 0.0384 Epoch: 7 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:39,870-Speed 5668.45 samples/sec Loss 6.0665 LearningRate 0.0384 Epoch: 7 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:41,685-Speed 5643.35 samples/sec Loss 6.0149 LearningRate 0.0384 Epoch: 7 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:43,510-Speed 5615.67 samples/sec Loss 6.1294 LearningRate 0.0384 Epoch: 7 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:45,336-Speed 5607.78 samples/sec Loss 5.9115 LearningRate 0.0383 Epoch: 7 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:47,155-Speed 5631.80 samples/sec Loss 6.0486 LearningRate 0.0383 Epoch: 7 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:49,010-Speed 5521.91 samples/sec Loss 5.9592 LearningRate 0.0383 Epoch: 7 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:50,860-Speed 5535.93 samples/sec Loss 5.8977 LearningRate 0.0383 Epoch: 7 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:52,681-Speed 5627.14 samples/sec Loss 5.9895 LearningRate 0.0383 Epoch: 7 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:54,510-Speed 5600.33 samples/sec Loss 5.9322 LearningRate 0.0383 Epoch: 7 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:56,334-Speed 5615.88 samples/sec Loss 5.8827 LearningRate 0.0383 Epoch: 7 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:10:58,146-Speed 5653.52 samples/sec Loss 5.9485 LearningRate 0.0383 Epoch: 7 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:10:59,979-Speed 5586.99 samples/sec Loss 5.8801 LearningRate 0.0383 Epoch: 7 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:01,803-Speed 5617.95 samples/sec Loss 5.9381 LearningRate 0.0382 Epoch: 7 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:03,628-Speed 5611.05 samples/sec Loss 5.9559 LearningRate 0.0382 Epoch: 7 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:05,466-Speed 5572.64 samples/sec Loss 6.0234 LearningRate 0.0382 Epoch: 7 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:07,287-Speed 5626.85 samples/sec Loss 6.0068 LearningRate 0.0382 Epoch: 7 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:09,104-Speed 5638.16 samples/sec Loss 5.9304 LearningRate 0.0382 Epoch: 7 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:10,921-Speed 5635.26 samples/sec Loss 5.8428 LearningRate 0.0382 Epoch: 7 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:12,740-Speed 5633.65 samples/sec Loss 5.9504 LearningRate 0.0382 Epoch: 7 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:14,568-Speed 5601.22 samples/sec Loss 5.8579 LearningRate 0.0382 Epoch: 7 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:16,372-Speed 5679.61 samples/sec Loss 5.8515 LearningRate 0.0382 Epoch: 7 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:18,216-Speed 5553.42 samples/sec Loss 5.8175 LearningRate 0.0382 Epoch: 7 Global Step: 43480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:20,048-Speed 5591.44 samples/sec Loss 5.9127 LearningRate 0.0381 Epoch: 7 Global Step: 43490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:21,864-Speed 5641.78 samples/sec Loss 5.7967 LearningRate 0.0381 Epoch: 7 Global Step: 43500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:23,676-Speed 5653.29 samples/sec Loss 6.0729 LearningRate 0.0381 Epoch: 7 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:25,503-Speed 5609.19 samples/sec Loss 5.8631 LearningRate 0.0381 Epoch: 7 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:27,332-Speed 5597.58 samples/sec Loss 5.8363 LearningRate 0.0381 Epoch: 7 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:29,161-Speed 5602.53 samples/sec Loss 5.9618 LearningRate 0.0381 Epoch: 7 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:30,988-Speed 5606.19 samples/sec Loss 6.0397 LearningRate 0.0381 Epoch: 7 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:32,832-Speed 5554.94 samples/sec Loss 5.9462 LearningRate 0.0381 Epoch: 7 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:34,636-Speed 5678.45 samples/sec Loss 6.0247 LearningRate 0.0381 Epoch: 7 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:36,456-Speed 5628.17 samples/sec Loss 5.9904 LearningRate 0.0380 Epoch: 7 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:38,275-Speed 5629.75 samples/sec Loss 5.7617 LearningRate 0.0380 Epoch: 7 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:40,101-Speed 5610.44 samples/sec Loss 6.0011 LearningRate 0.0380 Epoch: 7 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:41,952-Speed 5533.08 samples/sec Loss 5.9644 LearningRate 0.0380 Epoch: 7 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:43,768-Speed 5640.47 samples/sec Loss 6.0100 LearningRate 0.0380 Epoch: 7 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:45,602-Speed 5586.55 samples/sec Loss 5.8958 LearningRate 0.0380 Epoch: 7 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:47,426-Speed 5617.85 samples/sec Loss 5.8838 LearningRate 0.0380 Epoch: 7 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:49,256-Speed 5595.39 samples/sec Loss 5.9635 LearningRate 0.0380 Epoch: 7 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:51,073-Speed 5638.68 samples/sec Loss 5.9072 LearningRate 0.0380 Epoch: 7 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:52,888-Speed 5645.35 samples/sec Loss 5.8646 LearningRate 0.0379 Epoch: 7 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:11:54,704-Speed 5637.71 samples/sec Loss 5.8987 LearningRate 0.0379 Epoch: 7 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:56,521-Speed 5638.17 samples/sec Loss 5.8513 LearningRate 0.0379 Epoch: 7 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:11:58,360-Speed 5569.50 samples/sec Loss 6.0803 LearningRate 0.0379 Epoch: 7 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:12:00,174-Speed 5648.63 samples/sec Loss 5.9143 LearningRate 0.0379 Epoch: 7 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:01,996-Speed 5620.30 samples/sec Loss 6.0811 LearningRate 0.0379 Epoch: 7 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:03,825-Speed 5601.87 samples/sec Loss 5.9130 LearningRate 0.0379 Epoch: 7 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:05,652-Speed 5605.72 samples/sec Loss 5.8580 LearningRate 0.0379 Epoch: 7 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:07,485-Speed 5588.57 samples/sec Loss 5.8987 LearningRate 0.0379 Epoch: 7 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:09,328-Speed 5560.15 samples/sec Loss 5.8188 LearningRate 0.0378 Epoch: 7 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:11,138-Speed 5657.94 samples/sec Loss 5.9822 LearningRate 0.0378 Epoch: 7 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:12,951-Speed 5650.27 samples/sec Loss 5.9283 LearningRate 0.0378 Epoch: 7 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:14,787-Speed 5579.80 samples/sec Loss 5.9545 LearningRate 0.0378 Epoch: 7 Global Step: 43790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:16,618-Speed 5592.64 samples/sec Loss 6.0129 LearningRate 0.0378 Epoch: 7 Global Step: 43800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:18,440-Speed 5624.41 samples/sec Loss 5.9706 LearningRate 0.0378 Epoch: 7 Global Step: 43810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:20,296-Speed 5519.55 samples/sec Loss 5.8763 LearningRate 0.0378 Epoch: 7 Global Step: 43820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:22,130-Speed 5584.73 samples/sec Loss 5.8831 LearningRate 0.0378 Epoch: 7 Global Step: 43830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:23,952-Speed 5619.44 samples/sec Loss 5.8125 LearningRate 0.0378 Epoch: 7 Global Step: 43840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:25,762-Speed 5662.02 samples/sec Loss 5.8204 LearningRate 0.0377 Epoch: 7 Global Step: 43850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:27,575-Speed 5647.38 samples/sec Loss 5.8972 LearningRate 0.0377 Epoch: 7 Global Step: 43860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:29,436-Speed 5504.92 samples/sec Loss 5.9547 LearningRate 0.0377 Epoch: 7 Global Step: 43870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:12:31,362-Speed 5319.42 samples/sec Loss 5.9954 LearningRate 0.0377 Epoch: 7 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:33,206-Speed 5553.90 samples/sec Loss 5.9561 LearningRate 0.0377 Epoch: 7 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:35,040-Speed 5585.24 samples/sec Loss 5.8555 LearningRate 0.0377 Epoch: 7 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:36,854-Speed 5647.92 samples/sec Loss 5.8772 LearningRate 0.0377 Epoch: 7 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:38,670-Speed 5641.68 samples/sec Loss 5.9686 LearningRate 0.0377 Epoch: 7 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:40,478-Speed 5663.46 samples/sec Loss 5.8263 LearningRate 0.0377 Epoch: 7 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:42,322-Speed 5555.86 samples/sec Loss 6.0588 LearningRate 0.0377 Epoch: 7 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:44,182-Speed 5508.12 samples/sec Loss 5.9113 LearningRate 0.0376 Epoch: 7 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:46,006-Speed 5614.35 samples/sec Loss 5.8960 LearningRate 0.0376 Epoch: 7 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:47,835-Speed 5600.87 samples/sec Loss 6.0128 LearningRate 0.0376 Epoch: 7 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:12:49,671-Speed 5579.17 samples/sec Loss 5.8608 LearningRate 0.0376 Epoch: 7 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:12:51,490-Speed 5632.56 samples/sec Loss 5.9399 LearningRate 0.0376 Epoch: 7 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:12:53,337-Speed 5546.99 samples/sec Loss 5.9473 LearningRate 0.0376 Epoch: 7 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:13:27,770-[lfw][44000]XNorm: 20.828653 Training: 2022-04-27 04:13:27,771-[lfw][44000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-27 04:13:27,771-[lfw][44000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:14:07,567-[cfp_fp][44000]XNorm: 17.993965 Training: 2022-04-27 04:14:07,568-[cfp_fp][44000]Accuracy-Flip: 0.94400+-0.01179 Training: 2022-04-27 04:14:07,568-[cfp_fp][44000]Accuracy-Highest: 0.94857 Training: 2022-04-27 04:14:41,925-[agedb_30][44000]XNorm: 20.486939 Training: 2022-04-27 04:14:41,926-[agedb_30][44000]Accuracy-Flip: 0.97383+-0.00879 Training: 2022-04-27 04:14:41,926-[agedb_30][44000]Accuracy-Highest: 0.97383 Training: 2022-04-27 04:14:43,819-Speed 92.68 samples/sec Loss 6.0498 LearningRate 0.0376 Epoch: 7 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:45,657-Speed 5573.30 samples/sec Loss 5.8511 LearningRate 0.0376 Epoch: 7 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:47,482-Speed 5613.26 samples/sec Loss 5.9472 LearningRate 0.0376 Epoch: 7 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:49,348-Speed 5489.78 samples/sec Loss 5.8875 LearningRate 0.0375 Epoch: 7 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:51,188-Speed 5568.14 samples/sec Loss 6.0417 LearningRate 0.0375 Epoch: 7 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:53,015-Speed 5605.44 samples/sec Loss 5.9716 LearningRate 0.0375 Epoch: 7 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:54,865-Speed 5538.65 samples/sec Loss 5.8722 LearningRate 0.0375 Epoch: 7 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:14:56,689-Speed 5615.33 samples/sec Loss 5.7702 LearningRate 0.0375 Epoch: 7 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:14:58,505-Speed 5640.76 samples/sec Loss 5.9450 LearningRate 0.0375 Epoch: 7 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:00,318-Speed 5648.43 samples/sec Loss 5.8539 LearningRate 0.0375 Epoch: 7 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:02,139-Speed 5625.02 samples/sec Loss 5.9175 LearningRate 0.0375 Epoch: 7 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:03,977-Speed 5574.47 samples/sec Loss 5.8906 LearningRate 0.0375 Epoch: 7 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:05,815-Speed 5573.70 samples/sec Loss 6.0543 LearningRate 0.0374 Epoch: 7 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:07,631-Speed 5641.13 samples/sec Loss 5.9434 LearningRate 0.0374 Epoch: 7 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:09,457-Speed 5609.02 samples/sec Loss 5.9453 LearningRate 0.0374 Epoch: 7 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:11,271-Speed 5647.16 samples/sec Loss 5.8686 LearningRate 0.0374 Epoch: 7 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:13,089-Speed 5633.46 samples/sec Loss 5.9440 LearningRate 0.0374 Epoch: 7 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:14,923-Speed 5586.07 samples/sec Loss 6.0327 LearningRate 0.0374 Epoch: 7 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:16,749-Speed 5608.30 samples/sec Loss 5.8424 LearningRate 0.0374 Epoch: 7 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:18,567-Speed 5635.63 samples/sec Loss 5.8469 LearningRate 0.0374 Epoch: 7 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:20,375-Speed 5665.18 samples/sec Loss 5.8324 LearningRate 0.0374 Epoch: 7 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:22,209-Speed 5586.57 samples/sec Loss 5.8095 LearningRate 0.0374 Epoch: 7 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:24,029-Speed 5628.66 samples/sec Loss 5.9116 LearningRate 0.0373 Epoch: 7 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:25,857-Speed 5603.45 samples/sec Loss 5.8134 LearningRate 0.0373 Epoch: 7 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:27,667-Speed 5657.85 samples/sec Loss 5.8459 LearningRate 0.0373 Epoch: 7 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:29,496-Speed 5601.97 samples/sec Loss 5.7699 LearningRate 0.0373 Epoch: 7 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:31,306-Speed 5657.64 samples/sec Loss 5.8484 LearningRate 0.0373 Epoch: 7 Global Step: 44270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:33,140-Speed 5584.29 samples/sec Loss 5.9906 LearningRate 0.0373 Epoch: 7 Global Step: 44280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:34,968-Speed 5604.44 samples/sec Loss 5.9824 LearningRate 0.0373 Epoch: 7 Global Step: 44290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:36,806-Speed 5575.03 samples/sec Loss 5.8146 LearningRate 0.0373 Epoch: 7 Global Step: 44300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:38,624-Speed 5633.65 samples/sec Loss 6.0616 LearningRate 0.0373 Epoch: 7 Global Step: 44310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:40,456-Speed 5591.88 samples/sec Loss 5.9623 LearningRate 0.0372 Epoch: 7 Global Step: 44320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:42,269-Speed 5647.91 samples/sec Loss 5.8252 LearningRate 0.0372 Epoch: 7 Global Step: 44330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:44,081-Speed 5654.37 samples/sec Loss 5.8873 LearningRate 0.0372 Epoch: 7 Global Step: 44340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:15:45,909-Speed 5602.58 samples/sec Loss 6.0367 LearningRate 0.0372 Epoch: 7 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:47,723-Speed 5648.52 samples/sec Loss 5.8790 LearningRate 0.0372 Epoch: 7 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:49,548-Speed 5613.71 samples/sec Loss 5.7878 LearningRate 0.0372 Epoch: 7 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:51,375-Speed 5604.86 samples/sec Loss 5.9572 LearningRate 0.0372 Epoch: 7 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:53,203-Speed 5604.66 samples/sec Loss 5.8586 LearningRate 0.0372 Epoch: 7 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:55,058-Speed 5521.58 samples/sec Loss 5.8204 LearningRate 0.0372 Epoch: 7 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:56,881-Speed 5619.56 samples/sec Loss 5.9429 LearningRate 0.0371 Epoch: 7 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:15:58,719-Speed 5574.24 samples/sec Loss 5.8696 LearningRate 0.0371 Epoch: 7 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:16:00,541-Speed 5622.04 samples/sec Loss 5.8223 LearningRate 0.0371 Epoch: 7 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:02,353-Speed 5652.31 samples/sec Loss 5.8561 LearningRate 0.0371 Epoch: 7 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:04,166-Speed 5648.57 samples/sec Loss 5.9555 LearningRate 0.0371 Epoch: 7 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:05,978-Speed 5655.04 samples/sec Loss 5.8462 LearningRate 0.0371 Epoch: 7 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:07,792-Speed 5645.97 samples/sec Loss 5.8160 LearningRate 0.0371 Epoch: 7 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:09,620-Speed 5604.90 samples/sec Loss 5.8702 LearningRate 0.0371 Epoch: 7 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:11,432-Speed 5651.55 samples/sec Loss 5.8843 LearningRate 0.0371 Epoch: 7 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:13,246-Speed 5646.04 samples/sec Loss 5.8167 LearningRate 0.0371 Epoch: 7 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:15,086-Speed 5567.29 samples/sec Loss 5.9198 LearningRate 0.0370 Epoch: 7 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:16,921-Speed 5583.82 samples/sec Loss 5.9462 LearningRate 0.0370 Epoch: 7 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:18,752-Speed 5592.94 samples/sec Loss 6.0317 LearningRate 0.0370 Epoch: 7 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:16:20,567-Speed 5645.04 samples/sec Loss 5.8202 LearningRate 0.0370 Epoch: 7 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:16:22,374-Speed 5668.29 samples/sec Loss 5.9178 LearningRate 0.0370 Epoch: 7 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:24,195-Speed 5624.81 samples/sec Loss 5.9148 LearningRate 0.0370 Epoch: 7 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:26,018-Speed 5620.41 samples/sec Loss 5.8431 LearningRate 0.0370 Epoch: 7 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:27,855-Speed 5574.82 samples/sec Loss 5.9646 LearningRate 0.0370 Epoch: 7 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:29,680-Speed 5612.76 samples/sec Loss 6.0715 LearningRate 0.0370 Epoch: 7 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:31,498-Speed 5633.27 samples/sec Loss 5.8171 LearningRate 0.0369 Epoch: 7 Global Step: 44600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:33,321-Speed 5618.67 samples/sec Loss 5.8335 LearningRate 0.0369 Epoch: 7 Global Step: 44610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:35,137-Speed 5640.84 samples/sec Loss 5.8036 LearningRate 0.0369 Epoch: 7 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:36,986-Speed 5541.28 samples/sec Loss 5.9038 LearningRate 0.0369 Epoch: 7 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:38,808-Speed 5624.08 samples/sec Loss 5.8257 LearningRate 0.0369 Epoch: 7 Global Step: 44640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:40,633-Speed 5611.47 samples/sec Loss 5.7497 LearningRate 0.0369 Epoch: 7 Global Step: 44650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:42,463-Speed 5598.09 samples/sec Loss 5.8447 LearningRate 0.0369 Epoch: 7 Global Step: 44660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:44,283-Speed 5626.86 samples/sec Loss 6.0079 LearningRate 0.0369 Epoch: 7 Global Step: 44670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:46,101-Speed 5634.60 samples/sec Loss 5.7764 LearningRate 0.0369 Epoch: 7 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:47,920-Speed 5632.75 samples/sec Loss 5.9613 LearningRate 0.0368 Epoch: 7 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:49,755-Speed 5581.26 samples/sec Loss 5.8697 LearningRate 0.0368 Epoch: 7 Global Step: 44700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:51,575-Speed 5630.94 samples/sec Loss 5.9244 LearningRate 0.0368 Epoch: 7 Global Step: 44710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:16:53,389-Speed 5643.70 samples/sec Loss 5.8395 LearningRate 0.0368 Epoch: 7 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:55,222-Speed 5590.20 samples/sec Loss 5.8395 LearningRate 0.0368 Epoch: 7 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:57,050-Speed 5603.44 samples/sec Loss 5.8271 LearningRate 0.0368 Epoch: 7 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:16:58,875-Speed 5612.72 samples/sec Loss 5.9083 LearningRate 0.0368 Epoch: 7 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:00,713-Speed 5573.62 samples/sec Loss 5.8525 LearningRate 0.0368 Epoch: 7 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:02,536-Speed 5617.53 samples/sec Loss 5.7933 LearningRate 0.0368 Epoch: 7 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:04,380-Speed 5555.47 samples/sec Loss 5.9556 LearningRate 0.0368 Epoch: 7 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:06,219-Speed 5569.69 samples/sec Loss 5.9300 LearningRate 0.0367 Epoch: 7 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:08,033-Speed 5648.44 samples/sec Loss 5.7883 LearningRate 0.0367 Epoch: 7 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:09,847-Speed 5646.38 samples/sec Loss 5.7561 LearningRate 0.0367 Epoch: 7 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:11,667-Speed 5627.95 samples/sec Loss 5.8909 LearningRate 0.0367 Epoch: 7 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:13,483-Speed 5640.04 samples/sec Loss 5.8363 LearningRate 0.0367 Epoch: 7 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:15,304-Speed 5625.45 samples/sec Loss 5.8293 LearningRate 0.0367 Epoch: 7 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:17,142-Speed 5572.92 samples/sec Loss 5.8387 LearningRate 0.0367 Epoch: 7 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:18,963-Speed 5626.94 samples/sec Loss 5.8674 LearningRate 0.0367 Epoch: 7 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:20,794-Speed 5591.44 samples/sec Loss 5.8703 LearningRate 0.0367 Epoch: 7 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:22,642-Speed 5543.55 samples/sec Loss 5.7820 LearningRate 0.0366 Epoch: 7 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:24,489-Speed 5546.65 samples/sec Loss 5.8538 LearningRate 0.0366 Epoch: 7 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:26,310-Speed 5627.06 samples/sec Loss 5.8444 LearningRate 0.0366 Epoch: 7 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:28,131-Speed 5624.45 samples/sec Loss 5.8681 LearningRate 0.0366 Epoch: 7 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:29,948-Speed 5636.92 samples/sec Loss 5.8161 LearningRate 0.0366 Epoch: 7 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:31,775-Speed 5607.35 samples/sec Loss 5.7870 LearningRate 0.0366 Epoch: 7 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:33,596-Speed 5623.48 samples/sec Loss 5.8849 LearningRate 0.0366 Epoch: 7 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:35,414-Speed 5635.74 samples/sec Loss 5.7771 LearningRate 0.0366 Epoch: 7 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:37,220-Speed 5670.92 samples/sec Loss 5.8594 LearningRate 0.0366 Epoch: 7 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:39,036-Speed 5640.87 samples/sec Loss 5.7076 LearningRate 0.0365 Epoch: 7 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:40,862-Speed 5609.96 samples/sec Loss 5.8332 LearningRate 0.0365 Epoch: 7 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:42,680-Speed 5634.05 samples/sec Loss 5.9111 LearningRate 0.0365 Epoch: 7 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:44,496-Speed 5642.18 samples/sec Loss 5.8999 LearningRate 0.0365 Epoch: 7 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:46,343-Speed 5543.92 samples/sec Loss 5.7884 LearningRate 0.0365 Epoch: 7 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:48,171-Speed 5606.61 samples/sec Loss 5.7140 LearningRate 0.0365 Epoch: 7 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:17:49,988-Speed 5637.48 samples/sec Loss 5.7658 LearningRate 0.0365 Epoch: 7 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:51,826-Speed 5572.22 samples/sec Loss 5.7549 LearningRate 0.0365 Epoch: 7 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:53,639-Speed 5649.46 samples/sec Loss 5.8755 LearningRate 0.0365 Epoch: 7 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:55,468-Speed 5599.46 samples/sec Loss 5.8407 LearningRate 0.0365 Epoch: 7 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:57,313-Speed 5551.92 samples/sec Loss 5.8255 LearningRate 0.0364 Epoch: 7 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:17:59,156-Speed 5557.62 samples/sec Loss 6.0091 LearningRate 0.0364 Epoch: 7 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:00,996-Speed 5567.95 samples/sec Loss 5.7971 LearningRate 0.0364 Epoch: 7 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:02,825-Speed 5601.76 samples/sec Loss 5.7229 LearningRate 0.0364 Epoch: 7 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:04,625-Speed 5689.19 samples/sec Loss 5.7572 LearningRate 0.0364 Epoch: 7 Global Step: 45110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:06,437-Speed 5653.51 samples/sec Loss 5.9293 LearningRate 0.0364 Epoch: 7 Global Step: 45120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:08,262-Speed 5614.57 samples/sec Loss 5.9142 LearningRate 0.0364 Epoch: 7 Global Step: 45130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:10,086-Speed 5615.56 samples/sec Loss 5.7685 LearningRate 0.0364 Epoch: 7 Global Step: 45140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:11,935-Speed 5539.74 samples/sec Loss 5.7392 LearningRate 0.0364 Epoch: 7 Global Step: 45150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:13,746-Speed 5655.45 samples/sec Loss 5.8514 LearningRate 0.0363 Epoch: 7 Global Step: 45160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:15,572-Speed 5610.52 samples/sec Loss 5.7802 LearningRate 0.0363 Epoch: 7 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:17,408-Speed 5583.41 samples/sec Loss 5.7723 LearningRate 0.0363 Epoch: 7 Global Step: 45180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:19,232-Speed 5614.64 samples/sec Loss 5.8708 LearningRate 0.0363 Epoch: 7 Global Step: 45190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:21,056-Speed 5616.38 samples/sec Loss 5.9161 LearningRate 0.0363 Epoch: 7 Global Step: 45200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:18:22,889-Speed 5588.85 samples/sec Loss 5.8212 LearningRate 0.0363 Epoch: 7 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:24,708-Speed 5631.22 samples/sec Loss 5.9087 LearningRate 0.0363 Epoch: 7 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:26,526-Speed 5634.60 samples/sec Loss 5.8696 LearningRate 0.0363 Epoch: 7 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:28,356-Speed 5597.64 samples/sec Loss 5.7791 LearningRate 0.0363 Epoch: 7 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:30,168-Speed 5653.31 samples/sec Loss 5.8795 LearningRate 0.0363 Epoch: 7 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:32,008-Speed 5566.60 samples/sec Loss 5.6959 LearningRate 0.0362 Epoch: 7 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:33,869-Speed 5503.08 samples/sec Loss 5.7634 LearningRate 0.0362 Epoch: 7 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:35,695-Speed 5611.51 samples/sec Loss 5.9410 LearningRate 0.0362 Epoch: 7 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:37,516-Speed 5624.39 samples/sec Loss 5.6471 LearningRate 0.0362 Epoch: 7 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:39,344-Speed 5602.65 samples/sec Loss 5.8964 LearningRate 0.0362 Epoch: 7 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:18:41,187-Speed 5558.08 samples/sec Loss 5.6845 LearningRate 0.0362 Epoch: 7 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:43,028-Speed 5563.71 samples/sec Loss 5.8889 LearningRate 0.0362 Epoch: 7 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:44,853-Speed 5613.87 samples/sec Loss 5.7844 LearningRate 0.0362 Epoch: 7 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:46,701-Speed 5541.33 samples/sec Loss 6.0426 LearningRate 0.0362 Epoch: 7 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:48,566-Speed 5495.67 samples/sec Loss 5.7877 LearningRate 0.0361 Epoch: 7 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:50,405-Speed 5567.90 samples/sec Loss 5.7580 LearningRate 0.0361 Epoch: 7 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:52,225-Speed 5630.67 samples/sec Loss 5.8975 LearningRate 0.0361 Epoch: 7 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:54,052-Speed 5605.04 samples/sec Loss 5.8102 LearningRate 0.0361 Epoch: 7 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:55,886-Speed 5586.19 samples/sec Loss 5.6348 LearningRate 0.0361 Epoch: 7 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:57,726-Speed 5566.00 samples/sec Loss 5.7556 LearningRate 0.0361 Epoch: 7 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:18:59,553-Speed 5608.81 samples/sec Loss 5.8169 LearningRate 0.0361 Epoch: 7 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:01,416-Speed 5497.79 samples/sec Loss 5.7335 LearningRate 0.0361 Epoch: 7 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:03,244-Speed 5603.00 samples/sec Loss 5.6286 LearningRate 0.0361 Epoch: 7 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:05,064-Speed 5627.11 samples/sec Loss 5.7058 LearningRate 0.0361 Epoch: 7 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:06,887-Speed 5619.71 samples/sec Loss 5.8483 LearningRate 0.0360 Epoch: 7 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:08,746-Speed 5510.64 samples/sec Loss 5.8235 LearningRate 0.0360 Epoch: 7 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:10,575-Speed 5599.52 samples/sec Loss 5.8653 LearningRate 0.0360 Epoch: 7 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:12,484-Speed 5367.75 samples/sec Loss 5.8872 LearningRate 0.0360 Epoch: 7 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:24,996-Speed 818.46 samples/sec Loss 5.7818 LearningRate 0.0360 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:26,946-Speed 5255.16 samples/sec Loss 5.1901 LearningRate 0.0360 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:28,803-Speed 5516.27 samples/sec Loss 5.1171 LearningRate 0.0360 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:30,658-Speed 5522.15 samples/sec Loss 5.0973 LearningRate 0.0360 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:32,489-Speed 5596.58 samples/sec Loss 5.2529 LearningRate 0.0360 Epoch: 8 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:34,366-Speed 5459.06 samples/sec Loss 5.1046 LearningRate 0.0359 Epoch: 8 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:36,290-Speed 5324.49 samples/sec Loss 5.1819 LearningRate 0.0359 Epoch: 8 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:38,132-Speed 5562.61 samples/sec Loss 5.3442 LearningRate 0.0359 Epoch: 8 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:39,961-Speed 5600.83 samples/sec Loss 5.2023 LearningRate 0.0359 Epoch: 8 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:41,815-Speed 5523.98 samples/sec Loss 5.2866 LearningRate 0.0359 Epoch: 8 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:43,658-Speed 5558.28 samples/sec Loss 5.2538 LearningRate 0.0359 Epoch: 8 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:45,479-Speed 5624.96 samples/sec Loss 5.2855 LearningRate 0.0359 Epoch: 8 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:47,321-Speed 5563.65 samples/sec Loss 5.2895 LearningRate 0.0359 Epoch: 8 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:49,146-Speed 5612.90 samples/sec Loss 5.2222 LearningRate 0.0359 Epoch: 8 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:50,968-Speed 5623.46 samples/sec Loss 5.1628 LearningRate 0.0359 Epoch: 8 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:19:52,809-Speed 5564.31 samples/sec Loss 5.2438 LearningRate 0.0358 Epoch: 8 Global Step: 45640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:54,659-Speed 5537.97 samples/sec Loss 5.3188 LearningRate 0.0358 Epoch: 8 Global Step: 45650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:56,506-Speed 5545.67 samples/sec Loss 5.2753 LearningRate 0.0358 Epoch: 8 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:19:58,329-Speed 5617.98 samples/sec Loss 5.2862 LearningRate 0.0358 Epoch: 8 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:00,176-Speed 5546.45 samples/sec Loss 5.2722 LearningRate 0.0358 Epoch: 8 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:01,999-Speed 5618.23 samples/sec Loss 5.3413 LearningRate 0.0358 Epoch: 8 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:03,866-Speed 5488.10 samples/sec Loss 5.3714 LearningRate 0.0358 Epoch: 8 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:05,696-Speed 5595.85 samples/sec Loss 5.3873 LearningRate 0.0358 Epoch: 8 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:07,534-Speed 5574.45 samples/sec Loss 5.3796 LearningRate 0.0358 Epoch: 8 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:09,367-Speed 5588.24 samples/sec Loss 5.2448 LearningRate 0.0357 Epoch: 8 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:11,180-Speed 5649.88 samples/sec Loss 5.3197 LearningRate 0.0357 Epoch: 8 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:20:13,053-Speed 5471.74 samples/sec Loss 5.2662 LearningRate 0.0357 Epoch: 8 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:20:14,873-Speed 5627.42 samples/sec Loss 5.1973 LearningRate 0.0357 Epoch: 8 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:20:16,709-Speed 5579.78 samples/sec Loss 5.3239 LearningRate 0.0357 Epoch: 8 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:20:18,533-Speed 5614.55 samples/sec Loss 5.3310 LearningRate 0.0357 Epoch: 8 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:20,366-Speed 5591.62 samples/sec Loss 5.3550 LearningRate 0.0357 Epoch: 8 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:22,201-Speed 5582.40 samples/sec Loss 5.3689 LearningRate 0.0357 Epoch: 8 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:24,016-Speed 5643.91 samples/sec Loss 5.3169 LearningRate 0.0357 Epoch: 8 Global Step: 45810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:25,828-Speed 5652.09 samples/sec Loss 5.2163 LearningRate 0.0357 Epoch: 8 Global Step: 45820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:27,689-Speed 5503.24 samples/sec Loss 5.5058 LearningRate 0.0356 Epoch: 8 Global Step: 45830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:29,529-Speed 5566.92 samples/sec Loss 5.3558 LearningRate 0.0356 Epoch: 8 Global Step: 45840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:31,378-Speed 5542.61 samples/sec Loss 5.5584 LearningRate 0.0356 Epoch: 8 Global Step: 45850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:33,197-Speed 5630.86 samples/sec Loss 5.3856 LearningRate 0.0356 Epoch: 8 Global Step: 45860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:35,012-Speed 5644.32 samples/sec Loss 5.3128 LearningRate 0.0356 Epoch: 8 Global Step: 45870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:36,832-Speed 5628.71 samples/sec Loss 5.4592 LearningRate 0.0356 Epoch: 8 Global Step: 45880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:38,654-Speed 5623.04 samples/sec Loss 5.3889 LearningRate 0.0356 Epoch: 8 Global Step: 45890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:40,485-Speed 5594.54 samples/sec Loss 5.3954 LearningRate 0.0356 Epoch: 8 Global Step: 45900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:20:42,302-Speed 5635.87 samples/sec Loss 5.3857 LearningRate 0.0356 Epoch: 8 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:44,126-Speed 5616.92 samples/sec Loss 5.4796 LearningRate 0.0355 Epoch: 8 Global Step: 45920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:45,967-Speed 5563.59 samples/sec Loss 5.4124 LearningRate 0.0355 Epoch: 8 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:47,780-Speed 5649.71 samples/sec Loss 5.4463 LearningRate 0.0355 Epoch: 8 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:49,612-Speed 5592.52 samples/sec Loss 5.2982 LearningRate 0.0355 Epoch: 8 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:51,458-Speed 5549.15 samples/sec Loss 5.5471 LearningRate 0.0355 Epoch: 8 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:53,300-Speed 5559.23 samples/sec Loss 5.4729 LearningRate 0.0355 Epoch: 8 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:55,151-Speed 5534.62 samples/sec Loss 5.3325 LearningRate 0.0355 Epoch: 8 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:56,979-Speed 5603.51 samples/sec Loss 5.5273 LearningRate 0.0355 Epoch: 8 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:20:58,815-Speed 5581.22 samples/sec Loss 5.3929 LearningRate 0.0355 Epoch: 8 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:21:25,456-[lfw][46000]XNorm: 22.362603 Training: 2022-04-27 04:21:25,457-[lfw][46000]Accuracy-Flip: 0.99650+-0.00241 Training: 2022-04-27 04:21:25,457-[lfw][46000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:21:56,308-[cfp_fp][46000]XNorm: 19.956942 Training: 2022-04-27 04:21:56,309-[cfp_fp][46000]Accuracy-Flip: 0.94186+-0.01364 Training: 2022-04-27 04:21:56,309-[cfp_fp][46000]Accuracy-Highest: 0.94857 Training: 2022-04-27 04:22:22,921-[agedb_30][46000]XNorm: 21.985087 Training: 2022-04-27 04:22:22,922-[agedb_30][46000]Accuracy-Flip: 0.97267+-0.00932 Training: 2022-04-27 04:22:22,922-[agedb_30][46000]Accuracy-Highest: 0.97383 Training: 2022-04-27 04:22:24,757-Speed 119.15 samples/sec Loss 5.4497 LearningRate 0.0355 Epoch: 8 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:22:26,595-Speed 5573.83 samples/sec Loss 5.5328 LearningRate 0.0354 Epoch: 8 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:22:28,435-Speed 5566.61 samples/sec Loss 5.2637 LearningRate 0.0354 Epoch: 8 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:22:30,244-Speed 5662.89 samples/sec Loss 5.3575 LearningRate 0.0354 Epoch: 8 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:32,064-Speed 5627.84 samples/sec Loss 5.4480 LearningRate 0.0354 Epoch: 8 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:33,880-Speed 5641.76 samples/sec Loss 5.3987 LearningRate 0.0354 Epoch: 8 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:35,715-Speed 5583.10 samples/sec Loss 5.4468 LearningRate 0.0354 Epoch: 8 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:37,541-Speed 5609.61 samples/sec Loss 5.4649 LearningRate 0.0354 Epoch: 8 Global Step: 46080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:39,365-Speed 5614.62 samples/sec Loss 5.3606 LearningRate 0.0354 Epoch: 8 Global Step: 46090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:41,215-Speed 5537.10 samples/sec Loss 5.4202 LearningRate 0.0354 Epoch: 8 Global Step: 46100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:43,043-Speed 5604.25 samples/sec Loss 5.5141 LearningRate 0.0353 Epoch: 8 Global Step: 46110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:44,873-Speed 5596.74 samples/sec Loss 5.5154 LearningRate 0.0353 Epoch: 8 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:46,731-Speed 5511.25 samples/sec Loss 5.2685 LearningRate 0.0353 Epoch: 8 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:48,600-Speed 5481.65 samples/sec Loss 5.5192 LearningRate 0.0353 Epoch: 8 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:22:50,453-Speed 5528.57 samples/sec Loss 5.4448 LearningRate 0.0353 Epoch: 8 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:52,295-Speed 5561.36 samples/sec Loss 5.5566 LearningRate 0.0353 Epoch: 8 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:54,126-Speed 5595.38 samples/sec Loss 5.4729 LearningRate 0.0353 Epoch: 8 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:55,943-Speed 5637.93 samples/sec Loss 5.3291 LearningRate 0.0353 Epoch: 8 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:57,762-Speed 5629.91 samples/sec Loss 5.3285 LearningRate 0.0353 Epoch: 8 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:22:59,599-Speed 5577.22 samples/sec Loss 5.4188 LearningRate 0.0353 Epoch: 8 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:01,421-Speed 5621.00 samples/sec Loss 5.4965 LearningRate 0.0352 Epoch: 8 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:03,252-Speed 5595.40 samples/sec Loss 5.5713 LearningRate 0.0352 Epoch: 8 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:05,088-Speed 5578.33 samples/sec Loss 5.4748 LearningRate 0.0352 Epoch: 8 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:06,921-Speed 5586.95 samples/sec Loss 5.5199 LearningRate 0.0352 Epoch: 8 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:08,767-Speed 5551.74 samples/sec Loss 5.4987 LearningRate 0.0352 Epoch: 8 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:10,599-Speed 5590.16 samples/sec Loss 5.5927 LearningRate 0.0352 Epoch: 8 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:12,415-Speed 5639.18 samples/sec Loss 5.4023 LearningRate 0.0352 Epoch: 8 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:14,241-Speed 5613.04 samples/sec Loss 5.5274 LearningRate 0.0352 Epoch: 8 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:16,071-Speed 5596.35 samples/sec Loss 5.4650 LearningRate 0.0352 Epoch: 8 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:17,907-Speed 5578.50 samples/sec Loss 5.5532 LearningRate 0.0351 Epoch: 8 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:19,757-Speed 5539.34 samples/sec Loss 5.6655 LearningRate 0.0351 Epoch: 8 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:21,614-Speed 5515.66 samples/sec Loss 5.5127 LearningRate 0.0351 Epoch: 8 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:23,432-Speed 5633.34 samples/sec Loss 5.4288 LearningRate 0.0351 Epoch: 8 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:25,246-Speed 5647.98 samples/sec Loss 5.4737 LearningRate 0.0351 Epoch: 8 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:27,076-Speed 5597.99 samples/sec Loss 5.4957 LearningRate 0.0351 Epoch: 8 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:28,894-Speed 5632.96 samples/sec Loss 5.5193 LearningRate 0.0351 Epoch: 8 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:30,728-Speed 5583.93 samples/sec Loss 5.3735 LearningRate 0.0351 Epoch: 8 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:32,585-Speed 5516.65 samples/sec Loss 5.5721 LearningRate 0.0351 Epoch: 8 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:34,416-Speed 5594.55 samples/sec Loss 5.4481 LearningRate 0.0351 Epoch: 8 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:36,228-Speed 5653.20 samples/sec Loss 5.4428 LearningRate 0.0350 Epoch: 8 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:38,064-Speed 5581.13 samples/sec Loss 5.5374 LearningRate 0.0350 Epoch: 8 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:39,887-Speed 5618.12 samples/sec Loss 5.5650 LearningRate 0.0350 Epoch: 8 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:41,716-Speed 5601.22 samples/sec Loss 5.5791 LearningRate 0.0350 Epoch: 8 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:43,565-Speed 5538.63 samples/sec Loss 5.5533 LearningRate 0.0350 Epoch: 8 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:45,393-Speed 5605.04 samples/sec Loss 5.5148 LearningRate 0.0350 Epoch: 8 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:47,223-Speed 5595.85 samples/sec Loss 5.3773 LearningRate 0.0350 Epoch: 8 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:49,048-Speed 5613.25 samples/sec Loss 5.4353 LearningRate 0.0350 Epoch: 8 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:23:50,876-Speed 5605.47 samples/sec Loss 5.5569 LearningRate 0.0350 Epoch: 8 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:52,702-Speed 5609.55 samples/sec Loss 5.5346 LearningRate 0.0350 Epoch: 8 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:54,552-Speed 5538.25 samples/sec Loss 5.4866 LearningRate 0.0349 Epoch: 8 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:56,383-Speed 5594.76 samples/sec Loss 5.4684 LearningRate 0.0349 Epoch: 8 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:23:58,232-Speed 5538.06 samples/sec Loss 5.5406 LearningRate 0.0349 Epoch: 8 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:00,058-Speed 5611.55 samples/sec Loss 5.5033 LearningRate 0.0349 Epoch: 8 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:01,905-Speed 5546.64 samples/sec Loss 5.5486 LearningRate 0.0349 Epoch: 8 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:03,749-Speed 5554.34 samples/sec Loss 5.5817 LearningRate 0.0349 Epoch: 8 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:05,607-Speed 5514.55 samples/sec Loss 5.5279 LearningRate 0.0349 Epoch: 8 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:07,458-Speed 5533.41 samples/sec Loss 5.6591 LearningRate 0.0349 Epoch: 8 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:09,279-Speed 5626.34 samples/sec Loss 5.5026 LearningRate 0.0349 Epoch: 8 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:11,094-Speed 5642.76 samples/sec Loss 5.4339 LearningRate 0.0348 Epoch: 8 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:12,947-Speed 5529.07 samples/sec Loss 5.6427 LearningRate 0.0348 Epoch: 8 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:14,757-Speed 5658.57 samples/sec Loss 5.6636 LearningRate 0.0348 Epoch: 8 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:16,573-Speed 5640.37 samples/sec Loss 5.4558 LearningRate 0.0348 Epoch: 8 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:18,385-Speed 5654.61 samples/sec Loss 5.6032 LearningRate 0.0348 Epoch: 8 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:20,197-Speed 5651.23 samples/sec Loss 5.4744 LearningRate 0.0348 Epoch: 8 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:21,994-Speed 5701.56 samples/sec Loss 5.4794 LearningRate 0.0348 Epoch: 8 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:23,814-Speed 5628.27 samples/sec Loss 5.5031 LearningRate 0.0348 Epoch: 8 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:25,632-Speed 5632.75 samples/sec Loss 5.4668 LearningRate 0.0348 Epoch: 8 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:27,437-Speed 5678.13 samples/sec Loss 5.5139 LearningRate 0.0348 Epoch: 8 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:29,252-Speed 5644.44 samples/sec Loss 5.4631 LearningRate 0.0347 Epoch: 8 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:31,077-Speed 5612.82 samples/sec Loss 5.5782 LearningRate 0.0347 Epoch: 8 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:32,893-Speed 5638.98 samples/sec Loss 5.3672 LearningRate 0.0347 Epoch: 8 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:34,743-Speed 5538.14 samples/sec Loss 5.5344 LearningRate 0.0347 Epoch: 8 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:36,564-Speed 5624.99 samples/sec Loss 5.6605 LearningRate 0.0347 Epoch: 8 Global Step: 46730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:38,386-Speed 5620.76 samples/sec Loss 5.4896 LearningRate 0.0347 Epoch: 8 Global Step: 46740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:40,191-Speed 5676.13 samples/sec Loss 5.5710 LearningRate 0.0347 Epoch: 8 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:24:42,005-Speed 5645.48 samples/sec Loss 5.5543 LearningRate 0.0347 Epoch: 8 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:43,809-Speed 5677.76 samples/sec Loss 5.5461 LearningRate 0.0347 Epoch: 8 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:45,645-Speed 5579.22 samples/sec Loss 5.6122 LearningRate 0.0346 Epoch: 8 Global Step: 46780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:47,474-Speed 5599.65 samples/sec Loss 5.6293 LearningRate 0.0346 Epoch: 8 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:49,292-Speed 5636.93 samples/sec Loss 5.6160 LearningRate 0.0346 Epoch: 8 Global Step: 46800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:51,111-Speed 5632.03 samples/sec Loss 5.5099 LearningRate 0.0346 Epoch: 8 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:52,959-Speed 5542.09 samples/sec Loss 5.6360 LearningRate 0.0346 Epoch: 8 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:54,789-Speed 5598.12 samples/sec Loss 5.4971 LearningRate 0.0346 Epoch: 8 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:56,613-Speed 5618.38 samples/sec Loss 5.6332 LearningRate 0.0346 Epoch: 8 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:24:58,435-Speed 5620.04 samples/sec Loss 5.5575 LearningRate 0.0346 Epoch: 8 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:00,264-Speed 5601.59 samples/sec Loss 5.7102 LearningRate 0.0346 Epoch: 8 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:02,083-Speed 5629.51 samples/sec Loss 5.5365 LearningRate 0.0346 Epoch: 8 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:03,905-Speed 5621.88 samples/sec Loss 5.6098 LearningRate 0.0345 Epoch: 8 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:05,708-Speed 5682.62 samples/sec Loss 5.4809 LearningRate 0.0345 Epoch: 8 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:07,520-Speed 5652.81 samples/sec Loss 5.5024 LearningRate 0.0345 Epoch: 8 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:09,334-Speed 5649.45 samples/sec Loss 5.4677 LearningRate 0.0345 Epoch: 8 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:11,176-Speed 5561.06 samples/sec Loss 5.4897 LearningRate 0.0345 Epoch: 8 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:12,994-Speed 5632.52 samples/sec Loss 5.5390 LearningRate 0.0345 Epoch: 8 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:14,825-Speed 5596.42 samples/sec Loss 5.5306 LearningRate 0.0345 Epoch: 8 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:16,651-Speed 5607.08 samples/sec Loss 5.5265 LearningRate 0.0345 Epoch: 8 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:18,474-Speed 5621.08 samples/sec Loss 5.5423 LearningRate 0.0345 Epoch: 8 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:20,295-Speed 5625.18 samples/sec Loss 5.5364 LearningRate 0.0345 Epoch: 8 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:22,118-Speed 5618.84 samples/sec Loss 5.5473 LearningRate 0.0344 Epoch: 8 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:23,950-Speed 5591.49 samples/sec Loss 5.6623 LearningRate 0.0344 Epoch: 8 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:25,769-Speed 5632.19 samples/sec Loss 5.5842 LearningRate 0.0344 Epoch: 8 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:27,601-Speed 5590.87 samples/sec Loss 5.5804 LearningRate 0.0344 Epoch: 8 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:29,406-Speed 5674.45 samples/sec Loss 5.5216 LearningRate 0.0344 Epoch: 8 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:31,251-Speed 5553.74 samples/sec Loss 5.6770 LearningRate 0.0344 Epoch: 8 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:33,059-Speed 5664.59 samples/sec Loss 5.6135 LearningRate 0.0344 Epoch: 8 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:34,872-Speed 5649.94 samples/sec Loss 5.5512 LearningRate 0.0344 Epoch: 8 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:36,709-Speed 5574.81 samples/sec Loss 5.5243 LearningRate 0.0344 Epoch: 8 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:38,541-Speed 5592.27 samples/sec Loss 5.4944 LearningRate 0.0343 Epoch: 8 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:40,371-Speed 5596.85 samples/sec Loss 5.7574 LearningRate 0.0343 Epoch: 8 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:42,259-Speed 5426.11 samples/sec Loss 5.6017 LearningRate 0.0343 Epoch: 8 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:44,106-Speed 5566.40 samples/sec Loss 5.4704 LearningRate 0.0343 Epoch: 8 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:25:45,923-Speed 5638.11 samples/sec Loss 5.7691 LearningRate 0.0343 Epoch: 8 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:47,744-Speed 5623.43 samples/sec Loss 5.5833 LearningRate 0.0343 Epoch: 8 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:49,575-Speed 5595.78 samples/sec Loss 5.5922 LearningRate 0.0343 Epoch: 8 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:51,433-Speed 5514.01 samples/sec Loss 5.5848 LearningRate 0.0343 Epoch: 8 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:53,268-Speed 5583.89 samples/sec Loss 5.4889 LearningRate 0.0343 Epoch: 8 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:55,095-Speed 5606.36 samples/sec Loss 5.5227 LearningRate 0.0343 Epoch: 8 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:56,905-Speed 5658.22 samples/sec Loss 5.5484 LearningRate 0.0342 Epoch: 8 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:25:58,727-Speed 5622.84 samples/sec Loss 5.5598 LearningRate 0.0342 Epoch: 8 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:00,556-Speed 5599.94 samples/sec Loss 5.5663 LearningRate 0.0342 Epoch: 8 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:02,397-Speed 5565.75 samples/sec Loss 5.6503 LearningRate 0.0342 Epoch: 8 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:04,205-Speed 5664.93 samples/sec Loss 5.6400 LearningRate 0.0342 Epoch: 8 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:06,018-Speed 5649.73 samples/sec Loss 5.5251 LearningRate 0.0342 Epoch: 8 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:07,826-Speed 5663.43 samples/sec Loss 5.4954 LearningRate 0.0342 Epoch: 8 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:09,644-Speed 5636.12 samples/sec Loss 5.5533 LearningRate 0.0342 Epoch: 8 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:11,476-Speed 5592.01 samples/sec Loss 5.6368 LearningRate 0.0342 Epoch: 8 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:13,288-Speed 5654.66 samples/sec Loss 5.5753 LearningRate 0.0342 Epoch: 8 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:15,113-Speed 5611.54 samples/sec Loss 5.4545 LearningRate 0.0341 Epoch: 8 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:16,941-Speed 5603.70 samples/sec Loss 5.5411 LearningRate 0.0341 Epoch: 8 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:18,780-Speed 5571.55 samples/sec Loss 5.6285 LearningRate 0.0341 Epoch: 8 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:20,641-Speed 5503.92 samples/sec Loss 5.6588 LearningRate 0.0341 Epoch: 8 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:22,489-Speed 5541.14 samples/sec Loss 5.5952 LearningRate 0.0341 Epoch: 8 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:24,353-Speed 5496.25 samples/sec Loss 5.7028 LearningRate 0.0341 Epoch: 8 Global Step: 47320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:26,181-Speed 5603.72 samples/sec Loss 5.4926 LearningRate 0.0341 Epoch: 8 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:27,994-Speed 5650.64 samples/sec Loss 5.5316 LearningRate 0.0341 Epoch: 8 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:29,830-Speed 5578.09 samples/sec Loss 5.5862 LearningRate 0.0341 Epoch: 8 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:26:31,668-Speed 5572.76 samples/sec Loss 5.6170 LearningRate 0.0341 Epoch: 8 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:33,497-Speed 5600.50 samples/sec Loss 5.5770 LearningRate 0.0340 Epoch: 8 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:35,327-Speed 5598.73 samples/sec Loss 5.5721 LearningRate 0.0340 Epoch: 8 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:37,147-Speed 5627.74 samples/sec Loss 5.5947 LearningRate 0.0340 Epoch: 8 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:38,991-Speed 5554.96 samples/sec Loss 5.6553 LearningRate 0.0340 Epoch: 8 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:40,814-Speed 5619.60 samples/sec Loss 5.5635 LearningRate 0.0340 Epoch: 8 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:42,637-Speed 5619.68 samples/sec Loss 5.4849 LearningRate 0.0340 Epoch: 8 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:44,471-Speed 5583.79 samples/sec Loss 5.5552 LearningRate 0.0340 Epoch: 8 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:46,294-Speed 5620.15 samples/sec Loss 5.5668 LearningRate 0.0340 Epoch: 8 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:48,106-Speed 5654.55 samples/sec Loss 5.4981 LearningRate 0.0340 Epoch: 8 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:49,941-Speed 5579.50 samples/sec Loss 5.5472 LearningRate 0.0339 Epoch: 8 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:51,787-Speed 5551.22 samples/sec Loss 5.4967 LearningRate 0.0339 Epoch: 8 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:53,636-Speed 5540.21 samples/sec Loss 5.6605 LearningRate 0.0339 Epoch: 8 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:55,478-Speed 5561.44 samples/sec Loss 5.6548 LearningRate 0.0339 Epoch: 8 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:57,314-Speed 5576.48 samples/sec Loss 5.5316 LearningRate 0.0339 Epoch: 8 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:26:59,158-Speed 5556.20 samples/sec Loss 5.4829 LearningRate 0.0339 Epoch: 8 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:00,986-Speed 5604.41 samples/sec Loss 5.5599 LearningRate 0.0339 Epoch: 8 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:02,810-Speed 5615.74 samples/sec Loss 5.5989 LearningRate 0.0339 Epoch: 8 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:04,622-Speed 5653.30 samples/sec Loss 5.5033 LearningRate 0.0339 Epoch: 8 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:06,447-Speed 5612.00 samples/sec Loss 5.5505 LearningRate 0.0339 Epoch: 8 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:08,273-Speed 5610.13 samples/sec Loss 5.7667 LearningRate 0.0338 Epoch: 8 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:27:10,097-Speed 5615.72 samples/sec Loss 5.5313 LearningRate 0.0338 Epoch: 8 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:27:11,929-Speed 5591.76 samples/sec Loss 5.4650 LearningRate 0.0338 Epoch: 8 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:27:13,772-Speed 5558.49 samples/sec Loss 5.5742 LearningRate 0.0338 Epoch: 8 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:15,580-Speed 5663.06 samples/sec Loss 5.4913 LearningRate 0.0338 Epoch: 8 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:17,426-Speed 5549.12 samples/sec Loss 5.6075 LearningRate 0.0338 Epoch: 8 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:19,255-Speed 5602.37 samples/sec Loss 5.6310 LearningRate 0.0338 Epoch: 8 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:21,125-Speed 5478.62 samples/sec Loss 5.7949 LearningRate 0.0338 Epoch: 8 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:22,944-Speed 5632.68 samples/sec Loss 5.6208 LearningRate 0.0338 Epoch: 8 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:24,768-Speed 5614.68 samples/sec Loss 5.6351 LearningRate 0.0338 Epoch: 8 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:26,616-Speed 5544.99 samples/sec Loss 5.5425 LearningRate 0.0337 Epoch: 8 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:28,449-Speed 5586.23 samples/sec Loss 5.5288 LearningRate 0.0337 Epoch: 8 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:30,306-Speed 5517.74 samples/sec Loss 5.5501 LearningRate 0.0337 Epoch: 8 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:32,131-Speed 5612.08 samples/sec Loss 5.5263 LearningRate 0.0337 Epoch: 8 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:27:33,941-Speed 5658.35 samples/sec Loss 5.5850 LearningRate 0.0337 Epoch: 8 Global Step: 47700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:27:35,783-Speed 5560.06 samples/sec Loss 5.7066 LearningRate 0.0337 Epoch: 8 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:37,615-Speed 5591.33 samples/sec Loss 5.4760 LearningRate 0.0337 Epoch: 8 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:39,453-Speed 5575.66 samples/sec Loss 5.5234 LearningRate 0.0337 Epoch: 8 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:41,293-Speed 5568.87 samples/sec Loss 5.5493 LearningRate 0.0337 Epoch: 8 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:43,111-Speed 5634.88 samples/sec Loss 5.6490 LearningRate 0.0337 Epoch: 8 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:44,929-Speed 5633.12 samples/sec Loss 5.5137 LearningRate 0.0336 Epoch: 8 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:46,754-Speed 5612.96 samples/sec Loss 5.5967 LearningRate 0.0336 Epoch: 8 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:48,571-Speed 5637.45 samples/sec Loss 5.6513 LearningRate 0.0336 Epoch: 8 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:50,385-Speed 5647.16 samples/sec Loss 5.5800 LearningRate 0.0336 Epoch: 8 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:52,205-Speed 5627.76 samples/sec Loss 5.5514 LearningRate 0.0336 Epoch: 8 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:54,022-Speed 5638.18 samples/sec Loss 5.5994 LearningRate 0.0336 Epoch: 8 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:55,834-Speed 5652.93 samples/sec Loss 5.5642 LearningRate 0.0336 Epoch: 8 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:57,646-Speed 5655.39 samples/sec Loss 5.5248 LearningRate 0.0336 Epoch: 8 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:27:59,473-Speed 5605.66 samples/sec Loss 5.5797 LearningRate 0.0336 Epoch: 8 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:01,293-Speed 5627.44 samples/sec Loss 5.5993 LearningRate 0.0336 Epoch: 8 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:03,140-Speed 5546.35 samples/sec Loss 5.5517 LearningRate 0.0335 Epoch: 8 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:04,975-Speed 5584.47 samples/sec Loss 5.5540 LearningRate 0.0335 Epoch: 8 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:06,801-Speed 5608.41 samples/sec Loss 5.5311 LearningRate 0.0335 Epoch: 8 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:08,641-Speed 5566.03 samples/sec Loss 5.5340 LearningRate 0.0335 Epoch: 8 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:10,476-Speed 5583.32 samples/sec Loss 5.4987 LearningRate 0.0335 Epoch: 8 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:12,331-Speed 5522.46 samples/sec Loss 5.5935 LearningRate 0.0335 Epoch: 8 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:28:14,169-Speed 5571.40 samples/sec Loss 5.6319 LearningRate 0.0335 Epoch: 8 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:16,017-Speed 5544.49 samples/sec Loss 5.4488 LearningRate 0.0335 Epoch: 8 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:17,865-Speed 5546.00 samples/sec Loss 5.6977 LearningRate 0.0335 Epoch: 8 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:19,695-Speed 5596.91 samples/sec Loss 5.5002 LearningRate 0.0334 Epoch: 8 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:21,627-Speed 5302.48 samples/sec Loss 5.5280 LearningRate 0.0334 Epoch: 8 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:23,479-Speed 5533.68 samples/sec Loss 5.5771 LearningRate 0.0334 Epoch: 8 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:25,300-Speed 5623.78 samples/sec Loss 5.5862 LearningRate 0.0334 Epoch: 8 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:27,119-Speed 5631.50 samples/sec Loss 5.6587 LearningRate 0.0334 Epoch: 8 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:28,942-Speed 5620.56 samples/sec Loss 5.5933 LearningRate 0.0334 Epoch: 8 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:28:55,258-[lfw][48000]XNorm: 22.802285 Training: 2022-04-27 04:28:55,258-[lfw][48000]Accuracy-Flip: 0.99717+-0.00325 Training: 2022-04-27 04:28:55,259-[lfw][48000]Accuracy-Highest: 0.99750 Training: 2022-04-27 04:29:25,772-[cfp_fp][48000]XNorm: 20.126648 Training: 2022-04-27 04:29:25,773-[cfp_fp][48000]Accuracy-Flip: 0.95257+-0.00958 Training: 2022-04-27 04:29:25,773-[cfp_fp][48000]Accuracy-Highest: 0.95257 Training: 2022-04-27 04:29:52,075-[agedb_30][48000]XNorm: 22.615926 Training: 2022-04-27 04:29:52,076-[agedb_30][48000]Accuracy-Flip: 0.97067+-0.00720 Training: 2022-04-27 04:29:52,076-[agedb_30][48000]Accuracy-Highest: 0.97383 Training: 2022-04-27 04:29:53,941-Speed 120.47 samples/sec Loss 5.5185 LearningRate 0.0334 Epoch: 8 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:29:55,853-Speed 5355.80 samples/sec Loss 5.6398 LearningRate 0.0334 Epoch: 8 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:29:57,777-Speed 5323.74 samples/sec Loss 5.4525 LearningRate 0.0334 Epoch: 8 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:29:59,601-Speed 5615.63 samples/sec Loss 5.3583 LearningRate 0.0334 Epoch: 8 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:01,426-Speed 5614.69 samples/sec Loss 5.4128 LearningRate 0.0333 Epoch: 8 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:03,247-Speed 5623.77 samples/sec Loss 5.5606 LearningRate 0.0333 Epoch: 8 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:05,078-Speed 5594.61 samples/sec Loss 5.6358 LearningRate 0.0333 Epoch: 8 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:06,902-Speed 5616.73 samples/sec Loss 5.5123 LearningRate 0.0333 Epoch: 8 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:08,713-Speed 5655.54 samples/sec Loss 5.4213 LearningRate 0.0333 Epoch: 8 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:10,530-Speed 5639.54 samples/sec Loss 5.5933 LearningRate 0.0333 Epoch: 8 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:12,343-Speed 5650.01 samples/sec Loss 5.5383 LearningRate 0.0333 Epoch: 8 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:14,148-Speed 5676.49 samples/sec Loss 5.6284 LearningRate 0.0333 Epoch: 8 Global Step: 48120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:15,949-Speed 5685.56 samples/sec Loss 5.5778 LearningRate 0.0333 Epoch: 8 Global Step: 48130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:17,767-Speed 5633.76 samples/sec Loss 5.5853 LearningRate 0.0333 Epoch: 8 Global Step: 48140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:19,577-Speed 5661.58 samples/sec Loss 5.4499 LearningRate 0.0332 Epoch: 8 Global Step: 48150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:21,391-Speed 5645.91 samples/sec Loss 5.4932 LearningRate 0.0332 Epoch: 8 Global Step: 48160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:23,211-Speed 5627.48 samples/sec Loss 5.4859 LearningRate 0.0332 Epoch: 8 Global Step: 48170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:25,031-Speed 5627.56 samples/sec Loss 5.6976 LearningRate 0.0332 Epoch: 8 Global Step: 48180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:26,844-Speed 5651.48 samples/sec Loss 5.7465 LearningRate 0.0332 Epoch: 8 Global Step: 48190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:28,681-Speed 5575.50 samples/sec Loss 5.5344 LearningRate 0.0332 Epoch: 8 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:30,496-Speed 5643.44 samples/sec Loss 5.6271 LearningRate 0.0332 Epoch: 8 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:30:32,347-Speed 5535.53 samples/sec Loss 5.6145 LearningRate 0.0332 Epoch: 8 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:34,188-Speed 5563.24 samples/sec Loss 5.5456 LearningRate 0.0332 Epoch: 8 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:36,002-Speed 5648.54 samples/sec Loss 5.5643 LearningRate 0.0332 Epoch: 8 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:37,820-Speed 5633.58 samples/sec Loss 5.6186 LearningRate 0.0331 Epoch: 8 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:39,633-Speed 5651.87 samples/sec Loss 5.5683 LearningRate 0.0331 Epoch: 8 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:41,441-Speed 5663.00 samples/sec Loss 5.5023 LearningRate 0.0331 Epoch: 8 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:43,261-Speed 5628.93 samples/sec Loss 5.5032 LearningRate 0.0331 Epoch: 8 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:45,129-Speed 5483.27 samples/sec Loss 5.5486 LearningRate 0.0331 Epoch: 8 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:46,950-Speed 5625.11 samples/sec Loss 5.5686 LearningRate 0.0331 Epoch: 8 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:48,816-Speed 5491.31 samples/sec Loss 5.5268 LearningRate 0.0331 Epoch: 8 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:30:50,650-Speed 5583.06 samples/sec Loss 5.5389 LearningRate 0.0331 Epoch: 8 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:52,500-Speed 5538.10 samples/sec Loss 5.5804 LearningRate 0.0331 Epoch: 8 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:54,379-Speed 5453.85 samples/sec Loss 5.4820 LearningRate 0.0331 Epoch: 8 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:56,225-Speed 5550.10 samples/sec Loss 5.5050 LearningRate 0.0330 Epoch: 8 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:58,061-Speed 5576.53 samples/sec Loss 5.4404 LearningRate 0.0330 Epoch: 8 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:30:59,891-Speed 5598.57 samples/sec Loss 5.6067 LearningRate 0.0330 Epoch: 8 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:31:01,720-Speed 5602.32 samples/sec Loss 5.5369 LearningRate 0.0330 Epoch: 8 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:31:03,550-Speed 5596.28 samples/sec Loss 5.6281 LearningRate 0.0330 Epoch: 8 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:05,366-Speed 5641.72 samples/sec Loss 5.5531 LearningRate 0.0330 Epoch: 8 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:07,187-Speed 5624.98 samples/sec Loss 5.5399 LearningRate 0.0330 Epoch: 8 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:09,016-Speed 5600.38 samples/sec Loss 5.4290 LearningRate 0.0330 Epoch: 8 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:10,843-Speed 5606.48 samples/sec Loss 5.5746 LearningRate 0.0330 Epoch: 8 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:12,693-Speed 5537.74 samples/sec Loss 5.5806 LearningRate 0.0330 Epoch: 8 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:14,515-Speed 5622.33 samples/sec Loss 5.5729 LearningRate 0.0329 Epoch: 8 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:16,375-Speed 5506.64 samples/sec Loss 5.4785 LearningRate 0.0329 Epoch: 8 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:18,193-Speed 5634.25 samples/sec Loss 5.5898 LearningRate 0.0329 Epoch: 8 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:20,020-Speed 5610.04 samples/sec Loss 5.4781 LearningRate 0.0329 Epoch: 8 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:21,856-Speed 5579.36 samples/sec Loss 5.4634 LearningRate 0.0329 Epoch: 8 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:23,675-Speed 5630.90 samples/sec Loss 5.5420 LearningRate 0.0329 Epoch: 8 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:25,528-Speed 5526.17 samples/sec Loss 5.5532 LearningRate 0.0329 Epoch: 8 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:27,366-Speed 5575.63 samples/sec Loss 5.5038 LearningRate 0.0329 Epoch: 8 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:29,213-Speed 5544.34 samples/sec Loss 5.5797 LearningRate 0.0329 Epoch: 8 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:31,038-Speed 5613.55 samples/sec Loss 5.5588 LearningRate 0.0329 Epoch: 8 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:32,879-Speed 5563.25 samples/sec Loss 5.6545 LearningRate 0.0328 Epoch: 8 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:34,713-Speed 5584.72 samples/sec Loss 5.6147 LearningRate 0.0328 Epoch: 8 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:36,524-Speed 5655.94 samples/sec Loss 5.4802 LearningRate 0.0328 Epoch: 8 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:38,358-Speed 5586.71 samples/sec Loss 5.4265 LearningRate 0.0328 Epoch: 8 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:40,197-Speed 5569.75 samples/sec Loss 5.5030 LearningRate 0.0328 Epoch: 8 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:31:42,044-Speed 5545.78 samples/sec Loss 5.5236 LearningRate 0.0328 Epoch: 8 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:31:43,855-Speed 5657.76 samples/sec Loss 5.4313 LearningRate 0.0328 Epoch: 8 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:45,749-Speed 5406.89 samples/sec Loss 5.5740 LearningRate 0.0328 Epoch: 8 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:47,571-Speed 5622.97 samples/sec Loss 5.5711 LearningRate 0.0328 Epoch: 8 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:49,442-Speed 5475.97 samples/sec Loss 5.5376 LearningRate 0.0328 Epoch: 8 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:51,302-Speed 5506.18 samples/sec Loss 5.4848 LearningRate 0.0327 Epoch: 8 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:53,116-Speed 5648.28 samples/sec Loss 5.5934 LearningRate 0.0327 Epoch: 8 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:54,951-Speed 5582.05 samples/sec Loss 5.4317 LearningRate 0.0327 Epoch: 8 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:56,772-Speed 5624.41 samples/sec Loss 5.5406 LearningRate 0.0327 Epoch: 8 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:31:58,588-Speed 5640.53 samples/sec Loss 5.7195 LearningRate 0.0327 Epoch: 8 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:00,400-Speed 5654.11 samples/sec Loss 5.3294 LearningRate 0.0327 Epoch: 8 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:02,255-Speed 5522.18 samples/sec Loss 5.4688 LearningRate 0.0327 Epoch: 8 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:04,138-Speed 5439.84 samples/sec Loss 5.5345 LearningRate 0.0327 Epoch: 8 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:05,978-Speed 5567.50 samples/sec Loss 5.5032 LearningRate 0.0327 Epoch: 8 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:07,799-Speed 5625.31 samples/sec Loss 5.5261 LearningRate 0.0327 Epoch: 8 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:09,610-Speed 5658.10 samples/sec Loss 5.4409 LearningRate 0.0326 Epoch: 8 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:11,412-Speed 5681.43 samples/sec Loss 5.4688 LearningRate 0.0326 Epoch: 8 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:13,242-Speed 5597.96 samples/sec Loss 5.5307 LearningRate 0.0326 Epoch: 8 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:15,055-Speed 5650.02 samples/sec Loss 5.4860 LearningRate 0.0326 Epoch: 8 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:16,880-Speed 5612.88 samples/sec Loss 5.6236 LearningRate 0.0326 Epoch: 8 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:18,715-Speed 5583.69 samples/sec Loss 5.5217 LearningRate 0.0326 Epoch: 8 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:20,542-Speed 5606.33 samples/sec Loss 5.4913 LearningRate 0.0326 Epoch: 8 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:22,375-Speed 5591.69 samples/sec Loss 5.5447 LearningRate 0.0326 Epoch: 8 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:24,222-Speed 5545.35 samples/sec Loss 5.5916 LearningRate 0.0326 Epoch: 8 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:26,063-Speed 5565.94 samples/sec Loss 5.5145 LearningRate 0.0325 Epoch: 8 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:27,923-Speed 5504.81 samples/sec Loss 5.5711 LearningRate 0.0325 Epoch: 8 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:29,842-Speed 5337.53 samples/sec Loss 5.5116 LearningRate 0.0325 Epoch: 8 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:31,688-Speed 5550.80 samples/sec Loss 5.5234 LearningRate 0.0325 Epoch: 8 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:33,524-Speed 5577.59 samples/sec Loss 5.4855 LearningRate 0.0325 Epoch: 8 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:35,351-Speed 5607.58 samples/sec Loss 5.5468 LearningRate 0.0325 Epoch: 8 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:37,207-Speed 5518.75 samples/sec Loss 5.4571 LearningRate 0.0325 Epoch: 8 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:39,048-Speed 5563.34 samples/sec Loss 5.6111 LearningRate 0.0325 Epoch: 8 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:40,872-Speed 5616.31 samples/sec Loss 5.4768 LearningRate 0.0325 Epoch: 8 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:32:42,677-Speed 5677.68 samples/sec Loss 5.6031 LearningRate 0.0325 Epoch: 8 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:44,490-Speed 5649.01 samples/sec Loss 5.4721 LearningRate 0.0324 Epoch: 8 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:46,323-Speed 5586.82 samples/sec Loss 5.5686 LearningRate 0.0324 Epoch: 8 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:48,147-Speed 5616.32 samples/sec Loss 5.6390 LearningRate 0.0324 Epoch: 8 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:49,997-Speed 5536.66 samples/sec Loss 5.5276 LearningRate 0.0324 Epoch: 8 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:51,822-Speed 5613.20 samples/sec Loss 5.5709 LearningRate 0.0324 Epoch: 8 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:53,647-Speed 5614.41 samples/sec Loss 5.3897 LearningRate 0.0324 Epoch: 8 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:55,461-Speed 5646.10 samples/sec Loss 5.3903 LearningRate 0.0324 Epoch: 8 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:57,279-Speed 5635.92 samples/sec Loss 5.7091 LearningRate 0.0324 Epoch: 8 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:32:59,114-Speed 5580.53 samples/sec Loss 5.5154 LearningRate 0.0324 Epoch: 8 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:00,929-Speed 5643.20 samples/sec Loss 5.3813 LearningRate 0.0324 Epoch: 8 Global Step: 49030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:02,748-Speed 5633.67 samples/sec Loss 5.3708 LearningRate 0.0323 Epoch: 8 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:04,574-Speed 5608.40 samples/sec Loss 5.4378 LearningRate 0.0323 Epoch: 8 Global Step: 49050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:06,405-Speed 5595.30 samples/sec Loss 5.4136 LearningRate 0.0323 Epoch: 8 Global Step: 49060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:08,213-Speed 5665.49 samples/sec Loss 5.5034 LearningRate 0.0323 Epoch: 8 Global Step: 49070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:10,011-Speed 5703.97 samples/sec Loss 5.5183 LearningRate 0.0323 Epoch: 8 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:11,844-Speed 5587.56 samples/sec Loss 5.5719 LearningRate 0.0323 Epoch: 8 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:13,662-Speed 5636.26 samples/sec Loss 5.4693 LearningRate 0.0323 Epoch: 8 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:15,484-Speed 5621.71 samples/sec Loss 5.3706 LearningRate 0.0323 Epoch: 8 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:17,293-Speed 5660.92 samples/sec Loss 5.4186 LearningRate 0.0323 Epoch: 8 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:19,110-Speed 5636.97 samples/sec Loss 5.5655 LearningRate 0.0323 Epoch: 8 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:20,927-Speed 5637.57 samples/sec Loss 5.4675 LearningRate 0.0322 Epoch: 8 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:22,743-Speed 5641.29 samples/sec Loss 5.5210 LearningRate 0.0322 Epoch: 8 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:24,567-Speed 5616.60 samples/sec Loss 5.4894 LearningRate 0.0322 Epoch: 8 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:26,399-Speed 5591.71 samples/sec Loss 5.4160 LearningRate 0.0322 Epoch: 8 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:28,301-Speed 5385.83 samples/sec Loss 5.5253 LearningRate 0.0322 Epoch: 8 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:30,120-Speed 5632.34 samples/sec Loss 5.5905 LearningRate 0.0322 Epoch: 8 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:31,955-Speed 5583.53 samples/sec Loss 5.5007 LearningRate 0.0322 Epoch: 8 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:33,773-Speed 5634.13 samples/sec Loss 5.5864 LearningRate 0.0322 Epoch: 8 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:35,613-Speed 5566.56 samples/sec Loss 5.5439 LearningRate 0.0322 Epoch: 8 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:37,457-Speed 5553.67 samples/sec Loss 5.3382 LearningRate 0.0322 Epoch: 8 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:39,272-Speed 5643.42 samples/sec Loss 5.5216 LearningRate 0.0321 Epoch: 8 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:41,089-Speed 5638.86 samples/sec Loss 5.5315 LearningRate 0.0321 Epoch: 8 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:42,914-Speed 5612.99 samples/sec Loss 5.5688 LearningRate 0.0321 Epoch: 8 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:44,739-Speed 5612.72 samples/sec Loss 5.5087 LearningRate 0.0321 Epoch: 8 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:33:46,553-Speed 5645.62 samples/sec Loss 5.5548 LearningRate 0.0321 Epoch: 8 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:48,385-Speed 5593.28 samples/sec Loss 5.4346 LearningRate 0.0321 Epoch: 8 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:50,216-Speed 5593.10 samples/sec Loss 5.4762 LearningRate 0.0321 Epoch: 8 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:52,060-Speed 5554.69 samples/sec Loss 5.4166 LearningRate 0.0321 Epoch: 8 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:53,887-Speed 5608.86 samples/sec Loss 5.3962 LearningRate 0.0321 Epoch: 8 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:55,719-Speed 5589.34 samples/sec Loss 5.5963 LearningRate 0.0321 Epoch: 8 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:57,540-Speed 5628.06 samples/sec Loss 5.5485 LearningRate 0.0321 Epoch: 8 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:33:59,354-Speed 5648.44 samples/sec Loss 5.5033 LearningRate 0.0320 Epoch: 8 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:01,173-Speed 5630.87 samples/sec Loss 5.4440 LearningRate 0.0320 Epoch: 8 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:02,996-Speed 5616.84 samples/sec Loss 5.3957 LearningRate 0.0320 Epoch: 8 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:04,835-Speed 5569.81 samples/sec Loss 5.4197 LearningRate 0.0320 Epoch: 8 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:06,651-Speed 5642.02 samples/sec Loss 5.4283 LearningRate 0.0320 Epoch: 8 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:08,483-Speed 5592.88 samples/sec Loss 5.4369 LearningRate 0.0320 Epoch: 8 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:10,315-Speed 5592.15 samples/sec Loss 5.4304 LearningRate 0.0320 Epoch: 8 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:12,145-Speed 5596.94 samples/sec Loss 5.4020 LearningRate 0.0320 Epoch: 8 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:13,982-Speed 5577.21 samples/sec Loss 5.3845 LearningRate 0.0320 Epoch: 8 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:15,811-Speed 5598.04 samples/sec Loss 5.3801 LearningRate 0.0320 Epoch: 8 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:17,668-Speed 5517.77 samples/sec Loss 5.5353 LearningRate 0.0319 Epoch: 8 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:19,506-Speed 5572.93 samples/sec Loss 5.4808 LearningRate 0.0319 Epoch: 8 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:21,328-Speed 5621.29 samples/sec Loss 5.4104 LearningRate 0.0319 Epoch: 8 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:23,137-Speed 5662.33 samples/sec Loss 5.4299 LearningRate 0.0319 Epoch: 8 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:24,943-Speed 5670.43 samples/sec Loss 5.4176 LearningRate 0.0319 Epoch: 8 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:26,771-Speed 5604.96 samples/sec Loss 5.5145 LearningRate 0.0319 Epoch: 8 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:28,599-Speed 5602.84 samples/sec Loss 5.4537 LearningRate 0.0319 Epoch: 8 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:30,426-Speed 5608.31 samples/sec Loss 5.3437 LearningRate 0.0319 Epoch: 8 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:32,278-Speed 5531.29 samples/sec Loss 5.4381 LearningRate 0.0319 Epoch: 8 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:34,105-Speed 5606.60 samples/sec Loss 5.4901 LearningRate 0.0319 Epoch: 8 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:35,979-Speed 5464.75 samples/sec Loss 5.6222 LearningRate 0.0318 Epoch: 8 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:37,820-Speed 5564.77 samples/sec Loss 5.5388 LearningRate 0.0318 Epoch: 8 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:39,650-Speed 5599.25 samples/sec Loss 5.3307 LearningRate 0.0318 Epoch: 8 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:41,482-Speed 5590.24 samples/sec Loss 5.4677 LearningRate 0.0318 Epoch: 8 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:43,330-Speed 5544.54 samples/sec Loss 5.5612 LearningRate 0.0318 Epoch: 8 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:45,153-Speed 5619.51 samples/sec Loss 5.5055 LearningRate 0.0318 Epoch: 8 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:34:46,982-Speed 5599.80 samples/sec Loss 5.4930 LearningRate 0.0318 Epoch: 8 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:48,815-Speed 5612.85 samples/sec Loss 5.3628 LearningRate 0.0318 Epoch: 8 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:50,661-Speed 5548.16 samples/sec Loss 5.4220 LearningRate 0.0318 Epoch: 8 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:52,510-Speed 5541.45 samples/sec Loss 5.4133 LearningRate 0.0318 Epoch: 8 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:54,358-Speed 5541.53 samples/sec Loss 5.5002 LearningRate 0.0317 Epoch: 8 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:56,193-Speed 5583.75 samples/sec Loss 5.5488 LearningRate 0.0317 Epoch: 8 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:58,026-Speed 5586.55 samples/sec Loss 5.4351 LearningRate 0.0317 Epoch: 8 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:34:59,853-Speed 5606.46 samples/sec Loss 5.3910 LearningRate 0.0317 Epoch: 8 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:01,720-Speed 5487.83 samples/sec Loss 5.3769 LearningRate 0.0317 Epoch: 8 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:03,562-Speed 5562.85 samples/sec Loss 5.4538 LearningRate 0.0317 Epoch: 8 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:05,417-Speed 5521.65 samples/sec Loss 5.4421 LearningRate 0.0317 Epoch: 8 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:07,228-Speed 5655.18 samples/sec Loss 5.4350 LearningRate 0.0317 Epoch: 8 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:09,068-Speed 5567.29 samples/sec Loss 5.4920 LearningRate 0.0317 Epoch: 8 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:10,902-Speed 5587.20 samples/sec Loss 5.4707 LearningRate 0.0317 Epoch: 8 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:12,713-Speed 5653.63 samples/sec Loss 5.4167 LearningRate 0.0316 Epoch: 8 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:14,550-Speed 5577.45 samples/sec Loss 5.3494 LearningRate 0.0316 Epoch: 8 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:16,391-Speed 5563.59 samples/sec Loss 5.3736 LearningRate 0.0316 Epoch: 8 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:18,221-Speed 5597.15 samples/sec Loss 5.5987 LearningRate 0.0316 Epoch: 8 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:20,056-Speed 5583.16 samples/sec Loss 5.3538 LearningRate 0.0316 Epoch: 8 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:21,915-Speed 5510.19 samples/sec Loss 5.4978 LearningRate 0.0316 Epoch: 8 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:23,755-Speed 5568.84 samples/sec Loss 5.3892 LearningRate 0.0316 Epoch: 8 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:25,585-Speed 5596.28 samples/sec Loss 5.4754 LearningRate 0.0316 Epoch: 8 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:27,419-Speed 5587.26 samples/sec Loss 5.3966 LearningRate 0.0316 Epoch: 8 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:29,257-Speed 5572.13 samples/sec Loss 5.4599 LearningRate 0.0316 Epoch: 8 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:31,086-Speed 5601.18 samples/sec Loss 5.3857 LearningRate 0.0315 Epoch: 8 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:32,895-Speed 5660.46 samples/sec Loss 5.4170 LearningRate 0.0315 Epoch: 8 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:34,720-Speed 5612.57 samples/sec Loss 5.3616 LearningRate 0.0315 Epoch: 8 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:36,559-Speed 5569.92 samples/sec Loss 5.3792 LearningRate 0.0315 Epoch: 8 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:38,374-Speed 5644.48 samples/sec Loss 5.4574 LearningRate 0.0315 Epoch: 8 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:40,215-Speed 5563.96 samples/sec Loss 5.4246 LearningRate 0.0315 Epoch: 8 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:42,037-Speed 5621.13 samples/sec Loss 5.5511 LearningRate 0.0315 Epoch: 8 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:43,868-Speed 5596.77 samples/sec Loss 5.2946 LearningRate 0.0315 Epoch: 8 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:45,715-Speed 5545.96 samples/sec Loss 5.5551 LearningRate 0.0315 Epoch: 8 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:47,575-Speed 5506.90 samples/sec Loss 5.4364 LearningRate 0.0315 Epoch: 8 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:49,423-Speed 5542.80 samples/sec Loss 5.5040 LearningRate 0.0314 Epoch: 8 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:35:51,247-Speed 5615.56 samples/sec Loss 5.3832 LearningRate 0.0314 Epoch: 8 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:53,061-Speed 5646.38 samples/sec Loss 5.3827 LearningRate 0.0314 Epoch: 8 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:54,886-Speed 5613.59 samples/sec Loss 5.3545 LearningRate 0.0314 Epoch: 8 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:56,708-Speed 5623.45 samples/sec Loss 5.3142 LearningRate 0.0314 Epoch: 8 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:35:58,510-Speed 5684.74 samples/sec Loss 5.4579 LearningRate 0.0314 Epoch: 8 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:36:24,607-[lfw][50000]XNorm: 22.979849 Training: 2022-04-27 04:36:24,607-[lfw][50000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-27 04:36:24,608-[lfw][50000]Accuracy-Highest: 0.99800 Training: 2022-04-27 04:36:54,814-[cfp_fp][50000]XNorm: 19.953626 Training: 2022-04-27 04:36:54,815-[cfp_fp][50000]Accuracy-Flip: 0.94829+-0.01188 Training: 2022-04-27 04:36:54,815-[cfp_fp][50000]Accuracy-Highest: 0.95257 Training: 2022-04-27 04:37:20,844-[agedb_30][50000]XNorm: 22.859662 Training: 2022-04-27 04:37:20,845-[agedb_30][50000]Accuracy-Flip: 0.97217+-0.00820 Training: 2022-04-27 04:37:20,845-[agedb_30][50000]Accuracy-Highest: 0.97383 Training: 2022-04-27 04:37:22,698-Speed 121.63 samples/sec Loss 5.3863 LearningRate 0.0314 Epoch: 8 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:24,511-Speed 5651.59 samples/sec Loss 5.4194 LearningRate 0.0314 Epoch: 8 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:26,326-Speed 5643.63 samples/sec Loss 5.3486 LearningRate 0.0314 Epoch: 8 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:28,129-Speed 5679.75 samples/sec Loss 5.4279 LearningRate 0.0314 Epoch: 8 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:29,941-Speed 5654.87 samples/sec Loss 5.5469 LearningRate 0.0313 Epoch: 8 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:31,787-Speed 5549.76 samples/sec Loss 5.5559 LearningRate 0.0313 Epoch: 8 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:33,612-Speed 5612.05 samples/sec Loss 5.3708 LearningRate 0.0313 Epoch: 8 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:35,441-Speed 5598.84 samples/sec Loss 5.3564 LearningRate 0.0313 Epoch: 8 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:37,269-Speed 5605.65 samples/sec Loss 5.3995 LearningRate 0.0313 Epoch: 8 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:39,072-Speed 5678.76 samples/sec Loss 5.2726 LearningRate 0.0313 Epoch: 8 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:40,905-Speed 5587.54 samples/sec Loss 5.4320 LearningRate 0.0313 Epoch: 8 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:42,802-Speed 5402.37 samples/sec Loss 5.4034 LearningRate 0.0313 Epoch: 8 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:44,690-Speed 5425.94 samples/sec Loss 5.4454 LearningRate 0.0313 Epoch: 8 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:46,500-Speed 5660.68 samples/sec Loss 5.5323 LearningRate 0.0313 Epoch: 8 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:48,332-Speed 5590.81 samples/sec Loss 5.4445 LearningRate 0.0312 Epoch: 8 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:50,197-Speed 5491.17 samples/sec Loss 5.3892 LearningRate 0.0312 Epoch: 8 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:52,021-Speed 5617.96 samples/sec Loss 5.4884 LearningRate 0.0312 Epoch: 8 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:53,860-Speed 5569.11 samples/sec Loss 5.3968 LearningRate 0.0312 Epoch: 8 Global Step: 50180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:55,707-Speed 5546.44 samples/sec Loss 5.4953 LearningRate 0.0312 Epoch: 8 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:37:57,541-Speed 5585.83 samples/sec Loss 5.4336 LearningRate 0.0312 Epoch: 8 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:37:59,362-Speed 5625.18 samples/sec Loss 5.5659 LearningRate 0.0312 Epoch: 8 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:38:01,179-Speed 5636.46 samples/sec Loss 5.4281 LearningRate 0.0312 Epoch: 8 Global Step: 50220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:03,021-Speed 5559.79 samples/sec Loss 5.4386 LearningRate 0.0312 Epoch: 8 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:04,846-Speed 5613.81 samples/sec Loss 5.4537 LearningRate 0.0312 Epoch: 8 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:06,653-Speed 5669.00 samples/sec Loss 5.3575 LearningRate 0.0312 Epoch: 8 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:08,461-Speed 5666.26 samples/sec Loss 5.4696 LearningRate 0.0311 Epoch: 8 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:10,280-Speed 5632.56 samples/sec Loss 5.3360 LearningRate 0.0311 Epoch: 8 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:12,108-Speed 5604.00 samples/sec Loss 5.4463 LearningRate 0.0311 Epoch: 8 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:13,920-Speed 5651.16 samples/sec Loss 5.4592 LearningRate 0.0311 Epoch: 8 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:15,770-Speed 5535.99 samples/sec Loss 5.3139 LearningRate 0.0311 Epoch: 8 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:17,594-Speed 5617.98 samples/sec Loss 5.5668 LearningRate 0.0311 Epoch: 8 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:19,411-Speed 5635.73 samples/sec Loss 5.3348 LearningRate 0.0311 Epoch: 8 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:21,227-Speed 5640.99 samples/sec Loss 5.2993 LearningRate 0.0311 Epoch: 8 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:23,056-Speed 5600.64 samples/sec Loss 5.3484 LearningRate 0.0311 Epoch: 8 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:24,889-Speed 5589.11 samples/sec Loss 5.3566 LearningRate 0.0311 Epoch: 8 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:26,727-Speed 5574.37 samples/sec Loss 5.4318 LearningRate 0.0310 Epoch: 8 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:28,574-Speed 5543.62 samples/sec Loss 5.4311 LearningRate 0.0310 Epoch: 8 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:30,422-Speed 5543.87 samples/sec Loss 5.3672 LearningRate 0.0310 Epoch: 8 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:32,269-Speed 5547.89 samples/sec Loss 5.4005 LearningRate 0.0310 Epoch: 8 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:34,112-Speed 5557.88 samples/sec Loss 5.3204 LearningRate 0.0310 Epoch: 8 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:35,927-Speed 5642.49 samples/sec Loss 5.3393 LearningRate 0.0310 Epoch: 8 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:37,749-Speed 5622.21 samples/sec Loss 5.4126 LearningRate 0.0310 Epoch: 8 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:38:39,579-Speed 5597.95 samples/sec Loss 5.3833 LearningRate 0.0310 Epoch: 8 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:38:41,405-Speed 5609.34 samples/sec Loss 5.3308 LearningRate 0.0310 Epoch: 8 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:43,263-Speed 5511.49 samples/sec Loss 5.4923 LearningRate 0.0310 Epoch: 8 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:45,110-Speed 5548.25 samples/sec Loss 5.3890 LearningRate 0.0309 Epoch: 8 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:46,942-Speed 5590.55 samples/sec Loss 5.3680 LearningRate 0.0309 Epoch: 8 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:48,777-Speed 5583.44 samples/sec Loss 5.4017 LearningRate 0.0309 Epoch: 8 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:50,628-Speed 5533.07 samples/sec Loss 5.4052 LearningRate 0.0309 Epoch: 8 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:52,454-Speed 5610.62 samples/sec Loss 5.3680 LearningRate 0.0309 Epoch: 8 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:54,283-Speed 5601.50 samples/sec Loss 5.3596 LearningRate 0.0309 Epoch: 8 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:56,096-Speed 5650.18 samples/sec Loss 5.4369 LearningRate 0.0309 Epoch: 8 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:57,906-Speed 5659.18 samples/sec Loss 5.4286 LearningRate 0.0309 Epoch: 8 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:38:59,729-Speed 5619.92 samples/sec Loss 5.5593 LearningRate 0.0309 Epoch: 8 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:01,553-Speed 5614.19 samples/sec Loss 5.4793 LearningRate 0.0309 Epoch: 8 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:03,391-Speed 5576.26 samples/sec Loss 5.3625 LearningRate 0.0308 Epoch: 8 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:05,211-Speed 5626.86 samples/sec Loss 5.2587 LearningRate 0.0308 Epoch: 8 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:07,053-Speed 5562.04 samples/sec Loss 5.4226 LearningRate 0.0308 Epoch: 8 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:08,872-Speed 5629.23 samples/sec Loss 5.2951 LearningRate 0.0308 Epoch: 8 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:10,685-Speed 5651.30 samples/sec Loss 5.3812 LearningRate 0.0308 Epoch: 8 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:12,504-Speed 5630.48 samples/sec Loss 5.3307 LearningRate 0.0308 Epoch: 8 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:14,315-Speed 5655.41 samples/sec Loss 5.5338 LearningRate 0.0308 Epoch: 8 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:16,142-Speed 5608.41 samples/sec Loss 5.2697 LearningRate 0.0308 Epoch: 8 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:17,972-Speed 5596.98 samples/sec Loss 5.4370 LearningRate 0.0308 Epoch: 8 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:19,787-Speed 5644.66 samples/sec Loss 5.4028 LearningRate 0.0308 Epoch: 8 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:21,596-Speed 5662.06 samples/sec Loss 5.4029 LearningRate 0.0307 Epoch: 8 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:23,414-Speed 5634.04 samples/sec Loss 5.2746 LearningRate 0.0307 Epoch: 8 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:25,246-Speed 5592.25 samples/sec Loss 5.4549 LearningRate 0.0307 Epoch: 8 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:27,069-Speed 5619.42 samples/sec Loss 5.3344 LearningRate 0.0307 Epoch: 8 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:28,912-Speed 5556.52 samples/sec Loss 5.4566 LearningRate 0.0307 Epoch: 8 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:30,733-Speed 5625.76 samples/sec Loss 5.4359 LearningRate 0.0307 Epoch: 8 Global Step: 50710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:32,558-Speed 5613.97 samples/sec Loss 5.4387 LearningRate 0.0307 Epoch: 8 Global Step: 50720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:34,377-Speed 5632.01 samples/sec Loss 5.3326 LearningRate 0.0307 Epoch: 8 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:36,194-Speed 5635.62 samples/sec Loss 5.4226 LearningRate 0.0307 Epoch: 8 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:38,017-Speed 5619.82 samples/sec Loss 5.2701 LearningRate 0.0307 Epoch: 8 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:39,832-Speed 5643.30 samples/sec Loss 5.3001 LearningRate 0.0307 Epoch: 8 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:41,661-Speed 5600.80 samples/sec Loss 5.4062 LearningRate 0.0306 Epoch: 8 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:43,492-Speed 5594.06 samples/sec Loss 5.4216 LearningRate 0.0306 Epoch: 8 Global Step: 50780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:45,330-Speed 5572.83 samples/sec Loss 5.4201 LearningRate 0.0306 Epoch: 8 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:47,164-Speed 5585.34 samples/sec Loss 5.3020 LearningRate 0.0306 Epoch: 8 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:49,010-Speed 5549.78 samples/sec Loss 5.3246 LearningRate 0.0306 Epoch: 8 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:50,844-Speed 5586.48 samples/sec Loss 5.4324 LearningRate 0.0306 Epoch: 8 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:52,680-Speed 5578.73 samples/sec Loss 5.3660 LearningRate 0.0306 Epoch: 8 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:39:54,483-Speed 5680.84 samples/sec Loss 5.4202 LearningRate 0.0306 Epoch: 8 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:56,305-Speed 5622.62 samples/sec Loss 5.3751 LearningRate 0.0306 Epoch: 8 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:58,164-Speed 5509.71 samples/sec Loss 5.2943 LearningRate 0.0306 Epoch: 8 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:39:59,979-Speed 5644.51 samples/sec Loss 5.2993 LearningRate 0.0305 Epoch: 8 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:01,866-Speed 5426.37 samples/sec Loss 5.3432 LearningRate 0.0305 Epoch: 8 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:03,693-Speed 5607.90 samples/sec Loss 5.3808 LearningRate 0.0305 Epoch: 8 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:05,534-Speed 5564.16 samples/sec Loss 5.3548 LearningRate 0.0305 Epoch: 8 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:07,389-Speed 5522.29 samples/sec Loss 5.5173 LearningRate 0.0305 Epoch: 8 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:09,238-Speed 5539.30 samples/sec Loss 5.4439 LearningRate 0.0305 Epoch: 8 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:11,059-Speed 5625.48 samples/sec Loss 5.2632 LearningRate 0.0305 Epoch: 8 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:12,896-Speed 5576.43 samples/sec Loss 5.2454 LearningRate 0.0305 Epoch: 8 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:14,708-Speed 5651.66 samples/sec Loss 5.3613 LearningRate 0.0305 Epoch: 8 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:16,532-Speed 5620.68 samples/sec Loss 5.4388 LearningRate 0.0305 Epoch: 8 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:18,354-Speed 5621.92 samples/sec Loss 5.3620 LearningRate 0.0304 Epoch: 8 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:20,188-Speed 5585.22 samples/sec Loss 5.4182 LearningRate 0.0304 Epoch: 8 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:22,019-Speed 5591.85 samples/sec Loss 5.2460 LearningRate 0.0304 Epoch: 8 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:23,852-Speed 5589.13 samples/sec Loss 5.3245 LearningRate 0.0304 Epoch: 8 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:25,685-Speed 5590.73 samples/sec Loss 5.1971 LearningRate 0.0304 Epoch: 8 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:27,494-Speed 5661.57 samples/sec Loss 5.3261 LearningRate 0.0304 Epoch: 8 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:29,316-Speed 5622.10 samples/sec Loss 5.2841 LearningRate 0.0304 Epoch: 8 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:31,148-Speed 5591.81 samples/sec Loss 5.1495 LearningRate 0.0304 Epoch: 8 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:32,974-Speed 5609.84 samples/sec Loss 5.2480 LearningRate 0.0304 Epoch: 8 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:34,807-Speed 5588.87 samples/sec Loss 5.4196 LearningRate 0.0304 Epoch: 8 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:36,644-Speed 5574.95 samples/sec Loss 5.4364 LearningRate 0.0304 Epoch: 8 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:38,470-Speed 5609.11 samples/sec Loss 5.3406 LearningRate 0.0303 Epoch: 8 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:40,291-Speed 5627.21 samples/sec Loss 5.3413 LearningRate 0.0303 Epoch: 8 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:40:42,112-Speed 5622.71 samples/sec Loss 5.3687 LearningRate 0.0303 Epoch: 8 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:43,929-Speed 5639.12 samples/sec Loss 5.2841 LearningRate 0.0303 Epoch: 8 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:45,763-Speed 5584.63 samples/sec Loss 5.3601 LearningRate 0.0303 Epoch: 8 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:47,576-Speed 5651.82 samples/sec Loss 5.2811 LearningRate 0.0303 Epoch: 8 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:49,412-Speed 5580.47 samples/sec Loss 5.3649 LearningRate 0.0303 Epoch: 8 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:51,270-Speed 5511.59 samples/sec Loss 5.2257 LearningRate 0.0303 Epoch: 8 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:53,117-Speed 5546.83 samples/sec Loss 5.3066 LearningRate 0.0303 Epoch: 8 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:40:54,999-Speed 5442.72 samples/sec Loss 5.4113 LearningRate 0.0303 Epoch: 8 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:08,858-Speed 738.93 samples/sec Loss 4.9760 LearningRate 0.0302 Epoch: 9 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:10,695-Speed 5575.84 samples/sec Loss 4.6893 LearningRate 0.0302 Epoch: 9 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:12,566-Speed 5476.39 samples/sec Loss 4.6158 LearningRate 0.0302 Epoch: 9 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:14,413-Speed 5546.99 samples/sec Loss 4.6266 LearningRate 0.0302 Epoch: 9 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:16,240-Speed 5606.17 samples/sec Loss 4.7044 LearningRate 0.0302 Epoch: 9 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:18,059-Speed 5632.35 samples/sec Loss 4.6392 LearningRate 0.0302 Epoch: 9 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:19,884-Speed 5612.98 samples/sec Loss 4.7309 LearningRate 0.0302 Epoch: 9 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:21,709-Speed 5612.83 samples/sec Loss 4.6451 LearningRate 0.0302 Epoch: 9 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:23,525-Speed 5641.45 samples/sec Loss 4.7851 LearningRate 0.0302 Epoch: 9 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:25,413-Speed 5423.67 samples/sec Loss 4.8202 LearningRate 0.0302 Epoch: 9 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:41:27,265-Speed 5534.53 samples/sec Loss 4.7141 LearningRate 0.0301 Epoch: 9 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:29,072-Speed 5667.52 samples/sec Loss 4.7347 LearningRate 0.0301 Epoch: 9 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:30,905-Speed 5590.17 samples/sec Loss 4.5634 LearningRate 0.0301 Epoch: 9 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:32,723-Speed 5632.01 samples/sec Loss 4.8200 LearningRate 0.0301 Epoch: 9 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:34,661-Speed 5287.16 samples/sec Loss 4.7941 LearningRate 0.0301 Epoch: 9 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:36,490-Speed 5599.99 samples/sec Loss 4.8365 LearningRate 0.0301 Epoch: 9 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:38,305-Speed 5645.55 samples/sec Loss 4.7886 LearningRate 0.0301 Epoch: 9 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:40,147-Speed 5558.94 samples/sec Loss 4.8056 LearningRate 0.0301 Epoch: 9 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:41:41,981-Speed 5586.25 samples/sec Loss 4.6969 LearningRate 0.0301 Epoch: 9 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:43,835-Speed 5527.45 samples/sec Loss 4.9149 LearningRate 0.0301 Epoch: 9 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:45,726-Speed 5416.17 samples/sec Loss 4.8875 LearningRate 0.0301 Epoch: 9 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:47,592-Speed 5489.55 samples/sec Loss 4.8636 LearningRate 0.0300 Epoch: 9 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:49,431-Speed 5570.78 samples/sec Loss 4.8001 LearningRate 0.0300 Epoch: 9 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:51,286-Speed 5520.76 samples/sec Loss 4.8095 LearningRate 0.0300 Epoch: 9 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:53,136-Speed 5535.96 samples/sec Loss 4.8137 LearningRate 0.0300 Epoch: 9 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:54,967-Speed 5594.63 samples/sec Loss 4.7680 LearningRate 0.0300 Epoch: 9 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:56,843-Speed 5461.90 samples/sec Loss 4.9782 LearningRate 0.0300 Epoch: 9 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:41:58,677-Speed 5584.17 samples/sec Loss 5.0541 LearningRate 0.0300 Epoch: 9 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:42:00,508-Speed 5596.32 samples/sec Loss 4.8309 LearningRate 0.0300 Epoch: 9 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:02,327-Speed 5629.98 samples/sec Loss 4.9375 LearningRate 0.0300 Epoch: 9 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:04,147-Speed 5630.65 samples/sec Loss 4.8000 LearningRate 0.0300 Epoch: 9 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:05,958-Speed 5654.62 samples/sec Loss 4.8743 LearningRate 0.0299 Epoch: 9 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:07,773-Speed 5643.96 samples/sec Loss 4.7847 LearningRate 0.0299 Epoch: 9 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:09,608-Speed 5580.90 samples/sec Loss 4.8383 LearningRate 0.0299 Epoch: 9 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:11,417-Speed 5663.49 samples/sec Loss 4.9224 LearningRate 0.0299 Epoch: 9 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:13,239-Speed 5623.78 samples/sec Loss 4.8888 LearningRate 0.0299 Epoch: 9 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:15,082-Speed 5554.94 samples/sec Loss 4.9075 LearningRate 0.0299 Epoch: 9 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:16,904-Speed 5625.76 samples/sec Loss 4.9380 LearningRate 0.0299 Epoch: 9 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:18,722-Speed 5634.06 samples/sec Loss 4.9158 LearningRate 0.0299 Epoch: 9 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:20,542-Speed 5626.08 samples/sec Loss 4.8384 LearningRate 0.0299 Epoch: 9 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:22,381-Speed 5571.59 samples/sec Loss 5.0186 LearningRate 0.0299 Epoch: 9 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:24,193-Speed 5653.73 samples/sec Loss 4.7985 LearningRate 0.0298 Epoch: 9 Global Step: 51590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:26,006-Speed 5648.82 samples/sec Loss 4.9126 LearningRate 0.0298 Epoch: 9 Global Step: 51600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:27,853-Speed 5545.89 samples/sec Loss 4.8782 LearningRate 0.0298 Epoch: 9 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:29,682-Speed 5599.93 samples/sec Loss 5.0145 LearningRate 0.0298 Epoch: 9 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:31,611-Speed 5311.17 samples/sec Loss 4.9973 LearningRate 0.0298 Epoch: 9 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:33,543-Speed 5302.85 samples/sec Loss 4.9620 LearningRate 0.0298 Epoch: 9 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:35,396-Speed 5526.00 samples/sec Loss 5.0548 LearningRate 0.0298 Epoch: 9 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:37,236-Speed 5567.00 samples/sec Loss 5.0616 LearningRate 0.0298 Epoch: 9 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:39,059-Speed 5619.55 samples/sec Loss 4.8961 LearningRate 0.0298 Epoch: 9 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:40,893-Speed 5585.79 samples/sec Loss 4.9679 LearningRate 0.0298 Epoch: 9 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:42,705-Speed 5651.68 samples/sec Loss 4.9739 LearningRate 0.0298 Epoch: 9 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:44,540-Speed 5582.40 samples/sec Loss 5.0947 LearningRate 0.0297 Epoch: 9 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:46,354-Speed 5647.73 samples/sec Loss 5.0775 LearningRate 0.0297 Epoch: 9 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:42:48,181-Speed 5608.60 samples/sec Loss 5.0324 LearningRate 0.0297 Epoch: 9 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:49,992-Speed 5655.59 samples/sec Loss 4.8678 LearningRate 0.0297 Epoch: 9 Global Step: 51730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:51,828-Speed 5580.02 samples/sec Loss 4.9340 LearningRate 0.0297 Epoch: 9 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:53,651-Speed 5618.43 samples/sec Loss 4.9854 LearningRate 0.0297 Epoch: 9 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:55,482-Speed 5593.64 samples/sec Loss 4.9198 LearningRate 0.0297 Epoch: 9 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:57,328-Speed 5550.00 samples/sec Loss 5.0475 LearningRate 0.0297 Epoch: 9 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:42:59,160-Speed 5590.00 samples/sec Loss 4.9629 LearningRate 0.0297 Epoch: 9 Global Step: 51780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:00,998-Speed 5572.01 samples/sec Loss 5.0415 LearningRate 0.0297 Epoch: 9 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:02,837-Speed 5571.42 samples/sec Loss 4.8764 LearningRate 0.0296 Epoch: 9 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:04,660-Speed 5619.43 samples/sec Loss 4.9035 LearningRate 0.0296 Epoch: 9 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:06,477-Speed 5637.47 samples/sec Loss 5.0495 LearningRate 0.0296 Epoch: 9 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:08,295-Speed 5635.84 samples/sec Loss 5.0537 LearningRate 0.0296 Epoch: 9 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:10,099-Speed 5677.66 samples/sec Loss 4.9587 LearningRate 0.0296 Epoch: 9 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:11,917-Speed 5635.06 samples/sec Loss 5.0261 LearningRate 0.0296 Epoch: 9 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:13,737-Speed 5625.27 samples/sec Loss 4.9744 LearningRate 0.0296 Epoch: 9 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:15,569-Speed 5591.94 samples/sec Loss 4.9787 LearningRate 0.0296 Epoch: 9 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:17,388-Speed 5632.19 samples/sec Loss 4.9050 LearningRate 0.0296 Epoch: 9 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:19,211-Speed 5619.35 samples/sec Loss 4.8759 LearningRate 0.0296 Epoch: 9 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:21,047-Speed 5580.07 samples/sec Loss 4.9484 LearningRate 0.0296 Epoch: 9 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:22,886-Speed 5567.72 samples/sec Loss 5.0528 LearningRate 0.0295 Epoch: 9 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:24,707-Speed 5626.67 samples/sec Loss 5.0756 LearningRate 0.0295 Epoch: 9 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:26,533-Speed 5608.87 samples/sec Loss 5.1066 LearningRate 0.0295 Epoch: 9 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:43:28,357-Speed 5616.71 samples/sec Loss 5.0250 LearningRate 0.0295 Epoch: 9 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:30,195-Speed 5573.91 samples/sec Loss 5.0733 LearningRate 0.0295 Epoch: 9 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:32,045-Speed 5536.33 samples/sec Loss 5.0063 LearningRate 0.0295 Epoch: 9 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:33,887-Speed 5563.05 samples/sec Loss 5.0681 LearningRate 0.0295 Epoch: 9 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:35,719-Speed 5591.40 samples/sec Loss 5.1030 LearningRate 0.0295 Epoch: 9 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:37,562-Speed 5558.34 samples/sec Loss 4.9820 LearningRate 0.0295 Epoch: 9 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:43:39,389-Speed 5604.11 samples/sec Loss 5.0993 LearningRate 0.0295 Epoch: 9 Global Step: 52000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:44:05,731-[lfw][52000]XNorm: 21.763303 Training: 2022-04-27 04:44:05,731-[lfw][52000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-27 04:44:05,732-[lfw][52000]Accuracy-Highest: 0.99800 Training: 2022-04-27 04:44:36,233-[cfp_fp][52000]XNorm: 19.181753 Training: 2022-04-27 04:44:36,233-[cfp_fp][52000]Accuracy-Flip: 0.94700+-0.01251 Training: 2022-04-27 04:44:36,234-[cfp_fp][52000]Accuracy-Highest: 0.95257 Training: 2022-04-27 04:45:02,538-[agedb_30][52000]XNorm: 21.631263 Training: 2022-04-27 04:45:02,538-[agedb_30][52000]Accuracy-Flip: 0.97483+-0.00962 Training: 2022-04-27 04:45:02,539-[agedb_30][52000]Accuracy-Highest: 0.97483 Training: 2022-04-27 04:45:04,366-Speed 120.50 samples/sec Loss 5.0092 LearningRate 0.0294 Epoch: 9 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:06,164-Speed 5697.45 samples/sec Loss 5.1457 LearningRate 0.0294 Epoch: 9 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:07,957-Speed 5713.16 samples/sec Loss 5.0687 LearningRate 0.0294 Epoch: 9 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:09,767-Speed 5658.56 samples/sec Loss 4.9335 LearningRate 0.0294 Epoch: 9 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:11,576-Speed 5664.72 samples/sec Loss 5.0092 LearningRate 0.0294 Epoch: 9 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:13,389-Speed 5649.54 samples/sec Loss 5.0749 LearningRate 0.0294 Epoch: 9 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:15,221-Speed 5591.00 samples/sec Loss 4.9177 LearningRate 0.0294 Epoch: 9 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:17,029-Speed 5665.87 samples/sec Loss 5.0457 LearningRate 0.0294 Epoch: 9 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:18,830-Speed 5685.04 samples/sec Loss 5.1145 LearningRate 0.0294 Epoch: 9 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:20,648-Speed 5636.67 samples/sec Loss 5.1562 LearningRate 0.0294 Epoch: 9 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:22,458-Speed 5657.32 samples/sec Loss 4.9536 LearningRate 0.0294 Epoch: 9 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:24,276-Speed 5635.24 samples/sec Loss 5.1652 LearningRate 0.0293 Epoch: 9 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:26,105-Speed 5599.45 samples/sec Loss 5.0964 LearningRate 0.0293 Epoch: 9 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:27,921-Speed 5642.08 samples/sec Loss 4.9298 LearningRate 0.0293 Epoch: 9 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:29,753-Speed 5592.30 samples/sec Loss 4.9848 LearningRate 0.0293 Epoch: 9 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:31,564-Speed 5656.32 samples/sec Loss 4.9214 LearningRate 0.0293 Epoch: 9 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:33,386-Speed 5622.12 samples/sec Loss 5.1729 LearningRate 0.0293 Epoch: 9 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:35,213-Speed 5608.75 samples/sec Loss 5.0291 LearningRate 0.0293 Epoch: 9 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:37,013-Speed 5688.30 samples/sec Loss 5.0891 LearningRate 0.0293 Epoch: 9 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:38,825-Speed 5654.95 samples/sec Loss 5.0576 LearningRate 0.0293 Epoch: 9 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:40,641-Speed 5640.47 samples/sec Loss 4.9970 LearningRate 0.0293 Epoch: 9 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:42,456-Speed 5641.89 samples/sec Loss 4.9697 LearningRate 0.0292 Epoch: 9 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:44,272-Speed 5639.76 samples/sec Loss 4.9659 LearningRate 0.0292 Epoch: 9 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:46,084-Speed 5654.46 samples/sec Loss 5.0322 LearningRate 0.0292 Epoch: 9 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:47,926-Speed 5561.36 samples/sec Loss 5.1146 LearningRate 0.0292 Epoch: 9 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:49,740-Speed 5646.71 samples/sec Loss 5.0940 LearningRate 0.0292 Epoch: 9 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:51,560-Speed 5628.51 samples/sec Loss 5.0823 LearningRate 0.0292 Epoch: 9 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:53,403-Speed 5558.45 samples/sec Loss 5.1748 LearningRate 0.0292 Epoch: 9 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:55,242-Speed 5568.94 samples/sec Loss 5.0227 LearningRate 0.0292 Epoch: 9 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:45:57,072-Speed 5598.81 samples/sec Loss 5.0463 LearningRate 0.0292 Epoch: 9 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:45:58,883-Speed 5654.73 samples/sec Loss 5.1395 LearningRate 0.0292 Epoch: 9 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:00,699-Speed 5641.49 samples/sec Loss 5.1602 LearningRate 0.0292 Epoch: 9 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:02,528-Speed 5601.07 samples/sec Loss 5.0513 LearningRate 0.0291 Epoch: 9 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:04,345-Speed 5636.40 samples/sec Loss 5.1694 LearningRate 0.0291 Epoch: 9 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:06,162-Speed 5637.72 samples/sec Loss 5.2203 LearningRate 0.0291 Epoch: 9 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:07,982-Speed 5629.85 samples/sec Loss 5.2048 LearningRate 0.0291 Epoch: 9 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:09,824-Speed 5560.22 samples/sec Loss 5.0690 LearningRate 0.0291 Epoch: 9 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:11,653-Speed 5599.36 samples/sec Loss 5.0238 LearningRate 0.0291 Epoch: 9 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:13,484-Speed 5593.44 samples/sec Loss 5.1686 LearningRate 0.0291 Epoch: 9 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:15,329-Speed 5554.71 samples/sec Loss 5.1190 LearningRate 0.0291 Epoch: 9 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:17,132-Speed 5681.33 samples/sec Loss 5.1555 LearningRate 0.0291 Epoch: 9 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:18,942-Speed 5658.35 samples/sec Loss 5.0489 LearningRate 0.0291 Epoch: 9 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:20,761-Speed 5632.07 samples/sec Loss 4.9785 LearningRate 0.0290 Epoch: 9 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:22,569-Speed 5664.10 samples/sec Loss 5.1449 LearningRate 0.0290 Epoch: 9 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:24,434-Speed 5493.14 samples/sec Loss 5.1713 LearningRate 0.0290 Epoch: 9 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:26,307-Speed 5468.68 samples/sec Loss 5.0289 LearningRate 0.0290 Epoch: 9 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:28,143-Speed 5579.53 samples/sec Loss 5.1309 LearningRate 0.0290 Epoch: 9 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:29,962-Speed 5631.66 samples/sec Loss 5.1909 LearningRate 0.0290 Epoch: 9 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:31,789-Speed 5608.82 samples/sec Loss 5.0666 LearningRate 0.0290 Epoch: 9 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:33,602-Speed 5646.89 samples/sec Loss 5.1546 LearningRate 0.0290 Epoch: 9 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:35,438-Speed 5581.90 samples/sec Loss 5.0725 LearningRate 0.0290 Epoch: 9 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:37,250-Speed 5653.19 samples/sec Loss 5.0284 LearningRate 0.0290 Epoch: 9 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:39,058-Speed 5665.56 samples/sec Loss 4.9280 LearningRate 0.0290 Epoch: 9 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:40,873-Speed 5644.35 samples/sec Loss 5.0442 LearningRate 0.0289 Epoch: 9 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:42,690-Speed 5637.82 samples/sec Loss 5.0298 LearningRate 0.0289 Epoch: 9 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:44,511-Speed 5622.25 samples/sec Loss 5.1384 LearningRate 0.0289 Epoch: 9 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:46,321-Speed 5661.65 samples/sec Loss 5.0679 LearningRate 0.0289 Epoch: 9 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:46:48,126-Speed 5675.37 samples/sec Loss 5.2163 LearningRate 0.0289 Epoch: 9 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:49,935-Speed 5660.09 samples/sec Loss 5.0882 LearningRate 0.0289 Epoch: 9 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:51,744-Speed 5664.78 samples/sec Loss 5.0645 LearningRate 0.0289 Epoch: 9 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:53,552-Speed 5664.25 samples/sec Loss 5.1926 LearningRate 0.0289 Epoch: 9 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:55,369-Speed 5637.05 samples/sec Loss 5.0847 LearningRate 0.0289 Epoch: 9 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:57,201-Speed 5592.37 samples/sec Loss 5.1314 LearningRate 0.0289 Epoch: 9 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:46:59,014-Speed 5649.87 samples/sec Loss 5.1190 LearningRate 0.0288 Epoch: 9 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:00,827-Speed 5651.22 samples/sec Loss 5.0697 LearningRate 0.0288 Epoch: 9 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:02,686-Speed 5510.94 samples/sec Loss 4.9619 LearningRate 0.0288 Epoch: 9 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:04,549-Speed 5497.75 samples/sec Loss 5.2121 LearningRate 0.0288 Epoch: 9 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:06,370-Speed 5624.76 samples/sec Loss 5.0582 LearningRate 0.0288 Epoch: 9 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:08,209-Speed 5569.78 samples/sec Loss 5.1677 LearningRate 0.0288 Epoch: 9 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:10,044-Speed 5581.63 samples/sec Loss 4.9742 LearningRate 0.0288 Epoch: 9 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:11,873-Speed 5602.57 samples/sec Loss 5.1733 LearningRate 0.0288 Epoch: 9 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:13,699-Speed 5608.89 samples/sec Loss 5.1678 LearningRate 0.0288 Epoch: 9 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:15,512-Speed 5648.41 samples/sec Loss 5.0970 LearningRate 0.0288 Epoch: 9 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:17,334-Speed 5621.92 samples/sec Loss 5.0726 LearningRate 0.0288 Epoch: 9 Global Step: 52740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:19,160-Speed 5611.67 samples/sec Loss 5.1425 LearningRate 0.0287 Epoch: 9 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:20,989-Speed 5600.86 samples/sec Loss 5.0919 LearningRate 0.0287 Epoch: 9 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:22,823-Speed 5586.24 samples/sec Loss 4.9452 LearningRate 0.0287 Epoch: 9 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:24,642-Speed 5630.37 samples/sec Loss 5.1354 LearningRate 0.0287 Epoch: 9 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:26,463-Speed 5625.41 samples/sec Loss 5.1621 LearningRate 0.0287 Epoch: 9 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:28,277-Speed 5647.01 samples/sec Loss 5.0892 LearningRate 0.0287 Epoch: 9 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:30,114-Speed 5575.93 samples/sec Loss 5.1812 LearningRate 0.0287 Epoch: 9 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:31,959-Speed 5552.58 samples/sec Loss 4.9474 LearningRate 0.0287 Epoch: 9 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:33,804-Speed 5550.28 samples/sec Loss 5.1006 LearningRate 0.0287 Epoch: 9 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:35,619-Speed 5644.23 samples/sec Loss 5.0856 LearningRate 0.0287 Epoch: 9 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:37,445-Speed 5611.18 samples/sec Loss 4.9992 LearningRate 0.0287 Epoch: 9 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:39,273-Speed 5604.15 samples/sec Loss 5.0943 LearningRate 0.0286 Epoch: 9 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:41,103-Speed 5596.81 samples/sec Loss 5.1577 LearningRate 0.0286 Epoch: 9 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:42,936-Speed 5588.14 samples/sec Loss 5.0549 LearningRate 0.0286 Epoch: 9 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:44,765-Speed 5598.97 samples/sec Loss 4.9228 LearningRate 0.0286 Epoch: 9 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:46,597-Speed 5593.59 samples/sec Loss 5.0503 LearningRate 0.0286 Epoch: 9 Global Step: 52900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:48,444-Speed 5545.99 samples/sec Loss 5.2343 LearningRate 0.0286 Epoch: 9 Global Step: 52910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:50,282-Speed 5573.31 samples/sec Loss 5.1566 LearningRate 0.0286 Epoch: 9 Global Step: 52920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:52,100-Speed 5632.85 samples/sec Loss 5.0876 LearningRate 0.0286 Epoch: 9 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:47:53,906-Speed 5672.14 samples/sec Loss 5.1279 LearningRate 0.0286 Epoch: 9 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:55,729-Speed 5618.18 samples/sec Loss 5.0760 LearningRate 0.0286 Epoch: 9 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:57,542-Speed 5651.38 samples/sec Loss 5.0620 LearningRate 0.0285 Epoch: 9 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:47:59,359-Speed 5636.79 samples/sec Loss 5.1179 LearningRate 0.0285 Epoch: 9 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:01,170-Speed 5657.61 samples/sec Loss 5.0396 LearningRate 0.0285 Epoch: 9 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:02,991-Speed 5624.64 samples/sec Loss 5.0634 LearningRate 0.0285 Epoch: 9 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:04,809-Speed 5636.13 samples/sec Loss 5.0506 LearningRate 0.0285 Epoch: 9 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:06,622-Speed 5651.18 samples/sec Loss 5.1965 LearningRate 0.0285 Epoch: 9 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:08,442-Speed 5628.28 samples/sec Loss 5.0840 LearningRate 0.0285 Epoch: 9 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:10,276-Speed 5585.37 samples/sec Loss 5.0680 LearningRate 0.0285 Epoch: 9 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:12,091-Speed 5642.45 samples/sec Loss 5.1202 LearningRate 0.0285 Epoch: 9 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:13,916-Speed 5611.25 samples/sec Loss 5.1056 LearningRate 0.0285 Epoch: 9 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:15,743-Speed 5608.20 samples/sec Loss 5.0727 LearningRate 0.0285 Epoch: 9 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:17,558-Speed 5645.01 samples/sec Loss 5.0816 LearningRate 0.0284 Epoch: 9 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:19,397-Speed 5569.19 samples/sec Loss 5.0549 LearningRate 0.0284 Epoch: 9 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:21,243-Speed 5549.38 samples/sec Loss 5.1946 LearningRate 0.0284 Epoch: 9 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:23,067-Speed 5617.03 samples/sec Loss 5.3265 LearningRate 0.0284 Epoch: 9 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:24,885-Speed 5634.30 samples/sec Loss 5.1376 LearningRate 0.0284 Epoch: 9 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:26,706-Speed 5625.31 samples/sec Loss 5.0052 LearningRate 0.0284 Epoch: 9 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:28,552-Speed 5547.68 samples/sec Loss 5.0820 LearningRate 0.0284 Epoch: 9 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:30,365-Speed 5651.53 samples/sec Loss 5.0076 LearningRate 0.0284 Epoch: 9 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:32,176-Speed 5655.63 samples/sec Loss 5.0562 LearningRate 0.0284 Epoch: 9 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:34,020-Speed 5554.85 samples/sec Loss 5.1258 LearningRate 0.0284 Epoch: 9 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:35,829-Speed 5662.88 samples/sec Loss 5.0983 LearningRate 0.0284 Epoch: 9 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:37,656-Speed 5605.19 samples/sec Loss 4.9856 LearningRate 0.0283 Epoch: 9 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:39,465-Speed 5662.80 samples/sec Loss 5.1128 LearningRate 0.0283 Epoch: 9 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:41,284-Speed 5629.43 samples/sec Loss 5.1023 LearningRate 0.0283 Epoch: 9 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:43,130-Speed 5550.34 samples/sec Loss 5.1156 LearningRate 0.0283 Epoch: 9 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:48:44,940-Speed 5660.25 samples/sec Loss 5.1761 LearningRate 0.0283 Epoch: 9 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:46,785-Speed 5552.04 samples/sec Loss 4.9678 LearningRate 0.0283 Epoch: 9 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:48,600-Speed 5644.21 samples/sec Loss 5.1552 LearningRate 0.0283 Epoch: 9 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:50,414-Speed 5645.06 samples/sec Loss 5.1134 LearningRate 0.0283 Epoch: 9 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:52,231-Speed 5637.65 samples/sec Loss 5.1252 LearningRate 0.0283 Epoch: 9 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:54,059-Speed 5604.25 samples/sec Loss 5.1546 LearningRate 0.0283 Epoch: 9 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:55,881-Speed 5621.82 samples/sec Loss 5.0264 LearningRate 0.0282 Epoch: 9 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:57,692-Speed 5659.04 samples/sec Loss 5.1515 LearningRate 0.0282 Epoch: 9 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:48:59,499-Speed 5667.50 samples/sec Loss 5.0772 LearningRate 0.0282 Epoch: 9 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:01,318-Speed 5630.11 samples/sec Loss 4.9978 LearningRate 0.0282 Epoch: 9 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:03,142-Speed 5616.97 samples/sec Loss 5.1310 LearningRate 0.0282 Epoch: 9 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:04,960-Speed 5635.03 samples/sec Loss 5.1127 LearningRate 0.0282 Epoch: 9 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:06,759-Speed 5693.34 samples/sec Loss 5.1524 LearningRate 0.0282 Epoch: 9 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:08,575-Speed 5640.67 samples/sec Loss 5.0494 LearningRate 0.0282 Epoch: 9 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:10,404-Speed 5601.43 samples/sec Loss 5.1567 LearningRate 0.0282 Epoch: 9 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:12,214-Speed 5659.66 samples/sec Loss 5.1121 LearningRate 0.0282 Epoch: 9 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:14,039-Speed 5611.31 samples/sec Loss 5.0154 LearningRate 0.0282 Epoch: 9 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:15,864-Speed 5614.15 samples/sec Loss 5.0337 LearningRate 0.0281 Epoch: 9 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:17,708-Speed 5554.63 samples/sec Loss 4.9937 LearningRate 0.0281 Epoch: 9 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:19,538-Speed 5597.40 samples/sec Loss 5.0742 LearningRate 0.0281 Epoch: 9 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:21,361-Speed 5619.74 samples/sec Loss 5.2212 LearningRate 0.0281 Epoch: 9 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:23,219-Speed 5511.96 samples/sec Loss 5.0414 LearningRate 0.0281 Epoch: 9 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:25,039-Speed 5627.15 samples/sec Loss 5.0654 LearningRate 0.0281 Epoch: 9 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:26,852-Speed 5652.42 samples/sec Loss 5.0994 LearningRate 0.0281 Epoch: 9 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:28,661-Speed 5662.44 samples/sec Loss 5.0042 LearningRate 0.0281 Epoch: 9 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:30,471-Speed 5660.74 samples/sec Loss 5.0681 LearningRate 0.0281 Epoch: 9 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:32,296-Speed 5613.17 samples/sec Loss 5.1117 LearningRate 0.0281 Epoch: 9 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:34,110-Speed 5647.73 samples/sec Loss 5.0396 LearningRate 0.0281 Epoch: 9 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:35,935-Speed 5614.63 samples/sec Loss 5.1263 LearningRate 0.0280 Epoch: 9 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:37,746-Speed 5655.16 samples/sec Loss 5.0772 LearningRate 0.0280 Epoch: 9 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:39,586-Speed 5565.00 samples/sec Loss 5.1184 LearningRate 0.0280 Epoch: 9 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:49:41,389-Speed 5682.62 samples/sec Loss 5.0603 LearningRate 0.0280 Epoch: 9 Global Step: 53530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:43,222-Speed 5589.29 samples/sec Loss 5.0951 LearningRate 0.0280 Epoch: 9 Global Step: 53540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:45,061-Speed 5569.36 samples/sec Loss 5.0957 LearningRate 0.0280 Epoch: 9 Global Step: 53550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:46,891-Speed 5595.77 samples/sec Loss 5.1186 LearningRate 0.0280 Epoch: 9 Global Step: 53560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:48,729-Speed 5572.87 samples/sec Loss 5.0536 LearningRate 0.0280 Epoch: 9 Global Step: 53570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:50,536-Speed 5669.20 samples/sec Loss 5.0542 LearningRate 0.0280 Epoch: 9 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:52,360-Speed 5617.72 samples/sec Loss 5.0333 LearningRate 0.0280 Epoch: 9 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:54,219-Speed 5510.60 samples/sec Loss 5.1224 LearningRate 0.0279 Epoch: 9 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:56,044-Speed 5611.71 samples/sec Loss 5.1205 LearningRate 0.0279 Epoch: 9 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:57,886-Speed 5560.52 samples/sec Loss 5.0012 LearningRate 0.0279 Epoch: 9 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:49:59,703-Speed 5639.49 samples/sec Loss 4.9405 LearningRate 0.0279 Epoch: 9 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:01,513-Speed 5657.64 samples/sec Loss 5.1168 LearningRate 0.0279 Epoch: 9 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:03,352-Speed 5570.76 samples/sec Loss 5.0722 LearningRate 0.0279 Epoch: 9 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:05,168-Speed 5642.43 samples/sec Loss 5.1643 LearningRate 0.0279 Epoch: 9 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:06,995-Speed 5604.51 samples/sec Loss 5.0410 LearningRate 0.0279 Epoch: 9 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:08,844-Speed 5541.23 samples/sec Loss 5.0322 LearningRate 0.0279 Epoch: 9 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:10,680-Speed 5576.97 samples/sec Loss 5.0763 LearningRate 0.0279 Epoch: 9 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:12,511-Speed 5595.94 samples/sec Loss 5.1124 LearningRate 0.0279 Epoch: 9 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:14,336-Speed 5612.43 samples/sec Loss 5.0191 LearningRate 0.0278 Epoch: 9 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:16,166-Speed 5599.19 samples/sec Loss 5.1973 LearningRate 0.0278 Epoch: 9 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:18,013-Speed 5547.18 samples/sec Loss 5.1037 LearningRate 0.0278 Epoch: 9 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:19,855-Speed 5559.20 samples/sec Loss 5.1083 LearningRate 0.0278 Epoch: 9 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:21,751-Speed 5401.35 samples/sec Loss 5.0046 LearningRate 0.0278 Epoch: 9 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:23,672-Speed 5333.34 samples/sec Loss 5.1170 LearningRate 0.0278 Epoch: 9 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:25,480-Speed 5665.84 samples/sec Loss 5.0977 LearningRate 0.0278 Epoch: 9 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:27,304-Speed 5615.52 samples/sec Loss 5.1301 LearningRate 0.0278 Epoch: 9 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:29,128-Speed 5617.40 samples/sec Loss 5.0937 LearningRate 0.0278 Epoch: 9 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:50:30,945-Speed 5636.87 samples/sec Loss 5.1777 LearningRate 0.0278 Epoch: 9 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:32,771-Speed 5609.58 samples/sec Loss 5.1288 LearningRate 0.0278 Epoch: 9 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:34,605-Speed 5585.97 samples/sec Loss 5.1661 LearningRate 0.0277 Epoch: 9 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:36,451-Speed 5549.46 samples/sec Loss 5.0236 LearningRate 0.0277 Epoch: 9 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:38,284-Speed 5587.66 samples/sec Loss 5.0381 LearningRate 0.0277 Epoch: 9 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:40,102-Speed 5634.62 samples/sec Loss 5.1354 LearningRate 0.0277 Epoch: 9 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:41,924-Speed 5623.67 samples/sec Loss 5.1715 LearningRate 0.0277 Epoch: 9 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:43,756-Speed 5590.87 samples/sec Loss 5.1743 LearningRate 0.0277 Epoch: 9 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:45,598-Speed 5561.00 samples/sec Loss 4.9768 LearningRate 0.0277 Epoch: 9 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:47,427-Speed 5601.60 samples/sec Loss 5.0507 LearningRate 0.0277 Epoch: 9 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:49,228-Speed 5688.31 samples/sec Loss 5.0959 LearningRate 0.0277 Epoch: 9 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:51,059-Speed 5593.79 samples/sec Loss 5.0438 LearningRate 0.0277 Epoch: 9 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:52,905-Speed 5547.84 samples/sec Loss 5.1850 LearningRate 0.0277 Epoch: 9 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:54,729-Speed 5617.02 samples/sec Loss 5.1074 LearningRate 0.0276 Epoch: 9 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:56,561-Speed 5592.26 samples/sec Loss 4.9920 LearningRate 0.0276 Epoch: 9 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:50:58,375-Speed 5646.57 samples/sec Loss 5.1138 LearningRate 0.0276 Epoch: 9 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:51:00,202-Speed 5605.60 samples/sec Loss 5.0019 LearningRate 0.0276 Epoch: 9 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:51:02,033-Speed 5596.14 samples/sec Loss 5.0312 LearningRate 0.0276 Epoch: 9 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:51:03,863-Speed 5596.29 samples/sec Loss 5.0636 LearningRate 0.0276 Epoch: 9 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:51:05,691-Speed 5606.84 samples/sec Loss 5.1317 LearningRate 0.0276 Epoch: 9 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:51:07,577-Speed 5429.17 samples/sec Loss 4.8906 LearningRate 0.0276 Epoch: 9 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:51:33,938-[lfw][54000]XNorm: 21.300929 Training: 2022-04-27 04:51:33,938-[lfw][54000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-27 04:51:33,938-[lfw][54000]Accuracy-Highest: 0.99800 Training: 2022-04-27 04:52:04,485-[cfp_fp][54000]XNorm: 18.890860 Training: 2022-04-27 04:52:04,486-[cfp_fp][54000]Accuracy-Flip: 0.95286+-0.01322 Training: 2022-04-27 04:52:04,486-[cfp_fp][54000]Accuracy-Highest: 0.95286 Training: 2022-04-27 04:52:30,854-[agedb_30][54000]XNorm: 21.154284 Training: 2022-04-27 04:52:30,854-[agedb_30][54000]Accuracy-Flip: 0.97133+-0.00875 Training: 2022-04-27 04:52:30,855-[agedb_30][54000]Accuracy-Highest: 0.97483 Training: 2022-04-27 04:52:32,701-Speed 120.30 samples/sec Loss 4.9527 LearningRate 0.0276 Epoch: 9 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:52:34,529-Speed 5602.61 samples/sec Loss 5.0382 LearningRate 0.0276 Epoch: 9 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:52:36,333-Speed 5678.49 samples/sec Loss 5.0386 LearningRate 0.0276 Epoch: 9 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:52:38,160-Speed 5606.46 samples/sec Loss 5.0248 LearningRate 0.0275 Epoch: 9 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:52:39,971-Speed 5658.49 samples/sec Loss 5.0890 LearningRate 0.0275 Epoch: 9 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:52:41,799-Speed 5603.64 samples/sec Loss 5.0553 LearningRate 0.0275 Epoch: 9 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:43,641-Speed 5559.62 samples/sec Loss 5.1322 LearningRate 0.0275 Epoch: 9 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:45,486-Speed 5552.64 samples/sec Loss 5.0246 LearningRate 0.0275 Epoch: 9 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:47,316-Speed 5595.56 samples/sec Loss 4.8979 LearningRate 0.0275 Epoch: 9 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:49,169-Speed 5527.87 samples/sec Loss 5.1522 LearningRate 0.0275 Epoch: 9 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:51,007-Speed 5574.38 samples/sec Loss 5.0369 LearningRate 0.0275 Epoch: 9 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:52,827-Speed 5628.12 samples/sec Loss 5.0522 LearningRate 0.0275 Epoch: 9 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:54,654-Speed 5605.02 samples/sec Loss 5.0482 LearningRate 0.0275 Epoch: 9 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:56,472-Speed 5637.08 samples/sec Loss 5.0951 LearningRate 0.0274 Epoch: 9 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:52:58,291-Speed 5630.73 samples/sec Loss 4.9323 LearningRate 0.0274 Epoch: 9 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:00,103-Speed 5652.26 samples/sec Loss 5.2015 LearningRate 0.0274 Epoch: 9 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:01,905-Speed 5685.35 samples/sec Loss 4.9382 LearningRate 0.0274 Epoch: 9 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:03,712-Speed 5667.87 samples/sec Loss 5.0940 LearningRate 0.0274 Epoch: 9 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:05,524-Speed 5655.25 samples/sec Loss 5.0644 LearningRate 0.0274 Epoch: 9 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:07,325-Speed 5687.54 samples/sec Loss 5.0892 LearningRate 0.0274 Epoch: 9 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:09,133-Speed 5663.83 samples/sec Loss 5.0492 LearningRate 0.0274 Epoch: 9 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:10,939-Speed 5673.59 samples/sec Loss 5.0650 LearningRate 0.0274 Epoch: 9 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:12,751-Speed 5653.35 samples/sec Loss 5.0574 LearningRate 0.0274 Epoch: 9 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:14,562-Speed 5656.20 samples/sec Loss 5.1791 LearningRate 0.0274 Epoch: 9 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:16,362-Speed 5690.63 samples/sec Loss 5.0958 LearningRate 0.0273 Epoch: 9 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:18,182-Speed 5626.80 samples/sec Loss 5.0618 LearningRate 0.0273 Epoch: 9 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:20,014-Speed 5591.80 samples/sec Loss 5.0705 LearningRate 0.0273 Epoch: 9 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:21,822-Speed 5667.15 samples/sec Loss 5.0563 LearningRate 0.0273 Epoch: 9 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:23,635-Speed 5650.63 samples/sec Loss 4.9740 LearningRate 0.0273 Epoch: 9 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:25,458-Speed 5618.79 samples/sec Loss 5.1676 LearningRate 0.0273 Epoch: 9 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:27,288-Speed 5597.75 samples/sec Loss 5.0754 LearningRate 0.0273 Epoch: 9 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:29,123-Speed 5580.06 samples/sec Loss 5.0372 LearningRate 0.0273 Epoch: 9 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:30,957-Speed 5587.50 samples/sec Loss 5.1768 LearningRate 0.0273 Epoch: 9 Global Step: 54330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:32,777-Speed 5626.30 samples/sec Loss 5.1027 LearningRate 0.0273 Epoch: 9 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:34,593-Speed 5640.54 samples/sec Loss 5.0601 LearningRate 0.0273 Epoch: 9 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:36,431-Speed 5572.48 samples/sec Loss 5.0451 LearningRate 0.0272 Epoch: 9 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:38,241-Speed 5662.03 samples/sec Loss 5.0452 LearningRate 0.0272 Epoch: 9 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:40,058-Speed 5635.53 samples/sec Loss 4.9733 LearningRate 0.0272 Epoch: 9 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:41,870-Speed 5653.92 samples/sec Loss 5.0502 LearningRate 0.0272 Epoch: 9 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:43,693-Speed 5619.69 samples/sec Loss 5.0088 LearningRate 0.0272 Epoch: 9 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:45,527-Speed 5584.61 samples/sec Loss 5.1270 LearningRate 0.0272 Epoch: 9 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:47,334-Speed 5669.54 samples/sec Loss 4.9376 LearningRate 0.0272 Epoch: 9 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:49,172-Speed 5572.63 samples/sec Loss 5.0278 LearningRate 0.0272 Epoch: 9 Global Step: 54430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:50,997-Speed 5612.79 samples/sec Loss 4.9774 LearningRate 0.0272 Epoch: 9 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:52,809-Speed 5652.37 samples/sec Loss 5.0843 LearningRate 0.0272 Epoch: 9 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:54,629-Speed 5629.33 samples/sec Loss 5.1424 LearningRate 0.0272 Epoch: 9 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:53:56,428-Speed 5695.69 samples/sec Loss 5.1198 LearningRate 0.0271 Epoch: 9 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:53:58,239-Speed 5654.20 samples/sec Loss 5.0919 LearningRate 0.0271 Epoch: 9 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:00,062-Speed 5617.98 samples/sec Loss 5.0556 LearningRate 0.0271 Epoch: 9 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:01,897-Speed 5582.12 samples/sec Loss 5.0672 LearningRate 0.0271 Epoch: 9 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:03,715-Speed 5634.28 samples/sec Loss 5.1473 LearningRate 0.0271 Epoch: 9 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:05,646-Speed 5306.07 samples/sec Loss 4.9426 LearningRate 0.0271 Epoch: 9 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:07,500-Speed 5524.79 samples/sec Loss 5.0964 LearningRate 0.0271 Epoch: 9 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:09,332-Speed 5593.86 samples/sec Loss 5.0875 LearningRate 0.0271 Epoch: 9 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:11,162-Speed 5596.01 samples/sec Loss 5.0263 LearningRate 0.0271 Epoch: 9 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:13,002-Speed 5566.77 samples/sec Loss 5.0426 LearningRate 0.0271 Epoch: 9 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:14,815-Speed 5651.70 samples/sec Loss 5.0045 LearningRate 0.0271 Epoch: 9 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:54:16,655-Speed 5565.90 samples/sec Loss 4.9549 LearningRate 0.0270 Epoch: 9 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:54:18,476-Speed 5625.88 samples/sec Loss 5.0515 LearningRate 0.0270 Epoch: 9 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:54:20,287-Speed 5654.04 samples/sec Loss 5.1411 LearningRate 0.0270 Epoch: 9 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:22,091-Speed 5678.00 samples/sec Loss 5.0481 LearningRate 0.0270 Epoch: 9 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:23,899-Speed 5667.83 samples/sec Loss 5.0438 LearningRate 0.0270 Epoch: 9 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:25,724-Speed 5612.26 samples/sec Loss 4.9047 LearningRate 0.0270 Epoch: 9 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:27,551-Speed 5607.89 samples/sec Loss 4.9989 LearningRate 0.0270 Epoch: 9 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:29,369-Speed 5635.64 samples/sec Loss 5.0369 LearningRate 0.0270 Epoch: 9 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:31,193-Speed 5613.19 samples/sec Loss 5.0552 LearningRate 0.0270 Epoch: 9 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:33,027-Speed 5587.03 samples/sec Loss 5.0101 LearningRate 0.0270 Epoch: 9 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:34,836-Speed 5660.71 samples/sec Loss 5.0766 LearningRate 0.0270 Epoch: 9 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:36,677-Speed 5564.42 samples/sec Loss 5.1496 LearningRate 0.0269 Epoch: 9 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:38,503-Speed 5610.23 samples/sec Loss 5.0304 LearningRate 0.0269 Epoch: 9 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:54:40,311-Speed 5664.70 samples/sec Loss 5.0095 LearningRate 0.0269 Epoch: 9 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:42,132-Speed 5626.66 samples/sec Loss 4.9141 LearningRate 0.0269 Epoch: 9 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:43,949-Speed 5636.93 samples/sec Loss 5.0466 LearningRate 0.0269 Epoch: 9 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:45,767-Speed 5636.38 samples/sec Loss 5.0348 LearningRate 0.0269 Epoch: 9 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:47,590-Speed 5619.10 samples/sec Loss 5.1884 LearningRate 0.0269 Epoch: 9 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:49,410-Speed 5628.00 samples/sec Loss 5.0236 LearningRate 0.0269 Epoch: 9 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:51,264-Speed 5526.49 samples/sec Loss 5.1172 LearningRate 0.0269 Epoch: 9 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:53,100-Speed 5579.92 samples/sec Loss 4.9597 LearningRate 0.0269 Epoch: 9 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:54,910-Speed 5658.48 samples/sec Loss 4.9616 LearningRate 0.0269 Epoch: 9 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:56,724-Speed 5646.20 samples/sec Loss 5.0681 LearningRate 0.0268 Epoch: 9 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:54:58,566-Speed 5562.74 samples/sec Loss 4.9412 LearningRate 0.0268 Epoch: 9 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:00,403-Speed 5575.23 samples/sec Loss 5.0188 LearningRate 0.0268 Epoch: 9 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:02,222-Speed 5629.00 samples/sec Loss 5.0215 LearningRate 0.0268 Epoch: 9 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:04,049-Speed 5609.16 samples/sec Loss 4.9864 LearningRate 0.0268 Epoch: 9 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:05,866-Speed 5634.65 samples/sec Loss 4.8349 LearningRate 0.0268 Epoch: 9 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:07,702-Speed 5580.96 samples/sec Loss 4.9735 LearningRate 0.0268 Epoch: 9 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:09,535-Speed 5587.54 samples/sec Loss 5.0478 LearningRate 0.0268 Epoch: 9 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:11,354-Speed 5630.85 samples/sec Loss 5.1395 LearningRate 0.0268 Epoch: 9 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:13,183-Speed 5601.76 samples/sec Loss 4.9253 LearningRate 0.0268 Epoch: 9 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:15,021-Speed 5575.24 samples/sec Loss 5.1409 LearningRate 0.0268 Epoch: 9 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:16,834-Speed 5650.66 samples/sec Loss 5.0561 LearningRate 0.0267 Epoch: 9 Global Step: 54910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-27 04:55:18,646-Speed 5650.67 samples/sec Loss 5.0119 LearningRate 0.0267 Epoch: 9 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:20,479-Speed 5589.93 samples/sec Loss 4.8994 LearningRate 0.0267 Epoch: 9 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:22,305-Speed 5608.82 samples/sec Loss 5.0181 LearningRate 0.0267 Epoch: 9 Global Step: 54940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:24,136-Speed 5596.27 samples/sec Loss 5.0572 LearningRate 0.0267 Epoch: 9 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:25,959-Speed 5618.06 samples/sec Loss 4.9830 LearningRate 0.0267 Epoch: 9 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:27,787-Speed 5603.06 samples/sec Loss 4.9593 LearningRate 0.0267 Epoch: 9 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:29,612-Speed 5612.78 samples/sec Loss 4.9391 LearningRate 0.0267 Epoch: 9 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:31,421-Speed 5665.00 samples/sec Loss 5.0008 LearningRate 0.0267 Epoch: 9 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:33,226-Speed 5673.39 samples/sec Loss 5.0085 LearningRate 0.0267 Epoch: 9 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:35,029-Speed 5683.48 samples/sec Loss 5.0207 LearningRate 0.0267 Epoch: 9 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:36,831-Speed 5683.44 samples/sec Loss 4.9566 LearningRate 0.0266 Epoch: 9 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:38,647-Speed 5640.01 samples/sec Loss 4.9525 LearningRate 0.0266 Epoch: 9 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:40,477-Speed 5598.57 samples/sec Loss 5.0464 LearningRate 0.0266 Epoch: 9 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:42,290-Speed 5650.29 samples/sec Loss 4.9619 LearningRate 0.0266 Epoch: 9 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:44,103-Speed 5652.00 samples/sec Loss 4.9858 LearningRate 0.0266 Epoch: 9 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:45,955-Speed 5528.39 samples/sec Loss 5.0500 LearningRate 0.0266 Epoch: 9 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:47,780-Speed 5613.56 samples/sec Loss 5.1072 LearningRate 0.0266 Epoch: 9 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:55:49,586-Speed 5671.94 samples/sec Loss 4.9338 LearningRate 0.0266 Epoch: 9 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:51,414-Speed 5603.04 samples/sec Loss 4.9937 LearningRate 0.0266 Epoch: 9 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:53,240-Speed 5609.65 samples/sec Loss 4.9712 LearningRate 0.0266 Epoch: 9 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:55,050-Speed 5658.93 samples/sec Loss 5.0974 LearningRate 0.0266 Epoch: 9 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:56,876-Speed 5609.77 samples/sec Loss 5.0819 LearningRate 0.0265 Epoch: 9 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:55:58,737-Speed 5507.93 samples/sec Loss 4.9985 LearningRate 0.0265 Epoch: 9 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:00,566-Speed 5602.06 samples/sec Loss 5.0435 LearningRate 0.0265 Epoch: 9 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:02,465-Speed 5393.93 samples/sec Loss 5.1102 LearningRate 0.0265 Epoch: 9 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:04,380-Speed 5348.92 samples/sec Loss 5.1723 LearningRate 0.0265 Epoch: 9 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:06,264-Speed 5437.76 samples/sec Loss 4.9853 LearningRate 0.0265 Epoch: 9 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:08,072-Speed 5664.27 samples/sec Loss 5.0595 LearningRate 0.0265 Epoch: 9 Global Step: 55190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:56:09,888-Speed 5641.54 samples/sec Loss 4.8943 LearningRate 0.0265 Epoch: 9 Global Step: 55200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:56:11,698-Speed 5658.19 samples/sec Loss 5.0028 LearningRate 0.0265 Epoch: 9 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:13,522-Speed 5615.11 samples/sec Loss 4.8829 LearningRate 0.0265 Epoch: 9 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:15,375-Speed 5530.33 samples/sec Loss 4.9534 LearningRate 0.0265 Epoch: 9 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:17,220-Speed 5549.40 samples/sec Loss 4.9206 LearningRate 0.0264 Epoch: 9 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:19,059-Speed 5570.14 samples/sec Loss 4.9512 LearningRate 0.0264 Epoch: 9 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:20,872-Speed 5652.01 samples/sec Loss 4.8927 LearningRate 0.0264 Epoch: 9 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:22,679-Speed 5669.92 samples/sec Loss 5.0752 LearningRate 0.0264 Epoch: 9 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:24,505-Speed 5610.23 samples/sec Loss 4.8920 LearningRate 0.0264 Epoch: 9 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:26,332-Speed 5605.81 samples/sec Loss 4.8606 LearningRate 0.0264 Epoch: 9 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:28,143-Speed 5654.85 samples/sec Loss 4.9966 LearningRate 0.0264 Epoch: 9 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:29,951-Speed 5667.32 samples/sec Loss 5.0556 LearningRate 0.0264 Epoch: 9 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:56:31,775-Speed 5616.03 samples/sec Loss 4.9389 LearningRate 0.0264 Epoch: 9 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:33,622-Speed 5544.76 samples/sec Loss 5.0154 LearningRate 0.0264 Epoch: 9 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:35,447-Speed 5612.08 samples/sec Loss 5.0592 LearningRate 0.0264 Epoch: 9 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:37,278-Speed 5595.68 samples/sec Loss 5.2379 LearningRate 0.0263 Epoch: 9 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:39,130-Speed 5531.88 samples/sec Loss 5.0255 LearningRate 0.0263 Epoch: 9 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:40,960-Speed 5596.67 samples/sec Loss 5.0489 LearningRate 0.0263 Epoch: 9 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:42,773-Speed 5653.14 samples/sec Loss 4.9828 LearningRate 0.0263 Epoch: 9 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:44,597-Speed 5614.30 samples/sec Loss 5.1233 LearningRate 0.0263 Epoch: 9 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:46,420-Speed 5620.37 samples/sec Loss 4.9839 LearningRate 0.0263 Epoch: 9 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:48,260-Speed 5566.91 samples/sec Loss 4.9355 LearningRate 0.0263 Epoch: 9 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:50,085-Speed 5610.38 samples/sec Loss 4.9389 LearningRate 0.0263 Epoch: 9 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:56:51,932-Speed 5547.31 samples/sec Loss 5.0198 LearningRate 0.0263 Epoch: 9 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:56:53,759-Speed 5604.64 samples/sec Loss 5.1502 LearningRate 0.0263 Epoch: 9 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:55,607-Speed 5544.22 samples/sec Loss 4.9018 LearningRate 0.0263 Epoch: 9 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:57,419-Speed 5654.30 samples/sec Loss 4.9712 LearningRate 0.0262 Epoch: 9 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:56:59,226-Speed 5666.37 samples/sec Loss 4.9244 LearningRate 0.0262 Epoch: 9 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:01,032-Speed 5672.53 samples/sec Loss 4.9011 LearningRate 0.0262 Epoch: 9 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:02,865-Speed 5590.77 samples/sec Loss 5.0552 LearningRate 0.0262 Epoch: 9 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:04,691-Speed 5607.63 samples/sec Loss 4.9727 LearningRate 0.0262 Epoch: 9 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:06,509-Speed 5636.00 samples/sec Loss 5.0865 LearningRate 0.0262 Epoch: 9 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:08,319-Speed 5657.60 samples/sec Loss 4.9814 LearningRate 0.0262 Epoch: 9 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:10,144-Speed 5614.34 samples/sec Loss 4.9992 LearningRate 0.0262 Epoch: 9 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:11,972-Speed 5605.41 samples/sec Loss 4.9903 LearningRate 0.0262 Epoch: 9 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:57:13,785-Speed 5647.40 samples/sec Loss 4.9549 LearningRate 0.0262 Epoch: 9 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:57:15,601-Speed 5641.02 samples/sec Loss 4.9930 LearningRate 0.0262 Epoch: 9 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:17,435-Speed 5585.14 samples/sec Loss 4.9580 LearningRate 0.0261 Epoch: 9 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:19,261-Speed 5610.91 samples/sec Loss 4.9839 LearningRate 0.0261 Epoch: 9 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:21,069-Speed 5665.29 samples/sec Loss 5.0221 LearningRate 0.0261 Epoch: 9 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:22,895-Speed 5608.86 samples/sec Loss 4.8866 LearningRate 0.0261 Epoch: 9 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:24,718-Speed 5619.31 samples/sec Loss 4.9860 LearningRate 0.0261 Epoch: 9 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:26,563-Speed 5552.78 samples/sec Loss 5.1098 LearningRate 0.0261 Epoch: 9 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:28,408-Speed 5551.06 samples/sec Loss 4.9561 LearningRate 0.0261 Epoch: 9 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:30,219-Speed 5658.54 samples/sec Loss 4.8981 LearningRate 0.0261 Epoch: 9 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:32,088-Speed 5478.56 samples/sec Loss 4.9413 LearningRate 0.0261 Epoch: 9 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:33,982-Speed 5410.36 samples/sec Loss 4.9476 LearningRate 0.0261 Epoch: 9 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:35,815-Speed 5588.16 samples/sec Loss 4.9661 LearningRate 0.0261 Epoch: 9 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:37,642-Speed 5604.33 samples/sec Loss 4.8402 LearningRate 0.0260 Epoch: 9 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:39,464-Speed 5624.63 samples/sec Loss 5.0276 LearningRate 0.0260 Epoch: 9 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:41,310-Speed 5548.92 samples/sec Loss 4.8867 LearningRate 0.0260 Epoch: 9 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:43,113-Speed 5680.07 samples/sec Loss 4.9578 LearningRate 0.0260 Epoch: 9 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:44,932-Speed 5631.59 samples/sec Loss 4.9431 LearningRate 0.0260 Epoch: 9 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:46,750-Speed 5636.72 samples/sec Loss 4.9242 LearningRate 0.0260 Epoch: 9 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:48,553-Speed 5679.68 samples/sec Loss 4.7594 LearningRate 0.0260 Epoch: 9 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:50,378-Speed 5613.02 samples/sec Loss 4.9599 LearningRate 0.0260 Epoch: 9 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:57:52,196-Speed 5636.45 samples/sec Loss 4.9067 LearningRate 0.0260 Epoch: 9 Global Step: 55760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:57:54,020-Speed 5613.24 samples/sec Loss 4.9585 LearningRate 0.0260 Epoch: 9 Global Step: 55770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:57:55,859-Speed 5569.94 samples/sec Loss 4.9667 LearningRate 0.0260 Epoch: 9 Global Step: 55780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:57:57,675-Speed 5642.01 samples/sec Loss 5.0270 LearningRate 0.0259 Epoch: 9 Global Step: 55790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:57:59,492-Speed 5636.41 samples/sec Loss 4.9569 LearningRate 0.0259 Epoch: 9 Global Step: 55800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:01,310-Speed 5638.04 samples/sec Loss 4.9924 LearningRate 0.0259 Epoch: 9 Global Step: 55810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:03,132-Speed 5621.55 samples/sec Loss 4.9539 LearningRate 0.0259 Epoch: 9 Global Step: 55820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:04,947-Speed 5641.93 samples/sec Loss 5.0733 LearningRate 0.0259 Epoch: 9 Global Step: 55830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:06,755-Speed 5666.34 samples/sec Loss 5.0154 LearningRate 0.0259 Epoch: 9 Global Step: 55840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:08,560-Speed 5673.78 samples/sec Loss 5.0269 LearningRate 0.0259 Epoch: 9 Global Step: 55850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 04:58:10,378-Speed 5635.59 samples/sec Loss 5.0680 LearningRate 0.0259 Epoch: 9 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:12,204-Speed 5610.45 samples/sec Loss 4.9132 LearningRate 0.0259 Epoch: 9 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:14,011-Speed 5671.58 samples/sec Loss 4.9303 LearningRate 0.0259 Epoch: 9 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:15,821-Speed 5657.21 samples/sec Loss 5.0600 LearningRate 0.0259 Epoch: 9 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:17,630-Speed 5662.91 samples/sec Loss 5.1196 LearningRate 0.0259 Epoch: 9 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:19,438-Speed 5665.68 samples/sec Loss 4.9221 LearningRate 0.0258 Epoch: 9 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:21,244-Speed 5674.47 samples/sec Loss 4.8672 LearningRate 0.0258 Epoch: 9 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:23,049-Speed 5674.27 samples/sec Loss 5.0149 LearningRate 0.0258 Epoch: 9 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:24,866-Speed 5636.67 samples/sec Loss 4.9178 LearningRate 0.0258 Epoch: 9 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:26,674-Speed 5665.52 samples/sec Loss 5.0580 LearningRate 0.0258 Epoch: 9 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 04:58:28,487-Speed 5651.24 samples/sec Loss 4.9424 LearningRate 0.0258 Epoch: 9 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:58:30,313-Speed 5609.10 samples/sec Loss 4.8960 LearningRate 0.0258 Epoch: 9 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:58:32,120-Speed 5669.05 samples/sec Loss 4.9394 LearningRate 0.0258 Epoch: 9 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:58:33,945-Speed 5613.20 samples/sec Loss 4.8618 LearningRate 0.0258 Epoch: 9 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:58:35,765-Speed 5626.78 samples/sec Loss 4.8149 LearningRate 0.0258 Epoch: 9 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 04:59:01,952-[lfw][56000]XNorm: 21.273996 Training: 2022-04-27 04:59:01,952-[lfw][56000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-27 04:59:01,953-[lfw][56000]Accuracy-Highest: 0.99800 Training: 2022-04-27 04:59:32,301-[cfp_fp][56000]XNorm: 18.822446 Training: 2022-04-27 04:59:32,302-[cfp_fp][56000]Accuracy-Flip: 0.95343+-0.00976 Training: 2022-04-27 04:59:32,302-[cfp_fp][56000]Accuracy-Highest: 0.95343 Training: 2022-04-27 04:59:58,503-[agedb_30][56000]XNorm: 21.008991 Training: 2022-04-27 04:59:58,503-[agedb_30][56000]Accuracy-Flip: 0.97550+-0.00863 Training: 2022-04-27 04:59:58,504-[agedb_30][56000]Accuracy-Highest: 0.97550 Training: 2022-04-27 05:00:00,368-Speed 121.04 samples/sec Loss 4.8828 LearningRate 0.0258 Epoch: 9 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:02,167-Speed 5692.17 samples/sec Loss 5.0275 LearningRate 0.0257 Epoch: 9 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:04,004-Speed 5577.47 samples/sec Loss 5.0176 LearningRate 0.0257 Epoch: 9 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:05,839-Speed 5581.26 samples/sec Loss 5.0194 LearningRate 0.0257 Epoch: 9 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:07,671-Speed 5590.61 samples/sec Loss 4.9002 LearningRate 0.0257 Epoch: 9 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:09,498-Speed 5607.43 samples/sec Loss 4.8781 LearningRate 0.0257 Epoch: 9 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:11,323-Speed 5612.42 samples/sec Loss 4.9169 LearningRate 0.0257 Epoch: 9 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:13,144-Speed 5626.02 samples/sec Loss 5.0792 LearningRate 0.0257 Epoch: 9 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:14,965-Speed 5625.68 samples/sec Loss 4.9939 LearningRate 0.0257 Epoch: 9 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:16,794-Speed 5599.91 samples/sec Loss 5.0508 LearningRate 0.0257 Epoch: 9 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:18,619-Speed 5610.99 samples/sec Loss 4.9827 LearningRate 0.0257 Epoch: 9 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:20,506-Speed 5430.76 samples/sec Loss 4.8680 LearningRate 0.0257 Epoch: 9 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:22,366-Speed 5506.65 samples/sec Loss 4.9050 LearningRate 0.0256 Epoch: 9 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:24,297-Speed 5305.06 samples/sec Loss 4.9420 LearningRate 0.0256 Epoch: 9 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:26,215-Speed 5340.44 samples/sec Loss 4.9624 LearningRate 0.0256 Epoch: 9 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:28,135-Speed 5333.29 samples/sec Loss 4.9165 LearningRate 0.0256 Epoch: 9 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:29,956-Speed 5625.38 samples/sec Loss 4.8365 LearningRate 0.0256 Epoch: 9 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:31,777-Speed 5626.62 samples/sec Loss 5.0046 LearningRate 0.0256 Epoch: 9 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:33,593-Speed 5641.66 samples/sec Loss 4.8703 LearningRate 0.0256 Epoch: 9 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:35,435-Speed 5561.04 samples/sec Loss 5.0148 LearningRate 0.0256 Epoch: 9 Global Step: 56200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:37,259-Speed 5615.33 samples/sec Loss 4.9365 LearningRate 0.0256 Epoch: 9 Global Step: 56210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:39,074-Speed 5643.31 samples/sec Loss 4.8826 LearningRate 0.0256 Epoch: 9 Global Step: 56220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:40,885-Speed 5656.06 samples/sec Loss 4.9482 LearningRate 0.0256 Epoch: 9 Global Step: 56230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:42,699-Speed 5648.45 samples/sec Loss 5.0571 LearningRate 0.0255 Epoch: 9 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:44,515-Speed 5639.89 samples/sec Loss 4.8822 LearningRate 0.0255 Epoch: 9 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:46,337-Speed 5623.91 samples/sec Loss 5.0605 LearningRate 0.0255 Epoch: 9 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:48,181-Speed 5554.83 samples/sec Loss 4.9012 LearningRate 0.0255 Epoch: 9 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:50,099-Speed 5341.57 samples/sec Loss 5.0598 LearningRate 0.0255 Epoch: 9 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:51,963-Speed 5493.44 samples/sec Loss 4.9175 LearningRate 0.0255 Epoch: 9 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:53,796-Speed 5588.39 samples/sec Loss 4.8813 LearningRate 0.0255 Epoch: 9 Global Step: 56300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:00:55,597-Speed 5686.63 samples/sec Loss 4.9345 LearningRate 0.0255 Epoch: 9 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:57,413-Speed 5642.89 samples/sec Loss 4.8966 LearningRate 0.0255 Epoch: 9 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:00:59,238-Speed 5612.41 samples/sec Loss 4.9170 LearningRate 0.0255 Epoch: 9 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:01,062-Speed 5617.59 samples/sec Loss 5.0847 LearningRate 0.0255 Epoch: 9 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:02,899-Speed 5576.91 samples/sec Loss 5.0005 LearningRate 0.0255 Epoch: 9 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:04,720-Speed 5623.07 samples/sec Loss 4.8526 LearningRate 0.0254 Epoch: 9 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:06,550-Speed 5597.26 samples/sec Loss 4.8735 LearningRate 0.0254 Epoch: 9 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:08,371-Speed 5628.47 samples/sec Loss 4.8645 LearningRate 0.0254 Epoch: 9 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:10,184-Speed 5648.00 samples/sec Loss 5.0451 LearningRate 0.0254 Epoch: 9 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:11,997-Speed 5651.39 samples/sec Loss 4.8522 LearningRate 0.0254 Epoch: 9 Global Step: 56400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:13,836-Speed 5570.59 samples/sec Loss 4.8755 LearningRate 0.0254 Epoch: 9 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:01:15,656-Speed 5626.84 samples/sec Loss 4.9617 LearningRate 0.0254 Epoch: 9 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:01:17,480-Speed 5616.42 samples/sec Loss 4.8875 LearningRate 0.0254 Epoch: 9 Global Step: 56430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:19,315-Speed 5583.70 samples/sec Loss 4.9031 LearningRate 0.0254 Epoch: 9 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:21,135-Speed 5627.26 samples/sec Loss 4.9886 LearningRate 0.0254 Epoch: 9 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:22,981-Speed 5550.50 samples/sec Loss 4.8244 LearningRate 0.0254 Epoch: 9 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:24,833-Speed 5532.21 samples/sec Loss 4.9562 LearningRate 0.0253 Epoch: 9 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:26,646-Speed 5649.35 samples/sec Loss 5.0169 LearningRate 0.0253 Epoch: 9 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:28,469-Speed 5618.43 samples/sec Loss 4.8231 LearningRate 0.0253 Epoch: 9 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:30,286-Speed 5637.36 samples/sec Loss 4.9658 LearningRate 0.0253 Epoch: 9 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:32,112-Speed 5608.89 samples/sec Loss 4.7503 LearningRate 0.0253 Epoch: 9 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:33,916-Speed 5679.42 samples/sec Loss 4.8455 LearningRate 0.0253 Epoch: 9 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:35,723-Speed 5666.59 samples/sec Loss 4.8389 LearningRate 0.0253 Epoch: 9 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:01:37,535-Speed 5654.88 samples/sec Loss 5.1127 LearningRate 0.0253 Epoch: 9 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:39,344-Speed 5662.62 samples/sec Loss 4.8945 LearningRate 0.0253 Epoch: 9 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:41,162-Speed 5634.70 samples/sec Loss 4.9236 LearningRate 0.0253 Epoch: 9 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:42,975-Speed 5650.23 samples/sec Loss 4.8898 LearningRate 0.0253 Epoch: 9 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:44,804-Speed 5600.10 samples/sec Loss 5.0410 LearningRate 0.0252 Epoch: 9 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:46,618-Speed 5648.03 samples/sec Loss 4.9111 LearningRate 0.0252 Epoch: 9 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:48,439-Speed 5623.75 samples/sec Loss 5.0208 LearningRate 0.0252 Epoch: 9 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:50,253-Speed 5648.54 samples/sec Loss 5.0086 LearningRate 0.0252 Epoch: 9 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:01:52,067-Speed 5643.85 samples/sec Loss 4.8853 LearningRate 0.0252 Epoch: 9 Global Step: 56620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:01:53,883-Speed 5644.21 samples/sec Loss 4.8917 LearningRate 0.0252 Epoch: 9 Global Step: 56630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:01:55,696-Speed 5648.66 samples/sec Loss 4.9016 LearningRate 0.0252 Epoch: 9 Global Step: 56640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:01:57,515-Speed 5630.65 samples/sec Loss 4.9225 LearningRate 0.0252 Epoch: 9 Global Step: 56650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:01:59,328-Speed 5649.39 samples/sec Loss 4.9694 LearningRate 0.0252 Epoch: 9 Global Step: 56660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:01,142-Speed 5648.86 samples/sec Loss 4.8715 LearningRate 0.0252 Epoch: 9 Global Step: 56670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:02,963-Speed 5625.76 samples/sec Loss 4.8261 LearningRate 0.0252 Epoch: 9 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:04,774-Speed 5655.66 samples/sec Loss 4.8664 LearningRate 0.0251 Epoch: 9 Global Step: 56690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:06,640-Speed 5490.58 samples/sec Loss 4.7604 LearningRate 0.0251 Epoch: 9 Global Step: 56700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:08,496-Speed 5519.27 samples/sec Loss 4.8850 LearningRate 0.0251 Epoch: 9 Global Step: 56710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:02:10,303-Speed 5667.96 samples/sec Loss 4.9692 LearningRate 0.0251 Epoch: 9 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:12,123-Speed 5628.38 samples/sec Loss 4.9550 LearningRate 0.0251 Epoch: 9 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:13,931-Speed 5665.43 samples/sec Loss 4.7731 LearningRate 0.0251 Epoch: 9 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:15,747-Speed 5639.51 samples/sec Loss 4.8675 LearningRate 0.0251 Epoch: 9 Global Step: 56750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:17,601-Speed 5526.47 samples/sec Loss 5.0239 LearningRate 0.0251 Epoch: 9 Global Step: 56760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:19,441-Speed 5565.67 samples/sec Loss 4.9827 LearningRate 0.0251 Epoch: 9 Global Step: 56770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:21,266-Speed 5614.59 samples/sec Loss 5.0762 LearningRate 0.0251 Epoch: 9 Global Step: 56780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:23,099-Speed 5588.24 samples/sec Loss 4.7508 LearningRate 0.0251 Epoch: 9 Global Step: 56790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:24,922-Speed 5620.89 samples/sec Loss 4.9572 LearningRate 0.0251 Epoch: 9 Global Step: 56800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:26,747-Speed 5612.46 samples/sec Loss 4.8768 LearningRate 0.0250 Epoch: 9 Global Step: 56810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:28,586-Speed 5570.12 samples/sec Loss 4.9997 LearningRate 0.0250 Epoch: 9 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:02:30,414-Speed 5601.50 samples/sec Loss 4.8415 LearningRate 0.0250 Epoch: 9 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:02:32,230-Speed 5642.96 samples/sec Loss 4.8339 LearningRate 0.0250 Epoch: 9 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:02:34,103-Speed 5467.51 samples/sec Loss 4.7529 LearningRate 0.0250 Epoch: 9 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:35,896-Speed 5713.03 samples/sec Loss 4.8096 LearningRate 0.0250 Epoch: 9 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:50,686-Speed 692.41 samples/sec Loss 4.3587 LearningRate 0.0250 Epoch: 10 Global Step: 56870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:52,513-Speed 5608.05 samples/sec Loss 4.2520 LearningRate 0.0250 Epoch: 10 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:54,349-Speed 5581.20 samples/sec Loss 4.2726 LearningRate 0.0250 Epoch: 10 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:56,319-Speed 5198.33 samples/sec Loss 4.3551 LearningRate 0.0250 Epoch: 10 Global Step: 56900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:58,143-Speed 5616.54 samples/sec Loss 4.3397 LearningRate 0.0250 Epoch: 10 Global Step: 56910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:02:59,961-Speed 5635.06 samples/sec Loss 4.2820 LearningRate 0.0249 Epoch: 10 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:01,808-Speed 5546.44 samples/sec Loss 4.2223 LearningRate 0.0249 Epoch: 10 Global Step: 56930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:03,652-Speed 5554.11 samples/sec Loss 4.2938 LearningRate 0.0249 Epoch: 10 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:05,494-Speed 5560.14 samples/sec Loss 4.3930 LearningRate 0.0249 Epoch: 10 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:07,323-Speed 5600.27 samples/sec Loss 4.1599 LearningRate 0.0249 Epoch: 10 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:09,182-Speed 5511.07 samples/sec Loss 4.1753 LearningRate 0.0249 Epoch: 10 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:11,025-Speed 5555.30 samples/sec Loss 4.2332 LearningRate 0.0249 Epoch: 10 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:12,874-Speed 5542.63 samples/sec Loss 4.2924 LearningRate 0.0249 Epoch: 10 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:14,722-Speed 5542.22 samples/sec Loss 4.2469 LearningRate 0.0249 Epoch: 10 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:16,557-Speed 5582.66 samples/sec Loss 4.1955 LearningRate 0.0249 Epoch: 10 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:18,391-Speed 5585.28 samples/sec Loss 4.4133 LearningRate 0.0249 Epoch: 10 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:20,221-Speed 5598.46 samples/sec Loss 4.3402 LearningRate 0.0249 Epoch: 10 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:22,050-Speed 5599.78 samples/sec Loss 4.3105 LearningRate 0.0248 Epoch: 10 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:23,880-Speed 5598.16 samples/sec Loss 4.2179 LearningRate 0.0248 Epoch: 10 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:25,711-Speed 5594.42 samples/sec Loss 4.3905 LearningRate 0.0248 Epoch: 10 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:27,547-Speed 5578.14 samples/sec Loss 4.2840 LearningRate 0.0248 Epoch: 10 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:29,364-Speed 5636.40 samples/sec Loss 4.3691 LearningRate 0.0248 Epoch: 10 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:31,238-Speed 5467.35 samples/sec Loss 4.4490 LearningRate 0.0248 Epoch: 10 Global Step: 57090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:33,070-Speed 5590.40 samples/sec Loss 4.4429 LearningRate 0.0248 Epoch: 10 Global Step: 57100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:03:34,889-Speed 5632.11 samples/sec Loss 4.4298 LearningRate 0.0248 Epoch: 10 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:36,709-Speed 5629.01 samples/sec Loss 4.4688 LearningRate 0.0248 Epoch: 10 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:38,557-Speed 5541.43 samples/sec Loss 4.2764 LearningRate 0.0248 Epoch: 10 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:40,394-Speed 5576.94 samples/sec Loss 4.4180 LearningRate 0.0248 Epoch: 10 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:42,242-Speed 5543.23 samples/sec Loss 4.4892 LearningRate 0.0247 Epoch: 10 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:44,078-Speed 5580.01 samples/sec Loss 4.4035 LearningRate 0.0247 Epoch: 10 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:45,893-Speed 5642.78 samples/sec Loss 4.3229 LearningRate 0.0247 Epoch: 10 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:47,710-Speed 5637.30 samples/sec Loss 4.4855 LearningRate 0.0247 Epoch: 10 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:49,554-Speed 5555.57 samples/sec Loss 4.2845 LearningRate 0.0247 Epoch: 10 Global Step: 57190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:51,378-Speed 5614.49 samples/sec Loss 4.3127 LearningRate 0.0247 Epoch: 10 Global Step: 57200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:03:53,203-Speed 5613.85 samples/sec Loss 4.4230 LearningRate 0.0247 Epoch: 10 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:03:55,047-Speed 5556.74 samples/sec Loss 4.5290 LearningRate 0.0247 Epoch: 10 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:03:56,868-Speed 5623.88 samples/sec Loss 4.4267 LearningRate 0.0247 Epoch: 10 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:03:58,697-Speed 5600.34 samples/sec Loss 4.4565 LearningRate 0.0247 Epoch: 10 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:04:00,530-Speed 5589.12 samples/sec Loss 4.5058 LearningRate 0.0247 Epoch: 10 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:04:02,366-Speed 5579.73 samples/sec Loss 4.4772 LearningRate 0.0246 Epoch: 10 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:04,196-Speed 5597.24 samples/sec Loss 4.4476 LearningRate 0.0246 Epoch: 10 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:06,017-Speed 5623.74 samples/sec Loss 4.4231 LearningRate 0.0246 Epoch: 10 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:07,829-Speed 5655.45 samples/sec Loss 4.4708 LearningRate 0.0246 Epoch: 10 Global Step: 57290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:09,662-Speed 5587.90 samples/sec Loss 4.4389 LearningRate 0.0246 Epoch: 10 Global Step: 57300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:11,485-Speed 5617.37 samples/sec Loss 4.3724 LearningRate 0.0246 Epoch: 10 Global Step: 57310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:13,306-Speed 5624.80 samples/sec Loss 4.4860 LearningRate 0.0246 Epoch: 10 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:15,142-Speed 5581.17 samples/sec Loss 4.3985 LearningRate 0.0246 Epoch: 10 Global Step: 57330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:16,971-Speed 5598.67 samples/sec Loss 4.4752 LearningRate 0.0246 Epoch: 10 Global Step: 57340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:18,792-Speed 5625.89 samples/sec Loss 4.5273 LearningRate 0.0246 Epoch: 10 Global Step: 57350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:20,612-Speed 5626.85 samples/sec Loss 4.3586 LearningRate 0.0246 Epoch: 10 Global Step: 57360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:22,433-Speed 5627.53 samples/sec Loss 4.5711 LearningRate 0.0246 Epoch: 10 Global Step: 57370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:24,285-Speed 5531.29 samples/sec Loss 4.5175 LearningRate 0.0245 Epoch: 10 Global Step: 57380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 05:04:26,110-Speed 5613.62 samples/sec Loss 4.3886 LearningRate 0.0245 Epoch: 10 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:27,933-Speed 5619.32 samples/sec Loss 4.5478 LearningRate 0.0245 Epoch: 10 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:29,767-Speed 5583.28 samples/sec Loss 4.3713 LearningRate 0.0245 Epoch: 10 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:31,593-Speed 5609.45 samples/sec Loss 4.5375 LearningRate 0.0245 Epoch: 10 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:33,416-Speed 5619.02 samples/sec Loss 4.6612 LearningRate 0.0245 Epoch: 10 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:35,253-Speed 5576.52 samples/sec Loss 4.4781 LearningRate 0.0245 Epoch: 10 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:37,092-Speed 5571.66 samples/sec Loss 4.5369 LearningRate 0.0245 Epoch: 10 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:38,945-Speed 5525.85 samples/sec Loss 4.6088 LearningRate 0.0245 Epoch: 10 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:40,838-Speed 5412.21 samples/sec Loss 4.5112 LearningRate 0.0245 Epoch: 10 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:42,679-Speed 5563.34 samples/sec Loss 4.6104 LearningRate 0.0245 Epoch: 10 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:44,513-Speed 5585.86 samples/sec Loss 4.5018 LearningRate 0.0244 Epoch: 10 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:04:46,344-Speed 5594.67 samples/sec Loss 4.5267 LearningRate 0.0244 Epoch: 10 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:04:48,175-Speed 5594.67 samples/sec Loss 4.5496 LearningRate 0.0244 Epoch: 10 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:50,025-Speed 5536.48 samples/sec Loss 4.5991 LearningRate 0.0244 Epoch: 10 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:51,894-Speed 5480.73 samples/sec Loss 4.4899 LearningRate 0.0244 Epoch: 10 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:53,750-Speed 5517.64 samples/sec Loss 4.4969 LearningRate 0.0244 Epoch: 10 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:55,591-Speed 5565.96 samples/sec Loss 4.4913 LearningRate 0.0244 Epoch: 10 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:57,424-Speed 5589.26 samples/sec Loss 4.5461 LearningRate 0.0244 Epoch: 10 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:04:59,268-Speed 5551.93 samples/sec Loss 4.5490 LearningRate 0.0244 Epoch: 10 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:01,101-Speed 5590.16 samples/sec Loss 4.4957 LearningRate 0.0244 Epoch: 10 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:02,933-Speed 5590.28 samples/sec Loss 4.5012 LearningRate 0.0244 Epoch: 10 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:05,011-Speed 4929.67 samples/sec Loss 4.4454 LearningRate 0.0244 Epoch: 10 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:06,855-Speed 5553.40 samples/sec Loss 4.5657 LearningRate 0.0243 Epoch: 10 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:08,682-Speed 5610.03 samples/sec Loss 4.4353 LearningRate 0.0243 Epoch: 10 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:10,515-Speed 5586.70 samples/sec Loss 4.5838 LearningRate 0.0243 Epoch: 10 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:12,352-Speed 5575.23 samples/sec Loss 4.5595 LearningRate 0.0243 Epoch: 10 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:14,185-Speed 5588.37 samples/sec Loss 4.4156 LearningRate 0.0243 Epoch: 10 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:16,023-Speed 5573.61 samples/sec Loss 4.5190 LearningRate 0.0243 Epoch: 10 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:17,842-Speed 5633.17 samples/sec Loss 4.5961 LearningRate 0.0243 Epoch: 10 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:19,662-Speed 5628.16 samples/sec Loss 4.6427 LearningRate 0.0243 Epoch: 10 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:21,498-Speed 5579.52 samples/sec Loss 4.5994 LearningRate 0.0243 Epoch: 10 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:23,344-Speed 5547.16 samples/sec Loss 4.4926 LearningRate 0.0243 Epoch: 10 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:25,258-Speed 5353.55 samples/sec Loss 4.7063 LearningRate 0.0243 Epoch: 10 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:27,096-Speed 5571.63 samples/sec Loss 4.6565 LearningRate 0.0242 Epoch: 10 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:28,916-Speed 5626.96 samples/sec Loss 4.5549 LearningRate 0.0242 Epoch: 10 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:30,742-Speed 5610.25 samples/sec Loss 4.5504 LearningRate 0.0242 Epoch: 10 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:32,576-Speed 5585.48 samples/sec Loss 4.5214 LearningRate 0.0242 Epoch: 10 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:34,416-Speed 5567.27 samples/sec Loss 4.6197 LearningRate 0.0242 Epoch: 10 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:36,267-Speed 5533.80 samples/sec Loss 4.5866 LearningRate 0.0242 Epoch: 10 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:38,087-Speed 5627.50 samples/sec Loss 4.6669 LearningRate 0.0242 Epoch: 10 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:39,907-Speed 5628.16 samples/sec Loss 4.6195 LearningRate 0.0242 Epoch: 10 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:41,738-Speed 5596.03 samples/sec Loss 4.5654 LearningRate 0.0242 Epoch: 10 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:05:43,563-Speed 5611.88 samples/sec Loss 4.6089 LearningRate 0.0242 Epoch: 10 Global Step: 57810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:45,409-Speed 5549.32 samples/sec Loss 4.6092 LearningRate 0.0242 Epoch: 10 Global Step: 57820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:47,245-Speed 5580.48 samples/sec Loss 4.7157 LearningRate 0.0242 Epoch: 10 Global Step: 57830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:49,077-Speed 5589.34 samples/sec Loss 4.6165 LearningRate 0.0241 Epoch: 10 Global Step: 57840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:50,916-Speed 5568.87 samples/sec Loss 4.5863 LearningRate 0.0241 Epoch: 10 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:52,757-Speed 5566.25 samples/sec Loss 4.5239 LearningRate 0.0241 Epoch: 10 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:54,596-Speed 5571.62 samples/sec Loss 4.5795 LearningRate 0.0241 Epoch: 10 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:05:56,421-Speed 5612.60 samples/sec Loss 4.5702 LearningRate 0.0241 Epoch: 10 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:05:58,278-Speed 5515.73 samples/sec Loss 4.5398 LearningRate 0.0241 Epoch: 10 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:00,112-Speed 5584.35 samples/sec Loss 4.6201 LearningRate 0.0241 Epoch: 10 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:01,935-Speed 5619.35 samples/sec Loss 4.4907 LearningRate 0.0241 Epoch: 10 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:06:03,770-Speed 5580.77 samples/sec Loss 4.6918 LearningRate 0.0241 Epoch: 10 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:05,629-Speed 5511.78 samples/sec Loss 4.6248 LearningRate 0.0241 Epoch: 10 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:07,483-Speed 5525.59 samples/sec Loss 4.5872 LearningRate 0.0241 Epoch: 10 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:09,322-Speed 5569.36 samples/sec Loss 4.4442 LearningRate 0.0241 Epoch: 10 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:11,142-Speed 5626.99 samples/sec Loss 4.6024 LearningRate 0.0240 Epoch: 10 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:12,981-Speed 5570.49 samples/sec Loss 4.5021 LearningRate 0.0240 Epoch: 10 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:14,821-Speed 5567.14 samples/sec Loss 4.5745 LearningRate 0.0240 Epoch: 10 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:16,644-Speed 5619.46 samples/sec Loss 4.5436 LearningRate 0.0240 Epoch: 10 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:18,478-Speed 5586.39 samples/sec Loss 4.5630 LearningRate 0.0240 Epoch: 10 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:06:44,638-[lfw][58000]XNorm: 22.345384 Training: 2022-04-27 05:06:44,639-[lfw][58000]Accuracy-Flip: 0.99733+-0.00260 Training: 2022-04-27 05:06:44,639-[lfw][58000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:07:15,040-[cfp_fp][58000]XNorm: 19.903842 Training: 2022-04-27 05:07:15,040-[cfp_fp][58000]Accuracy-Flip: 0.95557+-0.01053 Training: 2022-04-27 05:07:15,041-[cfp_fp][58000]Accuracy-Highest: 0.95557 Training: 2022-04-27 05:07:41,202-[agedb_30][58000]XNorm: 22.140580 Training: 2022-04-27 05:07:41,202-[agedb_30][58000]Accuracy-Flip: 0.97483+-0.00976 Training: 2022-04-27 05:07:41,203-[agedb_30][58000]Accuracy-Highest: 0.97550 Training: 2022-04-27 05:07:43,041-Speed 121.09 samples/sec Loss 4.5631 LearningRate 0.0240 Epoch: 10 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:44,870-Speed 5601.38 samples/sec Loss 4.5854 LearningRate 0.0240 Epoch: 10 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:07:46,679-Speed 5662.85 samples/sec Loss 4.6195 LearningRate 0.0240 Epoch: 10 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:48,498-Speed 5630.62 samples/sec Loss 4.5550 LearningRate 0.0240 Epoch: 10 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:50,346-Speed 5542.40 samples/sec Loss 4.5675 LearningRate 0.0240 Epoch: 10 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:52,186-Speed 5566.89 samples/sec Loss 4.4772 LearningRate 0.0240 Epoch: 10 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:54,025-Speed 5572.19 samples/sec Loss 4.4751 LearningRate 0.0239 Epoch: 10 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:55,906-Speed 5444.54 samples/sec Loss 4.6576 LearningRate 0.0239 Epoch: 10 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:57,736-Speed 5598.80 samples/sec Loss 4.5922 LearningRate 0.0239 Epoch: 10 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:07:59,563-Speed 5605.99 samples/sec Loss 4.4795 LearningRate 0.0239 Epoch: 10 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:01,410-Speed 5544.95 samples/sec Loss 4.6372 LearningRate 0.0239 Epoch: 10 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:03,239-Speed 5599.93 samples/sec Loss 4.4964 LearningRate 0.0239 Epoch: 10 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:05,077-Speed 5574.52 samples/sec Loss 4.4934 LearningRate 0.0239 Epoch: 10 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-27 05:08:06,900-Speed 5616.08 samples/sec Loss 4.6738 LearningRate 0.0239 Epoch: 10 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:08,748-Speed 5545.89 samples/sec Loss 4.5912 LearningRate 0.0239 Epoch: 10 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:10,572-Speed 5613.87 samples/sec Loss 4.4512 LearningRate 0.0239 Epoch: 10 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:12,394-Speed 5622.87 samples/sec Loss 4.4865 LearningRate 0.0239 Epoch: 10 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 05:08:14,230-Speed 5578.75 samples/sec Loss 4.7460 LearningRate 0.0239 Epoch: 10 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:16,061-Speed 5594.13 samples/sec Loss 4.6408 LearningRate 0.0238 Epoch: 10 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:17,886-Speed 5615.01 samples/sec Loss 4.5518 LearningRate 0.0238 Epoch: 10 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:19,724-Speed 5571.76 samples/sec Loss 4.7193 LearningRate 0.0238 Epoch: 10 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:21,550-Speed 5611.98 samples/sec Loss 4.6480 LearningRate 0.0238 Epoch: 10 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:23,388-Speed 5572.48 samples/sec Loss 4.5909 LearningRate 0.0238 Epoch: 10 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:25,208-Speed 5628.24 samples/sec Loss 4.5139 LearningRate 0.0238 Epoch: 10 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:27,032-Speed 5615.63 samples/sec Loss 4.6763 LearningRate 0.0238 Epoch: 10 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:28,876-Speed 5555.56 samples/sec Loss 4.6929 LearningRate 0.0238 Epoch: 10 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:30,700-Speed 5613.01 samples/sec Loss 4.7078 LearningRate 0.0238 Epoch: 10 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:32,537-Speed 5578.57 samples/sec Loss 4.5525 LearningRate 0.0238 Epoch: 10 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:34,383-Speed 5547.20 samples/sec Loss 4.6367 LearningRate 0.0238 Epoch: 10 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:36,221-Speed 5574.27 samples/sec Loss 4.6005 LearningRate 0.0237 Epoch: 10 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:38,039-Speed 5636.55 samples/sec Loss 4.6269 LearningRate 0.0237 Epoch: 10 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:39,869-Speed 5596.56 samples/sec Loss 4.6397 LearningRate 0.0237 Epoch: 10 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:41,693-Speed 5615.44 samples/sec Loss 4.6761 LearningRate 0.0237 Epoch: 10 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:43,549-Speed 5519.20 samples/sec Loss 4.6566 LearningRate 0.0237 Epoch: 10 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:45,374-Speed 5610.81 samples/sec Loss 4.7146 LearningRate 0.0237 Epoch: 10 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:47,192-Speed 5634.14 samples/sec Loss 4.6862 LearningRate 0.0237 Epoch: 10 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:08:49,011-Speed 5631.68 samples/sec Loss 4.6885 LearningRate 0.0237 Epoch: 10 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:50,851-Speed 5566.47 samples/sec Loss 4.7859 LearningRate 0.0237 Epoch: 10 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:52,672-Speed 5627.49 samples/sec Loss 4.5682 LearningRate 0.0237 Epoch: 10 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:54,511-Speed 5568.19 samples/sec Loss 4.6796 LearningRate 0.0237 Epoch: 10 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:56,390-Speed 5452.67 samples/sec Loss 4.6508 LearningRate 0.0237 Epoch: 10 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:08:58,221-Speed 5593.40 samples/sec Loss 4.6033 LearningRate 0.0236 Epoch: 10 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:00,052-Speed 5596.05 samples/sec Loss 4.6528 LearningRate 0.0236 Epoch: 10 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:01,910-Speed 5514.04 samples/sec Loss 4.5086 LearningRate 0.0236 Epoch: 10 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:03,754-Speed 5554.94 samples/sec Loss 4.6059 LearningRate 0.0236 Epoch: 10 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:05,575-Speed 5624.52 samples/sec Loss 4.6487 LearningRate 0.0236 Epoch: 10 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:07,395-Speed 5627.34 samples/sec Loss 4.4799 LearningRate 0.0236 Epoch: 10 Global Step: 58470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:09,240-Speed 5552.06 samples/sec Loss 4.6724 LearningRate 0.0236 Epoch: 10 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:11,064-Speed 5615.31 samples/sec Loss 4.5565 LearningRate 0.0236 Epoch: 10 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:12,891-Speed 5607.34 samples/sec Loss 4.6177 LearningRate 0.0236 Epoch: 10 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:14,717-Speed 5610.40 samples/sec Loss 4.7200 LearningRate 0.0236 Epoch: 10 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:16,537-Speed 5628.71 samples/sec Loss 4.5827 LearningRate 0.0236 Epoch: 10 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:09:18,359-Speed 5621.27 samples/sec Loss 4.5379 LearningRate 0.0236 Epoch: 10 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:20,190-Speed 5595.53 samples/sec Loss 4.5901 LearningRate 0.0235 Epoch: 10 Global Step: 58540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:22,018-Speed 5602.57 samples/sec Loss 4.5594 LearningRate 0.0235 Epoch: 10 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:23,838-Speed 5629.15 samples/sec Loss 4.6704 LearningRate 0.0235 Epoch: 10 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:25,665-Speed 5605.93 samples/sec Loss 4.5561 LearningRate 0.0235 Epoch: 10 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:27,485-Speed 5629.39 samples/sec Loss 4.5657 LearningRate 0.0235 Epoch: 10 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:29,308-Speed 5616.45 samples/sec Loss 4.7030 LearningRate 0.0235 Epoch: 10 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:31,155-Speed 5546.20 samples/sec Loss 4.6289 LearningRate 0.0235 Epoch: 10 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:32,993-Speed 5572.81 samples/sec Loss 4.7262 LearningRate 0.0235 Epoch: 10 Global Step: 58610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:34,821-Speed 5603.85 samples/sec Loss 4.5491 LearningRate 0.0235 Epoch: 10 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:36,651-Speed 5598.37 samples/sec Loss 4.4630 LearningRate 0.0235 Epoch: 10 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:09:38,480-Speed 5599.12 samples/sec Loss 4.5833 LearningRate 0.0235 Epoch: 10 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:09:40,294-Speed 5646.50 samples/sec Loss 4.5966 LearningRate 0.0235 Epoch: 10 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:42,138-Speed 5555.43 samples/sec Loss 4.6284 LearningRate 0.0234 Epoch: 10 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:43,985-Speed 5547.88 samples/sec Loss 4.6196 LearningRate 0.0234 Epoch: 10 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:45,871-Speed 5431.12 samples/sec Loss 4.6058 LearningRate 0.0234 Epoch: 10 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:47,707-Speed 5578.65 samples/sec Loss 4.5931 LearningRate 0.0234 Epoch: 10 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:49,542-Speed 5582.36 samples/sec Loss 4.7120 LearningRate 0.0234 Epoch: 10 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:51,375-Speed 5588.74 samples/sec Loss 4.6605 LearningRate 0.0234 Epoch: 10 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:53,201-Speed 5608.34 samples/sec Loss 4.7342 LearningRate 0.0234 Epoch: 10 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:55,041-Speed 5566.98 samples/sec Loss 4.5958 LearningRate 0.0234 Epoch: 10 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:56,876-Speed 5582.31 samples/sec Loss 4.5535 LearningRate 0.0234 Epoch: 10 Global Step: 58740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:09:58,762-Speed 5432.78 samples/sec Loss 4.6704 LearningRate 0.0234 Epoch: 10 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:00,610-Speed 5543.72 samples/sec Loss 4.5975 LearningRate 0.0234 Epoch: 10 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:02,447-Speed 5573.47 samples/sec Loss 4.6579 LearningRate 0.0233 Epoch: 10 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:04,269-Speed 5622.24 samples/sec Loss 4.6675 LearningRate 0.0233 Epoch: 10 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:06,111-Speed 5564.00 samples/sec Loss 4.5356 LearningRate 0.0233 Epoch: 10 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:07,933-Speed 5621.88 samples/sec Loss 4.6765 LearningRate 0.0233 Epoch: 10 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:09,800-Speed 5485.07 samples/sec Loss 4.5958 LearningRate 0.0233 Epoch: 10 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:11,648-Speed 5543.56 samples/sec Loss 4.6198 LearningRate 0.0233 Epoch: 10 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:13,489-Speed 5565.09 samples/sec Loss 4.6402 LearningRate 0.0233 Epoch: 10 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:15,380-Speed 5415.96 samples/sec Loss 4.7327 LearningRate 0.0233 Epoch: 10 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:17,291-Speed 5358.85 samples/sec Loss 4.6610 LearningRate 0.0233 Epoch: 10 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:19,119-Speed 5603.14 samples/sec Loss 4.6895 LearningRate 0.0233 Epoch: 10 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:20,947-Speed 5605.05 samples/sec Loss 4.5239 LearningRate 0.0233 Epoch: 10 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:22,797-Speed 5536.48 samples/sec Loss 4.5698 LearningRate 0.0233 Epoch: 10 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:24,654-Speed 5516.80 samples/sec Loss 4.7398 LearningRate 0.0232 Epoch: 10 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:26,478-Speed 5616.71 samples/sec Loss 4.7044 LearningRate 0.0232 Epoch: 10 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:28,322-Speed 5555.25 samples/sec Loss 4.7469 LearningRate 0.0232 Epoch: 10 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:30,167-Speed 5550.54 samples/sec Loss 4.4811 LearningRate 0.0232 Epoch: 10 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:31,992-Speed 5615.59 samples/sec Loss 4.7015 LearningRate 0.0232 Epoch: 10 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:33,818-Speed 5609.47 samples/sec Loss 4.6701 LearningRate 0.0232 Epoch: 10 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:35,634-Speed 5638.00 samples/sec Loss 4.5162 LearningRate 0.0232 Epoch: 10 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:37,459-Speed 5612.78 samples/sec Loss 4.5816 LearningRate 0.0232 Epoch: 10 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:39,297-Speed 5574.92 samples/sec Loss 4.5104 LearningRate 0.0232 Epoch: 10 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:41,123-Speed 5608.23 samples/sec Loss 4.5846 LearningRate 0.0232 Epoch: 10 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:42,961-Speed 5574.59 samples/sec Loss 4.5867 LearningRate 0.0232 Epoch: 10 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:44,789-Speed 5602.78 samples/sec Loss 4.6731 LearningRate 0.0232 Epoch: 10 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:46,665-Speed 5458.47 samples/sec Loss 4.5099 LearningRate 0.0231 Epoch: 10 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:10:48,504-Speed 5572.02 samples/sec Loss 4.6727 LearningRate 0.0231 Epoch: 10 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:50,319-Speed 5643.29 samples/sec Loss 4.5513 LearningRate 0.0231 Epoch: 10 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:52,142-Speed 5618.75 samples/sec Loss 4.5649 LearningRate 0.0231 Epoch: 10 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:53,964-Speed 5624.39 samples/sec Loss 4.5224 LearningRate 0.0231 Epoch: 10 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:55,784-Speed 5627.31 samples/sec Loss 4.6222 LearningRate 0.0231 Epoch: 10 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:57,605-Speed 5623.54 samples/sec Loss 4.6270 LearningRate 0.0231 Epoch: 10 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:10:59,418-Speed 5651.09 samples/sec Loss 4.7082 LearningRate 0.0231 Epoch: 10 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:01,239-Speed 5624.32 samples/sec Loss 4.6469 LearningRate 0.0231 Epoch: 10 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:03,073-Speed 5585.42 samples/sec Loss 4.5953 LearningRate 0.0231 Epoch: 10 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:04,892-Speed 5630.74 samples/sec Loss 4.7005 LearningRate 0.0231 Epoch: 10 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:06,705-Speed 5650.24 samples/sec Loss 4.7470 LearningRate 0.0231 Epoch: 10 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:08,526-Speed 5625.42 samples/sec Loss 4.5357 LearningRate 0.0230 Epoch: 10 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:10,360-Speed 5584.59 samples/sec Loss 4.5553 LearningRate 0.0230 Epoch: 10 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:12,190-Speed 5599.02 samples/sec Loss 4.6489 LearningRate 0.0230 Epoch: 10 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:14,024-Speed 5587.29 samples/sec Loss 4.8099 LearningRate 0.0230 Epoch: 10 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:15,845-Speed 5622.73 samples/sec Loss 4.6744 LearningRate 0.0230 Epoch: 10 Global Step: 59170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:17,689-Speed 5555.20 samples/sec Loss 4.4819 LearningRate 0.0230 Epoch: 10 Global Step: 59180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:19,532-Speed 5558.53 samples/sec Loss 4.6441 LearningRate 0.0230 Epoch: 10 Global Step: 59190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:21,382-Speed 5536.90 samples/sec Loss 4.6223 LearningRate 0.0230 Epoch: 10 Global Step: 59200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:23,200-Speed 5635.47 samples/sec Loss 4.6573 LearningRate 0.0230 Epoch: 10 Global Step: 59210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:11:25,026-Speed 5608.84 samples/sec Loss 4.6093 LearningRate 0.0230 Epoch: 10 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:26,893-Speed 5485.57 samples/sec Loss 4.6524 LearningRate 0.0230 Epoch: 10 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:28,716-Speed 5618.95 samples/sec Loss 4.6541 LearningRate 0.0230 Epoch: 10 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:30,554-Speed 5573.95 samples/sec Loss 4.6524 LearningRate 0.0229 Epoch: 10 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:32,396-Speed 5560.99 samples/sec Loss 4.6395 LearningRate 0.0229 Epoch: 10 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:34,217-Speed 5623.61 samples/sec Loss 4.5787 LearningRate 0.0229 Epoch: 10 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:36,040-Speed 5622.07 samples/sec Loss 4.7819 LearningRate 0.0229 Epoch: 10 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:37,869-Speed 5601.29 samples/sec Loss 4.5577 LearningRate 0.0229 Epoch: 10 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:39,694-Speed 5609.99 samples/sec Loss 4.6437 LearningRate 0.0229 Epoch: 10 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:41,523-Speed 5600.43 samples/sec Loss 4.8036 LearningRate 0.0229 Epoch: 10 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:43,361-Speed 5574.38 samples/sec Loss 4.5397 LearningRate 0.0229 Epoch: 10 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:11:45,181-Speed 5630.03 samples/sec Loss 4.6497 LearningRate 0.0229 Epoch: 10 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:11:47,001-Speed 5627.23 samples/sec Loss 4.5642 LearningRate 0.0229 Epoch: 10 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:48,847-Speed 5546.80 samples/sec Loss 4.5763 LearningRate 0.0229 Epoch: 10 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:50,686-Speed 5570.77 samples/sec Loss 4.7002 LearningRate 0.0228 Epoch: 10 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:52,539-Speed 5527.52 samples/sec Loss 4.5981 LearningRate 0.0228 Epoch: 10 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:54,384-Speed 5552.08 samples/sec Loss 4.6699 LearningRate 0.0228 Epoch: 10 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:56,234-Speed 5538.00 samples/sec Loss 4.5796 LearningRate 0.0228 Epoch: 10 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:58,087-Speed 5528.28 samples/sec Loss 4.5305 LearningRate 0.0228 Epoch: 10 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:11:59,911-Speed 5617.28 samples/sec Loss 4.6111 LearningRate 0.0228 Epoch: 10 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:01,760-Speed 5541.18 samples/sec Loss 4.6847 LearningRate 0.0228 Epoch: 10 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:03,629-Speed 5480.28 samples/sec Loss 4.6013 LearningRate 0.0228 Epoch: 10 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:05,459-Speed 5595.91 samples/sec Loss 4.5371 LearningRate 0.0228 Epoch: 10 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:12:07,291-Speed 5589.98 samples/sec Loss 4.5444 LearningRate 0.0228 Epoch: 10 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:09,126-Speed 5583.32 samples/sec Loss 4.6315 LearningRate 0.0228 Epoch: 10 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:10,980-Speed 5525.53 samples/sec Loss 4.6301 LearningRate 0.0228 Epoch: 10 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:12,808-Speed 5603.49 samples/sec Loss 4.6450 LearningRate 0.0227 Epoch: 10 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:14,633-Speed 5613.26 samples/sec Loss 4.6787 LearningRate 0.0227 Epoch: 10 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:16,453-Speed 5627.83 samples/sec Loss 4.5409 LearningRate 0.0227 Epoch: 10 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:18,280-Speed 5606.74 samples/sec Loss 4.5697 LearningRate 0.0227 Epoch: 10 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:20,133-Speed 5527.34 samples/sec Loss 4.6231 LearningRate 0.0227 Epoch: 10 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:21,959-Speed 5610.54 samples/sec Loss 4.5640 LearningRate 0.0227 Epoch: 10 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:23,793-Speed 5586.93 samples/sec Loss 4.6306 LearningRate 0.0227 Epoch: 10 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:25,643-Speed 5534.40 samples/sec Loss 4.5398 LearningRate 0.0227 Epoch: 10 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:12:27,474-Speed 5596.79 samples/sec Loss 4.6675 LearningRate 0.0227 Epoch: 10 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:12:29,318-Speed 5552.02 samples/sec Loss 4.6506 LearningRate 0.0227 Epoch: 10 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:12:31,147-Speed 5601.82 samples/sec Loss 4.5497 LearningRate 0.0227 Epoch: 10 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:12:32,965-Speed 5633.48 samples/sec Loss 4.5898 LearningRate 0.0227 Epoch: 10 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:34,795-Speed 5599.03 samples/sec Loss 4.5409 LearningRate 0.0226 Epoch: 10 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:36,623-Speed 5603.37 samples/sec Loss 4.5441 LearningRate 0.0226 Epoch: 10 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:38,488-Speed 5492.97 samples/sec Loss 4.5168 LearningRate 0.0226 Epoch: 10 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:40,316-Speed 5603.16 samples/sec Loss 4.5964 LearningRate 0.0226 Epoch: 10 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:42,144-Speed 5603.26 samples/sec Loss 4.5764 LearningRate 0.0226 Epoch: 10 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:43,978-Speed 5587.98 samples/sec Loss 4.6174 LearningRate 0.0226 Epoch: 10 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:45,807-Speed 5598.14 samples/sec Loss 4.5654 LearningRate 0.0226 Epoch: 10 Global Step: 59660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:47,624-Speed 5638.02 samples/sec Loss 4.5465 LearningRate 0.0226 Epoch: 10 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:49,441-Speed 5636.34 samples/sec Loss 4.6528 LearningRate 0.0226 Epoch: 10 Global Step: 59680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:12:51,265-Speed 5617.48 samples/sec Loss 4.5593 LearningRate 0.0226 Epoch: 10 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:12:53,188-Speed 5326.01 samples/sec Loss 4.5650 LearningRate 0.0226 Epoch: 10 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:12:55,050-Speed 5500.89 samples/sec Loss 4.6584 LearningRate 0.0226 Epoch: 10 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:12:56,887-Speed 5576.33 samples/sec Loss 4.5555 LearningRate 0.0225 Epoch: 10 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:12:58,733-Speed 5550.93 samples/sec Loss 4.5574 LearningRate 0.0225 Epoch: 10 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:00,591-Speed 5513.41 samples/sec Loss 4.5916 LearningRate 0.0225 Epoch: 10 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:02,420-Speed 5599.68 samples/sec Loss 4.6519 LearningRate 0.0225 Epoch: 10 Global Step: 59750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:04,275-Speed 5521.55 samples/sec Loss 4.6028 LearningRate 0.0225 Epoch: 10 Global Step: 59760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:06,111-Speed 5581.47 samples/sec Loss 4.5817 LearningRate 0.0225 Epoch: 10 Global Step: 59770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:07,942-Speed 5593.42 samples/sec Loss 4.5963 LearningRate 0.0225 Epoch: 10 Global Step: 59780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:13:09,769-Speed 5606.90 samples/sec Loss 4.6366 LearningRate 0.0225 Epoch: 10 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:11,600-Speed 5593.34 samples/sec Loss 4.6759 LearningRate 0.0225 Epoch: 10 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:13,447-Speed 5546.99 samples/sec Loss 4.6675 LearningRate 0.0225 Epoch: 10 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:15,278-Speed 5593.62 samples/sec Loss 4.5606 LearningRate 0.0225 Epoch: 10 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:17,105-Speed 5606.63 samples/sec Loss 4.6082 LearningRate 0.0225 Epoch: 10 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:18,944-Speed 5569.58 samples/sec Loss 4.5148 LearningRate 0.0224 Epoch: 10 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:20,784-Speed 5565.78 samples/sec Loss 4.4946 LearningRate 0.0224 Epoch: 10 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:22,608-Speed 5616.91 samples/sec Loss 4.6306 LearningRate 0.0224 Epoch: 10 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:24,438-Speed 5597.81 samples/sec Loss 4.5872 LearningRate 0.0224 Epoch: 10 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:26,272-Speed 5585.12 samples/sec Loss 4.4770 LearningRate 0.0224 Epoch: 10 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:28,113-Speed 5565.97 samples/sec Loss 4.5423 LearningRate 0.0224 Epoch: 10 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:29,945-Speed 5589.79 samples/sec Loss 4.5691 LearningRate 0.0224 Epoch: 10 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:31,786-Speed 5563.20 samples/sec Loss 4.6101 LearningRate 0.0224 Epoch: 10 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:33,615-Speed 5602.44 samples/sec Loss 4.7559 LearningRate 0.0224 Epoch: 10 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:35,437-Speed 5621.06 samples/sec Loss 4.5123 LearningRate 0.0224 Epoch: 10 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:37,278-Speed 5563.10 samples/sec Loss 4.6039 LearningRate 0.0224 Epoch: 10 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:39,116-Speed 5572.14 samples/sec Loss 4.5863 LearningRate 0.0224 Epoch: 10 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:40,954-Speed 5574.77 samples/sec Loss 4.6892 LearningRate 0.0223 Epoch: 10 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:42,781-Speed 5607.45 samples/sec Loss 4.5716 LearningRate 0.0223 Epoch: 10 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:44,616-Speed 5580.34 samples/sec Loss 4.4574 LearningRate 0.0223 Epoch: 10 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:13:46,450-Speed 5586.83 samples/sec Loss 4.6288 LearningRate 0.0223 Epoch: 10 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:13:48,298-Speed 5542.05 samples/sec Loss 4.6075 LearningRate 0.0223 Epoch: 10 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:14:14,524-[lfw][60000]XNorm: 22.201556 Training: 2022-04-27 05:14:14,525-[lfw][60000]Accuracy-Flip: 0.99683+-0.00337 Training: 2022-04-27 05:14:14,525-[lfw][60000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:14:44,793-[cfp_fp][60000]XNorm: 19.690361 Training: 2022-04-27 05:14:44,794-[cfp_fp][60000]Accuracy-Flip: 0.95771+-0.00926 Training: 2022-04-27 05:14:44,794-[cfp_fp][60000]Accuracy-Highest: 0.95771 Training: 2022-04-27 05:15:10,942-[agedb_30][60000]XNorm: 21.847595 Training: 2022-04-27 05:15:10,943-[agedb_30][60000]Accuracy-Flip: 0.97533+-0.00891 Training: 2022-04-27 05:15:10,943-[agedb_30][60000]Accuracy-Highest: 0.97550 Training: 2022-04-27 05:15:12,774-Speed 121.22 samples/sec Loss 4.5299 LearningRate 0.0223 Epoch: 10 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:15:14,588-Speed 5647.89 samples/sec Loss 4.5984 LearningRate 0.0223 Epoch: 10 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:16,419-Speed 5594.05 samples/sec Loss 4.6005 LearningRate 0.0223 Epoch: 10 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:18,241-Speed 5619.73 samples/sec Loss 4.3832 LearningRate 0.0223 Epoch: 10 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:20,068-Speed 5608.55 samples/sec Loss 4.4697 LearningRate 0.0223 Epoch: 10 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:21,930-Speed 5499.20 samples/sec Loss 4.6012 LearningRate 0.0223 Epoch: 10 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:23,804-Speed 5466.53 samples/sec Loss 4.5342 LearningRate 0.0223 Epoch: 10 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:25,663-Speed 5511.64 samples/sec Loss 4.5494 LearningRate 0.0222 Epoch: 10 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:27,508-Speed 5551.41 samples/sec Loss 4.5144 LearningRate 0.0222 Epoch: 10 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:29,342-Speed 5585.85 samples/sec Loss 4.7174 LearningRate 0.0222 Epoch: 10 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:31,167-Speed 5613.60 samples/sec Loss 4.5153 LearningRate 0.0222 Epoch: 10 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:33,001-Speed 5585.69 samples/sec Loss 4.5513 LearningRate 0.0222 Epoch: 10 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:34,844-Speed 5557.58 samples/sec Loss 4.5722 LearningRate 0.0222 Epoch: 10 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:36,684-Speed 5565.56 samples/sec Loss 4.5878 LearningRate 0.0222 Epoch: 10 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:38,553-Speed 5481.90 samples/sec Loss 4.4690 LearningRate 0.0222 Epoch: 10 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:40,390-Speed 5575.72 samples/sec Loss 4.7503 LearningRate 0.0222 Epoch: 10 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:42,234-Speed 5555.51 samples/sec Loss 4.6014 LearningRate 0.0222 Epoch: 10 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:44,065-Speed 5595.25 samples/sec Loss 4.6692 LearningRate 0.0222 Epoch: 10 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:45,934-Speed 5480.40 samples/sec Loss 4.6928 LearningRate 0.0222 Epoch: 10 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:47,820-Speed 5430.99 samples/sec Loss 4.6306 LearningRate 0.0221 Epoch: 10 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:49,649-Speed 5599.57 samples/sec Loss 4.7116 LearningRate 0.0221 Epoch: 10 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:15:51,483-Speed 5586.35 samples/sec Loss 4.6761 LearningRate 0.0221 Epoch: 10 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:15:53,315-Speed 5590.23 samples/sec Loss 4.5627 LearningRate 0.0221 Epoch: 10 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:15:55,137-Speed 5622.27 samples/sec Loss 4.5742 LearningRate 0.0221 Epoch: 10 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:15:56,952-Speed 5643.24 samples/sec Loss 4.6161 LearningRate 0.0221 Epoch: 10 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:15:58,787-Speed 5583.78 samples/sec Loss 4.5302 LearningRate 0.0221 Epoch: 10 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:00,609-Speed 5621.46 samples/sec Loss 4.6081 LearningRate 0.0221 Epoch: 10 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:02,437-Speed 5604.71 samples/sec Loss 4.5245 LearningRate 0.0221 Epoch: 10 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:04,265-Speed 5601.62 samples/sec Loss 4.6777 LearningRate 0.0221 Epoch: 10 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:06,093-Speed 5605.28 samples/sec Loss 4.7464 LearningRate 0.0221 Epoch: 10 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:07,939-Speed 5546.82 samples/sec Loss 4.4824 LearningRate 0.0221 Epoch: 10 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:09,759-Speed 5630.38 samples/sec Loss 4.5370 LearningRate 0.0221 Epoch: 10 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:11,596-Speed 5574.70 samples/sec Loss 4.5903 LearningRate 0.0220 Epoch: 10 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:13,417-Speed 5625.81 samples/sec Loss 4.5111 LearningRate 0.0220 Epoch: 10 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:15,276-Speed 5511.93 samples/sec Loss 4.5896 LearningRate 0.0220 Epoch: 10 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:17,118-Speed 5558.85 samples/sec Loss 4.7387 LearningRate 0.0220 Epoch: 10 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:18,953-Speed 5582.51 samples/sec Loss 4.5176 LearningRate 0.0220 Epoch: 10 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:20,788-Speed 5582.26 samples/sec Loss 4.5487 LearningRate 0.0220 Epoch: 10 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:22,635-Speed 5544.42 samples/sec Loss 4.5906 LearningRate 0.0220 Epoch: 10 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:24,476-Speed 5567.60 samples/sec Loss 4.4980 LearningRate 0.0220 Epoch: 10 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:26,303-Speed 5603.98 samples/sec Loss 4.4375 LearningRate 0.0220 Epoch: 10 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:28,122-Speed 5631.92 samples/sec Loss 4.5843 LearningRate 0.0220 Epoch: 10 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:29,946-Speed 5616.89 samples/sec Loss 4.5214 LearningRate 0.0220 Epoch: 10 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:31,776-Speed 5596.55 samples/sec Loss 4.4254 LearningRate 0.0220 Epoch: 10 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:33,614-Speed 5572.48 samples/sec Loss 4.6859 LearningRate 0.0219 Epoch: 10 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:35,441-Speed 5609.02 samples/sec Loss 4.5283 LearningRate 0.0219 Epoch: 10 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:37,279-Speed 5572.39 samples/sec Loss 4.7627 LearningRate 0.0219 Epoch: 10 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:39,119-Speed 5567.13 samples/sec Loss 4.4856 LearningRate 0.0219 Epoch: 10 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:40,955-Speed 5578.48 samples/sec Loss 4.6692 LearningRate 0.0219 Epoch: 10 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:42,801-Speed 5550.18 samples/sec Loss 4.5302 LearningRate 0.0219 Epoch: 10 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:44,627-Speed 5608.76 samples/sec Loss 4.5050 LearningRate 0.0219 Epoch: 10 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:46,458-Speed 5595.89 samples/sec Loss 4.3980 LearningRate 0.0219 Epoch: 10 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:48,289-Speed 5594.47 samples/sec Loss 4.5596 LearningRate 0.0219 Epoch: 10 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:50,115-Speed 5607.79 samples/sec Loss 4.5836 LearningRate 0.0219 Epoch: 10 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:16:51,960-Speed 5553.57 samples/sec Loss 4.6485 LearningRate 0.0219 Epoch: 10 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:53,794-Speed 5585.32 samples/sec Loss 4.5156 LearningRate 0.0219 Epoch: 10 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:55,627-Speed 5588.25 samples/sec Loss 4.4909 LearningRate 0.0218 Epoch: 10 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:57,466-Speed 5570.70 samples/sec Loss 4.5674 LearningRate 0.0218 Epoch: 10 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:16:59,318-Speed 5531.91 samples/sec Loss 4.5690 LearningRate 0.0218 Epoch: 10 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:01,146-Speed 5602.19 samples/sec Loss 4.5895 LearningRate 0.0218 Epoch: 10 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:03,002-Speed 5518.77 samples/sec Loss 4.5482 LearningRate 0.0218 Epoch: 10 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:04,834-Speed 5592.60 samples/sec Loss 4.4785 LearningRate 0.0218 Epoch: 10 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:06,658-Speed 5615.09 samples/sec Loss 4.5675 LearningRate 0.0218 Epoch: 10 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:08,485-Speed 5605.87 samples/sec Loss 4.6049 LearningRate 0.0218 Epoch: 10 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:10,331-Speed 5550.65 samples/sec Loss 4.6467 LearningRate 0.0218 Epoch: 10 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:12,186-Speed 5520.02 samples/sec Loss 4.5828 LearningRate 0.0218 Epoch: 10 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:14,019-Speed 5587.41 samples/sec Loss 4.5382 LearningRate 0.0218 Epoch: 10 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:15,840-Speed 5626.63 samples/sec Loss 4.6209 LearningRate 0.0218 Epoch: 10 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:17,666-Speed 5610.40 samples/sec Loss 4.5428 LearningRate 0.0217 Epoch: 10 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:19,498-Speed 5590.91 samples/sec Loss 4.4437 LearningRate 0.0217 Epoch: 10 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:21,389-Speed 5417.45 samples/sec Loss 4.6144 LearningRate 0.0217 Epoch: 10 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:23,211-Speed 5623.00 samples/sec Loss 4.5562 LearningRate 0.0217 Epoch: 10 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:25,041-Speed 5595.63 samples/sec Loss 4.4992 LearningRate 0.0217 Epoch: 10 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:26,858-Speed 5638.02 samples/sec Loss 4.4323 LearningRate 0.0217 Epoch: 10 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:28,701-Speed 5559.01 samples/sec Loss 4.6138 LearningRate 0.0217 Epoch: 10 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:30,538-Speed 5576.35 samples/sec Loss 4.5672 LearningRate 0.0217 Epoch: 10 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:32,364-Speed 5608.90 samples/sec Loss 4.5376 LearningRate 0.0217 Epoch: 10 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:34,210-Speed 5549.23 samples/sec Loss 4.6730 LearningRate 0.0217 Epoch: 10 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:36,030-Speed 5626.80 samples/sec Loss 4.5159 LearningRate 0.0217 Epoch: 10 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:37,863-Speed 5588.12 samples/sec Loss 4.6577 LearningRate 0.0217 Epoch: 10 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:39,699-Speed 5582.34 samples/sec Loss 4.4460 LearningRate 0.0216 Epoch: 10 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:41,541-Speed 5559.25 samples/sec Loss 4.5243 LearningRate 0.0216 Epoch: 10 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:17:43,378-Speed 5577.57 samples/sec Loss 4.5109 LearningRate 0.0216 Epoch: 10 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:45,223-Speed 5552.32 samples/sec Loss 4.5655 LearningRate 0.0216 Epoch: 10 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:47,070-Speed 5544.65 samples/sec Loss 4.5478 LearningRate 0.0216 Epoch: 10 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:48,902-Speed 5592.71 samples/sec Loss 4.4470 LearningRate 0.0216 Epoch: 10 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:50,730-Speed 5602.11 samples/sec Loss 4.5107 LearningRate 0.0216 Epoch: 10 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:52,546-Speed 5641.89 samples/sec Loss 4.5497 LearningRate 0.0216 Epoch: 10 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:54,367-Speed 5624.11 samples/sec Loss 4.5308 LearningRate 0.0216 Epoch: 10 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:56,188-Speed 5625.60 samples/sec Loss 4.6371 LearningRate 0.0216 Epoch: 10 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:58,020-Speed 5590.83 samples/sec Loss 4.6113 LearningRate 0.0216 Epoch: 10 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:17:59,866-Speed 5550.49 samples/sec Loss 4.4224 LearningRate 0.0216 Epoch: 10 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:01,693-Speed 5606.52 samples/sec Loss 4.5135 LearningRate 0.0215 Epoch: 10 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:18:03,524-Speed 5594.43 samples/sec Loss 4.4969 LearningRate 0.0215 Epoch: 10 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:18:05,336-Speed 5652.74 samples/sec Loss 4.6317 LearningRate 0.0215 Epoch: 10 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:07,172-Speed 5581.22 samples/sec Loss 4.5190 LearningRate 0.0215 Epoch: 10 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:09,017-Speed 5549.22 samples/sec Loss 4.6395 LearningRate 0.0215 Epoch: 10 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:10,858-Speed 5564.63 samples/sec Loss 4.4404 LearningRate 0.0215 Epoch: 10 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:12,694-Speed 5578.52 samples/sec Loss 4.6320 LearningRate 0.0215 Epoch: 10 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:14,526-Speed 5591.18 samples/sec Loss 4.6312 LearningRate 0.0215 Epoch: 10 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:16,363-Speed 5575.96 samples/sec Loss 4.5920 LearningRate 0.0215 Epoch: 10 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:18,218-Speed 5522.22 samples/sec Loss 4.4994 LearningRate 0.0215 Epoch: 10 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:20,088-Speed 5478.45 samples/sec Loss 4.5266 LearningRate 0.0215 Epoch: 10 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:21,924-Speed 5580.18 samples/sec Loss 4.6379 LearningRate 0.0215 Epoch: 10 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:23,763-Speed 5569.45 samples/sec Loss 4.7722 LearningRate 0.0215 Epoch: 10 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:25,588-Speed 5613.52 samples/sec Loss 4.4233 LearningRate 0.0214 Epoch: 10 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:27,414-Speed 5609.69 samples/sec Loss 4.4399 LearningRate 0.0214 Epoch: 10 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:29,242-Speed 5604.68 samples/sec Loss 4.6775 LearningRate 0.0214 Epoch: 10 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:31,073-Speed 5593.20 samples/sec Loss 4.6106 LearningRate 0.0214 Epoch: 10 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:32,939-Speed 5489.25 samples/sec Loss 4.4871 LearningRate 0.0214 Epoch: 10 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:34,761-Speed 5622.94 samples/sec Loss 4.4860 LearningRate 0.0214 Epoch: 10 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:36,612-Speed 5534.70 samples/sec Loss 4.5680 LearningRate 0.0214 Epoch: 10 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:38,534-Speed 5329.99 samples/sec Loss 4.5989 LearningRate 0.0214 Epoch: 10 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:40,398-Speed 5493.79 samples/sec Loss 4.5646 LearningRate 0.0214 Epoch: 10 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:42,227-Speed 5602.51 samples/sec Loss 4.5164 LearningRate 0.0214 Epoch: 10 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:18:44,045-Speed 5633.83 samples/sec Loss 4.6007 LearningRate 0.0214 Epoch: 10 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:45,917-Speed 5471.63 samples/sec Loss 4.4740 LearningRate 0.0214 Epoch: 10 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:47,738-Speed 5624.31 samples/sec Loss 4.4951 LearningRate 0.0213 Epoch: 10 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:49,573-Speed 5584.75 samples/sec Loss 4.6061 LearningRate 0.0213 Epoch: 10 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:51,410-Speed 5575.70 samples/sec Loss 4.3930 LearningRate 0.0213 Epoch: 10 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:53,280-Speed 5476.41 samples/sec Loss 4.5089 LearningRate 0.0213 Epoch: 10 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:55,121-Speed 5563.14 samples/sec Loss 4.4162 LearningRate 0.0213 Epoch: 10 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:56,944-Speed 5619.00 samples/sec Loss 4.5749 LearningRate 0.0213 Epoch: 10 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:18:58,786-Speed 5562.61 samples/sec Loss 4.6292 LearningRate 0.0213 Epoch: 10 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:00,619-Speed 5587.87 samples/sec Loss 4.4369 LearningRate 0.0213 Epoch: 10 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:02,442-Speed 5619.29 samples/sec Loss 4.5500 LearningRate 0.0213 Epoch: 10 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:04,272-Speed 5596.67 samples/sec Loss 4.6176 LearningRate 0.0213 Epoch: 10 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:06,101-Speed 5601.39 samples/sec Loss 4.4485 LearningRate 0.0213 Epoch: 10 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:07,945-Speed 5554.92 samples/sec Loss 4.6615 LearningRate 0.0213 Epoch: 10 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:09,789-Speed 5555.30 samples/sec Loss 4.6612 LearningRate 0.0212 Epoch: 10 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:11,631-Speed 5559.62 samples/sec Loss 4.3878 LearningRate 0.0212 Epoch: 10 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:13,457-Speed 5610.99 samples/sec Loss 4.4647 LearningRate 0.0212 Epoch: 10 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:15,285-Speed 5603.89 samples/sec Loss 4.5537 LearningRate 0.0212 Epoch: 10 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:17,125-Speed 5567.07 samples/sec Loss 4.4552 LearningRate 0.0212 Epoch: 10 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:18,946-Speed 5625.76 samples/sec Loss 4.4991 LearningRate 0.0212 Epoch: 10 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:20,779-Speed 5587.66 samples/sec Loss 4.5220 LearningRate 0.0212 Epoch: 10 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:19:22,594-Speed 5644.38 samples/sec Loss 4.4572 LearningRate 0.0212 Epoch: 10 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:24,430-Speed 5579.03 samples/sec Loss 4.5925 LearningRate 0.0212 Epoch: 10 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:26,281-Speed 5532.47 samples/sec Loss 4.5438 LearningRate 0.0212 Epoch: 10 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:28,127-Speed 5550.27 samples/sec Loss 4.4212 LearningRate 0.0212 Epoch: 10 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:29,955-Speed 5603.70 samples/sec Loss 4.5046 LearningRate 0.0212 Epoch: 10 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:31,782-Speed 5604.38 samples/sec Loss 4.3194 LearningRate 0.0212 Epoch: 10 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:33,621-Speed 5573.14 samples/sec Loss 4.5112 LearningRate 0.0211 Epoch: 10 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:35,448-Speed 5604.37 samples/sec Loss 4.5409 LearningRate 0.0211 Epoch: 10 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:37,299-Speed 5536.15 samples/sec Loss 4.5471 LearningRate 0.0211 Epoch: 10 Global Step: 61450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:39,126-Speed 5604.73 samples/sec Loss 4.6534 LearningRate 0.0211 Epoch: 10 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:40,958-Speed 5593.59 samples/sec Loss 4.5761 LearningRate 0.0211 Epoch: 10 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:42,799-Speed 5562.35 samples/sec Loss 4.4923 LearningRate 0.0211 Epoch: 10 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:44,638-Speed 5569.49 samples/sec Loss 4.5546 LearningRate 0.0211 Epoch: 10 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:46,484-Speed 5550.87 samples/sec Loss 4.4841 LearningRate 0.0211 Epoch: 10 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:48,324-Speed 5564.28 samples/sec Loss 4.5714 LearningRate 0.0211 Epoch: 10 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:50,152-Speed 5604.50 samples/sec Loss 4.3936 LearningRate 0.0211 Epoch: 10 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:52,011-Speed 5512.12 samples/sec Loss 4.4746 LearningRate 0.0211 Epoch: 10 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:53,850-Speed 5568.58 samples/sec Loss 4.5204 LearningRate 0.0211 Epoch: 10 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:55,694-Speed 5557.01 samples/sec Loss 4.5247 LearningRate 0.0210 Epoch: 10 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:57,534-Speed 5566.12 samples/sec Loss 4.4211 LearningRate 0.0210 Epoch: 10 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:19:59,390-Speed 5518.29 samples/sec Loss 4.5284 LearningRate 0.0210 Epoch: 10 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:20:01,246-Speed 5519.32 samples/sec Loss 4.5767 LearningRate 0.0210 Epoch: 10 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:20:03,079-Speed 5589.22 samples/sec Loss 4.4953 LearningRate 0.0210 Epoch: 10 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:04,904-Speed 5610.86 samples/sec Loss 4.5498 LearningRate 0.0210 Epoch: 10 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:06,726-Speed 5624.82 samples/sec Loss 4.4856 LearningRate 0.0210 Epoch: 10 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:08,553-Speed 5604.22 samples/sec Loss 4.4942 LearningRate 0.0210 Epoch: 10 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:10,395-Speed 5562.19 samples/sec Loss 4.3982 LearningRate 0.0210 Epoch: 10 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:12,240-Speed 5552.90 samples/sec Loss 4.5606 LearningRate 0.0210 Epoch: 10 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:14,087-Speed 5543.83 samples/sec Loss 4.5364 LearningRate 0.0210 Epoch: 10 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:15,911-Speed 5616.28 samples/sec Loss 4.3451 LearningRate 0.0210 Epoch: 10 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:17,781-Speed 5479.55 samples/sec Loss 4.4143 LearningRate 0.0209 Epoch: 10 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:19,606-Speed 5612.89 samples/sec Loss 4.5194 LearningRate 0.0209 Epoch: 10 Global Step: 61680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:21,449-Speed 5558.40 samples/sec Loss 4.4823 LearningRate 0.0209 Epoch: 10 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:20:23,279-Speed 5598.08 samples/sec Loss 4.4792 LearningRate 0.0209 Epoch: 10 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:20:25,105-Speed 5609.56 samples/sec Loss 4.5184 LearningRate 0.0209 Epoch: 10 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:26,942-Speed 5575.41 samples/sec Loss 4.4518 LearningRate 0.0209 Epoch: 10 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:28,768-Speed 5608.23 samples/sec Loss 4.4312 LearningRate 0.0209 Epoch: 10 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:30,594-Speed 5610.60 samples/sec Loss 4.4439 LearningRate 0.0209 Epoch: 10 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:32,432-Speed 5571.82 samples/sec Loss 4.5203 LearningRate 0.0209 Epoch: 10 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:34,298-Speed 5490.28 samples/sec Loss 4.5699 LearningRate 0.0209 Epoch: 10 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:36,138-Speed 5566.10 samples/sec Loss 4.4669 LearningRate 0.0209 Epoch: 10 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:37,970-Speed 5593.84 samples/sec Loss 4.5561 LearningRate 0.0209 Epoch: 10 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:39,802-Speed 5588.63 samples/sec Loss 4.3281 LearningRate 0.0209 Epoch: 10 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:41,650-Speed 5546.34 samples/sec Loss 4.4782 LearningRate 0.0208 Epoch: 10 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:43,474-Speed 5613.36 samples/sec Loss 4.5588 LearningRate 0.0208 Epoch: 10 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:20:45,294-Speed 5629.65 samples/sec Loss 4.4248 LearningRate 0.0208 Epoch: 10 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:20:47,116-Speed 5623.13 samples/sec Loss 4.3902 LearningRate 0.0208 Epoch: 10 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:48,959-Speed 5557.53 samples/sec Loss 4.5064 LearningRate 0.0208 Epoch: 10 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:50,793-Speed 5584.19 samples/sec Loss 4.5249 LearningRate 0.0208 Epoch: 10 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:52,625-Speed 5589.84 samples/sec Loss 4.5357 LearningRate 0.0208 Epoch: 10 Global Step: 61860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:54,454-Speed 5602.40 samples/sec Loss 4.5010 LearningRate 0.0208 Epoch: 10 Global Step: 61870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:56,299-Speed 5551.97 samples/sec Loss 4.4034 LearningRate 0.0208 Epoch: 10 Global Step: 61880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:58,143-Speed 5553.38 samples/sec Loss 4.6025 LearningRate 0.0208 Epoch: 10 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:20:59,986-Speed 5558.05 samples/sec Loss 4.3961 LearningRate 0.0208 Epoch: 10 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:21:01,813-Speed 5607.66 samples/sec Loss 4.4602 LearningRate 0.0208 Epoch: 10 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:21:03,644-Speed 5596.57 samples/sec Loss 4.4605 LearningRate 0.0207 Epoch: 10 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:21:05,466-Speed 5620.15 samples/sec Loss 4.5576 LearningRate 0.0207 Epoch: 10 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:07,294-Speed 5605.06 samples/sec Loss 4.4110 LearningRate 0.0207 Epoch: 10 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:09,119-Speed 5611.12 samples/sec Loss 4.5623 LearningRate 0.0207 Epoch: 10 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:10,945-Speed 5612.51 samples/sec Loss 4.5873 LearningRate 0.0207 Epoch: 10 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:12,782-Speed 5575.28 samples/sec Loss 4.4703 LearningRate 0.0207 Epoch: 10 Global Step: 61970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:14,613-Speed 5593.01 samples/sec Loss 4.4496 LearningRate 0.0207 Epoch: 10 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:16,435-Speed 5623.56 samples/sec Loss 4.5342 LearningRate 0.0207 Epoch: 10 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:18,290-Speed 5519.87 samples/sec Loss 4.3617 LearningRate 0.0207 Epoch: 10 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:21:44,520-[lfw][62000]XNorm: 22.538425 Training: 2022-04-27 05:21:44,521-[lfw][62000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-27 05:21:44,521-[lfw][62000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:22:14,900-[cfp_fp][62000]XNorm: 19.937683 Training: 2022-04-27 05:22:14,901-[cfp_fp][62000]Accuracy-Flip: 0.95514+-0.00853 Training: 2022-04-27 05:22:14,901-[cfp_fp][62000]Accuracy-Highest: 0.95771 Training: 2022-04-27 05:22:41,080-[agedb_30][62000]XNorm: 22.216626 Training: 2022-04-27 05:22:41,080-[agedb_30][62000]Accuracy-Flip: 0.97667+-0.00813 Training: 2022-04-27 05:22:41,081-[agedb_30][62000]Accuracy-Highest: 0.97667 Training: 2022-04-27 05:22:42,936-Speed 120.98 samples/sec Loss 4.4760 LearningRate 0.0207 Epoch: 10 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:44,767-Speed 5592.94 samples/sec Loss 4.5009 LearningRate 0.0207 Epoch: 10 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:46,607-Speed 5567.19 samples/sec Loss 4.4485 LearningRate 0.0207 Epoch: 10 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:22:48,431-Speed 5615.66 samples/sec Loss 4.4586 LearningRate 0.0207 Epoch: 10 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:50,264-Speed 5590.84 samples/sec Loss 4.4742 LearningRate 0.0206 Epoch: 10 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:52,107-Speed 5557.49 samples/sec Loss 4.5137 LearningRate 0.0206 Epoch: 10 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:53,928-Speed 5624.68 samples/sec Loss 4.5501 LearningRate 0.0206 Epoch: 10 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:55,758-Speed 5595.64 samples/sec Loss 4.3908 LearningRate 0.0206 Epoch: 10 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:57,595-Speed 5578.48 samples/sec Loss 4.4322 LearningRate 0.0206 Epoch: 10 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:22:59,431-Speed 5576.93 samples/sec Loss 4.5530 LearningRate 0.0206 Epoch: 10 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:01,293-Speed 5500.69 samples/sec Loss 4.4420 LearningRate 0.0206 Epoch: 10 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:03,111-Speed 5636.67 samples/sec Loss 4.4723 LearningRate 0.0206 Epoch: 10 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:04,947-Speed 5579.96 samples/sec Loss 4.5015 LearningRate 0.0206 Epoch: 10 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:06,765-Speed 5634.47 samples/sec Loss 4.4759 LearningRate 0.0206 Epoch: 10 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:23:08,584-Speed 5630.87 samples/sec Loss 4.5513 LearningRate 0.0206 Epoch: 10 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:10,423-Speed 5570.14 samples/sec Loss 4.4076 LearningRate 0.0206 Epoch: 10 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:12,261-Speed 5573.84 samples/sec Loss 4.5388 LearningRate 0.0205 Epoch: 10 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:14,141-Speed 5448.77 samples/sec Loss 4.3649 LearningRate 0.0205 Epoch: 10 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:15,975-Speed 5583.80 samples/sec Loss 4.4062 LearningRate 0.0205 Epoch: 10 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:17,845-Speed 5477.57 samples/sec Loss 4.4977 LearningRate 0.0205 Epoch: 10 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:19,678-Speed 5588.71 samples/sec Loss 4.4546 LearningRate 0.0205 Epoch: 10 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:21,505-Speed 5607.98 samples/sec Loss 4.5295 LearningRate 0.0205 Epoch: 10 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:23,360-Speed 5521.46 samples/sec Loss 4.4217 LearningRate 0.0205 Epoch: 10 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:25,233-Speed 5468.25 samples/sec Loss 4.5305 LearningRate 0.0205 Epoch: 10 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:27,061-Speed 5604.57 samples/sec Loss 4.4426 LearningRate 0.0205 Epoch: 10 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:23:28,906-Speed 5553.48 samples/sec Loss 4.4536 LearningRate 0.0205 Epoch: 10 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:30,778-Speed 5471.62 samples/sec Loss 4.4651 LearningRate 0.0205 Epoch: 10 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:32,609-Speed 5592.73 samples/sec Loss 4.3908 LearningRate 0.0205 Epoch: 10 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:34,438-Speed 5601.06 samples/sec Loss 4.4036 LearningRate 0.0205 Epoch: 10 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:36,271-Speed 5588.44 samples/sec Loss 4.5897 LearningRate 0.0204 Epoch: 10 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:38,096-Speed 5613.00 samples/sec Loss 4.5094 LearningRate 0.0204 Epoch: 10 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:39,942-Speed 5548.60 samples/sec Loss 4.3928 LearningRate 0.0204 Epoch: 10 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:41,776-Speed 5586.28 samples/sec Loss 4.4201 LearningRate 0.0204 Epoch: 10 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:43,603-Speed 5604.92 samples/sec Loss 4.6243 LearningRate 0.0204 Epoch: 10 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:45,436-Speed 5588.96 samples/sec Loss 4.4337 LearningRate 0.0204 Epoch: 10 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:47,284-Speed 5541.54 samples/sec Loss 4.4221 LearningRate 0.0204 Epoch: 10 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:23:49,116-Speed 5592.93 samples/sec Loss 4.4901 LearningRate 0.0204 Epoch: 10 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:50,942-Speed 5609.22 samples/sec Loss 4.4182 LearningRate 0.0204 Epoch: 10 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:52,768-Speed 5610.07 samples/sec Loss 4.3786 LearningRate 0.0204 Epoch: 10 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:54,605-Speed 5578.02 samples/sec Loss 4.4529 LearningRate 0.0204 Epoch: 10 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:56,440-Speed 5580.16 samples/sec Loss 4.4241 LearningRate 0.0204 Epoch: 10 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:23:58,261-Speed 5625.51 samples/sec Loss 4.6358 LearningRate 0.0203 Epoch: 10 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:00,097-Speed 5578.59 samples/sec Loss 4.4164 LearningRate 0.0203 Epoch: 10 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:01,908-Speed 5658.22 samples/sec Loss 4.4454 LearningRate 0.0203 Epoch: 10 Global Step: 62440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:03,751-Speed 5558.62 samples/sec Loss 4.3246 LearningRate 0.0203 Epoch: 10 Global Step: 62450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:05,573-Speed 5619.41 samples/sec Loss 4.4838 LearningRate 0.0203 Epoch: 10 Global Step: 62460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:07,399-Speed 5612.10 samples/sec Loss 4.3285 LearningRate 0.0203 Epoch: 10 Global Step: 62470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:09,233-Speed 5584.15 samples/sec Loss 4.5545 LearningRate 0.0203 Epoch: 10 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:11,054-Speed 5625.84 samples/sec Loss 4.4178 LearningRate 0.0203 Epoch: 10 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:12,883-Speed 5600.37 samples/sec Loss 4.5136 LearningRate 0.0203 Epoch: 10 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:14,699-Speed 5643.30 samples/sec Loss 4.4277 LearningRate 0.0203 Epoch: 10 Global Step: 62510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:16,537-Speed 5572.10 samples/sec Loss 4.4276 LearningRate 0.0203 Epoch: 10 Global Step: 62520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:18,372-Speed 5580.76 samples/sec Loss 4.4983 LearningRate 0.0203 Epoch: 10 Global Step: 62530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:20,267-Speed 5406.64 samples/sec Loss 4.2919 LearningRate 0.0203 Epoch: 10 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:33,426-Speed 778.23 samples/sec Loss 4.2073 LearningRate 0.0202 Epoch: 11 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:35,281-Speed 5521.24 samples/sec Loss 3.8368 LearningRate 0.0202 Epoch: 11 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:37,111-Speed 5597.93 samples/sec Loss 3.7461 LearningRate 0.0202 Epoch: 11 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:38,948-Speed 5598.92 samples/sec Loss 3.7399 LearningRate 0.0202 Epoch: 11 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:40,781-Speed 5587.03 samples/sec Loss 3.7553 LearningRate 0.0202 Epoch: 11 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:42,627-Speed 5550.78 samples/sec Loss 3.7314 LearningRate 0.0202 Epoch: 11 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:44,456-Speed 5599.34 samples/sec Loss 3.8382 LearningRate 0.0202 Epoch: 11 Global Step: 62610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:46,317-Speed 5503.67 samples/sec Loss 3.8189 LearningRate 0.0202 Epoch: 11 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:48,166-Speed 5542.05 samples/sec Loss 3.8280 LearningRate 0.0202 Epoch: 11 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:50,001-Speed 5583.14 samples/sec Loss 3.8739 LearningRate 0.0202 Epoch: 11 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:24:51,829-Speed 5601.32 samples/sec Loss 3.9360 LearningRate 0.0202 Epoch: 11 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:53,655-Speed 5609.82 samples/sec Loss 3.8099 LearningRate 0.0202 Epoch: 11 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:55,518-Speed 5499.03 samples/sec Loss 3.8299 LearningRate 0.0202 Epoch: 11 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:24:57,335-Speed 5637.84 samples/sec Loss 3.8328 LearningRate 0.0201 Epoch: 11 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:24:59,159-Speed 5614.46 samples/sec Loss 3.9533 LearningRate 0.0201 Epoch: 11 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:00,981-Speed 5622.52 samples/sec Loss 3.9334 LearningRate 0.0201 Epoch: 11 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:02,822-Speed 5564.72 samples/sec Loss 3.8829 LearningRate 0.0201 Epoch: 11 Global Step: 62710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:04,645-Speed 5619.97 samples/sec Loss 3.8800 LearningRate 0.0201 Epoch: 11 Global Step: 62720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:06,474-Speed 5598.05 samples/sec Loss 3.7700 LearningRate 0.0201 Epoch: 11 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:08,331-Speed 5516.04 samples/sec Loss 3.8563 LearningRate 0.0201 Epoch: 11 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:10,164-Speed 5587.77 samples/sec Loss 3.8431 LearningRate 0.0201 Epoch: 11 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:12,017-Speed 5529.74 samples/sec Loss 3.9162 LearningRate 0.0201 Epoch: 11 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:13,854-Speed 5577.96 samples/sec Loss 3.9427 LearningRate 0.0201 Epoch: 11 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:25:15,694-Speed 5567.58 samples/sec Loss 3.9264 LearningRate 0.0201 Epoch: 11 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:17,512-Speed 5633.20 samples/sec Loss 3.8574 LearningRate 0.0201 Epoch: 11 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:19,340-Speed 5603.78 samples/sec Loss 3.8921 LearningRate 0.0200 Epoch: 11 Global Step: 62800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:21,180-Speed 5565.03 samples/sec Loss 3.9116 LearningRate 0.0200 Epoch: 11 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:22,994-Speed 5648.31 samples/sec Loss 3.8053 LearningRate 0.0200 Epoch: 11 Global Step: 62820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:24,831-Speed 5576.77 samples/sec Loss 3.8929 LearningRate 0.0200 Epoch: 11 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:26,662-Speed 5593.69 samples/sec Loss 4.0412 LearningRate 0.0200 Epoch: 11 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:28,505-Speed 5557.66 samples/sec Loss 3.9880 LearningRate 0.0200 Epoch: 11 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:30,336-Speed 5595.17 samples/sec Loss 4.0296 LearningRate 0.0200 Epoch: 11 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:32,200-Speed 5494.52 samples/sec Loss 3.9067 LearningRate 0.0200 Epoch: 11 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:34,058-Speed 5514.82 samples/sec Loss 3.9104 LearningRate 0.0200 Epoch: 11 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:35,903-Speed 5551.06 samples/sec Loss 3.8989 LearningRate 0.0200 Epoch: 11 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:37,841-Speed 5286.44 samples/sec Loss 3.9068 LearningRate 0.0200 Epoch: 11 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:39,764-Speed 5326.85 samples/sec Loss 3.8726 LearningRate 0.0200 Epoch: 11 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:41,699-Speed 5293.88 samples/sec Loss 3.9421 LearningRate 0.0200 Epoch: 11 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:43,639-Speed 5279.19 samples/sec Loss 3.9665 LearningRate 0.0199 Epoch: 11 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:45,607-Speed 5205.65 samples/sec Loss 3.9894 LearningRate 0.0199 Epoch: 11 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:25:47,614-Speed 5103.54 samples/sec Loss 3.7872 LearningRate 0.0199 Epoch: 11 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:49,449-Speed 5581.65 samples/sec Loss 4.0219 LearningRate 0.0199 Epoch: 11 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:51,295-Speed 5549.13 samples/sec Loss 3.9227 LearningRate 0.0199 Epoch: 11 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:53,146-Speed 5534.84 samples/sec Loss 4.0288 LearningRate 0.0199 Epoch: 11 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:54,976-Speed 5597.76 samples/sec Loss 3.9250 LearningRate 0.0199 Epoch: 11 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:56,810-Speed 5586.64 samples/sec Loss 3.8688 LearningRate 0.0199 Epoch: 11 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:25:58,625-Speed 5641.51 samples/sec Loss 3.8895 LearningRate 0.0199 Epoch: 11 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:00,442-Speed 5637.26 samples/sec Loss 3.9044 LearningRate 0.0199 Epoch: 11 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:02,285-Speed 5558.69 samples/sec Loss 4.0403 LearningRate 0.0199 Epoch: 11 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:04,119-Speed 5584.23 samples/sec Loss 4.0510 LearningRate 0.0199 Epoch: 11 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:05,938-Speed 5633.62 samples/sec Loss 3.9087 LearningRate 0.0199 Epoch: 11 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:07,753-Speed 5643.62 samples/sec Loss 4.0361 LearningRate 0.0198 Epoch: 11 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:09,566-Speed 5647.55 samples/sec Loss 4.0251 LearningRate 0.0198 Epoch: 11 Global Step: 63070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:11,437-Speed 5475.38 samples/sec Loss 4.0570 LearningRate 0.0198 Epoch: 11 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:13,266-Speed 5600.48 samples/sec Loss 4.0507 LearningRate 0.0198 Epoch: 11 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:15,119-Speed 5527.70 samples/sec Loss 4.0073 LearningRate 0.0198 Epoch: 11 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:16,936-Speed 5638.65 samples/sec Loss 3.9061 LearningRate 0.0198 Epoch: 11 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:18,807-Speed 5474.29 samples/sec Loss 4.1130 LearningRate 0.0198 Epoch: 11 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:20,633-Speed 5609.36 samples/sec Loss 3.9275 LearningRate 0.0198 Epoch: 11 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:22,464-Speed 5595.25 samples/sec Loss 4.0088 LearningRate 0.0198 Epoch: 11 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:24,285-Speed 5626.90 samples/sec Loss 4.1730 LearningRate 0.0198 Epoch: 11 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:26,105-Speed 5626.29 samples/sec Loss 3.8849 LearningRate 0.0198 Epoch: 11 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:27,919-Speed 5646.45 samples/sec Loss 4.0218 LearningRate 0.0198 Epoch: 11 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:29,747-Speed 5604.90 samples/sec Loss 3.9826 LearningRate 0.0198 Epoch: 11 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:31,559-Speed 5652.14 samples/sec Loss 3.8888 LearningRate 0.0197 Epoch: 11 Global Step: 63190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:33,370-Speed 5655.23 samples/sec Loss 3.9418 LearningRate 0.0197 Epoch: 11 Global Step: 63200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:35,188-Speed 5636.87 samples/sec Loss 4.1116 LearningRate 0.0197 Epoch: 11 Global Step: 63210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:26:37,014-Speed 5610.54 samples/sec Loss 4.0272 LearningRate 0.0197 Epoch: 11 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:38,858-Speed 5554.44 samples/sec Loss 4.1030 LearningRate 0.0197 Epoch: 11 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:40,680-Speed 5621.23 samples/sec Loss 4.0592 LearningRate 0.0197 Epoch: 11 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:42,521-Speed 5564.90 samples/sec Loss 4.0461 LearningRate 0.0197 Epoch: 11 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:44,343-Speed 5622.46 samples/sec Loss 4.0424 LearningRate 0.0197 Epoch: 11 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:46,158-Speed 5643.86 samples/sec Loss 4.0179 LearningRate 0.0197 Epoch: 11 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:47,970-Speed 5653.97 samples/sec Loss 4.0952 LearningRate 0.0197 Epoch: 11 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:49,788-Speed 5633.28 samples/sec Loss 4.0781 LearningRate 0.0197 Epoch: 11 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:51,597-Speed 5661.48 samples/sec Loss 4.1533 LearningRate 0.0197 Epoch: 11 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:53,405-Speed 5665.74 samples/sec Loss 4.1865 LearningRate 0.0196 Epoch: 11 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:55,226-Speed 5627.51 samples/sec Loss 3.8831 LearningRate 0.0196 Epoch: 11 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:57,040-Speed 5644.85 samples/sec Loss 4.1552 LearningRate 0.0196 Epoch: 11 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:26:58,867-Speed 5608.57 samples/sec Loss 4.1336 LearningRate 0.0196 Epoch: 11 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:00,701-Speed 5585.15 samples/sec Loss 3.9843 LearningRate 0.0196 Epoch: 11 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:02,543-Speed 5559.51 samples/sec Loss 3.9905 LearningRate 0.0196 Epoch: 11 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:04,350-Speed 5671.58 samples/sec Loss 4.0433 LearningRate 0.0196 Epoch: 11 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:06,157-Speed 5666.56 samples/sec Loss 3.9115 LearningRate 0.0196 Epoch: 11 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:07,990-Speed 5589.32 samples/sec Loss 3.9862 LearningRate 0.0196 Epoch: 11 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:09,855-Speed 5490.59 samples/sec Loss 3.9803 LearningRate 0.0196 Epoch: 11 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:11,676-Speed 5625.56 samples/sec Loss 3.9852 LearningRate 0.0196 Epoch: 11 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:13,493-Speed 5639.13 samples/sec Loss 4.0906 LearningRate 0.0196 Epoch: 11 Global Step: 63420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:27:15,298-Speed 5673.10 samples/sec Loss 3.9095 LearningRate 0.0196 Epoch: 11 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:17,115-Speed 5637.32 samples/sec Loss 4.0761 LearningRate 0.0195 Epoch: 11 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:18,939-Speed 5616.14 samples/sec Loss 4.1187 LearningRate 0.0195 Epoch: 11 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:20,756-Speed 5639.14 samples/sec Loss 4.0010 LearningRate 0.0195 Epoch: 11 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:22,568-Speed 5652.05 samples/sec Loss 3.9719 LearningRate 0.0195 Epoch: 11 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:24,406-Speed 5575.85 samples/sec Loss 4.0177 LearningRate 0.0195 Epoch: 11 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:26,213-Speed 5667.56 samples/sec Loss 3.9930 LearningRate 0.0195 Epoch: 11 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:28,055-Speed 5560.04 samples/sec Loss 4.1699 LearningRate 0.0195 Epoch: 11 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:29,861-Speed 5671.56 samples/sec Loss 4.0923 LearningRate 0.0195 Epoch: 11 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:31,669-Speed 5667.47 samples/sec Loss 4.1405 LearningRate 0.0195 Epoch: 11 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:33,477-Speed 5664.41 samples/sec Loss 4.2071 LearningRate 0.0195 Epoch: 11 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:35,284-Speed 5670.24 samples/sec Loss 4.1218 LearningRate 0.0195 Epoch: 11 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:37,090-Speed 5668.93 samples/sec Loss 4.1105 LearningRate 0.0195 Epoch: 11 Global Step: 63550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:38,900-Speed 5659.62 samples/sec Loss 4.0404 LearningRate 0.0195 Epoch: 11 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:40,705-Speed 5674.43 samples/sec Loss 4.1002 LearningRate 0.0194 Epoch: 11 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:42,571-Speed 5489.74 samples/sec Loss 3.9883 LearningRate 0.0194 Epoch: 11 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:44,437-Speed 5489.87 samples/sec Loss 4.0967 LearningRate 0.0194 Epoch: 11 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:46,256-Speed 5633.60 samples/sec Loss 4.1694 LearningRate 0.0194 Epoch: 11 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:48,093-Speed 5574.23 samples/sec Loss 4.0558 LearningRate 0.0194 Epoch: 11 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:27:49,939-Speed 5548.46 samples/sec Loss 4.1712 LearningRate 0.0194 Epoch: 11 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:51,767-Speed 5604.32 samples/sec Loss 4.0259 LearningRate 0.0194 Epoch: 11 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:53,595-Speed 5602.92 samples/sec Loss 4.1087 LearningRate 0.0194 Epoch: 11 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:55,427-Speed 5594.36 samples/sec Loss 4.0275 LearningRate 0.0194 Epoch: 11 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:57,253-Speed 5607.55 samples/sec Loss 4.0594 LearningRate 0.0194 Epoch: 11 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:27:59,070-Speed 5637.81 samples/sec Loss 4.1048 LearningRate 0.0194 Epoch: 11 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:00,906-Speed 5577.61 samples/sec Loss 4.1519 LearningRate 0.0194 Epoch: 11 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:02,724-Speed 5637.05 samples/sec Loss 4.1429 LearningRate 0.0194 Epoch: 11 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:04,570-Speed 5548.63 samples/sec Loss 4.1736 LearningRate 0.0193 Epoch: 11 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:06,464-Speed 5407.00 samples/sec Loss 4.1476 LearningRate 0.0193 Epoch: 11 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:08,283-Speed 5634.10 samples/sec Loss 4.1286 LearningRate 0.0193 Epoch: 11 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:10,112-Speed 5599.92 samples/sec Loss 4.1550 LearningRate 0.0193 Epoch: 11 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:11,928-Speed 5641.56 samples/sec Loss 4.0745 LearningRate 0.0193 Epoch: 11 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:13,734-Speed 5669.77 samples/sec Loss 4.1118 LearningRate 0.0193 Epoch: 11 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:15,571-Speed 5575.52 samples/sec Loss 4.1296 LearningRate 0.0193 Epoch: 11 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:17,384-Speed 5652.88 samples/sec Loss 4.1628 LearningRate 0.0193 Epoch: 11 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:19,219-Speed 5579.59 samples/sec Loss 4.1129 LearningRate 0.0193 Epoch: 11 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:21,027-Speed 5665.57 samples/sec Loss 4.1635 LearningRate 0.0193 Epoch: 11 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:22,835-Speed 5667.65 samples/sec Loss 4.0915 LearningRate 0.0193 Epoch: 11 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:24,647-Speed 5651.59 samples/sec Loss 4.0543 LearningRate 0.0193 Epoch: 11 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:26,466-Speed 5631.34 samples/sec Loss 4.0008 LearningRate 0.0193 Epoch: 11 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:28:28,282-Speed 5640.91 samples/sec Loss 4.1615 LearningRate 0.0192 Epoch: 11 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:28:30,076-Speed 5709.13 samples/sec Loss 4.0345 LearningRate 0.0192 Epoch: 11 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:31,901-Speed 5613.95 samples/sec Loss 4.0253 LearningRate 0.0192 Epoch: 11 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:33,722-Speed 5626.69 samples/sec Loss 4.0967 LearningRate 0.0192 Epoch: 11 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:35,542-Speed 5626.22 samples/sec Loss 4.1207 LearningRate 0.0192 Epoch: 11 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:37,361-Speed 5631.14 samples/sec Loss 4.0881 LearningRate 0.0192 Epoch: 11 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:39,191-Speed 5598.47 samples/sec Loss 4.0871 LearningRate 0.0192 Epoch: 11 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:41,028-Speed 5577.67 samples/sec Loss 4.1055 LearningRate 0.0192 Epoch: 11 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:42,860-Speed 5590.97 samples/sec Loss 4.0241 LearningRate 0.0192 Epoch: 11 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:44,706-Speed 5546.34 samples/sec Loss 4.0654 LearningRate 0.0192 Epoch: 11 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:46,532-Speed 5612.92 samples/sec Loss 4.0299 LearningRate 0.0192 Epoch: 11 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:48,354-Speed 5621.73 samples/sec Loss 4.2390 LearningRate 0.0192 Epoch: 11 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:50,173-Speed 5631.16 samples/sec Loss 3.9651 LearningRate 0.0192 Epoch: 11 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:52,002-Speed 5600.98 samples/sec Loss 4.0771 LearningRate 0.0191 Epoch: 11 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:53,834-Speed 5590.27 samples/sec Loss 4.1275 LearningRate 0.0191 Epoch: 11 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:55,666-Speed 5590.69 samples/sec Loss 4.1831 LearningRate 0.0191 Epoch: 11 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:57,499-Speed 5587.63 samples/sec Loss 4.1252 LearningRate 0.0191 Epoch: 11 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:28:59,326-Speed 5608.62 samples/sec Loss 4.1103 LearningRate 0.0191 Epoch: 11 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:29:25,328-[lfw][64000]XNorm: 22.442670 Training: 2022-04-27 05:29:25,329-[lfw][64000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-27 05:29:25,329-[lfw][64000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:29:55,458-[cfp_fp][64000]XNorm: 20.453871 Training: 2022-04-27 05:29:55,459-[cfp_fp][64000]Accuracy-Flip: 0.96386+-0.00965 Training: 2022-04-27 05:29:55,459-[cfp_fp][64000]Accuracy-Highest: 0.96386 Training: 2022-04-27 05:30:21,661-[agedb_30][64000]XNorm: 22.180480 Training: 2022-04-27 05:30:21,662-[agedb_30][64000]Accuracy-Flip: 0.97517+-0.00929 Training: 2022-04-27 05:30:21,662-[agedb_30][64000]Accuracy-Highest: 0.97667 Training: 2022-04-27 05:30:23,502-Speed 121.65 samples/sec Loss 4.1059 LearningRate 0.0191 Epoch: 11 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:25,326-Speed 5615.28 samples/sec Loss 4.1771 LearningRate 0.0191 Epoch: 11 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:27,131-Speed 5675.95 samples/sec Loss 4.1508 LearningRate 0.0191 Epoch: 11 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:28,948-Speed 5638.49 samples/sec Loss 4.0892 LearningRate 0.0191 Epoch: 11 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:30,774-Speed 5609.30 samples/sec Loss 4.2496 LearningRate 0.0191 Epoch: 11 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:32,618-Speed 5554.25 samples/sec Loss 4.0958 LearningRate 0.0191 Epoch: 11 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:34,426-Speed 5667.13 samples/sec Loss 4.2357 LearningRate 0.0191 Epoch: 11 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:36,236-Speed 5657.60 samples/sec Loss 4.0600 LearningRate 0.0191 Epoch: 11 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:38,107-Speed 5474.41 samples/sec Loss 4.1062 LearningRate 0.0190 Epoch: 11 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:39,934-Speed 5607.40 samples/sec Loss 4.0405 LearningRate 0.0190 Epoch: 11 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:41,840-Speed 5372.90 samples/sec Loss 4.1547 LearningRate 0.0190 Epoch: 11 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:43,694-Speed 5525.19 samples/sec Loss 4.2374 LearningRate 0.0190 Epoch: 11 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:45,529-Speed 5581.89 samples/sec Loss 4.1658 LearningRate 0.0190 Epoch: 11 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:47,345-Speed 5641.91 samples/sec Loss 4.1129 LearningRate 0.0190 Epoch: 11 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:30:49,197-Speed 5530.78 samples/sec Loss 4.2278 LearningRate 0.0190 Epoch: 11 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:51,048-Speed 5534.94 samples/sec Loss 4.0131 LearningRate 0.0190 Epoch: 11 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:52,904-Speed 5519.25 samples/sec Loss 4.1590 LearningRate 0.0190 Epoch: 11 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:54,740-Speed 5577.87 samples/sec Loss 4.3001 LearningRate 0.0190 Epoch: 11 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:56,610-Speed 5477.77 samples/sec Loss 3.9834 LearningRate 0.0190 Epoch: 11 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:30:58,456-Speed 5550.74 samples/sec Loss 4.1320 LearningRate 0.0190 Epoch: 11 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:00,269-Speed 5650.18 samples/sec Loss 4.1945 LearningRate 0.0190 Epoch: 11 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:02,072-Speed 5680.64 samples/sec Loss 4.1701 LearningRate 0.0189 Epoch: 11 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:03,891-Speed 5630.62 samples/sec Loss 4.1292 LearningRate 0.0189 Epoch: 11 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:05,710-Speed 5632.86 samples/sec Loss 4.2427 LearningRate 0.0189 Epoch: 11 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:07,546-Speed 5578.85 samples/sec Loss 4.1388 LearningRate 0.0189 Epoch: 11 Global Step: 64250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:09,366-Speed 5625.98 samples/sec Loss 4.1862 LearningRate 0.0189 Epoch: 11 Global Step: 64260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:11,186-Speed 5629.23 samples/sec Loss 4.1106 LearningRate 0.0189 Epoch: 11 Global Step: 64270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:13,044-Speed 5513.20 samples/sec Loss 4.1223 LearningRate 0.0189 Epoch: 11 Global Step: 64280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:14,861-Speed 5636.80 samples/sec Loss 4.1109 LearningRate 0.0189 Epoch: 11 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:16,686-Speed 5613.32 samples/sec Loss 4.3019 LearningRate 0.0189 Epoch: 11 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:18,511-Speed 5613.49 samples/sec Loss 4.0709 LearningRate 0.0189 Epoch: 11 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:31:20,340-Speed 5600.67 samples/sec Loss 4.3000 LearningRate 0.0189 Epoch: 11 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:22,155-Speed 5642.89 samples/sec Loss 4.2332 LearningRate 0.0189 Epoch: 11 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:23,995-Speed 5567.46 samples/sec Loss 4.1391 LearningRate 0.0189 Epoch: 11 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:25,825-Speed 5597.27 samples/sec Loss 4.0957 LearningRate 0.0188 Epoch: 11 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:27,717-Speed 5416.17 samples/sec Loss 4.1636 LearningRate 0.0188 Epoch: 11 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:29,541-Speed 5615.20 samples/sec Loss 4.1003 LearningRate 0.0188 Epoch: 11 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:31,369-Speed 5602.56 samples/sec Loss 4.0951 LearningRate 0.0188 Epoch: 11 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:33,217-Speed 5542.22 samples/sec Loss 4.0991 LearningRate 0.0188 Epoch: 11 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:35,052-Speed 5581.73 samples/sec Loss 4.1137 LearningRate 0.0188 Epoch: 11 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:36,879-Speed 5607.40 samples/sec Loss 4.2700 LearningRate 0.0188 Epoch: 11 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:38,758-Speed 5452.77 samples/sec Loss 4.1361 LearningRate 0.0188 Epoch: 11 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:31:40,594-Speed 5579.73 samples/sec Loss 4.0916 LearningRate 0.0188 Epoch: 11 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:31:42,419-Speed 5610.55 samples/sec Loss 4.1231 LearningRate 0.0188 Epoch: 11 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:44,252-Speed 5589.86 samples/sec Loss 4.1385 LearningRate 0.0188 Epoch: 11 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:46,115-Speed 5497.89 samples/sec Loss 4.1720 LearningRate 0.0188 Epoch: 11 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:47,968-Speed 5528.21 samples/sec Loss 4.1653 LearningRate 0.0188 Epoch: 11 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:49,785-Speed 5636.48 samples/sec Loss 4.1952 LearningRate 0.0187 Epoch: 11 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:51,617-Speed 5592.19 samples/sec Loss 4.2180 LearningRate 0.0187 Epoch: 11 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:53,461-Speed 5554.11 samples/sec Loss 4.1090 LearningRate 0.0187 Epoch: 11 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:55,309-Speed 5544.05 samples/sec Loss 4.2547 LearningRate 0.0187 Epoch: 11 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:57,139-Speed 5599.03 samples/sec Loss 4.1167 LearningRate 0.0187 Epoch: 11 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:31:58,970-Speed 5592.35 samples/sec Loss 4.1619 LearningRate 0.0187 Epoch: 11 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:00,795-Speed 5614.55 samples/sec Loss 4.0080 LearningRate 0.0187 Epoch: 11 Global Step: 64540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:32:02,610-Speed 5641.35 samples/sec Loss 4.2670 LearningRate 0.0187 Epoch: 11 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:04,466-Speed 5519.47 samples/sec Loss 4.1675 LearningRate 0.0187 Epoch: 11 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:06,302-Speed 5581.42 samples/sec Loss 4.1814 LearningRate 0.0187 Epoch: 11 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:08,140-Speed 5572.30 samples/sec Loss 4.1475 LearningRate 0.0187 Epoch: 11 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:10,062-Speed 5330.31 samples/sec Loss 4.1850 LearningRate 0.0187 Epoch: 11 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:11,897-Speed 5582.26 samples/sec Loss 4.1584 LearningRate 0.0187 Epoch: 11 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:13,761-Speed 5492.88 samples/sec Loss 4.1018 LearningRate 0.0186 Epoch: 11 Global Step: 64610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:15,588-Speed 5608.55 samples/sec Loss 4.1626 LearningRate 0.0186 Epoch: 11 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:17,408-Speed 5629.30 samples/sec Loss 4.2070 LearningRate 0.0186 Epoch: 11 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:19,242-Speed 5584.79 samples/sec Loss 4.1697 LearningRate 0.0186 Epoch: 11 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:21,060-Speed 5635.27 samples/sec Loss 4.1142 LearningRate 0.0186 Epoch: 11 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:22,896-Speed 5579.57 samples/sec Loss 4.0567 LearningRate 0.0186 Epoch: 11 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:24,724-Speed 5602.58 samples/sec Loss 4.0652 LearningRate 0.0186 Epoch: 11 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:26,562-Speed 5572.64 samples/sec Loss 4.2346 LearningRate 0.0186 Epoch: 11 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:28,395-Speed 5590.12 samples/sec Loss 4.1418 LearningRate 0.0186 Epoch: 11 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:30,237-Speed 5559.60 samples/sec Loss 4.1059 LearningRate 0.0186 Epoch: 11 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:32,051-Speed 5645.19 samples/sec Loss 4.2225 LearningRate 0.0186 Epoch: 11 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:33,860-Speed 5662.38 samples/sec Loss 4.1183 LearningRate 0.0186 Epoch: 11 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:35,692-Speed 5594.31 samples/sec Loss 4.1162 LearningRate 0.0186 Epoch: 11 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:37,522-Speed 5596.55 samples/sec Loss 4.0905 LearningRate 0.0186 Epoch: 11 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:39,327-Speed 5675.74 samples/sec Loss 4.0654 LearningRate 0.0185 Epoch: 11 Global Step: 64750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:41,149-Speed 5621.66 samples/sec Loss 4.1524 LearningRate 0.0185 Epoch: 11 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:42,956-Speed 5669.19 samples/sec Loss 4.1784 LearningRate 0.0185 Epoch: 11 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:44,770-Speed 5645.84 samples/sec Loss 4.0272 LearningRate 0.0185 Epoch: 11 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:46,588-Speed 5634.71 samples/sec Loss 4.1407 LearningRate 0.0185 Epoch: 11 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:48,410-Speed 5624.08 samples/sec Loss 4.0798 LearningRate 0.0185 Epoch: 11 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:50,223-Speed 5648.48 samples/sec Loss 4.1817 LearningRate 0.0185 Epoch: 11 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:52,074-Speed 5533.74 samples/sec Loss 4.1974 LearningRate 0.0185 Epoch: 11 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:53,896-Speed 5620.44 samples/sec Loss 4.2074 LearningRate 0.0185 Epoch: 11 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:55,707-Speed 5656.56 samples/sec Loss 4.1970 LearningRate 0.0185 Epoch: 11 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:57,513-Speed 5673.10 samples/sec Loss 4.1508 LearningRate 0.0185 Epoch: 11 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:32:59,342-Speed 5599.92 samples/sec Loss 4.2833 LearningRate 0.0185 Epoch: 11 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:01,160-Speed 5634.18 samples/sec Loss 4.0664 LearningRate 0.0185 Epoch: 11 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:02,980-Speed 5629.78 samples/sec Loss 4.1610 LearningRate 0.0184 Epoch: 11 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:04,821-Speed 5562.83 samples/sec Loss 4.1911 LearningRate 0.0184 Epoch: 11 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:06,641-Speed 5629.59 samples/sec Loss 4.2842 LearningRate 0.0184 Epoch: 11 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:08,447-Speed 5671.81 samples/sec Loss 4.1578 LearningRate 0.0184 Epoch: 11 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:10,275-Speed 5603.35 samples/sec Loss 4.0800 LearningRate 0.0184 Epoch: 11 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:12,097-Speed 5621.10 samples/sec Loss 4.1625 LearningRate 0.0184 Epoch: 11 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:13,921-Speed 5615.59 samples/sec Loss 4.1888 LearningRate 0.0184 Epoch: 11 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:15,746-Speed 5614.91 samples/sec Loss 4.1677 LearningRate 0.0184 Epoch: 11 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:17,558-Speed 5653.63 samples/sec Loss 4.1744 LearningRate 0.0184 Epoch: 11 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:19,372-Speed 5646.81 samples/sec Loss 4.1425 LearningRate 0.0184 Epoch: 11 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:21,205-Speed 5585.86 samples/sec Loss 4.1784 LearningRate 0.0184 Epoch: 11 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:23,059-Speed 5524.44 samples/sec Loss 4.0353 LearningRate 0.0184 Epoch: 11 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:24,902-Speed 5558.62 samples/sec Loss 4.1691 LearningRate 0.0184 Epoch: 11 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:26,725-Speed 5618.79 samples/sec Loss 4.1127 LearningRate 0.0183 Epoch: 11 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:28,577-Speed 5532.51 samples/sec Loss 4.1771 LearningRate 0.0183 Epoch: 11 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:30,509-Speed 5301.06 samples/sec Loss 4.1369 LearningRate 0.0183 Epoch: 11 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:32,443-Speed 5298.71 samples/sec Loss 4.2642 LearningRate 0.0183 Epoch: 11 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:34,285-Speed 5558.58 samples/sec Loss 4.1399 LearningRate 0.0183 Epoch: 11 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:33:36,115-Speed 5597.16 samples/sec Loss 4.1215 LearningRate 0.0183 Epoch: 11 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:33:37,920-Speed 5676.03 samples/sec Loss 4.1928 LearningRate 0.0183 Epoch: 11 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:39,757-Speed 5574.80 samples/sec Loss 4.1224 LearningRate 0.0183 Epoch: 11 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:41,572-Speed 5645.48 samples/sec Loss 4.0452 LearningRate 0.0183 Epoch: 11 Global Step: 65090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:43,411-Speed 5570.16 samples/sec Loss 4.3086 LearningRate 0.0183 Epoch: 11 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:45,234-Speed 5619.14 samples/sec Loss 4.2315 LearningRate 0.0183 Epoch: 11 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:47,057-Speed 5618.43 samples/sec Loss 4.1127 LearningRate 0.0183 Epoch: 11 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:48,869-Speed 5654.15 samples/sec Loss 4.1508 LearningRate 0.0183 Epoch: 11 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:50,689-Speed 5627.13 samples/sec Loss 4.0349 LearningRate 0.0182 Epoch: 11 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:52,531-Speed 5562.28 samples/sec Loss 4.0835 LearningRate 0.0182 Epoch: 11 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:54,405-Speed 5466.65 samples/sec Loss 4.1570 LearningRate 0.0182 Epoch: 11 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:33:56,299-Speed 5405.95 samples/sec Loss 4.0571 LearningRate 0.0182 Epoch: 11 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:33:58,231-Speed 5302.11 samples/sec Loss 4.2298 LearningRate 0.0182 Epoch: 11 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:34:00,084-Speed 5530.47 samples/sec Loss 4.2315 LearningRate 0.0182 Epoch: 11 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:01,900-Speed 5639.20 samples/sec Loss 4.1537 LearningRate 0.0182 Epoch: 11 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:03,797-Speed 5398.62 samples/sec Loss 4.1004 LearningRate 0.0182 Epoch: 11 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:05,723-Speed 5318.73 samples/sec Loss 4.1306 LearningRate 0.0182 Epoch: 11 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:07,649-Speed 5318.32 samples/sec Loss 4.2766 LearningRate 0.0182 Epoch: 11 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:09,528-Speed 5452.88 samples/sec Loss 4.2521 LearningRate 0.0182 Epoch: 11 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:11,445-Speed 5345.13 samples/sec Loss 4.1292 LearningRate 0.0182 Epoch: 11 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:13,300-Speed 5521.30 samples/sec Loss 4.2607 LearningRate 0.0182 Epoch: 11 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:15,143-Speed 5556.69 samples/sec Loss 4.1133 LearningRate 0.0182 Epoch: 11 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:16,984-Speed 5565.67 samples/sec Loss 4.1245 LearningRate 0.0181 Epoch: 11 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:18,803-Speed 5631.39 samples/sec Loss 4.0628 LearningRate 0.0181 Epoch: 11 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:34:20,603-Speed 5693.10 samples/sec Loss 4.2005 LearningRate 0.0181 Epoch: 11 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:22,416-Speed 5649.83 samples/sec Loss 4.1739 LearningRate 0.0181 Epoch: 11 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:24,253-Speed 5576.97 samples/sec Loss 4.1283 LearningRate 0.0181 Epoch: 11 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:26,075-Speed 5620.98 samples/sec Loss 4.1365 LearningRate 0.0181 Epoch: 11 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:27,908-Speed 5590.12 samples/sec Loss 4.1409 LearningRate 0.0181 Epoch: 11 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:29,727-Speed 5629.39 samples/sec Loss 4.1912 LearningRate 0.0181 Epoch: 11 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:31,539-Speed 5655.23 samples/sec Loss 4.1067 LearningRate 0.0181 Epoch: 11 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:33,364-Speed 5613.65 samples/sec Loss 4.3103 LearningRate 0.0181 Epoch: 11 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:35,174-Speed 5658.12 samples/sec Loss 4.1033 LearningRate 0.0181 Epoch: 11 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:36,999-Speed 5612.22 samples/sec Loss 4.2638 LearningRate 0.0181 Epoch: 11 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:38,806-Speed 5669.53 samples/sec Loss 4.0970 LearningRate 0.0181 Epoch: 11 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:40,647-Speed 5563.85 samples/sec Loss 4.1349 LearningRate 0.0180 Epoch: 11 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:42,569-Speed 5330.94 samples/sec Loss 4.1269 LearningRate 0.0180 Epoch: 11 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:44,383-Speed 5647.15 samples/sec Loss 4.1085 LearningRate 0.0180 Epoch: 11 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:46,193-Speed 5658.31 samples/sec Loss 4.0675 LearningRate 0.0180 Epoch: 11 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:48,028-Speed 5581.12 samples/sec Loss 4.1550 LearningRate 0.0180 Epoch: 11 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:49,854-Speed 5610.04 samples/sec Loss 4.1283 LearningRate 0.0180 Epoch: 11 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:51,687-Speed 5589.46 samples/sec Loss 4.1895 LearningRate 0.0180 Epoch: 11 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:53,534-Speed 5544.94 samples/sec Loss 4.1257 LearningRate 0.0180 Epoch: 11 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:55,380-Speed 5549.98 samples/sec Loss 4.0664 LearningRate 0.0180 Epoch: 11 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:34:57,209-Speed 5599.86 samples/sec Loss 4.0663 LearningRate 0.0180 Epoch: 11 Global Step: 65500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:34:59,035-Speed 5611.17 samples/sec Loss 4.1951 LearningRate 0.0180 Epoch: 11 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:00,863-Speed 5603.71 samples/sec Loss 4.1415 LearningRate 0.0180 Epoch: 11 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:02,673-Speed 5657.20 samples/sec Loss 4.0903 LearningRate 0.0180 Epoch: 11 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:04,488-Speed 5646.13 samples/sec Loss 4.1982 LearningRate 0.0179 Epoch: 11 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:06,318-Speed 5596.12 samples/sec Loss 4.1841 LearningRate 0.0179 Epoch: 11 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:08,157-Speed 5570.92 samples/sec Loss 4.2248 LearningRate 0.0179 Epoch: 11 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:09,965-Speed 5663.13 samples/sec Loss 4.1415 LearningRate 0.0179 Epoch: 11 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:11,782-Speed 5638.16 samples/sec Loss 4.1264 LearningRate 0.0179 Epoch: 11 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:13,619-Speed 5576.94 samples/sec Loss 3.9973 LearningRate 0.0179 Epoch: 11 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:15,438-Speed 5630.86 samples/sec Loss 4.0831 LearningRate 0.0179 Epoch: 11 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:17,239-Speed 5689.08 samples/sec Loss 4.2105 LearningRate 0.0179 Epoch: 11 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:19,053-Speed 5647.63 samples/sec Loss 4.1282 LearningRate 0.0179 Epoch: 11 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:20,883-Speed 5596.41 samples/sec Loss 4.1283 LearningRate 0.0179 Epoch: 11 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:22,714-Speed 5593.83 samples/sec Loss 4.1811 LearningRate 0.0179 Epoch: 11 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:24,547-Speed 5589.09 samples/sec Loss 4.1638 LearningRate 0.0179 Epoch: 11 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:26,358-Speed 5654.04 samples/sec Loss 4.0798 LearningRate 0.0179 Epoch: 11 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:28,171-Speed 5650.65 samples/sec Loss 4.2358 LearningRate 0.0179 Epoch: 11 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:29,987-Speed 5641.83 samples/sec Loss 4.0980 LearningRate 0.0178 Epoch: 11 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:31,809-Speed 5620.44 samples/sec Loss 4.1575 LearningRate 0.0178 Epoch: 11 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:33,626-Speed 5637.20 samples/sec Loss 4.1981 LearningRate 0.0178 Epoch: 11 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:35,451-Speed 5615.98 samples/sec Loss 4.1157 LearningRate 0.0178 Epoch: 11 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:37,286-Speed 5580.83 samples/sec Loss 4.1302 LearningRate 0.0178 Epoch: 11 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:39,115-Speed 5600.69 samples/sec Loss 4.0947 LearningRate 0.0178 Epoch: 11 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:40,925-Speed 5661.85 samples/sec Loss 4.1745 LearningRate 0.0178 Epoch: 11 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:42,749-Speed 5615.79 samples/sec Loss 4.0883 LearningRate 0.0178 Epoch: 11 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:44,560-Speed 5653.71 samples/sec Loss 4.0839 LearningRate 0.0178 Epoch: 11 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:46,391-Speed 5594.89 samples/sec Loss 4.2247 LearningRate 0.0178 Epoch: 11 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:48,226-Speed 5581.27 samples/sec Loss 4.1872 LearningRate 0.0178 Epoch: 11 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:50,052-Speed 5611.79 samples/sec Loss 4.2334 LearningRate 0.0178 Epoch: 11 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:51,879-Speed 5605.30 samples/sec Loss 4.1301 LearningRate 0.0178 Epoch: 11 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:53,717-Speed 5572.39 samples/sec Loss 4.0461 LearningRate 0.0177 Epoch: 11 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:35:55,531-Speed 5646.22 samples/sec Loss 4.1582 LearningRate 0.0177 Epoch: 11 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:57,349-Speed 5635.47 samples/sec Loss 4.1553 LearningRate 0.0177 Epoch: 11 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:35:59,156-Speed 5668.85 samples/sec Loss 4.1825 LearningRate 0.0177 Epoch: 11 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:00,976-Speed 5630.32 samples/sec Loss 4.2167 LearningRate 0.0177 Epoch: 11 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:02,786-Speed 5659.99 samples/sec Loss 4.2121 LearningRate 0.0177 Epoch: 11 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:04,606-Speed 5628.38 samples/sec Loss 4.0992 LearningRate 0.0177 Epoch: 11 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:06,434-Speed 5601.82 samples/sec Loss 3.9387 LearningRate 0.0177 Epoch: 11 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:08,253-Speed 5631.74 samples/sec Loss 4.2410 LearningRate 0.0177 Epoch: 11 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:10,095-Speed 5560.91 samples/sec Loss 4.0943 LearningRate 0.0177 Epoch: 11 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:11,926-Speed 5593.73 samples/sec Loss 4.0715 LearningRate 0.0177 Epoch: 11 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:13,740-Speed 5645.74 samples/sec Loss 4.0910 LearningRate 0.0177 Epoch: 11 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:36:15,546-Speed 5673.31 samples/sec Loss 3.9667 LearningRate 0.0177 Epoch: 11 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:17,360-Speed 5645.09 samples/sec Loss 4.1747 LearningRate 0.0177 Epoch: 11 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:19,174-Speed 5647.67 samples/sec Loss 4.1379 LearningRate 0.0176 Epoch: 11 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:20,986-Speed 5652.83 samples/sec Loss 4.0623 LearningRate 0.0176 Epoch: 11 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:22,855-Speed 5484.23 samples/sec Loss 4.1739 LearningRate 0.0176 Epoch: 11 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:24,688-Speed 5587.86 samples/sec Loss 4.0967 LearningRate 0.0176 Epoch: 11 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:26,525-Speed 5576.15 samples/sec Loss 4.1166 LearningRate 0.0176 Epoch: 11 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:28,383-Speed 5512.95 samples/sec Loss 4.0829 LearningRate 0.0176 Epoch: 11 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:36:58,473-[lfw][66000]XNorm: 22.807010 Training: 2022-04-27 05:36:58,474-[lfw][66000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-27 05:36:58,474-[lfw][66000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:37:28,797-[cfp_fp][66000]XNorm: 20.418062 Training: 2022-04-27 05:37:28,797-[cfp_fp][66000]Accuracy-Flip: 0.96314+-0.00722 Training: 2022-04-27 05:37:28,798-[cfp_fp][66000]Accuracy-Highest: 0.96386 Training: 2022-04-27 05:37:54,975-[agedb_30][66000]XNorm: 22.413620 Training: 2022-04-27 05:37:54,976-[agedb_30][66000]Accuracy-Flip: 0.97817+-0.00797 Training: 2022-04-27 05:37:54,976-[agedb_30][66000]Accuracy-Highest: 0.97817 Training: 2022-04-27 05:37:56,810-Speed 115.80 samples/sec Loss 4.0261 LearningRate 0.0176 Epoch: 11 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:37:58,616-Speed 5673.71 samples/sec Loss 4.0662 LearningRate 0.0176 Epoch: 11 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:00,423-Speed 5668.34 samples/sec Loss 4.1404 LearningRate 0.0176 Epoch: 11 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:02,272-Speed 5540.91 samples/sec Loss 4.1219 LearningRate 0.0176 Epoch: 11 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:04,098-Speed 5609.85 samples/sec Loss 4.1520 LearningRate 0.0176 Epoch: 11 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:05,928-Speed 5596.93 samples/sec Loss 4.0330 LearningRate 0.0176 Epoch: 11 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:07,733-Speed 5673.43 samples/sec Loss 4.0165 LearningRate 0.0176 Epoch: 11 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:09,536-Speed 5682.91 samples/sec Loss 4.0177 LearningRate 0.0175 Epoch: 11 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:11,345-Speed 5661.61 samples/sec Loss 4.1571 LearningRate 0.0175 Epoch: 11 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:13,171-Speed 5610.08 samples/sec Loss 4.0840 LearningRate 0.0175 Epoch: 11 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:14,981-Speed 5658.32 samples/sec Loss 4.1630 LearningRate 0.0175 Epoch: 11 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:16,795-Speed 5645.66 samples/sec Loss 4.1560 LearningRate 0.0175 Epoch: 11 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:18,619-Speed 5617.33 samples/sec Loss 4.1818 LearningRate 0.0175 Epoch: 11 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:38:20,445-Speed 5610.42 samples/sec Loss 4.1469 LearningRate 0.0175 Epoch: 11 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:22,269-Speed 5615.28 samples/sec Loss 4.0392 LearningRate 0.0175 Epoch: 11 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:24,089-Speed 5629.83 samples/sec Loss 4.1283 LearningRate 0.0175 Epoch: 11 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:25,918-Speed 5600.98 samples/sec Loss 4.2726 LearningRate 0.0175 Epoch: 11 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:27,737-Speed 5628.73 samples/sec Loss 4.1914 LearningRate 0.0175 Epoch: 11 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:29,554-Speed 5639.96 samples/sec Loss 4.0146 LearningRate 0.0175 Epoch: 11 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:31,380-Speed 5608.49 samples/sec Loss 4.1421 LearningRate 0.0175 Epoch: 11 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:33,201-Speed 5624.89 samples/sec Loss 4.1473 LearningRate 0.0175 Epoch: 11 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:35,016-Speed 5643.27 samples/sec Loss 4.1222 LearningRate 0.0174 Epoch: 11 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:36,832-Speed 5642.93 samples/sec Loss 4.1242 LearningRate 0.0174 Epoch: 11 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:38,634-Speed 5682.86 samples/sec Loss 4.1407 LearningRate 0.0174 Epoch: 11 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:40,443-Speed 5664.07 samples/sec Loss 4.0492 LearningRate 0.0174 Epoch: 11 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:42,262-Speed 5629.52 samples/sec Loss 4.1135 LearningRate 0.0174 Epoch: 11 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:44,075-Speed 5651.95 samples/sec Loss 4.1630 LearningRate 0.0174 Epoch: 11 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:45,894-Speed 5631.42 samples/sec Loss 3.9906 LearningRate 0.0174 Epoch: 11 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:47,718-Speed 5614.41 samples/sec Loss 4.1569 LearningRate 0.0174 Epoch: 11 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:49,548-Speed 5598.48 samples/sec Loss 4.1094 LearningRate 0.0174 Epoch: 11 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:51,362-Speed 5646.28 samples/sec Loss 4.0198 LearningRate 0.0174 Epoch: 11 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:53,192-Speed 5596.71 samples/sec Loss 4.1570 LearningRate 0.0174 Epoch: 11 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:55,018-Speed 5610.04 samples/sec Loss 4.0664 LearningRate 0.0174 Epoch: 11 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:38:56,836-Speed 5633.88 samples/sec Loss 4.1547 LearningRate 0.0174 Epoch: 11 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:38:58,659-Speed 5620.70 samples/sec Loss 3.9959 LearningRate 0.0174 Epoch: 11 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:39:00,467-Speed 5663.47 samples/sec Loss 4.0559 LearningRate 0.0173 Epoch: 11 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:02,304-Speed 5578.75 samples/sec Loss 4.0537 LearningRate 0.0173 Epoch: 11 Global Step: 66370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:04,119-Speed 5642.50 samples/sec Loss 4.0636 LearningRate 0.0173 Epoch: 11 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:05,966-Speed 5546.24 samples/sec Loss 4.1413 LearningRate 0.0173 Epoch: 11 Global Step: 66390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:07,785-Speed 5633.12 samples/sec Loss 3.9739 LearningRate 0.0173 Epoch: 11 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:09,613-Speed 5604.12 samples/sec Loss 4.0612 LearningRate 0.0173 Epoch: 11 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:11,445-Speed 5591.68 samples/sec Loss 4.2173 LearningRate 0.0173 Epoch: 11 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:13,260-Speed 5641.04 samples/sec Loss 4.0281 LearningRate 0.0173 Epoch: 11 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:15,090-Speed 5599.57 samples/sec Loss 4.0722 LearningRate 0.0173 Epoch: 11 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:16,933-Speed 5555.61 samples/sec Loss 4.1937 LearningRate 0.0173 Epoch: 11 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:18,758-Speed 5613.10 samples/sec Loss 4.0819 LearningRate 0.0173 Epoch: 11 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:39:20,587-Speed 5599.33 samples/sec Loss 4.1622 LearningRate 0.0173 Epoch: 11 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:39:22,416-Speed 5602.68 samples/sec Loss 4.0290 LearningRate 0.0173 Epoch: 11 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:39:24,236-Speed 5626.42 samples/sec Loss 4.0413 LearningRate 0.0172 Epoch: 11 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:39:26,051-Speed 5644.32 samples/sec Loss 4.1780 LearningRate 0.0172 Epoch: 11 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:27,867-Speed 5643.10 samples/sec Loss 4.2047 LearningRate 0.0172 Epoch: 11 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:29,677-Speed 5657.18 samples/sec Loss 4.1066 LearningRate 0.0172 Epoch: 11 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:31,504-Speed 5608.39 samples/sec Loss 4.0607 LearningRate 0.0172 Epoch: 11 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:33,318-Speed 5645.38 samples/sec Loss 4.0003 LearningRate 0.0172 Epoch: 11 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:35,133-Speed 5643.66 samples/sec Loss 4.0947 LearningRate 0.0172 Epoch: 11 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:36,954-Speed 5627.59 samples/sec Loss 3.9493 LearningRate 0.0172 Epoch: 11 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:38,837-Speed 5438.95 samples/sec Loss 3.9587 LearningRate 0.0172 Epoch: 11 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:40,656-Speed 5630.79 samples/sec Loss 4.0140 LearningRate 0.0172 Epoch: 11 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:42,499-Speed 5557.39 samples/sec Loss 4.0564 LearningRate 0.0172 Epoch: 11 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:44,336-Speed 5575.51 samples/sec Loss 4.1433 LearningRate 0.0172 Epoch: 11 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:46,151-Speed 5643.87 samples/sec Loss 4.1818 LearningRate 0.0172 Epoch: 11 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:47,978-Speed 5608.76 samples/sec Loss 4.1383 LearningRate 0.0172 Epoch: 11 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:49,786-Speed 5666.65 samples/sec Loss 3.9265 LearningRate 0.0171 Epoch: 11 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:51,616-Speed 5595.01 samples/sec Loss 4.0750 LearningRate 0.0171 Epoch: 11 Global Step: 66640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:53,426-Speed 5660.22 samples/sec Loss 4.0406 LearningRate 0.0171 Epoch: 11 Global Step: 66650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:55,236-Speed 5661.80 samples/sec Loss 4.0491 LearningRate 0.0171 Epoch: 11 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:57,062-Speed 5609.35 samples/sec Loss 4.0872 LearningRate 0.0171 Epoch: 11 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:39:58,877-Speed 5642.74 samples/sec Loss 4.0494 LearningRate 0.0171 Epoch: 11 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:00,689-Speed 5651.43 samples/sec Loss 4.0618 LearningRate 0.0171 Epoch: 11 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:02,517-Speed 5605.37 samples/sec Loss 4.0408 LearningRate 0.0171 Epoch: 11 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:40:04,338-Speed 5626.09 samples/sec Loss 4.0910 LearningRate 0.0171 Epoch: 11 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:06,159-Speed 5622.76 samples/sec Loss 4.0789 LearningRate 0.0171 Epoch: 11 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:07,982-Speed 5618.39 samples/sec Loss 4.0634 LearningRate 0.0171 Epoch: 11 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:09,799-Speed 5638.33 samples/sec Loss 4.0833 LearningRate 0.0171 Epoch: 11 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:11,607-Speed 5667.44 samples/sec Loss 4.1838 LearningRate 0.0171 Epoch: 11 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:13,432-Speed 5611.21 samples/sec Loss 4.0560 LearningRate 0.0171 Epoch: 11 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:15,240-Speed 5668.04 samples/sec Loss 4.1470 LearningRate 0.0170 Epoch: 11 Global Step: 66770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:17,061-Speed 5625.25 samples/sec Loss 4.1857 LearningRate 0.0170 Epoch: 11 Global Step: 66780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:18,886-Speed 5612.25 samples/sec Loss 4.0526 LearningRate 0.0170 Epoch: 11 Global Step: 66790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:20,721-Speed 5579.97 samples/sec Loss 4.1550 LearningRate 0.0170 Epoch: 11 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:22,542-Speed 5626.03 samples/sec Loss 4.1369 LearningRate 0.0170 Epoch: 11 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:24,354-Speed 5654.48 samples/sec Loss 4.0127 LearningRate 0.0170 Epoch: 11 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:26,171-Speed 5637.20 samples/sec Loss 4.0808 LearningRate 0.0170 Epoch: 11 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:27,986-Speed 5641.86 samples/sec Loss 4.1512 LearningRate 0.0170 Epoch: 11 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:29,803-Speed 5637.65 samples/sec Loss 4.1790 LearningRate 0.0170 Epoch: 11 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:31,613-Speed 5661.97 samples/sec Loss 3.9810 LearningRate 0.0170 Epoch: 11 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:33,433-Speed 5626.31 samples/sec Loss 4.0135 LearningRate 0.0170 Epoch: 11 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:35,249-Speed 5642.41 samples/sec Loss 4.0844 LearningRate 0.0170 Epoch: 11 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:37,070-Speed 5623.32 samples/sec Loss 4.1064 LearningRate 0.0170 Epoch: 11 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:38,894-Speed 5616.31 samples/sec Loss 4.0336 LearningRate 0.0170 Epoch: 11 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:40,720-Speed 5609.27 samples/sec Loss 4.0717 LearningRate 0.0169 Epoch: 11 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:40:42,533-Speed 5651.78 samples/sec Loss 4.0174 LearningRate 0.0169 Epoch: 11 Global Step: 66920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:44,365-Speed 5591.16 samples/sec Loss 4.0409 LearningRate 0.0169 Epoch: 11 Global Step: 66930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:46,203-Speed 5572.03 samples/sec Loss 3.9655 LearningRate 0.0169 Epoch: 11 Global Step: 66940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:48,015-Speed 5653.67 samples/sec Loss 4.1258 LearningRate 0.0169 Epoch: 11 Global Step: 66950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:49,851-Speed 5579.04 samples/sec Loss 4.0823 LearningRate 0.0169 Epoch: 11 Global Step: 66960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:51,684-Speed 5587.61 samples/sec Loss 4.0760 LearningRate 0.0169 Epoch: 11 Global Step: 66970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:53,514-Speed 5597.84 samples/sec Loss 4.1326 LearningRate 0.0169 Epoch: 11 Global Step: 66980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:55,347-Speed 5588.86 samples/sec Loss 4.1049 LearningRate 0.0169 Epoch: 11 Global Step: 66990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:57,161-Speed 5646.28 samples/sec Loss 4.0624 LearningRate 0.0169 Epoch: 11 Global Step: 67000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:40:58,976-Speed 5646.33 samples/sec Loss 4.0650 LearningRate 0.0169 Epoch: 11 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:41:00,803-Speed 5606.90 samples/sec Loss 4.0703 LearningRate 0.0169 Epoch: 11 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:02,630-Speed 5604.85 samples/sec Loss 4.0422 LearningRate 0.0169 Epoch: 11 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:04,452-Speed 5623.22 samples/sec Loss 4.2379 LearningRate 0.0168 Epoch: 11 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:06,298-Speed 5546.88 samples/sec Loss 3.9984 LearningRate 0.0168 Epoch: 11 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:08,119-Speed 5627.36 samples/sec Loss 3.9991 LearningRate 0.0168 Epoch: 11 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:09,941-Speed 5620.00 samples/sec Loss 4.1391 LearningRate 0.0168 Epoch: 11 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:11,751-Speed 5659.88 samples/sec Loss 4.2303 LearningRate 0.0168 Epoch: 11 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:13,571-Speed 5627.77 samples/sec Loss 4.0349 LearningRate 0.0168 Epoch: 11 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:15,393-Speed 5622.04 samples/sec Loss 4.1243 LearningRate 0.0168 Epoch: 11 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:17,219-Speed 5610.04 samples/sec Loss 4.1459 LearningRate 0.0168 Epoch: 11 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:19,045-Speed 5610.13 samples/sec Loss 4.0427 LearningRate 0.0168 Epoch: 11 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:41:20,877-Speed 5593.19 samples/sec Loss 4.0164 LearningRate 0.0168 Epoch: 11 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:41:22,705-Speed 5603.24 samples/sec Loss 4.0731 LearningRate 0.0168 Epoch: 11 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:24,535-Speed 5596.29 samples/sec Loss 3.9351 LearningRate 0.0168 Epoch: 11 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:26,367-Speed 5592.32 samples/sec Loss 3.9976 LearningRate 0.0168 Epoch: 11 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:28,208-Speed 5564.68 samples/sec Loss 4.0873 LearningRate 0.0168 Epoch: 11 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:30,051-Speed 5557.44 samples/sec Loss 4.0600 LearningRate 0.0167 Epoch: 11 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:31,887-Speed 5577.62 samples/sec Loss 3.9802 LearningRate 0.0167 Epoch: 11 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:33,719-Speed 5591.87 samples/sec Loss 4.1020 LearningRate 0.0167 Epoch: 11 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:35,544-Speed 5614.21 samples/sec Loss 4.1500 LearningRate 0.0167 Epoch: 11 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:37,367-Speed 5617.23 samples/sec Loss 3.9144 LearningRate 0.0167 Epoch: 11 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:39,189-Speed 5625.28 samples/sec Loss 4.0076 LearningRate 0.0167 Epoch: 11 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:41,008-Speed 5631.11 samples/sec Loss 4.0859 LearningRate 0.0167 Epoch: 11 Global Step: 67240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:41:42,815-Speed 5667.06 samples/sec Loss 4.2236 LearningRate 0.0167 Epoch: 11 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:44,635-Speed 5627.82 samples/sec Loss 4.0382 LearningRate 0.0167 Epoch: 11 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:46,450-Speed 5643.72 samples/sec Loss 4.1873 LearningRate 0.0167 Epoch: 11 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:48,267-Speed 5639.11 samples/sec Loss 4.0605 LearningRate 0.0167 Epoch: 11 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:50,081-Speed 5644.78 samples/sec Loss 4.1379 LearningRate 0.0167 Epoch: 11 Global Step: 67290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:51,900-Speed 5634.04 samples/sec Loss 3.8987 LearningRate 0.0167 Epoch: 11 Global Step: 67300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:53,734-Speed 5585.17 samples/sec Loss 3.9920 LearningRate 0.0167 Epoch: 11 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:55,555-Speed 5624.89 samples/sec Loss 4.1673 LearningRate 0.0166 Epoch: 11 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:57,384-Speed 5601.26 samples/sec Loss 4.0126 LearningRate 0.0166 Epoch: 11 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:41:59,196-Speed 5651.98 samples/sec Loss 4.0759 LearningRate 0.0166 Epoch: 11 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:01,014-Speed 5634.43 samples/sec Loss 4.0003 LearningRate 0.0166 Epoch: 11 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:02,858-Speed 5554.49 samples/sec Loss 4.0894 LearningRate 0.0166 Epoch: 11 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:04,683-Speed 5614.59 samples/sec Loss 4.1011 LearningRate 0.0166 Epoch: 11 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:06,514-Speed 5593.06 samples/sec Loss 4.0125 LearningRate 0.0166 Epoch: 11 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:08,334-Speed 5629.26 samples/sec Loss 4.1065 LearningRate 0.0166 Epoch: 11 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:10,140-Speed 5671.60 samples/sec Loss 4.0033 LearningRate 0.0166 Epoch: 11 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:11,950-Speed 5658.59 samples/sec Loss 3.8839 LearningRate 0.0166 Epoch: 11 Global Step: 67410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:13,786-Speed 5580.54 samples/sec Loss 4.0206 LearningRate 0.0166 Epoch: 11 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:15,596-Speed 5658.90 samples/sec Loss 4.0138 LearningRate 0.0166 Epoch: 11 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:17,429-Speed 5588.91 samples/sec Loss 4.0183 LearningRate 0.0166 Epoch: 11 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:19,241-Speed 5653.41 samples/sec Loss 4.1507 LearningRate 0.0166 Epoch: 11 Global Step: 67450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:21,054-Speed 5649.53 samples/sec Loss 4.0132 LearningRate 0.0165 Epoch: 11 Global Step: 67460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:22,866-Speed 5651.48 samples/sec Loss 4.0787 LearningRate 0.0165 Epoch: 11 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:24,688-Speed 5623.90 samples/sec Loss 4.0942 LearningRate 0.0165 Epoch: 11 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:26,526-Speed 5574.13 samples/sec Loss 3.9828 LearningRate 0.0165 Epoch: 11 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:28,337-Speed 5655.85 samples/sec Loss 3.9299 LearningRate 0.0165 Epoch: 11 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:30,141-Speed 5678.00 samples/sec Loss 4.0806 LearningRate 0.0165 Epoch: 11 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:31,967-Speed 5609.46 samples/sec Loss 3.9108 LearningRate 0.0165 Epoch: 11 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:33,779-Speed 5653.01 samples/sec Loss 4.0409 LearningRate 0.0165 Epoch: 11 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:35,596-Speed 5635.94 samples/sec Loss 4.1255 LearningRate 0.0165 Epoch: 11 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:37,425-Speed 5599.83 samples/sec Loss 4.0674 LearningRate 0.0165 Epoch: 11 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:39,252-Speed 5608.96 samples/sec Loss 4.0284 LearningRate 0.0165 Epoch: 11 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:41,064-Speed 5653.36 samples/sec Loss 4.0883 LearningRate 0.0165 Epoch: 11 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:42,880-Speed 5638.99 samples/sec Loss 4.0772 LearningRate 0.0165 Epoch: 11 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:44,694-Speed 5649.44 samples/sec Loss 4.1107 LearningRate 0.0165 Epoch: 11 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:46,506-Speed 5653.56 samples/sec Loss 3.9903 LearningRate 0.0164 Epoch: 11 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:48,326-Speed 5625.41 samples/sec Loss 4.0287 LearningRate 0.0164 Epoch: 11 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:42:50,135-Speed 5664.80 samples/sec Loss 3.9983 LearningRate 0.0164 Epoch: 11 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:51,955-Speed 5626.23 samples/sec Loss 4.0533 LearningRate 0.0164 Epoch: 11 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:53,770-Speed 5644.76 samples/sec Loss 4.0781 LearningRate 0.0164 Epoch: 11 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:55,581-Speed 5654.73 samples/sec Loss 4.0782 LearningRate 0.0164 Epoch: 11 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:57,401-Speed 5629.75 samples/sec Loss 3.9586 LearningRate 0.0164 Epoch: 11 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:42:59,215-Speed 5646.32 samples/sec Loss 4.1006 LearningRate 0.0164 Epoch: 11 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:01,076-Speed 5503.45 samples/sec Loss 4.0172 LearningRate 0.0164 Epoch: 11 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:02,897-Speed 5626.96 samples/sec Loss 3.9895 LearningRate 0.0164 Epoch: 11 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:04,748-Speed 5533.94 samples/sec Loss 4.1810 LearningRate 0.0164 Epoch: 11 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:06,576-Speed 5604.15 samples/sec Loss 3.9820 LearningRate 0.0164 Epoch: 11 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:08,412-Speed 5579.49 samples/sec Loss 4.0307 LearningRate 0.0164 Epoch: 11 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:43:10,241-Speed 5600.85 samples/sec Loss 3.9025 LearningRate 0.0164 Epoch: 11 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:12,083-Speed 5560.81 samples/sec Loss 4.0696 LearningRate 0.0163 Epoch: 11 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:13,937-Speed 5523.88 samples/sec Loss 4.1559 LearningRate 0.0163 Epoch: 11 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:15,870-Speed 5298.89 samples/sec Loss 3.9234 LearningRate 0.0163 Epoch: 11 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:17,704-Speed 5587.54 samples/sec Loss 4.0383 LearningRate 0.0163 Epoch: 11 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:19,544-Speed 5564.73 samples/sec Loss 4.0319 LearningRate 0.0163 Epoch: 11 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:21,367-Speed 5619.41 samples/sec Loss 4.1196 LearningRate 0.0163 Epoch: 11 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:23,195-Speed 5603.74 samples/sec Loss 3.8739 LearningRate 0.0163 Epoch: 11 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:25,014-Speed 5630.46 samples/sec Loss 3.9422 LearningRate 0.0163 Epoch: 11 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:26,836-Speed 5623.44 samples/sec Loss 4.2219 LearningRate 0.0163 Epoch: 11 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:28,644-Speed 5667.18 samples/sec Loss 4.0331 LearningRate 0.0163 Epoch: 11 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:43:30,467-Speed 5619.09 samples/sec Loss 4.0694 LearningRate 0.0163 Epoch: 11 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:32,313-Speed 5547.26 samples/sec Loss 4.0691 LearningRate 0.0163 Epoch: 11 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:34,139-Speed 5609.67 samples/sec Loss 4.1544 LearningRate 0.0163 Epoch: 11 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:35,960-Speed 5625.88 samples/sec Loss 4.0198 LearningRate 0.0163 Epoch: 11 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:37,778-Speed 5635.38 samples/sec Loss 4.1138 LearningRate 0.0162 Epoch: 11 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:39,589-Speed 5655.62 samples/sec Loss 4.0376 LearningRate 0.0162 Epoch: 11 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:41,400-Speed 5654.51 samples/sec Loss 3.9660 LearningRate 0.0162 Epoch: 11 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:43,236-Speed 5579.80 samples/sec Loss 4.1833 LearningRate 0.0162 Epoch: 11 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:45,057-Speed 5627.17 samples/sec Loss 4.0620 LearningRate 0.0162 Epoch: 11 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:46,892-Speed 5580.83 samples/sec Loss 4.0288 LearningRate 0.0162 Epoch: 11 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:48,702-Speed 5658.07 samples/sec Loss 4.0272 LearningRate 0.0162 Epoch: 11 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:50,516-Speed 5648.43 samples/sec Loss 4.1174 LearningRate 0.0162 Epoch: 11 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:52,335-Speed 5630.86 samples/sec Loss 4.0132 LearningRate 0.0162 Epoch: 11 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:54,190-Speed 5521.82 samples/sec Loss 4.0802 LearningRate 0.0162 Epoch: 11 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:56,012-Speed 5624.07 samples/sec Loss 3.9370 LearningRate 0.0162 Epoch: 11 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:57,845-Speed 5587.11 samples/sec Loss 4.0809 LearningRate 0.0162 Epoch: 11 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:43:59,676-Speed 5593.17 samples/sec Loss 4.0267 LearningRate 0.0162 Epoch: 11 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:44:26,106-[lfw][68000]XNorm: 22.783929 Training: 2022-04-27 05:44:26,106-[lfw][68000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-27 05:44:26,107-[lfw][68000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:44:56,279-[cfp_fp][68000]XNorm: 20.524930 Training: 2022-04-27 05:44:56,280-[cfp_fp][68000]Accuracy-Flip: 0.96257+-0.00889 Training: 2022-04-27 05:44:56,280-[cfp_fp][68000]Accuracy-Highest: 0.96386 Training: 2022-04-27 05:45:22,336-[agedb_30][68000]XNorm: 22.781881 Training: 2022-04-27 05:45:22,337-[agedb_30][68000]Accuracy-Flip: 0.97483+-0.00626 Training: 2022-04-27 05:45:22,337-[agedb_30][68000]Accuracy-Highest: 0.97817 Training: 2022-04-27 05:45:24,168-Speed 121.20 samples/sec Loss 4.0359 LearningRate 0.0162 Epoch: 11 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:25,995-Speed 5606.67 samples/sec Loss 3.9886 LearningRate 0.0161 Epoch: 11 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:27,825-Speed 5595.44 samples/sec Loss 4.0592 LearningRate 0.0161 Epoch: 11 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:29,643-Speed 5633.74 samples/sec Loss 3.9196 LearningRate 0.0161 Epoch: 11 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:45:31,446-Speed 5682.79 samples/sec Loss 3.9861 LearningRate 0.0161 Epoch: 11 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:33,267-Speed 5623.63 samples/sec Loss 4.0640 LearningRate 0.0161 Epoch: 11 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:35,087-Speed 5631.03 samples/sec Loss 3.9414 LearningRate 0.0161 Epoch: 11 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:36,905-Speed 5631.85 samples/sec Loss 3.9536 LearningRate 0.0161 Epoch: 11 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:38,711-Speed 5674.06 samples/sec Loss 4.0717 LearningRate 0.0161 Epoch: 11 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:40,547-Speed 5577.05 samples/sec Loss 3.8548 LearningRate 0.0161 Epoch: 11 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:42,360-Speed 5651.39 samples/sec Loss 4.0474 LearningRate 0.0161 Epoch: 11 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:44,194-Speed 5585.82 samples/sec Loss 3.9758 LearningRate 0.0161 Epoch: 11 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:46,023-Speed 5598.65 samples/sec Loss 4.0273 LearningRate 0.0161 Epoch: 11 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:47,857-Speed 5586.17 samples/sec Loss 3.9963 LearningRate 0.0161 Epoch: 11 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:49,691-Speed 5584.91 samples/sec Loss 3.9529 LearningRate 0.0161 Epoch: 11 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:45:51,564-Speed 5468.99 samples/sec Loss 4.0381 LearningRate 0.0161 Epoch: 11 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:53,378-Speed 5648.85 samples/sec Loss 3.8302 LearningRate 0.0160 Epoch: 11 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:55,212-Speed 5583.71 samples/sec Loss 4.0484 LearningRate 0.0160 Epoch: 11 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:57,044-Speed 5591.57 samples/sec Loss 4.0249 LearningRate 0.0160 Epoch: 11 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:45:58,857-Speed 5650.78 samples/sec Loss 4.0923 LearningRate 0.0160 Epoch: 11 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:00,666-Speed 5661.09 samples/sec Loss 4.1515 LearningRate 0.0160 Epoch: 11 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:02,479-Speed 5652.58 samples/sec Loss 3.9643 LearningRate 0.0160 Epoch: 11 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:04,349-Speed 5476.42 samples/sec Loss 4.0473 LearningRate 0.0160 Epoch: 11 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:17,262-Speed 793.03 samples/sec Loss 3.5282 LearningRate 0.0160 Epoch: 12 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:19,447-Speed 4688.10 samples/sec Loss 3.2558 LearningRate 0.0160 Epoch: 12 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:21,284-Speed 5577.22 samples/sec Loss 3.3061 LearningRate 0.0160 Epoch: 12 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:46:23,128-Speed 5556.52 samples/sec Loss 3.3569 LearningRate 0.0160 Epoch: 12 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:46:24,958-Speed 5597.02 samples/sec Loss 3.3194 LearningRate 0.0160 Epoch: 12 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:26,819-Speed 5505.14 samples/sec Loss 3.3084 LearningRate 0.0160 Epoch: 12 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:28,670-Speed 5532.67 samples/sec Loss 3.3332 LearningRate 0.0160 Epoch: 12 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:30,521-Speed 5535.38 samples/sec Loss 3.3529 LearningRate 0.0159 Epoch: 12 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:32,347-Speed 5609.75 samples/sec Loss 3.4192 LearningRate 0.0159 Epoch: 12 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:34,189-Speed 5559.22 samples/sec Loss 3.3243 LearningRate 0.0159 Epoch: 12 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:36,029-Speed 5566.87 samples/sec Loss 3.3765 LearningRate 0.0159 Epoch: 12 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:37,886-Speed 5516.55 samples/sec Loss 3.2775 LearningRate 0.0159 Epoch: 12 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:39,728-Speed 5560.66 samples/sec Loss 3.3495 LearningRate 0.0159 Epoch: 12 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:41,542-Speed 5646.74 samples/sec Loss 3.4960 LearningRate 0.0159 Epoch: 12 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:43,385-Speed 5557.67 samples/sec Loss 3.4469 LearningRate 0.0159 Epoch: 12 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:45,199-Speed 5647.75 samples/sec Loss 3.4963 LearningRate 0.0159 Epoch: 12 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:47,028-Speed 5602.94 samples/sec Loss 3.4108 LearningRate 0.0159 Epoch: 12 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:48,950-Speed 5328.24 samples/sec Loss 3.5143 LearningRate 0.0159 Epoch: 12 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:50,811-Speed 5504.22 samples/sec Loss 3.4531 LearningRate 0.0159 Epoch: 12 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:52,631-Speed 5628.96 samples/sec Loss 3.3936 LearningRate 0.0159 Epoch: 12 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:54,458-Speed 5606.09 samples/sec Loss 3.5374 LearningRate 0.0159 Epoch: 12 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:56,288-Speed 5599.12 samples/sec Loss 3.3250 LearningRate 0.0158 Epoch: 12 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:58,122-Speed 5582.78 samples/sec Loss 3.4187 LearningRate 0.0158 Epoch: 12 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:46:59,984-Speed 5502.46 samples/sec Loss 3.3751 LearningRate 0.0158 Epoch: 12 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:01,810-Speed 5610.71 samples/sec Loss 3.3792 LearningRate 0.0158 Epoch: 12 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:03,627-Speed 5635.06 samples/sec Loss 3.4451 LearningRate 0.0158 Epoch: 12 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:05,446-Speed 5633.34 samples/sec Loss 3.3779 LearningRate 0.0158 Epoch: 12 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:07,271-Speed 5612.43 samples/sec Loss 3.4847 LearningRate 0.0158 Epoch: 12 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:09,120-Speed 5542.57 samples/sec Loss 3.3724 LearningRate 0.0158 Epoch: 12 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:10,969-Speed 5539.80 samples/sec Loss 3.3700 LearningRate 0.0158 Epoch: 12 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:12,801-Speed 5589.86 samples/sec Loss 3.4956 LearningRate 0.0158 Epoch: 12 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:14,638-Speed 5575.87 samples/sec Loss 3.4068 LearningRate 0.0158 Epoch: 12 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:16,453-Speed 5644.81 samples/sec Loss 3.5586 LearningRate 0.0158 Epoch: 12 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:18,280-Speed 5605.89 samples/sec Loss 3.4221 LearningRate 0.0158 Epoch: 12 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:20,115-Speed 5582.31 samples/sec Loss 3.4997 LearningRate 0.0158 Epoch: 12 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:21,944-Speed 5600.71 samples/sec Loss 3.5171 LearningRate 0.0157 Epoch: 12 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:23,771-Speed 5604.49 samples/sec Loss 3.4724 LearningRate 0.0157 Epoch: 12 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:25,585-Speed 5648.27 samples/sec Loss 3.4811 LearningRate 0.0157 Epoch: 12 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:27,407-Speed 5620.75 samples/sec Loss 3.4821 LearningRate 0.0157 Epoch: 12 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:29,223-Speed 5642.62 samples/sec Loss 3.4630 LearningRate 0.0157 Epoch: 12 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:31,040-Speed 5638.42 samples/sec Loss 3.6473 LearningRate 0.0157 Epoch: 12 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:32,860-Speed 5628.41 samples/sec Loss 3.4116 LearningRate 0.0157 Epoch: 12 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:34,687-Speed 5607.25 samples/sec Loss 3.4968 LearningRate 0.0157 Epoch: 12 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:36,499-Speed 5652.13 samples/sec Loss 3.6271 LearningRate 0.0157 Epoch: 12 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:38,315-Speed 5642.77 samples/sec Loss 3.5448 LearningRate 0.0157 Epoch: 12 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:40,154-Speed 5568.32 samples/sec Loss 3.5231 LearningRate 0.0157 Epoch: 12 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:41,979-Speed 5613.90 samples/sec Loss 3.6093 LearningRate 0.0157 Epoch: 12 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:43,809-Speed 5594.90 samples/sec Loss 3.4645 LearningRate 0.0157 Epoch: 12 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:45,628-Speed 5634.13 samples/sec Loss 3.5540 LearningRate 0.0157 Epoch: 12 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:47,468-Speed 5566.37 samples/sec Loss 3.4051 LearningRate 0.0157 Epoch: 12 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:49,294-Speed 5608.24 samples/sec Loss 3.4567 LearningRate 0.0156 Epoch: 12 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:51,115-Speed 5624.57 samples/sec Loss 3.5605 LearningRate 0.0156 Epoch: 12 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:47:52,944-Speed 5601.69 samples/sec Loss 3.6068 LearningRate 0.0156 Epoch: 12 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:54,767-Speed 5619.23 samples/sec Loss 3.5128 LearningRate 0.0156 Epoch: 12 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:56,585-Speed 5635.68 samples/sec Loss 3.6318 LearningRate 0.0156 Epoch: 12 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:47:58,399-Speed 5646.20 samples/sec Loss 3.5082 LearningRate 0.0156 Epoch: 12 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:00,209-Speed 5660.62 samples/sec Loss 3.5466 LearningRate 0.0156 Epoch: 12 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:02,019-Speed 5658.83 samples/sec Loss 3.6569 LearningRate 0.0156 Epoch: 12 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:03,843-Speed 5613.90 samples/sec Loss 3.5126 LearningRate 0.0156 Epoch: 12 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:05,671-Speed 5605.54 samples/sec Loss 3.4999 LearningRate 0.0156 Epoch: 12 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:07,499-Speed 5602.37 samples/sec Loss 3.6157 LearningRate 0.0156 Epoch: 12 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:09,309-Speed 5659.18 samples/sec Loss 3.5440 LearningRate 0.0156 Epoch: 12 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:11,146-Speed 5576.76 samples/sec Loss 3.5319 LearningRate 0.0156 Epoch: 12 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:12,969-Speed 5617.74 samples/sec Loss 3.5964 LearningRate 0.0156 Epoch: 12 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:14,789-Speed 5629.77 samples/sec Loss 3.5642 LearningRate 0.0155 Epoch: 12 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:16,610-Speed 5626.26 samples/sec Loss 3.6743 LearningRate 0.0155 Epoch: 12 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:18,430-Speed 5627.55 samples/sec Loss 3.5420 LearningRate 0.0155 Epoch: 12 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:20,279-Speed 5539.62 samples/sec Loss 3.4827 LearningRate 0.0155 Epoch: 12 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:22,111-Speed 5592.03 samples/sec Loss 3.5546 LearningRate 0.0155 Epoch: 12 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:23,943-Speed 5592.86 samples/sec Loss 3.5675 LearningRate 0.0155 Epoch: 12 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:25,757-Speed 5646.32 samples/sec Loss 3.5439 LearningRate 0.0155 Epoch: 12 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:27,587-Speed 5597.05 samples/sec Loss 3.5014 LearningRate 0.0155 Epoch: 12 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:29,408-Speed 5626.34 samples/sec Loss 3.6196 LearningRate 0.0155 Epoch: 12 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:31,224-Speed 5640.55 samples/sec Loss 3.5703 LearningRate 0.0155 Epoch: 12 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:33,053-Speed 5597.87 samples/sec Loss 3.5233 LearningRate 0.0155 Epoch: 12 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:34,896-Speed 5558.39 samples/sec Loss 3.5898 LearningRate 0.0155 Epoch: 12 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:36,716-Speed 5629.03 samples/sec Loss 3.4973 LearningRate 0.0155 Epoch: 12 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:38,536-Speed 5627.85 samples/sec Loss 3.6195 LearningRate 0.0155 Epoch: 12 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:40,346-Speed 5661.33 samples/sec Loss 3.5077 LearningRate 0.0155 Epoch: 12 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:42,181-Speed 5580.99 samples/sec Loss 3.5277 LearningRate 0.0154 Epoch: 12 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:43,999-Speed 5635.10 samples/sec Loss 3.5388 LearningRate 0.0154 Epoch: 12 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:45,840-Speed 5563.83 samples/sec Loss 3.6483 LearningRate 0.0154 Epoch: 12 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:47,679-Speed 5571.38 samples/sec Loss 3.5528 LearningRate 0.0154 Epoch: 12 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:48:49,499-Speed 5626.83 samples/sec Loss 3.5833 LearningRate 0.0154 Epoch: 12 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:51,346-Speed 5545.32 samples/sec Loss 3.6973 LearningRate 0.0154 Epoch: 12 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:53,162-Speed 5639.86 samples/sec Loss 3.6703 LearningRate 0.0154 Epoch: 12 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:54,988-Speed 5610.88 samples/sec Loss 3.5797 LearningRate 0.0154 Epoch: 12 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:56,814-Speed 5608.50 samples/sec Loss 3.6364 LearningRate 0.0154 Epoch: 12 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:48:58,636-Speed 5622.57 samples/sec Loss 3.6010 LearningRate 0.0154 Epoch: 12 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:00,511-Speed 5464.78 samples/sec Loss 3.7065 LearningRate 0.0154 Epoch: 12 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:02,346-Speed 5582.58 samples/sec Loss 3.5551 LearningRate 0.0154 Epoch: 12 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:04,173-Speed 5604.77 samples/sec Loss 3.5436 LearningRate 0.0154 Epoch: 12 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:06,569-Speed 4274.88 samples/sec Loss 3.5484 LearningRate 0.0154 Epoch: 12 Global Step: 69160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:08,390-Speed 5625.33 samples/sec Loss 3.6053 LearningRate 0.0153 Epoch: 12 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:10,222-Speed 5593.85 samples/sec Loss 3.5667 LearningRate 0.0153 Epoch: 12 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:12,045-Speed 5617.10 samples/sec Loss 3.5571 LearningRate 0.0153 Epoch: 12 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:13,857-Speed 5653.77 samples/sec Loss 3.4414 LearningRate 0.0153 Epoch: 12 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:15,670-Speed 5649.53 samples/sec Loss 3.5795 LearningRate 0.0153 Epoch: 12 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:17,480-Speed 5660.31 samples/sec Loss 3.5972 LearningRate 0.0153 Epoch: 12 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:19,317-Speed 5574.06 samples/sec Loss 3.6024 LearningRate 0.0153 Epoch: 12 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:21,143-Speed 5612.10 samples/sec Loss 3.5827 LearningRate 0.0153 Epoch: 12 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:23,015-Speed 5471.04 samples/sec Loss 3.6178 LearningRate 0.0153 Epoch: 12 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:24,856-Speed 5563.95 samples/sec Loss 3.5637 LearningRate 0.0153 Epoch: 12 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:26,697-Speed 5563.97 samples/sec Loss 3.5761 LearningRate 0.0153 Epoch: 12 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:49:28,542-Speed 5552.15 samples/sec Loss 3.5583 LearningRate 0.0153 Epoch: 12 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:49:30,345-Speed 5683.39 samples/sec Loss 3.5606 LearningRate 0.0153 Epoch: 12 Global Step: 69290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:32,163-Speed 5632.53 samples/sec Loss 3.6264 LearningRate 0.0153 Epoch: 12 Global Step: 69300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:33,982-Speed 5632.36 samples/sec Loss 3.6840 LearningRate 0.0153 Epoch: 12 Global Step: 69310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:35,798-Speed 5638.70 samples/sec Loss 3.7003 LearningRate 0.0152 Epoch: 12 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:37,619-Speed 5625.11 samples/sec Loss 3.6912 LearningRate 0.0152 Epoch: 12 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:39,444-Speed 5613.65 samples/sec Loss 3.5602 LearningRate 0.0152 Epoch: 12 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:41,256-Speed 5653.62 samples/sec Loss 3.6674 LearningRate 0.0152 Epoch: 12 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:43,069-Speed 5651.05 samples/sec Loss 3.6498 LearningRate 0.0152 Epoch: 12 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:44,897-Speed 5601.57 samples/sec Loss 3.5785 LearningRate 0.0152 Epoch: 12 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:46,732-Speed 5582.64 samples/sec Loss 3.6561 LearningRate 0.0152 Epoch: 12 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:48,569-Speed 5578.46 samples/sec Loss 3.6579 LearningRate 0.0152 Epoch: 12 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:49:50,397-Speed 5602.49 samples/sec Loss 3.5901 LearningRate 0.0152 Epoch: 12 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:52,217-Speed 5626.60 samples/sec Loss 3.6229 LearningRate 0.0152 Epoch: 12 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:54,104-Speed 5429.49 samples/sec Loss 3.7408 LearningRate 0.0152 Epoch: 12 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:55,967-Speed 5498.47 samples/sec Loss 3.6682 LearningRate 0.0152 Epoch: 12 Global Step: 69430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:57,795-Speed 5604.16 samples/sec Loss 3.6554 LearningRate 0.0152 Epoch: 12 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:49:59,626-Speed 5594.20 samples/sec Loss 3.6683 LearningRate 0.0152 Epoch: 12 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:01,456-Speed 5597.86 samples/sec Loss 3.6127 LearningRate 0.0151 Epoch: 12 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:03,290-Speed 5585.24 samples/sec Loss 3.5913 LearningRate 0.0151 Epoch: 12 Global Step: 69470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:05,120-Speed 5594.99 samples/sec Loss 3.6389 LearningRate 0.0151 Epoch: 12 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:06,963-Speed 5561.57 samples/sec Loss 3.6674 LearningRate 0.0151 Epoch: 12 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:08,791-Speed 5601.96 samples/sec Loss 3.5311 LearningRate 0.0151 Epoch: 12 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:50:10,662-Speed 5475.58 samples/sec Loss 3.6068 LearningRate 0.0151 Epoch: 12 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:50:12,489-Speed 5605.52 samples/sec Loss 3.6586 LearningRate 0.0151 Epoch: 12 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:14,331-Speed 5561.75 samples/sec Loss 3.5032 LearningRate 0.0151 Epoch: 12 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:16,147-Speed 5642.93 samples/sec Loss 3.7403 LearningRate 0.0151 Epoch: 12 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:17,983-Speed 5578.56 samples/sec Loss 3.6895 LearningRate 0.0151 Epoch: 12 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:19,862-Speed 5450.70 samples/sec Loss 3.7084 LearningRate 0.0151 Epoch: 12 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:21,765-Speed 5381.91 samples/sec Loss 3.5801 LearningRate 0.0151 Epoch: 12 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:23,584-Speed 5631.17 samples/sec Loss 3.6862 LearningRate 0.0151 Epoch: 12 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:25,409-Speed 5611.64 samples/sec Loss 3.6328 LearningRate 0.0151 Epoch: 12 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:27,232-Speed 5619.18 samples/sec Loss 3.7559 LearningRate 0.0151 Epoch: 12 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:29,067-Speed 5585.21 samples/sec Loss 3.7037 LearningRate 0.0150 Epoch: 12 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:30,881-Speed 5646.97 samples/sec Loss 3.7406 LearningRate 0.0150 Epoch: 12 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:50:32,687-Speed 5670.84 samples/sec Loss 3.5902 LearningRate 0.0150 Epoch: 12 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:34,503-Speed 5639.79 samples/sec Loss 3.7122 LearningRate 0.0150 Epoch: 12 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:36,327-Speed 5618.47 samples/sec Loss 3.5892 LearningRate 0.0150 Epoch: 12 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:38,146-Speed 5629.28 samples/sec Loss 3.7504 LearningRate 0.0150 Epoch: 12 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:39,962-Speed 5642.33 samples/sec Loss 3.6373 LearningRate 0.0150 Epoch: 12 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:41,783-Speed 5624.17 samples/sec Loss 3.6794 LearningRate 0.0150 Epoch: 12 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:43,618-Speed 5583.03 samples/sec Loss 3.6664 LearningRate 0.0150 Epoch: 12 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:45,469-Speed 5534.17 samples/sec Loss 3.6712 LearningRate 0.0150 Epoch: 12 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:47,317-Speed 5540.79 samples/sec Loss 3.5967 LearningRate 0.0150 Epoch: 12 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:49,142-Speed 5615.12 samples/sec Loss 3.6649 LearningRate 0.0150 Epoch: 12 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:50,967-Speed 5613.70 samples/sec Loss 3.6927 LearningRate 0.0150 Epoch: 12 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:52,810-Speed 5557.02 samples/sec Loss 3.6206 LearningRate 0.0150 Epoch: 12 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:54,647-Speed 5577.52 samples/sec Loss 3.5471 LearningRate 0.0149 Epoch: 12 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:56,495-Speed 5542.70 samples/sec Loss 3.7185 LearningRate 0.0149 Epoch: 12 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:50:58,319-Speed 5616.04 samples/sec Loss 3.7466 LearningRate 0.0149 Epoch: 12 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:00,154-Speed 5579.98 samples/sec Loss 3.6655 LearningRate 0.0149 Epoch: 12 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:01,990-Speed 5580.98 samples/sec Loss 3.6423 LearningRate 0.0149 Epoch: 12 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:03,821-Speed 5592.64 samples/sec Loss 3.6833 LearningRate 0.0149 Epoch: 12 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:05,658-Speed 5575.89 samples/sec Loss 3.6156 LearningRate 0.0149 Epoch: 12 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:07,482-Speed 5615.16 samples/sec Loss 3.6373 LearningRate 0.0149 Epoch: 12 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:09,835-Speed 4354.73 samples/sec Loss 3.6411 LearningRate 0.0149 Epoch: 12 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:51:12,581-Speed 3729.70 samples/sec Loss 3.6334 LearningRate 0.0149 Epoch: 12 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:14,758-Speed 4705.16 samples/sec Loss 3.6394 LearningRate 0.0149 Epoch: 12 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:16,585-Speed 5608.55 samples/sec Loss 3.6380 LearningRate 0.0149 Epoch: 12 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:18,415-Speed 5597.31 samples/sec Loss 3.6918 LearningRate 0.0149 Epoch: 12 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:20,236-Speed 5625.05 samples/sec Loss 3.6183 LearningRate 0.0149 Epoch: 12 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:22,073-Speed 5573.62 samples/sec Loss 3.5748 LearningRate 0.0149 Epoch: 12 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:23,904-Speed 5596.14 samples/sec Loss 3.7036 LearningRate 0.0148 Epoch: 12 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:25,747-Speed 5557.19 samples/sec Loss 3.6520 LearningRate 0.0148 Epoch: 12 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:27,594-Speed 5550.18 samples/sec Loss 3.5940 LearningRate 0.0148 Epoch: 12 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:29,430-Speed 5577.33 samples/sec Loss 3.7539 LearningRate 0.0148 Epoch: 12 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:31,269-Speed 5571.72 samples/sec Loss 3.6495 LearningRate 0.0148 Epoch: 12 Global Step: 69940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:51:33,088-Speed 5631.51 samples/sec Loss 3.5938 LearningRate 0.0148 Epoch: 12 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:34,901-Speed 5649.78 samples/sec Loss 3.6673 LearningRate 0.0148 Epoch: 12 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:36,719-Speed 5635.71 samples/sec Loss 3.5350 LearningRate 0.0148 Epoch: 12 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:38,532-Speed 5650.32 samples/sec Loss 3.6612 LearningRate 0.0148 Epoch: 12 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:40,372-Speed 5566.88 samples/sec Loss 3.6389 LearningRate 0.0148 Epoch: 12 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:51:42,183-Speed 5654.04 samples/sec Loss 3.6137 LearningRate 0.0148 Epoch: 12 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:52:08,136-[lfw][70000]XNorm: 21.865118 Training: 2022-04-27 05:52:08,136-[lfw][70000]Accuracy-Flip: 0.99717+-0.00308 Training: 2022-04-27 05:52:08,137-[lfw][70000]Accuracy-Highest: 0.99800 Training: 2022-04-27 05:52:38,214-[cfp_fp][70000]XNorm: 19.692456 Training: 2022-04-27 05:52:38,215-[cfp_fp][70000]Accuracy-Flip: 0.96543+-0.00763 Training: 2022-04-27 05:52:38,215-[cfp_fp][70000]Accuracy-Highest: 0.96543 Training: 2022-04-27 05:53:04,170-[agedb_30][70000]XNorm: 21.599249 Training: 2022-04-27 05:53:04,171-[agedb_30][70000]Accuracy-Flip: 0.97917+-0.00724 Training: 2022-04-27 05:53:04,171-[agedb_30][70000]Accuracy-Highest: 0.97917 Training: 2022-04-27 05:53:06,004-Speed 122.17 samples/sec Loss 3.7741 LearningRate 0.0148 Epoch: 12 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:07,812-Speed 5665.03 samples/sec Loss 3.6307 LearningRate 0.0148 Epoch: 12 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:09,632-Speed 5631.02 samples/sec Loss 3.6379 LearningRate 0.0148 Epoch: 12 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:11,449-Speed 5634.68 samples/sec Loss 3.7226 LearningRate 0.0148 Epoch: 12 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:13,264-Speed 5643.98 samples/sec Loss 3.6355 LearningRate 0.0147 Epoch: 12 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:15,100-Speed 5580.02 samples/sec Loss 3.6602 LearningRate 0.0147 Epoch: 12 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:16,927-Speed 5605.39 samples/sec Loss 3.6579 LearningRate 0.0147 Epoch: 12 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:18,749-Speed 5624.19 samples/sec Loss 3.6306 LearningRate 0.0147 Epoch: 12 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:20,562-Speed 5647.61 samples/sec Loss 3.6663 LearningRate 0.0147 Epoch: 12 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:22,377-Speed 5646.46 samples/sec Loss 3.8069 LearningRate 0.0147 Epoch: 12 Global Step: 70100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:24,206-Speed 5600.32 samples/sec Loss 3.6779 LearningRate 0.0147 Epoch: 12 Global Step: 70110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:26,023-Speed 5636.13 samples/sec Loss 3.7836 LearningRate 0.0147 Epoch: 12 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:27,845-Speed 5622.79 samples/sec Loss 3.6218 LearningRate 0.0147 Epoch: 12 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:29,682-Speed 5575.76 samples/sec Loss 3.5986 LearningRate 0.0147 Epoch: 12 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:31,512-Speed 5596.86 samples/sec Loss 3.6438 LearningRate 0.0147 Epoch: 12 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:53:33,348-Speed 5580.45 samples/sec Loss 3.6329 LearningRate 0.0147 Epoch: 12 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:35,171-Speed 5619.89 samples/sec Loss 3.6440 LearningRate 0.0147 Epoch: 12 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:36,995-Speed 5616.16 samples/sec Loss 3.7462 LearningRate 0.0147 Epoch: 12 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:38,855-Speed 5507.36 samples/sec Loss 3.7331 LearningRate 0.0147 Epoch: 12 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:40,734-Speed 5452.03 samples/sec Loss 3.7226 LearningRate 0.0146 Epoch: 12 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:42,632-Speed 5396.96 samples/sec Loss 3.6787 LearningRate 0.0146 Epoch: 12 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:44,488-Speed 5518.04 samples/sec Loss 3.6771 LearningRate 0.0146 Epoch: 12 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:46,308-Speed 5629.57 samples/sec Loss 3.6538 LearningRate 0.0146 Epoch: 12 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:48,124-Speed 5638.25 samples/sec Loss 3.7374 LearningRate 0.0146 Epoch: 12 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:49,934-Speed 5658.67 samples/sec Loss 3.6491 LearningRate 0.0146 Epoch: 12 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:51,762-Speed 5604.70 samples/sec Loss 3.6023 LearningRate 0.0146 Epoch: 12 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:53:53,588-Speed 5609.69 samples/sec Loss 3.7337 LearningRate 0.0146 Epoch: 12 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:55,454-Speed 5491.25 samples/sec Loss 3.6742 LearningRate 0.0146 Epoch: 12 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:57,292-Speed 5571.19 samples/sec Loss 3.5855 LearningRate 0.0146 Epoch: 12 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:53:59,137-Speed 5553.29 samples/sec Loss 3.6416 LearningRate 0.0146 Epoch: 12 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:00,987-Speed 5536.82 samples/sec Loss 3.6321 LearningRate 0.0146 Epoch: 12 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:02,827-Speed 5569.75 samples/sec Loss 3.6805 LearningRate 0.0146 Epoch: 12 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:04,665-Speed 5572.98 samples/sec Loss 3.6351 LearningRate 0.0146 Epoch: 12 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:06,503-Speed 5572.77 samples/sec Loss 3.8148 LearningRate 0.0146 Epoch: 12 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:08,329-Speed 5607.19 samples/sec Loss 3.6572 LearningRate 0.0145 Epoch: 12 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:10,171-Speed 5563.70 samples/sec Loss 3.7544 LearningRate 0.0145 Epoch: 12 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:11,985-Speed 5645.34 samples/sec Loss 3.6965 LearningRate 0.0145 Epoch: 12 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:13,900-Speed 5351.28 samples/sec Loss 3.5360 LearningRate 0.0145 Epoch: 12 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:15,767-Speed 5485.88 samples/sec Loss 3.6771 LearningRate 0.0145 Epoch: 12 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:17,622-Speed 5521.29 samples/sec Loss 3.5467 LearningRate 0.0145 Epoch: 12 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:19,440-Speed 5636.09 samples/sec Loss 3.6571 LearningRate 0.0145 Epoch: 12 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:21,269-Speed 5601.44 samples/sec Loss 3.5917 LearningRate 0.0145 Epoch: 12 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:23,089-Speed 5626.56 samples/sec Loss 3.5082 LearningRate 0.0145 Epoch: 12 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:24,908-Speed 5631.80 samples/sec Loss 3.7445 LearningRate 0.0145 Epoch: 12 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:26,723-Speed 5642.60 samples/sec Loss 3.5600 LearningRate 0.0145 Epoch: 12 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:28,533-Speed 5658.41 samples/sec Loss 3.6932 LearningRate 0.0145 Epoch: 12 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:30,352-Speed 5634.03 samples/sec Loss 3.6207 LearningRate 0.0145 Epoch: 12 Global Step: 70470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:54:32,155-Speed 5681.49 samples/sec Loss 3.6291 LearningRate 0.0145 Epoch: 12 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:33,983-Speed 5603.22 samples/sec Loss 3.5362 LearningRate 0.0145 Epoch: 12 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:35,798-Speed 5644.13 samples/sec Loss 3.7800 LearningRate 0.0144 Epoch: 12 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:37,609-Speed 5654.09 samples/sec Loss 3.5257 LearningRate 0.0144 Epoch: 12 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:39,431-Speed 5622.89 samples/sec Loss 3.6877 LearningRate 0.0144 Epoch: 12 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:41,264-Speed 5587.65 samples/sec Loss 3.7460 LearningRate 0.0144 Epoch: 12 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:43,089-Speed 5614.19 samples/sec Loss 3.7438 LearningRate 0.0144 Epoch: 12 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:44,910-Speed 5624.01 samples/sec Loss 3.7802 LearningRate 0.0144 Epoch: 12 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:46,735-Speed 5614.05 samples/sec Loss 3.6041 LearningRate 0.0144 Epoch: 12 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:48,566-Speed 5595.45 samples/sec Loss 3.6759 LearningRate 0.0144 Epoch: 12 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:50,391-Speed 5611.83 samples/sec Loss 3.7163 LearningRate 0.0144 Epoch: 12 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:52,213-Speed 5622.41 samples/sec Loss 3.7144 LearningRate 0.0144 Epoch: 12 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:54,041-Speed 5603.27 samples/sec Loss 3.5370 LearningRate 0.0144 Epoch: 12 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:55,878-Speed 5575.89 samples/sec Loss 3.7178 LearningRate 0.0144 Epoch: 12 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:57,771-Speed 5410.48 samples/sec Loss 3.7629 LearningRate 0.0144 Epoch: 12 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:54:59,606-Speed 5584.52 samples/sec Loss 3.6978 LearningRate 0.0144 Epoch: 12 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:01,435-Speed 5598.33 samples/sec Loss 3.6550 LearningRate 0.0144 Epoch: 12 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:03,259-Speed 5616.60 samples/sec Loss 3.6699 LearningRate 0.0143 Epoch: 12 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:05,078-Speed 5631.35 samples/sec Loss 3.6170 LearningRate 0.0143 Epoch: 12 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:06,889-Speed 5656.36 samples/sec Loss 3.5884 LearningRate 0.0143 Epoch: 12 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:08,722-Speed 5588.23 samples/sec Loss 3.6604 LearningRate 0.0143 Epoch: 12 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:55:10,530-Speed 5665.02 samples/sec Loss 3.6583 LearningRate 0.0143 Epoch: 12 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:12,342-Speed 5655.58 samples/sec Loss 3.6974 LearningRate 0.0143 Epoch: 12 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:14,165-Speed 5618.12 samples/sec Loss 3.7395 LearningRate 0.0143 Epoch: 12 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:15,990-Speed 5613.42 samples/sec Loss 3.6832 LearningRate 0.0143 Epoch: 12 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:17,826-Speed 5578.48 samples/sec Loss 3.7270 LearningRate 0.0143 Epoch: 12 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:19,660-Speed 5583.35 samples/sec Loss 3.6823 LearningRate 0.0143 Epoch: 12 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:21,499-Speed 5572.44 samples/sec Loss 3.5945 LearningRate 0.0143 Epoch: 12 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:23,339-Speed 5564.87 samples/sec Loss 3.5989 LearningRate 0.0143 Epoch: 12 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:25,173-Speed 5587.54 samples/sec Loss 3.6251 LearningRate 0.0143 Epoch: 12 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:27,007-Speed 5583.58 samples/sec Loss 3.7668 LearningRate 0.0143 Epoch: 12 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:28,843-Speed 5580.79 samples/sec Loss 3.6546 LearningRate 0.0143 Epoch: 12 Global Step: 70790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:55:30,658-Speed 5643.28 samples/sec Loss 3.5438 LearningRate 0.0142 Epoch: 12 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:55:32,463-Speed 5676.00 samples/sec Loss 3.7212 LearningRate 0.0142 Epoch: 12 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:34,274-Speed 5655.86 samples/sec Loss 3.7784 LearningRate 0.0142 Epoch: 12 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:36,095-Speed 5624.18 samples/sec Loss 3.6687 LearningRate 0.0142 Epoch: 12 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:37,928-Speed 5589.13 samples/sec Loss 3.6763 LearningRate 0.0142 Epoch: 12 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:39,747-Speed 5631.90 samples/sec Loss 3.6546 LearningRate 0.0142 Epoch: 12 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:41,560-Speed 5649.39 samples/sec Loss 3.6766 LearningRate 0.0142 Epoch: 12 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:43,392-Speed 5592.15 samples/sec Loss 3.6104 LearningRate 0.0142 Epoch: 12 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:45,211-Speed 5630.42 samples/sec Loss 3.6142 LearningRate 0.0142 Epoch: 12 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:47,026-Speed 5645.48 samples/sec Loss 3.5580 LearningRate 0.0142 Epoch: 12 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:48,845-Speed 5630.21 samples/sec Loss 3.6450 LearningRate 0.0142 Epoch: 12 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:50,683-Speed 5574.42 samples/sec Loss 3.7261 LearningRate 0.0142 Epoch: 12 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:52,512-Speed 5598.45 samples/sec Loss 3.7843 LearningRate 0.0142 Epoch: 12 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:55:54,333-Speed 5626.01 samples/sec Loss 3.6327 LearningRate 0.0142 Epoch: 12 Global Step: 70930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:55:56,162-Speed 5600.54 samples/sec Loss 3.7608 LearningRate 0.0142 Epoch: 12 Global Step: 70940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:55:57,997-Speed 5581.03 samples/sec Loss 3.7202 LearningRate 0.0141 Epoch: 12 Global Step: 70950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:55:59,820-Speed 5620.91 samples/sec Loss 3.6580 LearningRate 0.0141 Epoch: 12 Global Step: 70960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:01,670-Speed 5537.19 samples/sec Loss 3.7076 LearningRate 0.0141 Epoch: 12 Global Step: 70970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:03,503-Speed 5588.18 samples/sec Loss 3.5943 LearningRate 0.0141 Epoch: 12 Global Step: 70980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:05,344-Speed 5562.02 samples/sec Loss 3.5847 LearningRate 0.0141 Epoch: 12 Global Step: 70990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:07,165-Speed 5627.68 samples/sec Loss 3.5361 LearningRate 0.0141 Epoch: 12 Global Step: 71000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:08,997-Speed 5590.57 samples/sec Loss 3.5788 LearningRate 0.0141 Epoch: 12 Global Step: 71010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:10,816-Speed 5632.19 samples/sec Loss 3.8058 LearningRate 0.0141 Epoch: 12 Global Step: 71020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 05:56:12,634-Speed 5633.28 samples/sec Loss 3.6152 LearningRate 0.0141 Epoch: 12 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:14,509-Speed 5463.70 samples/sec Loss 3.6652 LearningRate 0.0141 Epoch: 12 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:16,419-Speed 5363.64 samples/sec Loss 3.7881 LearningRate 0.0141 Epoch: 12 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:18,328-Speed 5366.89 samples/sec Loss 3.5508 LearningRate 0.0141 Epoch: 12 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:20,179-Speed 5532.82 samples/sec Loss 3.7637 LearningRate 0.0141 Epoch: 12 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:22,005-Speed 5609.02 samples/sec Loss 3.6718 LearningRate 0.0141 Epoch: 12 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:23,827-Speed 5623.00 samples/sec Loss 3.7138 LearningRate 0.0141 Epoch: 12 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:25,664-Speed 5576.30 samples/sec Loss 3.6612 LearningRate 0.0140 Epoch: 12 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:27,486-Speed 5620.74 samples/sec Loss 3.6472 LearningRate 0.0140 Epoch: 12 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:29,309-Speed 5620.12 samples/sec Loss 3.6166 LearningRate 0.0140 Epoch: 12 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:31,144-Speed 5584.29 samples/sec Loss 3.6685 LearningRate 0.0140 Epoch: 12 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:56:32,954-Speed 5657.51 samples/sec Loss 3.8220 LearningRate 0.0140 Epoch: 12 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:34,773-Speed 5632.86 samples/sec Loss 3.6823 LearningRate 0.0140 Epoch: 12 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:36,599-Speed 5609.64 samples/sec Loss 3.6076 LearningRate 0.0140 Epoch: 12 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:38,419-Speed 5627.65 samples/sec Loss 3.7500 LearningRate 0.0140 Epoch: 12 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:40,263-Speed 5555.99 samples/sec Loss 3.7096 LearningRate 0.0140 Epoch: 12 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:42,081-Speed 5632.82 samples/sec Loss 3.5926 LearningRate 0.0140 Epoch: 12 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:43,904-Speed 5618.77 samples/sec Loss 3.7035 LearningRate 0.0140 Epoch: 12 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:45,748-Speed 5555.45 samples/sec Loss 3.7151 LearningRate 0.0140 Epoch: 12 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:47,592-Speed 5554.33 samples/sec Loss 3.5813 LearningRate 0.0140 Epoch: 12 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:49,415-Speed 5619.64 samples/sec Loss 3.7663 LearningRate 0.0140 Epoch: 12 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:51,226-Speed 5654.50 samples/sec Loss 3.5934 LearningRate 0.0140 Epoch: 12 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:53,113-Speed 5429.07 samples/sec Loss 3.7442 LearningRate 0.0139 Epoch: 12 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:55,037-Speed 5324.80 samples/sec Loss 3.7097 LearningRate 0.0139 Epoch: 12 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:56,870-Speed 5587.77 samples/sec Loss 3.6126 LearningRate 0.0139 Epoch: 12 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:56:58,697-Speed 5608.33 samples/sec Loss 3.7716 LearningRate 0.0139 Epoch: 12 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:00,544-Speed 5546.79 samples/sec Loss 3.7190 LearningRate 0.0139 Epoch: 12 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:02,397-Speed 5525.36 samples/sec Loss 3.6633 LearningRate 0.0139 Epoch: 12 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:04,236-Speed 5570.43 samples/sec Loss 3.6338 LearningRate 0.0139 Epoch: 12 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:06,055-Speed 5633.28 samples/sec Loss 3.6155 LearningRate 0.0139 Epoch: 12 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:07,879-Speed 5614.36 samples/sec Loss 3.6787 LearningRate 0.0139 Epoch: 12 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:09,693-Speed 5648.77 samples/sec Loss 3.6690 LearningRate 0.0139 Epoch: 12 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:11,520-Speed 5607.09 samples/sec Loss 3.6964 LearningRate 0.0139 Epoch: 12 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:13,351-Speed 5593.28 samples/sec Loss 3.6526 LearningRate 0.0139 Epoch: 12 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:15,180-Speed 5600.44 samples/sec Loss 3.6826 LearningRate 0.0139 Epoch: 12 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:16,995-Speed 5644.78 samples/sec Loss 3.8524 LearningRate 0.0139 Epoch: 12 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:18,835-Speed 5567.49 samples/sec Loss 3.6108 LearningRate 0.0139 Epoch: 12 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:20,677-Speed 5560.05 samples/sec Loss 3.6645 LearningRate 0.0138 Epoch: 12 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:22,502-Speed 5613.89 samples/sec Loss 3.6754 LearningRate 0.0138 Epoch: 12 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:24,331-Speed 5598.48 samples/sec Loss 3.6858 LearningRate 0.0138 Epoch: 12 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:26,170-Speed 5570.30 samples/sec Loss 3.6117 LearningRate 0.0138 Epoch: 12 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:27,995-Speed 5614.56 samples/sec Loss 3.6246 LearningRate 0.0138 Epoch: 12 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:57:29,800-Speed 5672.96 samples/sec Loss 3.6506 LearningRate 0.0138 Epoch: 12 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:31,639-Speed 5569.94 samples/sec Loss 3.6090 LearningRate 0.0138 Epoch: 12 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:33,474-Speed 5583.30 samples/sec Loss 3.6147 LearningRate 0.0138 Epoch: 12 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:35,315-Speed 5564.34 samples/sec Loss 3.8033 LearningRate 0.0138 Epoch: 12 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:37,241-Speed 5317.99 samples/sec Loss 3.6407 LearningRate 0.0138 Epoch: 12 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:39,164-Speed 5327.50 samples/sec Loss 3.6099 LearningRate 0.0138 Epoch: 12 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:41,004-Speed 5568.85 samples/sec Loss 3.6246 LearningRate 0.0138 Epoch: 12 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:42,847-Speed 5556.16 samples/sec Loss 3.6070 LearningRate 0.0138 Epoch: 12 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:44,666-Speed 5632.16 samples/sec Loss 3.6909 LearningRate 0.0138 Epoch: 12 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:46,492-Speed 5609.34 samples/sec Loss 3.6883 LearningRate 0.0138 Epoch: 12 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:48,330-Speed 5575.08 samples/sec Loss 3.5969 LearningRate 0.0138 Epoch: 12 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:57:50,170-Speed 5564.47 samples/sec Loss 3.6731 LearningRate 0.0137 Epoch: 12 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:52,001-Speed 5594.04 samples/sec Loss 3.5463 LearningRate 0.0137 Epoch: 12 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:53,824-Speed 5620.25 samples/sec Loss 3.5795 LearningRate 0.0137 Epoch: 12 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:55,661-Speed 5575.91 samples/sec Loss 3.7568 LearningRate 0.0137 Epoch: 12 Global Step: 71590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:57,501-Speed 5566.44 samples/sec Loss 3.6323 LearningRate 0.0137 Epoch: 12 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:57:59,359-Speed 5514.52 samples/sec Loss 3.7333 LearningRate 0.0137 Epoch: 12 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:01,195-Speed 5578.89 samples/sec Loss 3.7078 LearningRate 0.0137 Epoch: 12 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:03,026-Speed 5594.43 samples/sec Loss 3.6881 LearningRate 0.0137 Epoch: 12 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:04,887-Speed 5506.00 samples/sec Loss 3.5576 LearningRate 0.0137 Epoch: 12 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:06,737-Speed 5536.86 samples/sec Loss 3.7043 LearningRate 0.0137 Epoch: 12 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:08,573-Speed 5577.25 samples/sec Loss 3.7599 LearningRate 0.0137 Epoch: 12 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:10,393-Speed 5627.54 samples/sec Loss 3.7436 LearningRate 0.0137 Epoch: 12 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:12,227-Speed 5586.88 samples/sec Loss 3.7428 LearningRate 0.0137 Epoch: 12 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:14,051-Speed 5615.59 samples/sec Loss 3.7268 LearningRate 0.0137 Epoch: 12 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:15,894-Speed 5557.53 samples/sec Loss 3.6672 LearningRate 0.0137 Epoch: 12 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:17,723-Speed 5601.69 samples/sec Loss 3.6941 LearningRate 0.0136 Epoch: 12 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:19,550-Speed 5607.05 samples/sec Loss 3.7641 LearningRate 0.0136 Epoch: 12 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:21,472-Speed 5330.23 samples/sec Loss 3.5677 LearningRate 0.0136 Epoch: 12 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:23,394-Speed 5328.43 samples/sec Loss 3.7131 LearningRate 0.0136 Epoch: 12 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:25,316-Speed 5328.91 samples/sec Loss 3.5687 LearningRate 0.0136 Epoch: 12 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:27,221-Speed 5376.24 samples/sec Loss 3.6388 LearningRate 0.0136 Epoch: 12 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:29,144-Speed 5327.81 samples/sec Loss 3.7026 LearningRate 0.0136 Epoch: 12 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:31,067-Speed 5326.40 samples/sec Loss 3.7063 LearningRate 0.0136 Epoch: 12 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:32,914-Speed 5545.74 samples/sec Loss 3.6617 LearningRate 0.0136 Epoch: 12 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:34,737-Speed 5620.16 samples/sec Loss 3.6563 LearningRate 0.0136 Epoch: 12 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:36,563-Speed 5608.82 samples/sec Loss 3.6069 LearningRate 0.0136 Epoch: 12 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:38,385-Speed 5622.79 samples/sec Loss 3.6928 LearningRate 0.0136 Epoch: 12 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:40,228-Speed 5557.03 samples/sec Loss 3.5889 LearningRate 0.0136 Epoch: 12 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:42,059-Speed 5595.63 samples/sec Loss 3.5561 LearningRate 0.0136 Epoch: 12 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:43,896-Speed 5576.98 samples/sec Loss 3.6468 LearningRate 0.0136 Epoch: 12 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:45,735-Speed 5570.49 samples/sec Loss 3.6844 LearningRate 0.0135 Epoch: 12 Global Step: 71860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 05:58:47,565-Speed 5598.51 samples/sec Loss 3.6441 LearningRate 0.0135 Epoch: 12 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:49,391-Speed 5610.28 samples/sec Loss 3.7332 LearningRate 0.0135 Epoch: 12 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:51,229-Speed 5572.23 samples/sec Loss 3.6629 LearningRate 0.0135 Epoch: 12 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:53,047-Speed 5634.01 samples/sec Loss 3.7071 LearningRate 0.0135 Epoch: 12 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:54,884-Speed 5576.88 samples/sec Loss 3.6923 LearningRate 0.0135 Epoch: 12 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:56,733-Speed 5541.31 samples/sec Loss 3.7206 LearningRate 0.0135 Epoch: 12 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:58:58,567-Speed 5582.81 samples/sec Loss 3.5797 LearningRate 0.0135 Epoch: 12 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:00,411-Speed 5555.87 samples/sec Loss 3.7053 LearningRate 0.0135 Epoch: 12 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:02,243-Speed 5591.87 samples/sec Loss 3.5710 LearningRate 0.0135 Epoch: 12 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:04,071-Speed 5604.00 samples/sec Loss 3.6314 LearningRate 0.0135 Epoch: 12 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:05,891-Speed 5628.35 samples/sec Loss 3.6507 LearningRate 0.0135 Epoch: 12 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:07,721-Speed 5598.73 samples/sec Loss 3.5902 LearningRate 0.0135 Epoch: 12 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:09,544-Speed 5619.69 samples/sec Loss 3.6358 LearningRate 0.0135 Epoch: 12 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:11,377-Speed 5586.12 samples/sec Loss 3.6165 LearningRate 0.0135 Epoch: 12 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 05:59:37,629-[lfw][72000]XNorm: 21.907372 Training: 2022-04-27 05:59:37,629-[lfw][72000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-27 05:59:37,630-[lfw][72000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:00:08,141-[cfp_fp][72000]XNorm: 19.808723 Training: 2022-04-27 06:00:08,142-[cfp_fp][72000]Accuracy-Flip: 0.96386+-0.01002 Training: 2022-04-27 06:00:08,142-[cfp_fp][72000]Accuracy-Highest: 0.96543 Training: 2022-04-27 06:00:34,470-[agedb_30][72000]XNorm: 21.776772 Training: 2022-04-27 06:00:34,470-[agedb_30][72000]Accuracy-Flip: 0.97883+-0.00882 Training: 2022-04-27 06:00:34,471-[agedb_30][72000]Accuracy-Highest: 0.97917 Training: 2022-04-27 06:00:36,379-Speed 120.47 samples/sec Loss 3.7351 LearningRate 0.0135 Epoch: 12 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:38,190-Speed 5656.74 samples/sec Loss 3.6815 LearningRate 0.0134 Epoch: 12 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:40,029-Speed 5570.94 samples/sec Loss 3.6099 LearningRate 0.0134 Epoch: 12 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:41,917-Speed 5423.53 samples/sec Loss 3.6174 LearningRate 0.0134 Epoch: 12 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:43,800-Speed 5441.59 samples/sec Loss 3.6142 LearningRate 0.0134 Epoch: 12 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:45,625-Speed 5609.73 samples/sec Loss 3.6035 LearningRate 0.0134 Epoch: 12 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:47,429-Speed 5680.06 samples/sec Loss 3.5804 LearningRate 0.0134 Epoch: 12 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:49,257-Speed 5602.67 samples/sec Loss 3.5877 LearningRate 0.0134 Epoch: 12 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:51,074-Speed 5637.40 samples/sec Loss 3.5866 LearningRate 0.0134 Epoch: 12 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:52,884-Speed 5660.43 samples/sec Loss 3.6825 LearningRate 0.0134 Epoch: 12 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:54,701-Speed 5635.77 samples/sec Loss 3.5286 LearningRate 0.0134 Epoch: 12 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:00:56,526-Speed 5612.57 samples/sec Loss 3.5627 LearningRate 0.0134 Epoch: 12 Global Step: 72120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:00:58,342-Speed 5640.61 samples/sec Loss 3.5888 LearningRate 0.0134 Epoch: 12 Global Step: 72130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:00,164-Speed 5622.68 samples/sec Loss 3.6896 LearningRate 0.0134 Epoch: 12 Global Step: 72140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:01,989-Speed 5614.42 samples/sec Loss 3.6889 LearningRate 0.0134 Epoch: 12 Global Step: 72150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:03,801-Speed 5651.51 samples/sec Loss 3.6710 LearningRate 0.0134 Epoch: 12 Global Step: 72160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:05,640-Speed 5571.23 samples/sec Loss 3.5687 LearningRate 0.0133 Epoch: 12 Global Step: 72170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:07,452-Speed 5654.35 samples/sec Loss 3.6610 LearningRate 0.0133 Epoch: 12 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:09,279-Speed 5604.76 samples/sec Loss 3.6533 LearningRate 0.0133 Epoch: 12 Global Step: 72190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:11,095-Speed 5641.06 samples/sec Loss 3.5383 LearningRate 0.0133 Epoch: 12 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:12,911-Speed 5641.83 samples/sec Loss 3.6292 LearningRate 0.0133 Epoch: 12 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 06:01:14,722-Speed 5655.04 samples/sec Loss 3.7124 LearningRate 0.0133 Epoch: 12 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:16,530-Speed 5665.99 samples/sec Loss 3.7635 LearningRate 0.0133 Epoch: 12 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:18,345-Speed 5643.16 samples/sec Loss 3.5492 LearningRate 0.0133 Epoch: 12 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:20,213-Speed 5485.34 samples/sec Loss 3.7207 LearningRate 0.0133 Epoch: 12 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:22,136-Speed 5325.73 samples/sec Loss 3.5894 LearningRate 0.0133 Epoch: 12 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:24,002-Speed 5488.84 samples/sec Loss 3.6154 LearningRate 0.0133 Epoch: 12 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:25,820-Speed 5634.50 samples/sec Loss 3.5666 LearningRate 0.0133 Epoch: 12 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:27,639-Speed 5632.08 samples/sec Loss 3.6472 LearningRate 0.0133 Epoch: 12 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:29,453-Speed 5645.90 samples/sec Loss 3.5759 LearningRate 0.0133 Epoch: 12 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:31,278-Speed 5614.85 samples/sec Loss 3.6489 LearningRate 0.0133 Epoch: 12 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:33,102-Speed 5614.76 samples/sec Loss 3.6548 LearningRate 0.0133 Epoch: 12 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:34,938-Speed 5579.04 samples/sec Loss 3.6993 LearningRate 0.0132 Epoch: 12 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:36,764-Speed 5608.35 samples/sec Loss 3.6561 LearningRate 0.0132 Epoch: 12 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:38,575-Speed 5656.26 samples/sec Loss 3.6066 LearningRate 0.0132 Epoch: 12 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:40,398-Speed 5620.02 samples/sec Loss 3.5577 LearningRate 0.0132 Epoch: 12 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:42,224-Speed 5611.54 samples/sec Loss 3.5985 LearningRate 0.0132 Epoch: 12 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:44,060-Speed 5578.56 samples/sec Loss 3.6475 LearningRate 0.0132 Epoch: 12 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:45,876-Speed 5640.03 samples/sec Loss 3.5924 LearningRate 0.0132 Epoch: 12 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:47,700-Speed 5618.54 samples/sec Loss 3.6302 LearningRate 0.0132 Epoch: 12 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:49,526-Speed 5608.35 samples/sec Loss 3.5708 LearningRate 0.0132 Epoch: 12 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:51,362-Speed 5578.41 samples/sec Loss 3.5766 LearningRate 0.0132 Epoch: 12 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:01:53,204-Speed 5561.86 samples/sec Loss 3.5325 LearningRate 0.0132 Epoch: 12 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:55,041-Speed 5574.18 samples/sec Loss 3.5403 LearningRate 0.0132 Epoch: 12 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:56,861-Speed 5630.52 samples/sec Loss 3.6170 LearningRate 0.0132 Epoch: 12 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:01:58,685-Speed 5614.68 samples/sec Loss 3.6839 LearningRate 0.0132 Epoch: 12 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:00,510-Speed 5611.78 samples/sec Loss 3.5080 LearningRate 0.0132 Epoch: 12 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:02,341-Speed 5596.94 samples/sec Loss 3.6466 LearningRate 0.0132 Epoch: 12 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:04,161-Speed 5626.05 samples/sec Loss 3.4963 LearningRate 0.0131 Epoch: 12 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:05,992-Speed 5595.65 samples/sec Loss 3.6210 LearningRate 0.0131 Epoch: 12 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:07,811-Speed 5629.31 samples/sec Loss 3.6355 LearningRate 0.0131 Epoch: 12 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:09,621-Speed 5662.48 samples/sec Loss 3.6398 LearningRate 0.0131 Epoch: 12 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:11,432-Speed 5656.54 samples/sec Loss 3.5389 LearningRate 0.0131 Epoch: 12 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:13,243-Speed 5656.83 samples/sec Loss 3.6228 LearningRate 0.0131 Epoch: 12 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:15,062-Speed 5630.09 samples/sec Loss 3.6139 LearningRate 0.0131 Epoch: 12 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:16,887-Speed 5612.79 samples/sec Loss 3.5629 LearningRate 0.0131 Epoch: 12 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:18,698-Speed 5657.19 samples/sec Loss 3.6139 LearningRate 0.0131 Epoch: 12 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:20,528-Speed 5597.73 samples/sec Loss 3.6245 LearningRate 0.0131 Epoch: 12 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:22,364-Speed 5578.89 samples/sec Loss 3.5709 LearningRate 0.0131 Epoch: 12 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:24,201-Speed 5575.29 samples/sec Loss 3.4997 LearningRate 0.0131 Epoch: 12 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:26,068-Speed 5487.86 samples/sec Loss 3.6125 LearningRate 0.0131 Epoch: 12 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:27,896-Speed 5601.14 samples/sec Loss 3.6580 LearningRate 0.0131 Epoch: 12 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:29,756-Speed 5508.31 samples/sec Loss 3.5800 LearningRate 0.0131 Epoch: 12 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:02:31,572-Speed 5641.73 samples/sec Loss 3.6044 LearningRate 0.0130 Epoch: 12 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:33,387-Speed 5643.70 samples/sec Loss 3.6628 LearningRate 0.0130 Epoch: 12 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:35,205-Speed 5634.13 samples/sec Loss 3.6013 LearningRate 0.0130 Epoch: 12 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:37,017-Speed 5651.95 samples/sec Loss 3.5365 LearningRate 0.0130 Epoch: 12 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:38,839-Speed 5623.79 samples/sec Loss 3.5361 LearningRate 0.0130 Epoch: 12 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:40,674-Speed 5581.51 samples/sec Loss 3.6520 LearningRate 0.0130 Epoch: 12 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:42,483-Speed 5661.50 samples/sec Loss 3.6598 LearningRate 0.0130 Epoch: 12 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:44,330-Speed 5545.75 samples/sec Loss 3.6417 LearningRate 0.0130 Epoch: 12 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:46,169-Speed 5570.03 samples/sec Loss 3.5939 LearningRate 0.0130 Epoch: 12 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:47,994-Speed 5614.26 samples/sec Loss 3.5109 LearningRate 0.0130 Epoch: 12 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:49,821-Speed 5606.11 samples/sec Loss 3.6049 LearningRate 0.0130 Epoch: 12 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:51,639-Speed 5633.69 samples/sec Loss 3.5551 LearningRate 0.0130 Epoch: 12 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:53,489-Speed 5538.05 samples/sec Loss 3.7208 LearningRate 0.0130 Epoch: 12 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:55,303-Speed 5647.79 samples/sec Loss 3.5098 LearningRate 0.0130 Epoch: 12 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:57,130-Speed 5606.45 samples/sec Loss 3.5407 LearningRate 0.0130 Epoch: 12 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:02:58,956-Speed 5609.69 samples/sec Loss 3.6752 LearningRate 0.0130 Epoch: 12 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:00,773-Speed 5636.82 samples/sec Loss 3.5484 LearningRate 0.0129 Epoch: 12 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:02,588-Speed 5643.12 samples/sec Loss 3.6242 LearningRate 0.0129 Epoch: 12 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:04,410-Speed 5622.73 samples/sec Loss 3.6491 LearningRate 0.0129 Epoch: 12 Global Step: 72820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:06,287-Speed 5457.17 samples/sec Loss 3.5706 LearningRate 0.0129 Epoch: 12 Global Step: 72830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:08,113-Speed 5608.78 samples/sec Loss 3.5832 LearningRate 0.0129 Epoch: 12 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:03:09,928-Speed 5645.29 samples/sec Loss 3.5217 LearningRate 0.0129 Epoch: 12 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:03:11,731-Speed 5682.29 samples/sec Loss 3.5577 LearningRate 0.0129 Epoch: 12 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:13,556-Speed 5613.29 samples/sec Loss 3.5527 LearningRate 0.0129 Epoch: 12 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:15,383-Speed 5606.26 samples/sec Loss 3.6711 LearningRate 0.0129 Epoch: 12 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:17,212-Speed 5599.45 samples/sec Loss 3.6287 LearningRate 0.0129 Epoch: 12 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:19,047-Speed 5585.46 samples/sec Loss 3.5181 LearningRate 0.0129 Epoch: 12 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:20,879-Speed 5590.49 samples/sec Loss 3.7087 LearningRate 0.0129 Epoch: 12 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:22,692-Speed 5647.84 samples/sec Loss 3.6757 LearningRate 0.0129 Epoch: 12 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:24,508-Speed 5642.90 samples/sec Loss 3.5768 LearningRate 0.0129 Epoch: 12 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:26,328-Speed 5628.52 samples/sec Loss 3.6803 LearningRate 0.0129 Epoch: 12 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:28,138-Speed 5658.11 samples/sec Loss 3.5095 LearningRate 0.0129 Epoch: 12 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:29,938-Speed 5691.77 samples/sec Loss 3.5486 LearningRate 0.0128 Epoch: 12 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:31,752-Speed 5647.36 samples/sec Loss 3.6744 LearningRate 0.0128 Epoch: 12 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:33,586-Speed 5583.75 samples/sec Loss 3.6300 LearningRate 0.0128 Epoch: 12 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:35,413-Speed 5607.13 samples/sec Loss 3.6880 LearningRate 0.0128 Epoch: 12 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:37,235-Speed 5621.61 samples/sec Loss 3.6395 LearningRate 0.0128 Epoch: 12 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:39,044-Speed 5662.78 samples/sec Loss 3.7277 LearningRate 0.0128 Epoch: 12 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:40,862-Speed 5634.00 samples/sec Loss 3.6745 LearningRate 0.0128 Epoch: 12 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:42,681-Speed 5632.20 samples/sec Loss 3.6313 LearningRate 0.0128 Epoch: 12 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:44,507-Speed 5609.90 samples/sec Loss 3.6353 LearningRate 0.0128 Epoch: 12 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:46,360-Speed 5528.14 samples/sec Loss 3.4840 LearningRate 0.0128 Epoch: 12 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:48,170-Speed 5661.00 samples/sec Loss 3.6462 LearningRate 0.0128 Epoch: 12 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:49,992-Speed 5621.72 samples/sec Loss 3.6832 LearningRate 0.0128 Epoch: 12 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:51,825-Speed 5585.52 samples/sec Loss 3.6225 LearningRate 0.0128 Epoch: 12 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:53,675-Speed 5539.44 samples/sec Loss 3.5711 LearningRate 0.0128 Epoch: 12 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:55,501-Speed 5609.12 samples/sec Loss 3.5759 LearningRate 0.0128 Epoch: 12 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:57,323-Speed 5621.45 samples/sec Loss 3.5282 LearningRate 0.0128 Epoch: 12 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:03:59,145-Speed 5621.11 samples/sec Loss 3.7241 LearningRate 0.0127 Epoch: 12 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:00,977-Speed 5593.04 samples/sec Loss 3.5618 LearningRate 0.0127 Epoch: 12 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:02,814-Speed 5575.53 samples/sec Loss 3.6412 LearningRate 0.0127 Epoch: 12 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:04,630-Speed 5641.78 samples/sec Loss 3.6559 LearningRate 0.0127 Epoch: 12 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:06,435-Speed 5673.04 samples/sec Loss 3.5886 LearningRate 0.0127 Epoch: 12 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:08,254-Speed 5633.32 samples/sec Loss 3.5567 LearningRate 0.0127 Epoch: 12 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:10,092-Speed 5573.54 samples/sec Loss 3.6227 LearningRate 0.0127 Epoch: 12 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:11,903-Speed 5655.58 samples/sec Loss 3.4938 LearningRate 0.0127 Epoch: 12 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:13,724-Speed 5624.76 samples/sec Loss 3.6276 LearningRate 0.0127 Epoch: 12 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:15,545-Speed 5626.11 samples/sec Loss 3.5708 LearningRate 0.0127 Epoch: 12 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:17,364-Speed 5630.93 samples/sec Loss 3.4671 LearningRate 0.0127 Epoch: 12 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:19,184-Speed 5626.90 samples/sec Loss 3.5487 LearningRate 0.0127 Epoch: 12 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:21,006-Speed 5624.22 samples/sec Loss 3.4993 LearningRate 0.0127 Epoch: 12 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:22,838-Speed 5590.20 samples/sec Loss 3.5367 LearningRate 0.0127 Epoch: 12 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:24,657-Speed 5631.38 samples/sec Loss 3.6368 LearningRate 0.0127 Epoch: 12 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:04:26,473-Speed 5642.78 samples/sec Loss 3.5861 LearningRate 0.0127 Epoch: 12 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:28,289-Speed 5639.79 samples/sec Loss 3.5868 LearningRate 0.0126 Epoch: 12 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:30,105-Speed 5640.17 samples/sec Loss 3.5788 LearningRate 0.0126 Epoch: 12 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:31,928-Speed 5620.18 samples/sec Loss 3.5926 LearningRate 0.0126 Epoch: 12 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:33,741-Speed 5647.67 samples/sec Loss 3.5986 LearningRate 0.0126 Epoch: 12 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:35,560-Speed 5631.83 samples/sec Loss 3.5423 LearningRate 0.0126 Epoch: 12 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:37,367-Speed 5668.78 samples/sec Loss 3.5262 LearningRate 0.0126 Epoch: 12 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:39,190-Speed 5619.84 samples/sec Loss 3.6998 LearningRate 0.0126 Epoch: 12 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:41,034-Speed 5555.31 samples/sec Loss 3.5739 LearningRate 0.0126 Epoch: 12 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:42,855-Speed 5623.08 samples/sec Loss 3.5748 LearningRate 0.0126 Epoch: 12 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:44,668-Speed 5650.08 samples/sec Loss 3.5495 LearningRate 0.0126 Epoch: 12 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:46,481-Speed 5651.60 samples/sec Loss 3.5479 LearningRate 0.0126 Epoch: 12 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:48,291-Speed 5661.53 samples/sec Loss 3.5598 LearningRate 0.0126 Epoch: 12 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:50,156-Speed 5490.34 samples/sec Loss 3.5331 LearningRate 0.0126 Epoch: 12 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:51,989-Speed 5589.29 samples/sec Loss 3.5583 LearningRate 0.0126 Epoch: 12 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:53,835-Speed 5547.76 samples/sec Loss 3.5466 LearningRate 0.0126 Epoch: 12 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:55,672-Speed 5575.57 samples/sec Loss 3.5270 LearningRate 0.0126 Epoch: 12 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:57,494-Speed 5623.34 samples/sec Loss 3.5416 LearningRate 0.0125 Epoch: 12 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:04:59,320-Speed 5608.98 samples/sec Loss 3.5140 LearningRate 0.0125 Epoch: 12 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:01,146-Speed 5608.92 samples/sec Loss 3.5220 LearningRate 0.0125 Epoch: 12 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:02,964-Speed 5636.00 samples/sec Loss 3.5401 LearningRate 0.0125 Epoch: 12 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-27 06:05:04,773-Speed 5662.51 samples/sec Loss 3.4073 LearningRate 0.0125 Epoch: 12 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:06,589-Speed 5640.41 samples/sec Loss 3.5462 LearningRate 0.0125 Epoch: 12 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:08,412-Speed 5618.38 samples/sec Loss 3.6665 LearningRate 0.0125 Epoch: 12 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:10,226-Speed 5648.24 samples/sec Loss 3.4653 LearningRate 0.0125 Epoch: 12 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:12,046-Speed 5628.13 samples/sec Loss 3.6293 LearningRate 0.0125 Epoch: 12 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:13,860-Speed 5647.35 samples/sec Loss 3.5193 LearningRate 0.0125 Epoch: 12 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:15,668-Speed 5663.72 samples/sec Loss 3.5628 LearningRate 0.0125 Epoch: 12 Global Step: 73540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:17,520-Speed 5533.25 samples/sec Loss 3.5718 LearningRate 0.0125 Epoch: 12 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:19,358-Speed 5572.62 samples/sec Loss 3.4294 LearningRate 0.0125 Epoch: 12 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:21,210-Speed 5530.30 samples/sec Loss 3.5675 LearningRate 0.0125 Epoch: 12 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:23,034-Speed 5616.10 samples/sec Loss 3.5138 LearningRate 0.0125 Epoch: 12 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:24,872-Speed 5572.55 samples/sec Loss 3.6458 LearningRate 0.0125 Epoch: 12 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:26,707-Speed 5581.82 samples/sec Loss 3.3997 LearningRate 0.0124 Epoch: 12 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:28,533-Speed 5609.67 samples/sec Loss 3.4863 LearningRate 0.0124 Epoch: 12 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:30,360-Speed 5609.47 samples/sec Loss 3.5266 LearningRate 0.0124 Epoch: 12 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:32,213-Speed 5528.44 samples/sec Loss 3.4802 LearningRate 0.0124 Epoch: 12 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:34,047-Speed 5583.07 samples/sec Loss 3.5136 LearningRate 0.0124 Epoch: 12 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:35,862-Speed 5643.59 samples/sec Loss 3.5619 LearningRate 0.0124 Epoch: 12 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:37,692-Speed 5597.03 samples/sec Loss 3.4493 LearningRate 0.0124 Epoch: 12 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:39,529-Speed 5576.19 samples/sec Loss 3.5472 LearningRate 0.0124 Epoch: 12 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:41,338-Speed 5663.35 samples/sec Loss 3.5575 LearningRate 0.0124 Epoch: 12 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:43,159-Speed 5624.43 samples/sec Loss 3.4424 LearningRate 0.0124 Epoch: 12 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:44,986-Speed 5608.12 samples/sec Loss 3.4252 LearningRate 0.0124 Epoch: 12 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:46,800-Speed 5645.97 samples/sec Loss 3.4503 LearningRate 0.0124 Epoch: 12 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:48,630-Speed 5599.89 samples/sec Loss 3.5784 LearningRate 0.0124 Epoch: 12 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:50,443-Speed 5650.08 samples/sec Loss 3.5460 LearningRate 0.0124 Epoch: 12 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:52,250-Speed 5666.40 samples/sec Loss 3.5763 LearningRate 0.0124 Epoch: 12 Global Step: 73740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:54,060-Speed 5660.78 samples/sec Loss 3.4613 LearningRate 0.0124 Epoch: 12 Global Step: 73750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:55,877-Speed 5637.25 samples/sec Loss 3.5146 LearningRate 0.0123 Epoch: 12 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:57,695-Speed 5633.77 samples/sec Loss 3.5379 LearningRate 0.0123 Epoch: 12 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:05:59,555-Speed 5505.80 samples/sec Loss 3.6010 LearningRate 0.0123 Epoch: 12 Global Step: 73780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:01,421-Speed 5491.75 samples/sec Loss 3.4767 LearningRate 0.0123 Epoch: 12 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:03,308-Speed 5427.36 samples/sec Loss 3.4448 LearningRate 0.0123 Epoch: 12 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:05,127-Speed 5632.85 samples/sec Loss 3.5469 LearningRate 0.0123 Epoch: 12 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:06,942-Speed 5640.80 samples/sec Loss 3.6000 LearningRate 0.0123 Epoch: 12 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:08,768-Speed 5610.93 samples/sec Loss 3.6282 LearningRate 0.0123 Epoch: 12 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:10,588-Speed 5628.74 samples/sec Loss 3.3539 LearningRate 0.0123 Epoch: 12 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:12,433-Speed 5553.84 samples/sec Loss 3.5555 LearningRate 0.0123 Epoch: 12 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:14,258-Speed 5612.89 samples/sec Loss 3.5496 LearningRate 0.0123 Epoch: 12 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:06:16,071-Speed 5649.62 samples/sec Loss 3.5099 LearningRate 0.0123 Epoch: 12 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:17,893-Speed 5622.30 samples/sec Loss 3.6031 LearningRate 0.0123 Epoch: 12 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:19,720-Speed 5605.84 samples/sec Loss 3.4014 LearningRate 0.0123 Epoch: 12 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:21,536-Speed 5639.15 samples/sec Loss 3.4222 LearningRate 0.0123 Epoch: 12 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:23,427-Speed 5419.15 samples/sec Loss 3.4035 LearningRate 0.0123 Epoch: 12 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:35,146-Speed 873.85 samples/sec Loss 3.2095 LearningRate 0.0122 Epoch: 13 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:36,995-Speed 5541.68 samples/sec Loss 2.8364 LearningRate 0.0122 Epoch: 13 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:38,824-Speed 5598.40 samples/sec Loss 2.8550 LearningRate 0.0122 Epoch: 13 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:40,705-Speed 5445.29 samples/sec Loss 2.8814 LearningRate 0.0122 Epoch: 13 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:42,589-Speed 5438.23 samples/sec Loss 2.9108 LearningRate 0.0122 Epoch: 13 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:44,418-Speed 5603.15 samples/sec Loss 2.9216 LearningRate 0.0122 Epoch: 13 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:46,234-Speed 5638.41 samples/sec Loss 2.8080 LearningRate 0.0122 Epoch: 13 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:06:48,064-Speed 5598.69 samples/sec Loss 2.8353 LearningRate 0.0122 Epoch: 13 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:06:49,898-Speed 5585.69 samples/sec Loss 2.9502 LearningRate 0.0122 Epoch: 13 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:07:16,352-[lfw][74000]XNorm: 22.042191 Training: 2022-04-27 06:07:16,352-[lfw][74000]Accuracy-Flip: 0.99733+-0.00327 Training: 2022-04-27 06:07:16,353-[lfw][74000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:07:47,008-[cfp_fp][74000]XNorm: 19.978743 Training: 2022-04-27 06:07:47,009-[cfp_fp][74000]Accuracy-Flip: 0.95914+-0.01046 Training: 2022-04-27 06:07:47,009-[cfp_fp][74000]Accuracy-Highest: 0.96543 Training: 2022-04-27 06:08:13,414-[agedb_30][74000]XNorm: 21.946677 Training: 2022-04-27 06:08:13,415-[agedb_30][74000]Accuracy-Flip: 0.97683+-0.00529 Training: 2022-04-27 06:08:13,415-[agedb_30][74000]Accuracy-Highest: 0.97917 Training: 2022-04-27 06:08:15,244-Speed 119.98 samples/sec Loss 2.8585 LearningRate 0.0122 Epoch: 13 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:17,061-Speed 5638.26 samples/sec Loss 2.9743 LearningRate 0.0122 Epoch: 13 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:18,878-Speed 5636.22 samples/sec Loss 2.9519 LearningRate 0.0122 Epoch: 13 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:20,700-Speed 5621.87 samples/sec Loss 2.9285 LearningRate 0.0122 Epoch: 13 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:22,512-Speed 5653.31 samples/sec Loss 2.8828 LearningRate 0.0122 Epoch: 13 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:24,345-Speed 5589.09 samples/sec Loss 2.9773 LearningRate 0.0122 Epoch: 13 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 06:08:26,188-Speed 5559.64 samples/sec Loss 2.9241 LearningRate 0.0122 Epoch: 13 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:28,013-Speed 5613.71 samples/sec Loss 2.8985 LearningRate 0.0122 Epoch: 13 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:29,829-Speed 5637.79 samples/sec Loss 2.8883 LearningRate 0.0121 Epoch: 13 Global Step: 74090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:08:31,649-Speed 5627.90 samples/sec Loss 3.0163 LearningRate 0.0121 Epoch: 13 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:33,481-Speed 5591.71 samples/sec Loss 2.8685 LearningRate 0.0121 Epoch: 13 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:35,300-Speed 5632.34 samples/sec Loss 2.9357 LearningRate 0.0121 Epoch: 13 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:37,138-Speed 5573.58 samples/sec Loss 2.9911 LearningRate 0.0121 Epoch: 13 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:38,962-Speed 5614.20 samples/sec Loss 2.8796 LearningRate 0.0121 Epoch: 13 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:40,801-Speed 5573.46 samples/sec Loss 2.9803 LearningRate 0.0121 Epoch: 13 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:42,632-Speed 5592.56 samples/sec Loss 2.9298 LearningRate 0.0121 Epoch: 13 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:44,453-Speed 5625.56 samples/sec Loss 3.0325 LearningRate 0.0121 Epoch: 13 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:46,277-Speed 5618.06 samples/sec Loss 2.9818 LearningRate 0.0121 Epoch: 13 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:48,091-Speed 5645.51 samples/sec Loss 2.9616 LearningRate 0.0121 Epoch: 13 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:49,893-Speed 5685.99 samples/sec Loss 3.0393 LearningRate 0.0121 Epoch: 13 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:51,712-Speed 5629.97 samples/sec Loss 2.9320 LearningRate 0.0121 Epoch: 13 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:53,532-Speed 5627.57 samples/sec Loss 3.0314 LearningRate 0.0121 Epoch: 13 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:08:55,358-Speed 5611.12 samples/sec Loss 2.9535 LearningRate 0.0121 Epoch: 13 Global Step: 74230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:08:57,164-Speed 5670.86 samples/sec Loss 2.9504 LearningRate 0.0121 Epoch: 13 Global Step: 74240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:08:59,007-Speed 5556.86 samples/sec Loss 3.0340 LearningRate 0.0120 Epoch: 13 Global Step: 74250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:00,839-Speed 5592.60 samples/sec Loss 2.9700 LearningRate 0.0120 Epoch: 13 Global Step: 74260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:02,654-Speed 5643.69 samples/sec Loss 2.9952 LearningRate 0.0120 Epoch: 13 Global Step: 74270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:04,468-Speed 5648.35 samples/sec Loss 3.0304 LearningRate 0.0120 Epoch: 13 Global Step: 74280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:06,280-Speed 5651.74 samples/sec Loss 2.9860 LearningRate 0.0120 Epoch: 13 Global Step: 74290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:08,093-Speed 5649.08 samples/sec Loss 3.0692 LearningRate 0.0120 Epoch: 13 Global Step: 74300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:09,909-Speed 5643.07 samples/sec Loss 3.0436 LearningRate 0.0120 Epoch: 13 Global Step: 74310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:11,762-Speed 5527.90 samples/sec Loss 3.0901 LearningRate 0.0120 Epoch: 13 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:09:13,612-Speed 5534.76 samples/sec Loss 3.0245 LearningRate 0.0120 Epoch: 13 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:15,444-Speed 5591.44 samples/sec Loss 3.0009 LearningRate 0.0120 Epoch: 13 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:17,260-Speed 5642.29 samples/sec Loss 3.0597 LearningRate 0.0120 Epoch: 13 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:19,089-Speed 5599.84 samples/sec Loss 2.9544 LearningRate 0.0120 Epoch: 13 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:20,941-Speed 5532.36 samples/sec Loss 3.0365 LearningRate 0.0120 Epoch: 13 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:22,772-Speed 5591.59 samples/sec Loss 2.9892 LearningRate 0.0120 Epoch: 13 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:24,588-Speed 5643.73 samples/sec Loss 3.0777 LearningRate 0.0120 Epoch: 13 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:26,405-Speed 5636.45 samples/sec Loss 3.0011 LearningRate 0.0120 Epoch: 13 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:28,243-Speed 5572.14 samples/sec Loss 3.0755 LearningRate 0.0119 Epoch: 13 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:30,064-Speed 5626.95 samples/sec Loss 3.0551 LearningRate 0.0119 Epoch: 13 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:31,873-Speed 5661.15 samples/sec Loss 2.9760 LearningRate 0.0119 Epoch: 13 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:33,702-Speed 5600.04 samples/sec Loss 3.0782 LearningRate 0.0119 Epoch: 13 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:35,520-Speed 5634.79 samples/sec Loss 3.0229 LearningRate 0.0119 Epoch: 13 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:37,331-Speed 5657.57 samples/sec Loss 3.0392 LearningRate 0.0119 Epoch: 13 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:39,156-Speed 5610.54 samples/sec Loss 3.0100 LearningRate 0.0119 Epoch: 13 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:40,979-Speed 5619.98 samples/sec Loss 3.0158 LearningRate 0.0119 Epoch: 13 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:42,806-Speed 5605.90 samples/sec Loss 3.1844 LearningRate 0.0119 Epoch: 13 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:44,629-Speed 5620.55 samples/sec Loss 3.0100 LearningRate 0.0119 Epoch: 13 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:46,448-Speed 5631.81 samples/sec Loss 3.0355 LearningRate 0.0119 Epoch: 13 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:48,288-Speed 5567.30 samples/sec Loss 2.9949 LearningRate 0.0119 Epoch: 13 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:50,144-Speed 5518.87 samples/sec Loss 3.1019 LearningRate 0.0119 Epoch: 13 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:09:51,963-Speed 5630.58 samples/sec Loss 3.0998 LearningRate 0.0119 Epoch: 13 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:53,834-Speed 5474.89 samples/sec Loss 3.0075 LearningRate 0.0119 Epoch: 13 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:55,656-Speed 5623.27 samples/sec Loss 3.0555 LearningRate 0.0119 Epoch: 13 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:57,488-Speed 5589.75 samples/sec Loss 3.0336 LearningRate 0.0119 Epoch: 13 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:09:59,358-Speed 5477.92 samples/sec Loss 3.0633 LearningRate 0.0118 Epoch: 13 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:01,451-Speed 4895.95 samples/sec Loss 3.0885 LearningRate 0.0118 Epoch: 13 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:03,374-Speed 5326.68 samples/sec Loss 3.1218 LearningRate 0.0118 Epoch: 13 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:05,280-Speed 5372.86 samples/sec Loss 3.1142 LearningRate 0.0118 Epoch: 13 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:07,095-Speed 5643.45 samples/sec Loss 3.0387 LearningRate 0.0118 Epoch: 13 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:08,909-Speed 5649.44 samples/sec Loss 3.1248 LearningRate 0.0118 Epoch: 13 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:10,723-Speed 5646.35 samples/sec Loss 3.0958 LearningRate 0.0118 Epoch: 13 Global Step: 74640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:10:12,525-Speed 5683.92 samples/sec Loss 3.0918 LearningRate 0.0118 Epoch: 13 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:14,346-Speed 5626.39 samples/sec Loss 3.0701 LearningRate 0.0118 Epoch: 13 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:16,157-Speed 5656.27 samples/sec Loss 3.0556 LearningRate 0.0118 Epoch: 13 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:17,965-Speed 5663.23 samples/sec Loss 2.9564 LearningRate 0.0118 Epoch: 13 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:19,784-Speed 5631.53 samples/sec Loss 3.0598 LearningRate 0.0118 Epoch: 13 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:21,605-Speed 5626.61 samples/sec Loss 3.0157 LearningRate 0.0118 Epoch: 13 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:23,416-Speed 5655.60 samples/sec Loss 3.1559 LearningRate 0.0118 Epoch: 13 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:25,222-Speed 5670.08 samples/sec Loss 3.0939 LearningRate 0.0118 Epoch: 13 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:27,032-Speed 5660.82 samples/sec Loss 3.1043 LearningRate 0.0118 Epoch: 13 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:28,858-Speed 5607.91 samples/sec Loss 3.1017 LearningRate 0.0117 Epoch: 13 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:30,667-Speed 5663.80 samples/sec Loss 3.1065 LearningRate 0.0117 Epoch: 13 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:10:32,472-Speed 5677.83 samples/sec Loss 3.1513 LearningRate 0.0117 Epoch: 13 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:34,285-Speed 5649.35 samples/sec Loss 3.0832 LearningRate 0.0117 Epoch: 13 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:36,107-Speed 5621.60 samples/sec Loss 3.1520 LearningRate 0.0117 Epoch: 13 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:37,929-Speed 5622.37 samples/sec Loss 3.1499 LearningRate 0.0117 Epoch: 13 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:39,749-Speed 5627.73 samples/sec Loss 3.0031 LearningRate 0.0117 Epoch: 13 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:41,559-Speed 5657.94 samples/sec Loss 3.0944 LearningRate 0.0117 Epoch: 13 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:43,397-Speed 5575.60 samples/sec Loss 3.0335 LearningRate 0.0117 Epoch: 13 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:45,212-Speed 5641.21 samples/sec Loss 3.1629 LearningRate 0.0117 Epoch: 13 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:47,037-Speed 5614.52 samples/sec Loss 3.0634 LearningRate 0.0117 Epoch: 13 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:48,868-Speed 5592.31 samples/sec Loss 3.2085 LearningRate 0.0117 Epoch: 13 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:50,700-Speed 5591.92 samples/sec Loss 2.9375 LearningRate 0.0117 Epoch: 13 Global Step: 74860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:10:52,517-Speed 5638.52 samples/sec Loss 3.0560 LearningRate 0.0117 Epoch: 13 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:54,341-Speed 5615.73 samples/sec Loss 3.0832 LearningRate 0.0117 Epoch: 13 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:56,243-Speed 5385.52 samples/sec Loss 3.1551 LearningRate 0.0117 Epoch: 13 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:58,161-Speed 5340.77 samples/sec Loss 3.1650 LearningRate 0.0117 Epoch: 13 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:10:59,979-Speed 5634.68 samples/sec Loss 3.0245 LearningRate 0.0116 Epoch: 13 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:01,820-Speed 5566.16 samples/sec Loss 3.1911 LearningRate 0.0116 Epoch: 13 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:03,639-Speed 5629.66 samples/sec Loss 3.0598 LearningRate 0.0116 Epoch: 13 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:05,457-Speed 5636.01 samples/sec Loss 3.1354 LearningRate 0.0116 Epoch: 13 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:07,273-Speed 5638.30 samples/sec Loss 3.1674 LearningRate 0.0116 Epoch: 13 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:09,100-Speed 5607.11 samples/sec Loss 3.0551 LearningRate 0.0116 Epoch: 13 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:10,926-Speed 5609.52 samples/sec Loss 3.0416 LearningRate 0.0116 Epoch: 13 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:11:12,743-Speed 5638.08 samples/sec Loss 3.1167 LearningRate 0.0116 Epoch: 13 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:14,572-Speed 5600.43 samples/sec Loss 3.1358 LearningRate 0.0116 Epoch: 13 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:16,393-Speed 5627.03 samples/sec Loss 3.1331 LearningRate 0.0116 Epoch: 13 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:18,203-Speed 5658.29 samples/sec Loss 3.1539 LearningRate 0.0116 Epoch: 13 Global Step: 75010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:20,017-Speed 5647.73 samples/sec Loss 3.0132 LearningRate 0.0116 Epoch: 13 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:21,830-Speed 5650.30 samples/sec Loss 3.0255 LearningRate 0.0116 Epoch: 13 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:23,644-Speed 5647.00 samples/sec Loss 3.1156 LearningRate 0.0116 Epoch: 13 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:25,463-Speed 5631.94 samples/sec Loss 3.1292 LearningRate 0.0116 Epoch: 13 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:27,304-Speed 5563.87 samples/sec Loss 3.1482 LearningRate 0.0116 Epoch: 13 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:29,131-Speed 5604.49 samples/sec Loss 3.3105 LearningRate 0.0116 Epoch: 13 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:30,966-Speed 5582.59 samples/sec Loss 3.1419 LearningRate 0.0115 Epoch: 13 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:11:32,793-Speed 5606.81 samples/sec Loss 3.0131 LearningRate 0.0115 Epoch: 13 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:34,626-Speed 5590.14 samples/sec Loss 3.0964 LearningRate 0.0115 Epoch: 13 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:36,454-Speed 5602.85 samples/sec Loss 3.1470 LearningRate 0.0115 Epoch: 13 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:38,269-Speed 5643.46 samples/sec Loss 3.1819 LearningRate 0.0115 Epoch: 13 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:40,082-Speed 5650.92 samples/sec Loss 2.9800 LearningRate 0.0115 Epoch: 13 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:41,896-Speed 5647.66 samples/sec Loss 3.1565 LearningRate 0.0115 Epoch: 13 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:43,712-Speed 5638.21 samples/sec Loss 3.2309 LearningRate 0.0115 Epoch: 13 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:45,539-Speed 5607.17 samples/sec Loss 3.1053 LearningRate 0.0115 Epoch: 13 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:47,362-Speed 5618.82 samples/sec Loss 3.2080 LearningRate 0.0115 Epoch: 13 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:49,204-Speed 5560.32 samples/sec Loss 3.2118 LearningRate 0.0115 Epoch: 13 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:51,024-Speed 5630.37 samples/sec Loss 3.1995 LearningRate 0.0115 Epoch: 13 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:52,852-Speed 5603.89 samples/sec Loss 3.1282 LearningRate 0.0115 Epoch: 13 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:54,705-Speed 5526.45 samples/sec Loss 3.1657 LearningRate 0.0115 Epoch: 13 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:56,524-Speed 5631.15 samples/sec Loss 2.9877 LearningRate 0.0115 Epoch: 13 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:11:58,338-Speed 5649.87 samples/sec Loss 3.1259 LearningRate 0.0115 Epoch: 13 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:00,152-Speed 5645.00 samples/sec Loss 3.1659 LearningRate 0.0114 Epoch: 13 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:02,000-Speed 5543.21 samples/sec Loss 3.2069 LearningRate 0.0114 Epoch: 13 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:03,820-Speed 5627.41 samples/sec Loss 3.1757 LearningRate 0.0114 Epoch: 13 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:05,633-Speed 5652.14 samples/sec Loss 3.1555 LearningRate 0.0114 Epoch: 13 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:07,450-Speed 5635.83 samples/sec Loss 3.0970 LearningRate 0.0114 Epoch: 13 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:09,266-Speed 5641.90 samples/sec Loss 3.0688 LearningRate 0.0114 Epoch: 13 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:11,091-Speed 5610.54 samples/sec Loss 3.1363 LearningRate 0.0114 Epoch: 13 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:12,919-Speed 5606.19 samples/sec Loss 3.1751 LearningRate 0.0114 Epoch: 13 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:14,758-Speed 5567.00 samples/sec Loss 3.1900 LearningRate 0.0114 Epoch: 13 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:16,580-Speed 5626.10 samples/sec Loss 3.1085 LearningRate 0.0114 Epoch: 13 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:18,394-Speed 5646.29 samples/sec Loss 3.2697 LearningRate 0.0114 Epoch: 13 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:20,239-Speed 5552.08 samples/sec Loss 3.1521 LearningRate 0.0114 Epoch: 13 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:22,072-Speed 5588.62 samples/sec Loss 3.0800 LearningRate 0.0114 Epoch: 13 Global Step: 75360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:23,900-Speed 5602.28 samples/sec Loss 3.1793 LearningRate 0.0114 Epoch: 13 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:25,731-Speed 5594.94 samples/sec Loss 3.0312 LearningRate 0.0114 Epoch: 13 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:27,567-Speed 5578.47 samples/sec Loss 3.1249 LearningRate 0.0114 Epoch: 13 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:29,409-Speed 5560.57 samples/sec Loss 3.1799 LearningRate 0.0114 Epoch: 13 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:31,307-Speed 5397.79 samples/sec Loss 3.0859 LearningRate 0.0113 Epoch: 13 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:33,117-Speed 5659.21 samples/sec Loss 3.1828 LearningRate 0.0113 Epoch: 13 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:34,958-Speed 5564.02 samples/sec Loss 3.1250 LearningRate 0.0113 Epoch: 13 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:36,801-Speed 5558.36 samples/sec Loss 3.1640 LearningRate 0.0113 Epoch: 13 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:38,653-Speed 5532.04 samples/sec Loss 3.0948 LearningRate 0.0113 Epoch: 13 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:40,477-Speed 5616.49 samples/sec Loss 3.2059 LearningRate 0.0113 Epoch: 13 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:42,292-Speed 5642.42 samples/sec Loss 3.2045 LearningRate 0.0113 Epoch: 13 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:44,108-Speed 5641.98 samples/sec Loss 3.2102 LearningRate 0.0113 Epoch: 13 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:45,917-Speed 5660.86 samples/sec Loss 3.0962 LearningRate 0.0113 Epoch: 13 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:47,740-Speed 5621.38 samples/sec Loss 3.1564 LearningRate 0.0113 Epoch: 13 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:49,553-Speed 5647.62 samples/sec Loss 3.2450 LearningRate 0.0113 Epoch: 13 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:51,387-Speed 5586.45 samples/sec Loss 3.1933 LearningRate 0.0113 Epoch: 13 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:53,250-Speed 5498.59 samples/sec Loss 3.2699 LearningRate 0.0113 Epoch: 13 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:55,072-Speed 5622.80 samples/sec Loss 3.0938 LearningRate 0.0113 Epoch: 13 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:56,918-Speed 5549.59 samples/sec Loss 3.2161 LearningRate 0.0113 Epoch: 13 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:12:58,760-Speed 5560.35 samples/sec Loss 3.0855 LearningRate 0.0113 Epoch: 13 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:00,590-Speed 5599.06 samples/sec Loss 3.1909 LearningRate 0.0113 Epoch: 13 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:02,415-Speed 5612.27 samples/sec Loss 3.1767 LearningRate 0.0112 Epoch: 13 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:04,231-Speed 5640.86 samples/sec Loss 3.2231 LearningRate 0.0112 Epoch: 13 Global Step: 75590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:13:06,051-Speed 5628.06 samples/sec Loss 3.1746 LearningRate 0.0112 Epoch: 13 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:07,881-Speed 5595.54 samples/sec Loss 3.2913 LearningRate 0.0112 Epoch: 13 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:09,699-Speed 5636.40 samples/sec Loss 3.1554 LearningRate 0.0112 Epoch: 13 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:11,539-Speed 5566.89 samples/sec Loss 3.0817 LearningRate 0.0112 Epoch: 13 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:13,391-Speed 5530.47 samples/sec Loss 3.1641 LearningRate 0.0112 Epoch: 13 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:15,216-Speed 5612.75 samples/sec Loss 3.2308 LearningRate 0.0112 Epoch: 13 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:17,045-Speed 5599.47 samples/sec Loss 3.1718 LearningRate 0.0112 Epoch: 13 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:18,859-Speed 5646.87 samples/sec Loss 3.1473 LearningRate 0.0112 Epoch: 13 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:20,677-Speed 5634.53 samples/sec Loss 3.0761 LearningRate 0.0112 Epoch: 13 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:22,510-Speed 5589.18 samples/sec Loss 3.0220 LearningRate 0.0112 Epoch: 13 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:24,311-Speed 5689.44 samples/sec Loss 3.2356 LearningRate 0.0112 Epoch: 13 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:26,120-Speed 5662.80 samples/sec Loss 3.1555 LearningRate 0.0112 Epoch: 13 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:27,935-Speed 5643.49 samples/sec Loss 3.2341 LearningRate 0.0112 Epoch: 13 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:29,749-Speed 5646.58 samples/sec Loss 3.1507 LearningRate 0.0112 Epoch: 13 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:31,580-Speed 5594.47 samples/sec Loss 3.1900 LearningRate 0.0112 Epoch: 13 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:33,415-Speed 5580.49 samples/sec Loss 3.2230 LearningRate 0.0111 Epoch: 13 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:35,251-Speed 5579.74 samples/sec Loss 3.2018 LearningRate 0.0111 Epoch: 13 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:37,079-Speed 5602.13 samples/sec Loss 3.2378 LearningRate 0.0111 Epoch: 13 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:38,897-Speed 5636.08 samples/sec Loss 3.3169 LearningRate 0.0111 Epoch: 13 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:40,718-Speed 5624.96 samples/sec Loss 3.2672 LearningRate 0.0111 Epoch: 13 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:42,535-Speed 5638.41 samples/sec Loss 3.1899 LearningRate 0.0111 Epoch: 13 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:44,364-Speed 5600.30 samples/sec Loss 3.1967 LearningRate 0.0111 Epoch: 13 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:46,179-Speed 5644.41 samples/sec Loss 3.3121 LearningRate 0.0111 Epoch: 13 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:48,018-Speed 5570.99 samples/sec Loss 3.2560 LearningRate 0.0111 Epoch: 13 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:49,839-Speed 5624.56 samples/sec Loss 3.1888 LearningRate 0.0111 Epoch: 13 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:51,670-Speed 5593.94 samples/sec Loss 3.2734 LearningRate 0.0111 Epoch: 13 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:53,481-Speed 5657.59 samples/sec Loss 3.1744 LearningRate 0.0111 Epoch: 13 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:55,307-Speed 5607.42 samples/sec Loss 3.1267 LearningRate 0.0111 Epoch: 13 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:57,144-Speed 5575.53 samples/sec Loss 3.1724 LearningRate 0.0111 Epoch: 13 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:13:58,969-Speed 5613.49 samples/sec Loss 3.1432 LearningRate 0.0111 Epoch: 13 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:00,786-Speed 5636.89 samples/sec Loss 3.2163 LearningRate 0.0111 Epoch: 13 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:02,612-Speed 5609.05 samples/sec Loss 3.2025 LearningRate 0.0111 Epoch: 13 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:04,435-Speed 5621.92 samples/sec Loss 3.2734 LearningRate 0.0110 Epoch: 13 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:06,262-Speed 5607.27 samples/sec Loss 3.2749 LearningRate 0.0110 Epoch: 13 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:08,113-Speed 5532.25 samples/sec Loss 3.1258 LearningRate 0.0110 Epoch: 13 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:09,969-Speed 5518.86 samples/sec Loss 3.0513 LearningRate 0.0110 Epoch: 13 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:11,799-Speed 5598.55 samples/sec Loss 3.2046 LearningRate 0.0110 Epoch: 13 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:13,616-Speed 5638.42 samples/sec Loss 3.1237 LearningRate 0.0110 Epoch: 13 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:15,474-Speed 5512.75 samples/sec Loss 3.1061 LearningRate 0.0110 Epoch: 13 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:17,292-Speed 5633.29 samples/sec Loss 2.9823 LearningRate 0.0110 Epoch: 13 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:14:19,114-Speed 5622.98 samples/sec Loss 3.1666 LearningRate 0.0110 Epoch: 13 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:14:45,637-[lfw][76000]XNorm: 22.480857 Training: 2022-04-27 06:14:45,637-[lfw][76000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-27 06:14:45,638-[lfw][76000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:15:16,357-[cfp_fp][76000]XNorm: 20.312722 Training: 2022-04-27 06:15:16,357-[cfp_fp][76000]Accuracy-Flip: 0.96657+-0.00600 Training: 2022-04-27 06:15:16,358-[cfp_fp][76000]Accuracy-Highest: 0.96657 Training: 2022-04-27 06:15:42,926-[agedb_30][76000]XNorm: 22.537930 Training: 2022-04-27 06:15:42,926-[agedb_30][76000]Accuracy-Flip: 0.97933+-0.00680 Training: 2022-04-27 06:15:42,927-[agedb_30][76000]Accuracy-Highest: 0.97933 Training: 2022-04-27 06:15:44,740-Speed 119.59 samples/sec Loss 3.2192 LearningRate 0.0110 Epoch: 13 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:46,558-Speed 5636.47 samples/sec Loss 3.1610 LearningRate 0.0110 Epoch: 13 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:48,383-Speed 5611.63 samples/sec Loss 3.1845 LearningRate 0.0110 Epoch: 13 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:50,205-Speed 5623.27 samples/sec Loss 3.1842 LearningRate 0.0110 Epoch: 13 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:52,013-Speed 5664.29 samples/sec Loss 3.1947 LearningRate 0.0110 Epoch: 13 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:53,824-Speed 5658.12 samples/sec Loss 3.0325 LearningRate 0.0110 Epoch: 13 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:55,711-Speed 5426.71 samples/sec Loss 3.1020 LearningRate 0.0110 Epoch: 13 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:57,518-Speed 5669.34 samples/sec Loss 3.0667 LearningRate 0.0110 Epoch: 13 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:15:59,339-Speed 5624.47 samples/sec Loss 3.1940 LearningRate 0.0109 Epoch: 13 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:01,225-Speed 5430.57 samples/sec Loss 3.1720 LearningRate 0.0109 Epoch: 13 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:03,022-Speed 5700.48 samples/sec Loss 3.2277 LearningRate 0.0109 Epoch: 13 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:04,848-Speed 5610.16 samples/sec Loss 3.2500 LearningRate 0.0109 Epoch: 13 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:06,679-Speed 5596.48 samples/sec Loss 3.1608 LearningRate 0.0109 Epoch: 13 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:08,497-Speed 5632.60 samples/sec Loss 3.2804 LearningRate 0.0109 Epoch: 13 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:10,313-Speed 5640.91 samples/sec Loss 3.1058 LearningRate 0.0109 Epoch: 13 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:12,130-Speed 5636.88 samples/sec Loss 3.2166 LearningRate 0.0109 Epoch: 13 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:13,955-Speed 5613.50 samples/sec Loss 3.1363 LearningRate 0.0109 Epoch: 13 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:15,780-Speed 5612.84 samples/sec Loss 3.2661 LearningRate 0.0109 Epoch: 13 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:17,599-Speed 5632.11 samples/sec Loss 3.1670 LearningRate 0.0109 Epoch: 13 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:19,409-Speed 5657.18 samples/sec Loss 3.0980 LearningRate 0.0109 Epoch: 13 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:21,218-Speed 5664.13 samples/sec Loss 3.2082 LearningRate 0.0109 Epoch: 13 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:23,041-Speed 5617.39 samples/sec Loss 3.2395 LearningRate 0.0109 Epoch: 13 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:24,874-Speed 5590.25 samples/sec Loss 3.2090 LearningRate 0.0109 Epoch: 13 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:26,723-Speed 5539.32 samples/sec Loss 3.1198 LearningRate 0.0109 Epoch: 13 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:28,556-Speed 5587.14 samples/sec Loss 3.1932 LearningRate 0.0109 Epoch: 13 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:30,389-Speed 5590.55 samples/sec Loss 3.1359 LearningRate 0.0109 Epoch: 13 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:32,214-Speed 5613.32 samples/sec Loss 3.1879 LearningRate 0.0108 Epoch: 13 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:34,026-Speed 5651.59 samples/sec Loss 3.2333 LearningRate 0.0108 Epoch: 13 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:35,849-Speed 5620.01 samples/sec Loss 3.2509 LearningRate 0.0108 Epoch: 13 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:37,659-Speed 5657.88 samples/sec Loss 3.2118 LearningRate 0.0108 Epoch: 13 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:39,486-Speed 5606.40 samples/sec Loss 3.2410 LearningRate 0.0108 Epoch: 13 Global Step: 76310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:16:41,290-Speed 5680.54 samples/sec Loss 3.2171 LearningRate 0.0108 Epoch: 13 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:43,103-Speed 5649.19 samples/sec Loss 3.2268 LearningRate 0.0108 Epoch: 13 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:44,914-Speed 5656.06 samples/sec Loss 3.1864 LearningRate 0.0108 Epoch: 13 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:46,730-Speed 5640.11 samples/sec Loss 3.1138 LearningRate 0.0108 Epoch: 13 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:48,542-Speed 5652.14 samples/sec Loss 3.2296 LearningRate 0.0108 Epoch: 13 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:50,364-Speed 5623.86 samples/sec Loss 3.2077 LearningRate 0.0108 Epoch: 13 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:52,186-Speed 5622.56 samples/sec Loss 3.1257 LearningRate 0.0108 Epoch: 13 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:54,006-Speed 5629.36 samples/sec Loss 3.1094 LearningRate 0.0108 Epoch: 13 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:55,842-Speed 5579.16 samples/sec Loss 3.1729 LearningRate 0.0108 Epoch: 13 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:57,666-Speed 5615.27 samples/sec Loss 3.1543 LearningRate 0.0108 Epoch: 13 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:16:59,467-Speed 5686.59 samples/sec Loss 3.1642 LearningRate 0.0108 Epoch: 13 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:01,282-Speed 5643.27 samples/sec Loss 3.2876 LearningRate 0.0108 Epoch: 13 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:03,097-Speed 5645.44 samples/sec Loss 3.1817 LearningRate 0.0107 Epoch: 13 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:04,913-Speed 5638.30 samples/sec Loss 3.2999 LearningRate 0.0107 Epoch: 13 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:06,765-Speed 5532.49 samples/sec Loss 3.1026 LearningRate 0.0107 Epoch: 13 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:08,580-Speed 5643.54 samples/sec Loss 3.1901 LearningRate 0.0107 Epoch: 13 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:10,394-Speed 5645.47 samples/sec Loss 3.0701 LearningRate 0.0107 Epoch: 13 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:12,222-Speed 5604.52 samples/sec Loss 3.2317 LearningRate 0.0107 Epoch: 13 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:14,044-Speed 5622.23 samples/sec Loss 3.1760 LearningRate 0.0107 Epoch: 13 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:15,906-Speed 5502.43 samples/sec Loss 3.2176 LearningRate 0.0107 Epoch: 13 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:17,717-Speed 5657.87 samples/sec Loss 3.0989 LearningRate 0.0107 Epoch: 13 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:19,545-Speed 5601.71 samples/sec Loss 3.2129 LearningRate 0.0107 Epoch: 13 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:21,370-Speed 5614.11 samples/sec Loss 3.3102 LearningRate 0.0107 Epoch: 13 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:23,220-Speed 5536.83 samples/sec Loss 3.1038 LearningRate 0.0107 Epoch: 13 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:25,050-Speed 5597.45 samples/sec Loss 3.1778 LearningRate 0.0107 Epoch: 13 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:26,894-Speed 5555.17 samples/sec Loss 3.1590 LearningRate 0.0107 Epoch: 13 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:28,711-Speed 5635.59 samples/sec Loss 3.1158 LearningRate 0.0107 Epoch: 13 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:30,533-Speed 5621.93 samples/sec Loss 3.2116 LearningRate 0.0107 Epoch: 13 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:32,357-Speed 5614.87 samples/sec Loss 3.1653 LearningRate 0.0107 Epoch: 13 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:34,180-Speed 5619.06 samples/sec Loss 3.2298 LearningRate 0.0106 Epoch: 13 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:35,995-Speed 5643.55 samples/sec Loss 3.1728 LearningRate 0.0106 Epoch: 13 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:37,825-Speed 5601.21 samples/sec Loss 3.1811 LearningRate 0.0106 Epoch: 13 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:39,663-Speed 5572.76 samples/sec Loss 3.2791 LearningRate 0.0106 Epoch: 13 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:41,505-Speed 5559.07 samples/sec Loss 3.1483 LearningRate 0.0106 Epoch: 13 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:43,324-Speed 5632.94 samples/sec Loss 3.2219 LearningRate 0.0106 Epoch: 13 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:45,141-Speed 5636.99 samples/sec Loss 3.2520 LearningRate 0.0106 Epoch: 13 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:46,961-Speed 5627.07 samples/sec Loss 3.1643 LearningRate 0.0106 Epoch: 13 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:48,795-Speed 5586.34 samples/sec Loss 3.2734 LearningRate 0.0106 Epoch: 13 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:50,624-Speed 5600.84 samples/sec Loss 3.1713 LearningRate 0.0106 Epoch: 13 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:52,498-Speed 5464.43 samples/sec Loss 3.0955 LearningRate 0.0106 Epoch: 13 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:54,344-Speed 5551.26 samples/sec Loss 3.1416 LearningRate 0.0106 Epoch: 13 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:56,153-Speed 5660.37 samples/sec Loss 3.3051 LearningRate 0.0106 Epoch: 13 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:57,967-Speed 5646.77 samples/sec Loss 3.1617 LearningRate 0.0106 Epoch: 13 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:17:59,812-Speed 5554.63 samples/sec Loss 3.2178 LearningRate 0.0106 Epoch: 13 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:01,629-Speed 5635.48 samples/sec Loss 3.1447 LearningRate 0.0106 Epoch: 13 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:03,457-Speed 5603.84 samples/sec Loss 3.1453 LearningRate 0.0106 Epoch: 13 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:05,285-Speed 5604.08 samples/sec Loss 3.2737 LearningRate 0.0106 Epoch: 13 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:07,117-Speed 5592.99 samples/sec Loss 3.1197 LearningRate 0.0105 Epoch: 13 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:08,945-Speed 5601.90 samples/sec Loss 3.1747 LearningRate 0.0105 Epoch: 13 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:10,802-Speed 5517.39 samples/sec Loss 3.1445 LearningRate 0.0105 Epoch: 13 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:12,630-Speed 5602.06 samples/sec Loss 3.1329 LearningRate 0.0105 Epoch: 13 Global Step: 76820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:18:14,430-Speed 5690.43 samples/sec Loss 3.2872 LearningRate 0.0105 Epoch: 13 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:16,249-Speed 5632.98 samples/sec Loss 3.1971 LearningRate 0.0105 Epoch: 13 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:18,152-Speed 5382.60 samples/sec Loss 3.1447 LearningRate 0.0105 Epoch: 13 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:20,098-Speed 5263.11 samples/sec Loss 3.1172 LearningRate 0.0105 Epoch: 13 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:21,932-Speed 5585.84 samples/sec Loss 3.0666 LearningRate 0.0105 Epoch: 13 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:23,759-Speed 5606.90 samples/sec Loss 3.1078 LearningRate 0.0105 Epoch: 13 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:25,586-Speed 5607.65 samples/sec Loss 3.1723 LearningRate 0.0105 Epoch: 13 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:27,406-Speed 5625.77 samples/sec Loss 3.1015 LearningRate 0.0105 Epoch: 13 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:29,235-Speed 5603.34 samples/sec Loss 3.1190 LearningRate 0.0105 Epoch: 13 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:31,059-Speed 5613.93 samples/sec Loss 3.2040 LearningRate 0.0105 Epoch: 13 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:32,871-Speed 5653.33 samples/sec Loss 3.1503 LearningRate 0.0105 Epoch: 13 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:34,714-Speed 5557.25 samples/sec Loss 3.1643 LearningRate 0.0105 Epoch: 13 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:36,519-Speed 5675.27 samples/sec Loss 3.1161 LearningRate 0.0105 Epoch: 13 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:38,351-Speed 5592.08 samples/sec Loss 3.1628 LearningRate 0.0104 Epoch: 13 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:40,186-Speed 5582.73 samples/sec Loss 3.0821 LearningRate 0.0104 Epoch: 13 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:42,010-Speed 5617.45 samples/sec Loss 3.2289 LearningRate 0.0104 Epoch: 13 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:43,851-Speed 5563.19 samples/sec Loss 3.0887 LearningRate 0.0104 Epoch: 13 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:45,706-Speed 5521.68 samples/sec Loss 3.0439 LearningRate 0.0104 Epoch: 13 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:47,543-Speed 5575.42 samples/sec Loss 3.1367 LearningRate 0.0104 Epoch: 13 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:49,356-Speed 5649.71 samples/sec Loss 3.3571 LearningRate 0.0104 Epoch: 13 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:51,166-Speed 5662.12 samples/sec Loss 3.2133 LearningRate 0.0104 Epoch: 13 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:52,984-Speed 5632.93 samples/sec Loss 3.2622 LearningRate 0.0104 Epoch: 13 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:54,805-Speed 5626.35 samples/sec Loss 3.1972 LearningRate 0.0104 Epoch: 13 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:56,613-Speed 5662.84 samples/sec Loss 3.1639 LearningRate 0.0104 Epoch: 13 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:18:58,427-Speed 5649.85 samples/sec Loss 3.1144 LearningRate 0.0104 Epoch: 13 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:00,255-Speed 5602.13 samples/sec Loss 3.2680 LearningRate 0.0104 Epoch: 13 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:02,073-Speed 5635.76 samples/sec Loss 3.1273 LearningRate 0.0104 Epoch: 13 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:03,900-Speed 5604.84 samples/sec Loss 3.3080 LearningRate 0.0104 Epoch: 13 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:05,729-Speed 5603.10 samples/sec Loss 3.1712 LearningRate 0.0104 Epoch: 13 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:07,542-Speed 5649.84 samples/sec Loss 3.1137 LearningRate 0.0104 Epoch: 13 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:09,356-Speed 5644.92 samples/sec Loss 3.2399 LearningRate 0.0104 Epoch: 13 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:11,191-Speed 5584.15 samples/sec Loss 3.1625 LearningRate 0.0103 Epoch: 13 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:13,009-Speed 5633.36 samples/sec Loss 3.1166 LearningRate 0.0103 Epoch: 13 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:14,827-Speed 5633.65 samples/sec Loss 3.0976 LearningRate 0.0103 Epoch: 13 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:16,641-Speed 5646.77 samples/sec Loss 3.1868 LearningRate 0.0103 Epoch: 13 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:18,465-Speed 5615.42 samples/sec Loss 3.2144 LearningRate 0.0103 Epoch: 13 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:20,304-Speed 5572.06 samples/sec Loss 3.1602 LearningRate 0.0103 Epoch: 13 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:22,121-Speed 5637.98 samples/sec Loss 3.2846 LearningRate 0.0103 Epoch: 13 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:23,946-Speed 5612.25 samples/sec Loss 3.1630 LearningRate 0.0103 Epoch: 13 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:25,761-Speed 5644.63 samples/sec Loss 3.0920 LearningRate 0.0103 Epoch: 13 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:27,593-Speed 5591.19 samples/sec Loss 3.1506 LearningRate 0.0103 Epoch: 13 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:19:29,396-Speed 5680.10 samples/sec Loss 3.1876 LearningRate 0.0103 Epoch: 13 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:31,230-Speed 5586.46 samples/sec Loss 3.1890 LearningRate 0.0103 Epoch: 13 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:33,054-Speed 5616.70 samples/sec Loss 3.0875 LearningRate 0.0103 Epoch: 13 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:34,876-Speed 5621.22 samples/sec Loss 3.1046 LearningRate 0.0103 Epoch: 13 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:36,708-Speed 5589.97 samples/sec Loss 3.1851 LearningRate 0.0103 Epoch: 13 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:38,538-Speed 5597.82 samples/sec Loss 3.1329 LearningRate 0.0103 Epoch: 13 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:40,363-Speed 5611.83 samples/sec Loss 3.0834 LearningRate 0.0103 Epoch: 13 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:42,187-Speed 5618.37 samples/sec Loss 3.2092 LearningRate 0.0103 Epoch: 13 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:44,004-Speed 5635.95 samples/sec Loss 3.2547 LearningRate 0.0102 Epoch: 13 Global Step: 77320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:45,819-Speed 5643.30 samples/sec Loss 3.2605 LearningRate 0.0102 Epoch: 13 Global Step: 77330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:47,648-Speed 5602.71 samples/sec Loss 3.0288 LearningRate 0.0102 Epoch: 13 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:49,483-Speed 5581.88 samples/sec Loss 3.1550 LearningRate 0.0102 Epoch: 13 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:51,293-Speed 5658.51 samples/sec Loss 3.2143 LearningRate 0.0102 Epoch: 13 Global Step: 77360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:53,121-Speed 5605.29 samples/sec Loss 3.1559 LearningRate 0.0102 Epoch: 13 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:54,942-Speed 5623.65 samples/sec Loss 3.0765 LearningRate 0.0102 Epoch: 13 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:56,758-Speed 5640.87 samples/sec Loss 3.1751 LearningRate 0.0102 Epoch: 13 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:19:58,608-Speed 5536.93 samples/sec Loss 3.1226 LearningRate 0.0102 Epoch: 13 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:00,454-Speed 5548.37 samples/sec Loss 3.1416 LearningRate 0.0102 Epoch: 13 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:02,320-Speed 5491.29 samples/sec Loss 3.1669 LearningRate 0.0102 Epoch: 13 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:04,154-Speed 5583.77 samples/sec Loss 3.1179 LearningRate 0.0102 Epoch: 13 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:05,983-Speed 5599.15 samples/sec Loss 3.1673 LearningRate 0.0102 Epoch: 13 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:20:07,805-Speed 5622.91 samples/sec Loss 3.1382 LearningRate 0.0102 Epoch: 13 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:09,627-Speed 5622.39 samples/sec Loss 3.1673 LearningRate 0.0102 Epoch: 13 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:11,463-Speed 5579.07 samples/sec Loss 3.2164 LearningRate 0.0102 Epoch: 13 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:13,283-Speed 5629.53 samples/sec Loss 3.2658 LearningRate 0.0102 Epoch: 13 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:15,109-Speed 5609.99 samples/sec Loss 3.1378 LearningRate 0.0101 Epoch: 13 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:16,921-Speed 5652.42 samples/sec Loss 3.2423 LearningRate 0.0101 Epoch: 13 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:18,741-Speed 5626.70 samples/sec Loss 3.1438 LearningRate 0.0101 Epoch: 13 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:20,569-Speed 5605.85 samples/sec Loss 3.1695 LearningRate 0.0101 Epoch: 13 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:22,381-Speed 5654.25 samples/sec Loss 3.1755 LearningRate 0.0101 Epoch: 13 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:24,230-Speed 5538.62 samples/sec Loss 3.0719 LearningRate 0.0101 Epoch: 13 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:26,041-Speed 5655.17 samples/sec Loss 3.1311 LearningRate 0.0101 Epoch: 13 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:27,853-Speed 5652.19 samples/sec Loss 3.2555 LearningRate 0.0101 Epoch: 13 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:29,667-Speed 5649.41 samples/sec Loss 3.1021 LearningRate 0.0101 Epoch: 13 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:31,482-Speed 5644.88 samples/sec Loss 3.1495 LearningRate 0.0101 Epoch: 13 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:33,318-Speed 5579.16 samples/sec Loss 3.2431 LearningRate 0.0101 Epoch: 13 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:35,137-Speed 5630.20 samples/sec Loss 3.3041 LearningRate 0.0101 Epoch: 13 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:36,952-Speed 5642.94 samples/sec Loss 3.2893 LearningRate 0.0101 Epoch: 13 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:38,772-Speed 5629.06 samples/sec Loss 3.2195 LearningRate 0.0101 Epoch: 13 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:40,609-Speed 5576.75 samples/sec Loss 3.2884 LearningRate 0.0101 Epoch: 13 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:42,438-Speed 5598.68 samples/sec Loss 3.0691 LearningRate 0.0101 Epoch: 13 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:44,253-Speed 5643.72 samples/sec Loss 3.1709 LearningRate 0.0101 Epoch: 13 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:46,099-Speed 5549.76 samples/sec Loss 3.2866 LearningRate 0.0101 Epoch: 13 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:47,914-Speed 5643.78 samples/sec Loss 3.2249 LearningRate 0.0100 Epoch: 13 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:49,728-Speed 5646.13 samples/sec Loss 3.2084 LearningRate 0.0100 Epoch: 13 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:51,545-Speed 5639.08 samples/sec Loss 3.1901 LearningRate 0.0100 Epoch: 13 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:53,355-Speed 5659.12 samples/sec Loss 3.1538 LearningRate 0.0100 Epoch: 13 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:55,183-Speed 5602.93 samples/sec Loss 3.1789 LearningRate 0.0100 Epoch: 13 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:57,018-Speed 5584.35 samples/sec Loss 3.1045 LearningRate 0.0100 Epoch: 13 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:20:58,835-Speed 5637.37 samples/sec Loss 3.1364 LearningRate 0.0100 Epoch: 13 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:00,648-Speed 5648.05 samples/sec Loss 3.2277 LearningRate 0.0100 Epoch: 13 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:02,479-Speed 5594.51 samples/sec Loss 3.1974 LearningRate 0.0100 Epoch: 13 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:04,296-Speed 5637.17 samples/sec Loss 3.1627 LearningRate 0.0100 Epoch: 13 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:06,120-Speed 5614.96 samples/sec Loss 3.2451 LearningRate 0.0100 Epoch: 13 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:07,934-Speed 5647.05 samples/sec Loss 3.1677 LearningRate 0.0100 Epoch: 13 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:09,760-Speed 5609.00 samples/sec Loss 3.0349 LearningRate 0.0100 Epoch: 13 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:11,589-Speed 5600.96 samples/sec Loss 3.1191 LearningRate 0.0100 Epoch: 13 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:13,416-Speed 5606.99 samples/sec Loss 3.1318 LearningRate 0.0100 Epoch: 13 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:15,228-Speed 5654.07 samples/sec Loss 3.1214 LearningRate 0.0100 Epoch: 13 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:17,060-Speed 5591.11 samples/sec Loss 3.2822 LearningRate 0.0100 Epoch: 13 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:18,890-Speed 5599.51 samples/sec Loss 3.2057 LearningRate 0.0100 Epoch: 13 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:20,711-Speed 5623.83 samples/sec Loss 3.1556 LearningRate 0.0099 Epoch: 13 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:21:22,530-Speed 5632.29 samples/sec Loss 3.1710 LearningRate 0.0099 Epoch: 13 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:24,349-Speed 5631.82 samples/sec Loss 3.0500 LearningRate 0.0099 Epoch: 13 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:26,162-Speed 5649.98 samples/sec Loss 3.1885 LearningRate 0.0099 Epoch: 13 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:27,984-Speed 5621.45 samples/sec Loss 3.1613 LearningRate 0.0099 Epoch: 13 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:29,793-Speed 5662.69 samples/sec Loss 3.1534 LearningRate 0.0099 Epoch: 13 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:31,604-Speed 5655.23 samples/sec Loss 3.1107 LearningRate 0.0099 Epoch: 13 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:33,414-Speed 5660.14 samples/sec Loss 3.1897 LearningRate 0.0099 Epoch: 13 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:35,230-Speed 5639.02 samples/sec Loss 3.0785 LearningRate 0.0099 Epoch: 13 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:37,048-Speed 5634.25 samples/sec Loss 3.1017 LearningRate 0.0099 Epoch: 13 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:38,861-Speed 5651.29 samples/sec Loss 3.3543 LearningRate 0.0099 Epoch: 13 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:40,691-Speed 5597.80 samples/sec Loss 3.3864 LearningRate 0.0099 Epoch: 13 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:42,521-Speed 5597.38 samples/sec Loss 3.1575 LearningRate 0.0099 Epoch: 13 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:44,348-Speed 5607.41 samples/sec Loss 3.1280 LearningRate 0.0099 Epoch: 13 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:46,167-Speed 5631.37 samples/sec Loss 3.1218 LearningRate 0.0099 Epoch: 13 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:21:47,988-Speed 5624.69 samples/sec Loss 3.2023 LearningRate 0.0099 Epoch: 13 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:22:14,152-[lfw][78000]XNorm: 22.107671 Training: 2022-04-27 06:22:14,152-[lfw][78000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-27 06:22:14,153-[lfw][78000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:22:44,487-[cfp_fp][78000]XNorm: 20.108242 Training: 2022-04-27 06:22:44,488-[cfp_fp][78000]Accuracy-Flip: 0.96943+-0.00860 Training: 2022-04-27 06:22:44,488-[cfp_fp][78000]Accuracy-Highest: 0.96943 Training: 2022-04-27 06:23:10,722-[agedb_30][78000]XNorm: 22.114465 Training: 2022-04-27 06:23:10,723-[agedb_30][78000]Accuracy-Flip: 0.97633+-0.00785 Training: 2022-04-27 06:23:10,723-[agedb_30][78000]Accuracy-Highest: 0.97933 Training: 2022-04-27 06:23:12,547-Speed 121.10 samples/sec Loss 3.1804 LearningRate 0.0099 Epoch: 13 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:14,381-Speed 5587.30 samples/sec Loss 3.1165 LearningRate 0.0099 Epoch: 13 Global Step: 78020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:16,200-Speed 5629.67 samples/sec Loss 3.1369 LearningRate 0.0098 Epoch: 13 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:18,027-Speed 5607.22 samples/sec Loss 2.9713 LearningRate 0.0098 Epoch: 13 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:19,847-Speed 5627.11 samples/sec Loss 3.2287 LearningRate 0.0098 Epoch: 13 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:21,647-Speed 5692.61 samples/sec Loss 3.1772 LearningRate 0.0098 Epoch: 13 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:23,480-Speed 5587.72 samples/sec Loss 3.1251 LearningRate 0.0098 Epoch: 13 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:25,307-Speed 5608.03 samples/sec Loss 3.1798 LearningRate 0.0098 Epoch: 13 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:27,147-Speed 5566.93 samples/sec Loss 3.2302 LearningRate 0.0098 Epoch: 13 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:29,000-Speed 5526.67 samples/sec Loss 3.0805 LearningRate 0.0098 Epoch: 13 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:30,806-Speed 5673.23 samples/sec Loss 3.0584 LearningRate 0.0098 Epoch: 13 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:32,636-Speed 5597.09 samples/sec Loss 3.1085 LearningRate 0.0098 Epoch: 13 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:34,476-Speed 5566.50 samples/sec Loss 3.1274 LearningRate 0.0098 Epoch: 13 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:36,301-Speed 5613.25 samples/sec Loss 3.1534 LearningRate 0.0098 Epoch: 13 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:38,142-Speed 5561.70 samples/sec Loss 3.0922 LearningRate 0.0098 Epoch: 13 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:39,949-Speed 5671.44 samples/sec Loss 3.0799 LearningRate 0.0098 Epoch: 13 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:41,762-Speed 5648.78 samples/sec Loss 3.1042 LearningRate 0.0098 Epoch: 13 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:43,592-Speed 5596.98 samples/sec Loss 3.2598 LearningRate 0.0098 Epoch: 13 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:45,412-Speed 5629.37 samples/sec Loss 3.1319 LearningRate 0.0098 Epoch: 13 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:47,249-Speed 5575.54 samples/sec Loss 3.1932 LearningRate 0.0098 Epoch: 13 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:49,087-Speed 5572.14 samples/sec Loss 3.1226 LearningRate 0.0098 Epoch: 13 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:50,920-Speed 5590.14 samples/sec Loss 3.1560 LearningRate 0.0097 Epoch: 13 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:52,759-Speed 5569.10 samples/sec Loss 2.9904 LearningRate 0.0097 Epoch: 13 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:54,597-Speed 5573.31 samples/sec Loss 2.9770 LearningRate 0.0097 Epoch: 13 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:56,435-Speed 5572.75 samples/sec Loss 3.1198 LearningRate 0.0097 Epoch: 13 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:23:58,255-Speed 5626.49 samples/sec Loss 3.0945 LearningRate 0.0097 Epoch: 13 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:24:00,078-Speed 5621.58 samples/sec Loss 3.0997 LearningRate 0.0097 Epoch: 13 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:01,917-Speed 5570.30 samples/sec Loss 3.1348 LearningRate 0.0097 Epoch: 13 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:03,741-Speed 5615.64 samples/sec Loss 3.0764 LearningRate 0.0097 Epoch: 13 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:05,558-Speed 5637.23 samples/sec Loss 3.0921 LearningRate 0.0097 Epoch: 13 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:07,375-Speed 5639.27 samples/sec Loss 3.1763 LearningRate 0.0097 Epoch: 13 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:09,183-Speed 5665.26 samples/sec Loss 3.1507 LearningRate 0.0097 Epoch: 13 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:10,995-Speed 5650.81 samples/sec Loss 3.1487 LearningRate 0.0097 Epoch: 13 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:12,808-Speed 5651.26 samples/sec Loss 3.1544 LearningRate 0.0097 Epoch: 13 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:14,626-Speed 5633.35 samples/sec Loss 3.1651 LearningRate 0.0097 Epoch: 13 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:16,444-Speed 5635.84 samples/sec Loss 3.2241 LearningRate 0.0097 Epoch: 13 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:18,258-Speed 5645.68 samples/sec Loss 3.0820 LearningRate 0.0097 Epoch: 13 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:20,093-Speed 5583.06 samples/sec Loss 3.2459 LearningRate 0.0097 Epoch: 13 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:21,923-Speed 5598.52 samples/sec Loss 3.1131 LearningRate 0.0097 Epoch: 13 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:23,749-Speed 5608.90 samples/sec Loss 3.1449 LearningRate 0.0096 Epoch: 13 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:25,657-Speed 5368.73 samples/sec Loss 3.1110 LearningRate 0.0096 Epoch: 13 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:27,504-Speed 5549.15 samples/sec Loss 3.0848 LearningRate 0.0096 Epoch: 13 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:29,430-Speed 5316.40 samples/sec Loss 3.1122 LearningRate 0.0096 Epoch: 13 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:31,319-Speed 5422.65 samples/sec Loss 3.1801 LearningRate 0.0096 Epoch: 13 Global Step: 78440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:33,164-Speed 5553.47 samples/sec Loss 3.0818 LearningRate 0.0096 Epoch: 13 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:34,996-Speed 5591.43 samples/sec Loss 3.2032 LearningRate 0.0096 Epoch: 13 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:36,815-Speed 5630.39 samples/sec Loss 3.2297 LearningRate 0.0096 Epoch: 13 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:24:38,623-Speed 5666.28 samples/sec Loss 3.1776 LearningRate 0.0096 Epoch: 13 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:40,456-Speed 5589.11 samples/sec Loss 3.1482 LearningRate 0.0096 Epoch: 13 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:42,279-Speed 5616.63 samples/sec Loss 2.9060 LearningRate 0.0096 Epoch: 13 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:44,095-Speed 5640.49 samples/sec Loss 3.1430 LearningRate 0.0096 Epoch: 13 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:45,922-Speed 5607.44 samples/sec Loss 3.1637 LearningRate 0.0096 Epoch: 13 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:47,773-Speed 5534.54 samples/sec Loss 3.0902 LearningRate 0.0096 Epoch: 13 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:49,606-Speed 5589.85 samples/sec Loss 3.1520 LearningRate 0.0096 Epoch: 13 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:51,440-Speed 5585.37 samples/sec Loss 3.0438 LearningRate 0.0096 Epoch: 13 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:53,260-Speed 5628.14 samples/sec Loss 3.1916 LearningRate 0.0096 Epoch: 13 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:55,090-Speed 5595.56 samples/sec Loss 2.9703 LearningRate 0.0096 Epoch: 13 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:56,932-Speed 5563.93 samples/sec Loss 3.0716 LearningRate 0.0095 Epoch: 13 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:24:58,773-Speed 5562.50 samples/sec Loss 3.1498 LearningRate 0.0095 Epoch: 13 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:00,664-Speed 5416.17 samples/sec Loss 3.1068 LearningRate 0.0095 Epoch: 13 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:02,514-Speed 5536.63 samples/sec Loss 3.1042 LearningRate 0.0095 Epoch: 13 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:04,342-Speed 5605.52 samples/sec Loss 3.2181 LearningRate 0.0095 Epoch: 13 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:06,164-Speed 5622.13 samples/sec Loss 2.9937 LearningRate 0.0095 Epoch: 13 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:07,984-Speed 5626.86 samples/sec Loss 3.2153 LearningRate 0.0095 Epoch: 13 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:09,814-Speed 5600.34 samples/sec Loss 3.0810 LearningRate 0.0095 Epoch: 13 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:11,658-Speed 5554.86 samples/sec Loss 3.1475 LearningRate 0.0095 Epoch: 13 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:13,512-Speed 5523.20 samples/sec Loss 3.2523 LearningRate 0.0095 Epoch: 13 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:15,326-Speed 5648.85 samples/sec Loss 3.0464 LearningRate 0.0095 Epoch: 13 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:17,159-Speed 5585.97 samples/sec Loss 3.1136 LearningRate 0.0095 Epoch: 13 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:18,992-Speed 5591.10 samples/sec Loss 3.1022 LearningRate 0.0095 Epoch: 13 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:20,827-Speed 5581.70 samples/sec Loss 3.0570 LearningRate 0.0095 Epoch: 13 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:22,654-Speed 5606.84 samples/sec Loss 3.0801 LearningRate 0.0095 Epoch: 13 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:24,482-Speed 5602.19 samples/sec Loss 3.1621 LearningRate 0.0095 Epoch: 13 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:26,295-Speed 5649.29 samples/sec Loss 3.2029 LearningRate 0.0095 Epoch: 13 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:28,113-Speed 5636.55 samples/sec Loss 3.0764 LearningRate 0.0095 Epoch: 13 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:29,943-Speed 5597.13 samples/sec Loss 3.1483 LearningRate 0.0095 Epoch: 13 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:31,868-Speed 5322.37 samples/sec Loss 2.9786 LearningRate 0.0094 Epoch: 13 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:33,752-Speed 5437.10 samples/sec Loss 3.1824 LearningRate 0.0094 Epoch: 13 Global Step: 78780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:25:35,560-Speed 5665.40 samples/sec Loss 3.1558 LearningRate 0.0094 Epoch: 13 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:37,412-Speed 5529.63 samples/sec Loss 3.2244 LearningRate 0.0094 Epoch: 13 Global Step: 78800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:39,250-Speed 5572.13 samples/sec Loss 2.9813 LearningRate 0.0094 Epoch: 13 Global Step: 78810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:41,086-Speed 5579.86 samples/sec Loss 3.1393 LearningRate 0.0094 Epoch: 13 Global Step: 78820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:42,919-Speed 5589.52 samples/sec Loss 3.0371 LearningRate 0.0094 Epoch: 13 Global Step: 78830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:44,743-Speed 5614.41 samples/sec Loss 3.0898 LearningRate 0.0094 Epoch: 13 Global Step: 78840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:46,574-Speed 5593.53 samples/sec Loss 3.0856 LearningRate 0.0094 Epoch: 13 Global Step: 78850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:48,393-Speed 5632.03 samples/sec Loss 3.0339 LearningRate 0.0094 Epoch: 13 Global Step: 78860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:50,234-Speed 5563.35 samples/sec Loss 3.0570 LearningRate 0.0094 Epoch: 13 Global Step: 78870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:52,053-Speed 5633.84 samples/sec Loss 3.1556 LearningRate 0.0094 Epoch: 13 Global Step: 78880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:53,888-Speed 5583.05 samples/sec Loss 3.1092 LearningRate 0.0094 Epoch: 13 Global Step: 78890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:25:55,707-Speed 5631.12 samples/sec Loss 3.1993 LearningRate 0.0094 Epoch: 13 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:57,543-Speed 5577.74 samples/sec Loss 3.1474 LearningRate 0.0094 Epoch: 13 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:25:59,396-Speed 5527.73 samples/sec Loss 3.0010 LearningRate 0.0094 Epoch: 13 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:01,231-Speed 5584.65 samples/sec Loss 3.1012 LearningRate 0.0094 Epoch: 13 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:03,064-Speed 5586.44 samples/sec Loss 3.0223 LearningRate 0.0094 Epoch: 13 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:04,897-Speed 5588.40 samples/sec Loss 2.9832 LearningRate 0.0093 Epoch: 13 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:06,746-Speed 5539.77 samples/sec Loss 3.0362 LearningRate 0.0093 Epoch: 13 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:08,567-Speed 5626.74 samples/sec Loss 3.0773 LearningRate 0.0093 Epoch: 13 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:10,397-Speed 5596.84 samples/sec Loss 3.0794 LearningRate 0.0093 Epoch: 13 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:12,230-Speed 5587.91 samples/sec Loss 3.0694 LearningRate 0.0093 Epoch: 13 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:14,031-Speed 5689.05 samples/sec Loss 2.9932 LearningRate 0.0093 Epoch: 13 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:15,844-Speed 5649.98 samples/sec Loss 3.1218 LearningRate 0.0093 Epoch: 13 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:17,682-Speed 5571.31 samples/sec Loss 3.1642 LearningRate 0.0093 Epoch: 13 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:19,497-Speed 5645.66 samples/sec Loss 2.9571 LearningRate 0.0093 Epoch: 13 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:21,324-Speed 5604.75 samples/sec Loss 3.0517 LearningRate 0.0093 Epoch: 13 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:23,168-Speed 5557.47 samples/sec Loss 3.1575 LearningRate 0.0093 Epoch: 13 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:25,001-Speed 5585.84 samples/sec Loss 3.0474 LearningRate 0.0093 Epoch: 13 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:26,840-Speed 5571.44 samples/sec Loss 3.0525 LearningRate 0.0093 Epoch: 13 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:28,667-Speed 5607.58 samples/sec Loss 3.0767 LearningRate 0.0093 Epoch: 13 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:30,500-Speed 5586.47 samples/sec Loss 3.1379 LearningRate 0.0093 Epoch: 13 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:32,316-Speed 5642.31 samples/sec Loss 3.0973 LearningRate 0.0093 Epoch: 13 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:34,140-Speed 5615.15 samples/sec Loss 3.1189 LearningRate 0.0093 Epoch: 13 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:35,982-Speed 5561.97 samples/sec Loss 3.1245 LearningRate 0.0093 Epoch: 13 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:37,810-Speed 5602.56 samples/sec Loss 3.1728 LearningRate 0.0093 Epoch: 13 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:39,637-Speed 5609.22 samples/sec Loss 2.9669 LearningRate 0.0092 Epoch: 13 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:41,463-Speed 5609.29 samples/sec Loss 3.0826 LearningRate 0.0092 Epoch: 13 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:43,271-Speed 5665.22 samples/sec Loss 3.1966 LearningRate 0.0092 Epoch: 13 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:45,102-Speed 5593.90 samples/sec Loss 3.0350 LearningRate 0.0092 Epoch: 13 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:46,928-Speed 5610.87 samples/sec Loss 3.1505 LearningRate 0.0092 Epoch: 13 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:48,742-Speed 5644.94 samples/sec Loss 3.1635 LearningRate 0.0092 Epoch: 13 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:50,567-Speed 5614.86 samples/sec Loss 2.9794 LearningRate 0.0092 Epoch: 13 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:52,380-Speed 5650.50 samples/sec Loss 3.1230 LearningRate 0.0092 Epoch: 13 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:54,200-Speed 5627.45 samples/sec Loss 3.0098 LearningRate 0.0092 Epoch: 13 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:56,035-Speed 5583.10 samples/sec Loss 3.0903 LearningRate 0.0092 Epoch: 13 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:57,847-Speed 5651.51 samples/sec Loss 3.1034 LearningRate 0.0092 Epoch: 13 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:26:59,668-Speed 5625.67 samples/sec Loss 3.1360 LearningRate 0.0092 Epoch: 13 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:01,518-Speed 5536.52 samples/sec Loss 3.1063 LearningRate 0.0092 Epoch: 13 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:03,366-Speed 5544.91 samples/sec Loss 3.0730 LearningRate 0.0092 Epoch: 13 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:05,200-Speed 5586.11 samples/sec Loss 3.2944 LearningRate 0.0092 Epoch: 13 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:07,026-Speed 5609.48 samples/sec Loss 3.1384 LearningRate 0.0092 Epoch: 13 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:08,843-Speed 5635.49 samples/sec Loss 3.0454 LearningRate 0.0092 Epoch: 13 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:10,656-Speed 5651.10 samples/sec Loss 3.0608 LearningRate 0.0092 Epoch: 13 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:12,481-Speed 5612.80 samples/sec Loss 2.9834 LearningRate 0.0092 Epoch: 13 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:14,324-Speed 5557.92 samples/sec Loss 3.0983 LearningRate 0.0091 Epoch: 13 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:16,137-Speed 5650.26 samples/sec Loss 3.1103 LearningRate 0.0091 Epoch: 13 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:17,955-Speed 5633.65 samples/sec Loss 3.1127 LearningRate 0.0091 Epoch: 13 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:19,777-Speed 5621.39 samples/sec Loss 3.0927 LearningRate 0.0091 Epoch: 13 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:21,605-Speed 5604.96 samples/sec Loss 2.9989 LearningRate 0.0091 Epoch: 13 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:23,441-Speed 5581.09 samples/sec Loss 3.0729 LearningRate 0.0091 Epoch: 13 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:25,281-Speed 5565.35 samples/sec Loss 3.0597 LearningRate 0.0091 Epoch: 13 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:27,085-Speed 5677.94 samples/sec Loss 3.1341 LearningRate 0.0091 Epoch: 13 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:28,899-Speed 5646.10 samples/sec Loss 3.0121 LearningRate 0.0091 Epoch: 13 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:30,719-Speed 5627.54 samples/sec Loss 3.0863 LearningRate 0.0091 Epoch: 13 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:32,548-Speed 5602.94 samples/sec Loss 3.1486 LearningRate 0.0091 Epoch: 13 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:34,369-Speed 5624.65 samples/sec Loss 3.0912 LearningRate 0.0091 Epoch: 13 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:36,209-Speed 5566.25 samples/sec Loss 3.0433 LearningRate 0.0091 Epoch: 13 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:38,037-Speed 5603.26 samples/sec Loss 3.0184 LearningRate 0.0091 Epoch: 13 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:39,863-Speed 5611.52 samples/sec Loss 3.1382 LearningRate 0.0091 Epoch: 13 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:41,675-Speed 5654.10 samples/sec Loss 2.9810 LearningRate 0.0091 Epoch: 13 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:43,499-Speed 5616.74 samples/sec Loss 3.0112 LearningRate 0.0091 Epoch: 13 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:45,324-Speed 5612.62 samples/sec Loss 3.0422 LearningRate 0.0091 Epoch: 13 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:27:47,215-Speed 5415.47 samples/sec Loss 3.2039 LearningRate 0.0090 Epoch: 13 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:49,043-Speed 5604.56 samples/sec Loss 3.1363 LearningRate 0.0090 Epoch: 13 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:50,874-Speed 5595.00 samples/sec Loss 3.0258 LearningRate 0.0090 Epoch: 13 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:52,699-Speed 5613.51 samples/sec Loss 3.1058 LearningRate 0.0090 Epoch: 13 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:54,519-Speed 5626.71 samples/sec Loss 2.8801 LearningRate 0.0090 Epoch: 13 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:56,340-Speed 5625.71 samples/sec Loss 3.0525 LearningRate 0.0090 Epoch: 13 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:58,173-Speed 5587.67 samples/sec Loss 3.0144 LearningRate 0.0090 Epoch: 13 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:27:59,988-Speed 5642.01 samples/sec Loss 3.1131 LearningRate 0.0090 Epoch: 13 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:01,816-Speed 5604.51 samples/sec Loss 3.0663 LearningRate 0.0090 Epoch: 13 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:03,699-Speed 5441.31 samples/sec Loss 3.0415 LearningRate 0.0090 Epoch: 13 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:15,201-Speed 890.38 samples/sec Loss 2.7140 LearningRate 0.0090 Epoch: 14 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:17,052-Speed 5535.03 samples/sec Loss 2.5417 LearningRate 0.0090 Epoch: 14 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:18,901-Speed 5540.43 samples/sec Loss 2.4799 LearningRate 0.0090 Epoch: 14 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:20,720-Speed 5630.64 samples/sec Loss 2.4051 LearningRate 0.0090 Epoch: 14 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:22,821-Speed 4874.23 samples/sec Loss 2.3981 LearningRate 0.0090 Epoch: 14 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:24,670-Speed 5541.43 samples/sec Loss 2.4373 LearningRate 0.0090 Epoch: 14 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:26,491-Speed 5624.78 samples/sec Loss 2.4530 LearningRate 0.0090 Epoch: 14 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:28,333-Speed 5560.14 samples/sec Loss 2.3599 LearningRate 0.0090 Epoch: 14 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:30,166-Speed 5590.27 samples/sec Loss 2.4640 LearningRate 0.0090 Epoch: 14 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:31,995-Speed 5598.43 samples/sec Loss 2.4447 LearningRate 0.0089 Epoch: 14 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:33,846-Speed 5537.11 samples/sec Loss 2.4108 LearningRate 0.0089 Epoch: 14 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:35,658-Speed 5652.73 samples/sec Loss 2.5049 LearningRate 0.0089 Epoch: 14 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:37,480-Speed 5622.24 samples/sec Loss 2.4285 LearningRate 0.0089 Epoch: 14 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:39,300-Speed 5627.14 samples/sec Loss 2.3776 LearningRate 0.0089 Epoch: 14 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:41,114-Speed 5648.97 samples/sec Loss 2.5663 LearningRate 0.0089 Epoch: 14 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:42,962-Speed 5540.37 samples/sec Loss 2.2988 LearningRate 0.0089 Epoch: 14 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:44,793-Speed 5596.35 samples/sec Loss 2.4592 LearningRate 0.0089 Epoch: 14 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:46,610-Speed 5636.43 samples/sec Loss 2.4305 LearningRate 0.0089 Epoch: 14 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:48,446-Speed 5580.67 samples/sec Loss 2.5396 LearningRate 0.0089 Epoch: 14 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:50,267-Speed 5622.51 samples/sec Loss 2.4922 LearningRate 0.0089 Epoch: 14 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:52,093-Speed 5611.95 samples/sec Loss 2.6125 LearningRate 0.0089 Epoch: 14 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:53,924-Speed 5594.97 samples/sec Loss 2.4930 LearningRate 0.0089 Epoch: 14 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:55,824-Speed 5392.24 samples/sec Loss 2.4609 LearningRate 0.0089 Epoch: 14 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:57,653-Speed 5598.49 samples/sec Loss 2.5205 LearningRate 0.0089 Epoch: 14 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:28:59,473-Speed 5630.57 samples/sec Loss 2.4954 LearningRate 0.0089 Epoch: 14 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:01,298-Speed 5612.54 samples/sec Loss 2.3699 LearningRate 0.0089 Epoch: 14 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:03,134-Speed 5577.83 samples/sec Loss 2.4299 LearningRate 0.0089 Epoch: 14 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:04,965-Speed 5596.25 samples/sec Loss 2.5378 LearningRate 0.0089 Epoch: 14 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:06,786-Speed 5624.33 samples/sec Loss 2.5353 LearningRate 0.0088 Epoch: 14 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:08,632-Speed 5549.51 samples/sec Loss 2.5685 LearningRate 0.0088 Epoch: 14 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:10,484-Speed 5529.03 samples/sec Loss 2.5203 LearningRate 0.0088 Epoch: 14 Global Step: 79910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:29:12,338-Speed 5527.48 samples/sec Loss 2.5097 LearningRate 0.0088 Epoch: 14 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:14,178-Speed 5566.29 samples/sec Loss 2.5782 LearningRate 0.0088 Epoch: 14 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:16,011-Speed 5588.59 samples/sec Loss 2.3907 LearningRate 0.0088 Epoch: 14 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:17,823-Speed 5653.79 samples/sec Loss 2.6297 LearningRate 0.0088 Epoch: 14 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:19,667-Speed 5555.70 samples/sec Loss 2.5892 LearningRate 0.0088 Epoch: 14 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:21,504-Speed 5575.75 samples/sec Loss 2.6074 LearningRate 0.0088 Epoch: 14 Global Step: 79970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:23,344-Speed 5568.19 samples/sec Loss 2.6253 LearningRate 0.0088 Epoch: 14 Global Step: 79980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:25,186-Speed 5558.69 samples/sec Loss 2.5336 LearningRate 0.0088 Epoch: 14 Global Step: 79990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:27,020-Speed 5585.86 samples/sec Loss 2.3842 LearningRate 0.0088 Epoch: 14 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:29:53,097-[lfw][80000]XNorm: 21.572815 Training: 2022-04-27 06:29:53,097-[lfw][80000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-27 06:29:53,098-[lfw][80000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:30:23,335-[cfp_fp][80000]XNorm: 20.100877 Training: 2022-04-27 06:30:23,336-[cfp_fp][80000]Accuracy-Flip: 0.96543+-0.00792 Training: 2022-04-27 06:30:23,336-[cfp_fp][80000]Accuracy-Highest: 0.96943 Training: 2022-04-27 06:30:49,423-[agedb_30][80000]XNorm: 21.515273 Training: 2022-04-27 06:30:49,423-[agedb_30][80000]Accuracy-Flip: 0.98017+-0.00765 Training: 2022-04-27 06:30:49,424-[agedb_30][80000]Accuracy-Highest: 0.98017 Training: 2022-04-27 06:30:51,330-Speed 121.46 samples/sec Loss 2.5493 LearningRate 0.0088 Epoch: 14 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:30:53,206-Speed 5458.12 samples/sec Loss 2.5403 LearningRate 0.0088 Epoch: 14 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:30:55,017-Speed 5657.06 samples/sec Loss 2.5544 LearningRate 0.0088 Epoch: 14 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:30:56,846-Speed 5601.78 samples/sec Loss 2.4983 LearningRate 0.0088 Epoch: 14 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:30:58,678-Speed 5591.05 samples/sec Loss 2.5965 LearningRate 0.0088 Epoch: 14 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:00,574-Speed 5403.10 samples/sec Loss 2.6477 LearningRate 0.0088 Epoch: 14 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:02,422-Speed 5541.52 samples/sec Loss 2.5105 LearningRate 0.0088 Epoch: 14 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:04,236-Speed 5646.85 samples/sec Loss 2.5666 LearningRate 0.0088 Epoch: 14 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:06,044-Speed 5665.81 samples/sec Loss 2.5044 LearningRate 0.0087 Epoch: 14 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:07,862-Speed 5634.90 samples/sec Loss 2.5494 LearningRate 0.0087 Epoch: 14 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:09,699-Speed 5574.34 samples/sec Loss 2.4999 LearningRate 0.0087 Epoch: 14 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:11,538-Speed 5571.25 samples/sec Loss 2.5314 LearningRate 0.0087 Epoch: 14 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:13,404-Speed 5489.14 samples/sec Loss 2.5580 LearningRate 0.0087 Epoch: 14 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:15,234-Speed 5598.51 samples/sec Loss 2.5478 LearningRate 0.0087 Epoch: 14 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:17,074-Speed 5567.64 samples/sec Loss 2.5406 LearningRate 0.0087 Epoch: 14 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:18,896-Speed 5621.85 samples/sec Loss 2.4578 LearningRate 0.0087 Epoch: 14 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:20,739-Speed 5557.72 samples/sec Loss 2.6078 LearningRate 0.0087 Epoch: 14 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:22,567-Speed 5603.65 samples/sec Loss 2.5359 LearningRate 0.0087 Epoch: 14 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:24,378-Speed 5656.22 samples/sec Loss 2.6744 LearningRate 0.0087 Epoch: 14 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:26,192-Speed 5645.78 samples/sec Loss 2.5919 LearningRate 0.0087 Epoch: 14 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:27,998-Speed 5671.06 samples/sec Loss 2.6020 LearningRate 0.0087 Epoch: 14 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:29,817-Speed 5633.25 samples/sec Loss 2.5346 LearningRate 0.0087 Epoch: 14 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:31:31,626-Speed 5661.34 samples/sec Loss 2.5246 LearningRate 0.0087 Epoch: 14 Global Step: 80230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:33,471-Speed 5553.40 samples/sec Loss 2.5538 LearningRate 0.0087 Epoch: 14 Global Step: 80240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:35,281-Speed 5657.46 samples/sec Loss 2.6539 LearningRate 0.0087 Epoch: 14 Global Step: 80250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:37,091-Speed 5659.75 samples/sec Loss 2.5824 LearningRate 0.0087 Epoch: 14 Global Step: 80260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:38,899-Speed 5667.57 samples/sec Loss 2.5378 LearningRate 0.0087 Epoch: 14 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:40,723-Speed 5616.27 samples/sec Loss 2.5661 LearningRate 0.0086 Epoch: 14 Global Step: 80280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:42,557-Speed 5583.13 samples/sec Loss 2.6032 LearningRate 0.0086 Epoch: 14 Global Step: 80290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:44,368-Speed 5657.98 samples/sec Loss 2.4726 LearningRate 0.0086 Epoch: 14 Global Step: 80300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:46,182-Speed 5647.09 samples/sec Loss 2.5398 LearningRate 0.0086 Epoch: 14 Global Step: 80310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:48,020-Speed 5571.84 samples/sec Loss 2.5625 LearningRate 0.0086 Epoch: 14 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:49,841-Speed 5625.62 samples/sec Loss 2.5828 LearningRate 0.0086 Epoch: 14 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:51,667-Speed 5609.13 samples/sec Loss 2.5556 LearningRate 0.0086 Epoch: 14 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:53,481-Speed 5647.71 samples/sec Loss 2.5860 LearningRate 0.0086 Epoch: 14 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:55,296-Speed 5643.82 samples/sec Loss 2.7246 LearningRate 0.0086 Epoch: 14 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:57,121-Speed 5610.50 samples/sec Loss 2.6573 LearningRate 0.0086 Epoch: 14 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:31:58,949-Speed 5604.14 samples/sec Loss 2.5630 LearningRate 0.0086 Epoch: 14 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:00,791-Speed 5564.08 samples/sec Loss 2.5024 LearningRate 0.0086 Epoch: 14 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:02,623-Speed 5592.01 samples/sec Loss 2.6206 LearningRate 0.0086 Epoch: 14 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:04,441-Speed 5634.08 samples/sec Loss 2.6888 LearningRate 0.0086 Epoch: 14 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:06,265-Speed 5615.62 samples/sec Loss 2.5449 LearningRate 0.0086 Epoch: 14 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:08,059-Speed 5709.11 samples/sec Loss 2.4515 LearningRate 0.0086 Epoch: 14 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:09,885-Speed 5608.63 samples/sec Loss 2.6062 LearningRate 0.0086 Epoch: 14 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:11,727-Speed 5560.86 samples/sec Loss 2.6769 LearningRate 0.0086 Epoch: 14 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:13,566-Speed 5570.94 samples/sec Loss 2.6321 LearningRate 0.0086 Epoch: 14 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:15,392-Speed 5609.17 samples/sec Loss 2.5478 LearningRate 0.0085 Epoch: 14 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:17,216-Speed 5616.96 samples/sec Loss 2.5696 LearningRate 0.0085 Epoch: 14 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:19,021-Speed 5674.05 samples/sec Loss 2.6134 LearningRate 0.0085 Epoch: 14 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:20,830-Speed 5662.28 samples/sec Loss 2.6026 LearningRate 0.0085 Epoch: 14 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:22,647-Speed 5638.43 samples/sec Loss 2.6213 LearningRate 0.0085 Epoch: 14 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:24,470-Speed 5619.05 samples/sec Loss 2.6929 LearningRate 0.0085 Epoch: 14 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:26,285-Speed 5644.12 samples/sec Loss 2.6161 LearningRate 0.0085 Epoch: 14 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:32:28,101-Speed 5639.21 samples/sec Loss 2.6082 LearningRate 0.0085 Epoch: 14 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:29,931-Speed 5598.03 samples/sec Loss 2.6579 LearningRate 0.0085 Epoch: 14 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:31,749-Speed 5635.36 samples/sec Loss 2.6596 LearningRate 0.0085 Epoch: 14 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:33,566-Speed 5638.83 samples/sec Loss 2.7042 LearningRate 0.0085 Epoch: 14 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:35,391-Speed 5611.89 samples/sec Loss 2.6147 LearningRate 0.0085 Epoch: 14 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:37,200-Speed 5661.23 samples/sec Loss 2.5555 LearningRate 0.0085 Epoch: 14 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:39,023-Speed 5620.16 samples/sec Loss 2.4880 LearningRate 0.0085 Epoch: 14 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:40,835-Speed 5651.85 samples/sec Loss 2.7156 LearningRate 0.0085 Epoch: 14 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:42,661-Speed 5609.29 samples/sec Loss 2.6421 LearningRate 0.0085 Epoch: 14 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:44,481-Speed 5628.12 samples/sec Loss 2.6440 LearningRate 0.0085 Epoch: 14 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:46,289-Speed 5668.84 samples/sec Loss 2.5904 LearningRate 0.0085 Epoch: 14 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:32:48,091-Speed 5683.35 samples/sec Loss 2.6400 LearningRate 0.0085 Epoch: 14 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:49,919-Speed 5603.08 samples/sec Loss 2.5493 LearningRate 0.0085 Epoch: 14 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:51,760-Speed 5565.00 samples/sec Loss 2.6076 LearningRate 0.0084 Epoch: 14 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:53,584-Speed 5614.55 samples/sec Loss 2.6607 LearningRate 0.0084 Epoch: 14 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:55,414-Speed 5598.28 samples/sec Loss 2.6424 LearningRate 0.0084 Epoch: 14 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:57,245-Speed 5594.87 samples/sec Loss 2.6506 LearningRate 0.0084 Epoch: 14 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:32:59,067-Speed 5622.16 samples/sec Loss 2.6532 LearningRate 0.0084 Epoch: 14 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:00,885-Speed 5632.47 samples/sec Loss 2.6301 LearningRate 0.0084 Epoch: 14 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:02,716-Speed 5595.69 samples/sec Loss 2.6032 LearningRate 0.0084 Epoch: 14 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:04,545-Speed 5600.49 samples/sec Loss 2.6862 LearningRate 0.0084 Epoch: 14 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:06,364-Speed 5630.86 samples/sec Loss 2.6966 LearningRate 0.0084 Epoch: 14 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:08,181-Speed 5637.60 samples/sec Loss 2.6161 LearningRate 0.0084 Epoch: 14 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:10,006-Speed 5613.37 samples/sec Loss 2.6772 LearningRate 0.0084 Epoch: 14 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:11,830-Speed 5617.76 samples/sec Loss 2.6328 LearningRate 0.0084 Epoch: 14 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:13,664-Speed 5582.55 samples/sec Loss 2.6517 LearningRate 0.0084 Epoch: 14 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:15,482-Speed 5635.11 samples/sec Loss 2.6404 LearningRate 0.0084 Epoch: 14 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:17,319-Speed 5577.57 samples/sec Loss 2.6402 LearningRate 0.0084 Epoch: 14 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:19,153-Speed 5585.11 samples/sec Loss 2.5943 LearningRate 0.0084 Epoch: 14 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:20,978-Speed 5613.17 samples/sec Loss 2.6991 LearningRate 0.0084 Epoch: 14 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:22,802-Speed 5615.93 samples/sec Loss 2.6580 LearningRate 0.0084 Epoch: 14 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:24,637-Speed 5581.31 samples/sec Loss 2.6909 LearningRate 0.0084 Epoch: 14 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:26,500-Speed 5499.38 samples/sec Loss 2.6280 LearningRate 0.0083 Epoch: 14 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:28,349-Speed 5539.58 samples/sec Loss 2.5851 LearningRate 0.0083 Epoch: 14 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:30,190-Speed 5564.04 samples/sec Loss 2.5787 LearningRate 0.0083 Epoch: 14 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:32,018-Speed 5603.26 samples/sec Loss 2.6209 LearningRate 0.0083 Epoch: 14 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:33,833-Speed 5644.03 samples/sec Loss 2.6508 LearningRate 0.0083 Epoch: 14 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:35,656-Speed 5620.21 samples/sec Loss 2.6802 LearningRate 0.0083 Epoch: 14 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:37,469-Speed 5650.98 samples/sec Loss 2.6678 LearningRate 0.0083 Epoch: 14 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:39,281-Speed 5652.17 samples/sec Loss 2.5954 LearningRate 0.0083 Epoch: 14 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:41,110-Speed 5601.33 samples/sec Loss 2.6897 LearningRate 0.0083 Epoch: 14 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:42,920-Speed 5659.90 samples/sec Loss 2.6591 LearningRate 0.0083 Epoch: 14 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:44,734-Speed 5646.34 samples/sec Loss 2.6514 LearningRate 0.0083 Epoch: 14 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:46,562-Speed 5603.97 samples/sec Loss 2.7448 LearningRate 0.0083 Epoch: 14 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:48,393-Speed 5594.61 samples/sec Loss 2.6299 LearningRate 0.0083 Epoch: 14 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:50,208-Speed 5641.45 samples/sec Loss 2.5948 LearningRate 0.0083 Epoch: 14 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:52,034-Speed 5611.55 samples/sec Loss 2.7235 LearningRate 0.0083 Epoch: 14 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:53,857-Speed 5619.83 samples/sec Loss 2.6646 LearningRate 0.0083 Epoch: 14 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:55,673-Speed 5639.86 samples/sec Loss 2.7433 LearningRate 0.0083 Epoch: 14 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:57,492-Speed 5630.40 samples/sec Loss 2.6813 LearningRate 0.0083 Epoch: 14 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:33:59,314-Speed 5622.59 samples/sec Loss 2.6954 LearningRate 0.0083 Epoch: 14 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:01,135-Speed 5625.72 samples/sec Loss 2.7293 LearningRate 0.0083 Epoch: 14 Global Step: 81050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:34:02,943-Speed 5664.73 samples/sec Loss 2.5844 LearningRate 0.0082 Epoch: 14 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:04,773-Speed 5599.81 samples/sec Loss 2.6387 LearningRate 0.0082 Epoch: 14 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:06,593-Speed 5626.10 samples/sec Loss 2.6054 LearningRate 0.0082 Epoch: 14 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:08,407-Speed 5647.33 samples/sec Loss 2.6985 LearningRate 0.0082 Epoch: 14 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:10,220-Speed 5648.74 samples/sec Loss 2.6797 LearningRate 0.0082 Epoch: 14 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:12,041-Speed 5626.40 samples/sec Loss 2.6442 LearningRate 0.0082 Epoch: 14 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:13,878-Speed 5575.95 samples/sec Loss 2.7806 LearningRate 0.0082 Epoch: 14 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:15,693-Speed 5644.56 samples/sec Loss 2.6337 LearningRate 0.0082 Epoch: 14 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:17,517-Speed 5615.65 samples/sec Loss 2.7110 LearningRate 0.0082 Epoch: 14 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:19,363-Speed 5549.43 samples/sec Loss 2.6514 LearningRate 0.0082 Epoch: 14 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:21,176-Speed 5650.40 samples/sec Loss 2.6824 LearningRate 0.0082 Epoch: 14 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:22,995-Speed 5629.54 samples/sec Loss 2.5746 LearningRate 0.0082 Epoch: 14 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:24,823-Speed 5604.09 samples/sec Loss 2.6901 LearningRate 0.0082 Epoch: 14 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:26,640-Speed 5637.97 samples/sec Loss 2.6439 LearningRate 0.0082 Epoch: 14 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:28,480-Speed 5568.55 samples/sec Loss 2.6839 LearningRate 0.0082 Epoch: 14 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:30,321-Speed 5563.77 samples/sec Loss 2.6643 LearningRate 0.0082 Epoch: 14 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:32,152-Speed 5593.29 samples/sec Loss 2.6027 LearningRate 0.0082 Epoch: 14 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:33,974-Speed 5621.97 samples/sec Loss 2.6151 LearningRate 0.0082 Epoch: 14 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:35,796-Speed 5622.05 samples/sec Loss 2.7181 LearningRate 0.0082 Epoch: 14 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:37,693-Speed 5402.13 samples/sec Loss 2.6963 LearningRate 0.0082 Epoch: 14 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:39,556-Speed 5497.26 samples/sec Loss 2.6792 LearningRate 0.0081 Epoch: 14 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:41,381-Speed 5615.69 samples/sec Loss 2.5918 LearningRate 0.0081 Epoch: 14 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:43,232-Speed 5533.22 samples/sec Loss 2.7227 LearningRate 0.0081 Epoch: 14 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:45,090-Speed 5511.38 samples/sec Loss 2.6744 LearningRate 0.0081 Epoch: 14 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:46,921-Speed 5596.59 samples/sec Loss 2.6425 LearningRate 0.0081 Epoch: 14 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:48,754-Speed 5586.29 samples/sec Loss 2.7675 LearningRate 0.0081 Epoch: 14 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:50,591-Speed 5577.28 samples/sec Loss 2.6551 LearningRate 0.0081 Epoch: 14 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:52,443-Speed 5529.37 samples/sec Loss 2.6578 LearningRate 0.0081 Epoch: 14 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:54,318-Speed 5464.73 samples/sec Loss 2.5974 LearningRate 0.0081 Epoch: 14 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:56,143-Speed 5612.15 samples/sec Loss 2.6856 LearningRate 0.0081 Epoch: 14 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:34:57,966-Speed 5619.77 samples/sec Loss 2.6348 LearningRate 0.0081 Epoch: 14 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:34:59,807-Speed 5564.67 samples/sec Loss 2.5623 LearningRate 0.0081 Epoch: 14 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:01,640-Speed 5587.53 samples/sec Loss 2.5640 LearningRate 0.0081 Epoch: 14 Global Step: 81380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:03,462-Speed 5624.03 samples/sec Loss 2.6257 LearningRate 0.0081 Epoch: 14 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:05,270-Speed 5665.73 samples/sec Loss 2.6826 LearningRate 0.0081 Epoch: 14 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:07,081-Speed 5654.55 samples/sec Loss 2.5440 LearningRate 0.0081 Epoch: 14 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:08,891-Speed 5658.98 samples/sec Loss 2.6861 LearningRate 0.0081 Epoch: 14 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:10,715-Speed 5618.23 samples/sec Loss 2.7037 LearningRate 0.0081 Epoch: 14 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:12,554-Speed 5568.96 samples/sec Loss 2.7588 LearningRate 0.0081 Epoch: 14 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:14,383-Speed 5600.77 samples/sec Loss 2.6221 LearningRate 0.0081 Epoch: 14 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:16,207-Speed 5614.78 samples/sec Loss 2.7288 LearningRate 0.0080 Epoch: 14 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:18,031-Speed 5617.29 samples/sec Loss 2.6613 LearningRate 0.0080 Epoch: 14 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:19,847-Speed 5641.76 samples/sec Loss 2.6255 LearningRate 0.0080 Epoch: 14 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:21,738-Speed 5417.14 samples/sec Loss 2.7011 LearningRate 0.0080 Epoch: 14 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:23,555-Speed 5636.76 samples/sec Loss 2.7350 LearningRate 0.0080 Epoch: 14 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:25,372-Speed 5636.76 samples/sec Loss 2.6532 LearningRate 0.0080 Epoch: 14 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:27,201-Speed 5601.60 samples/sec Loss 2.6426 LearningRate 0.0080 Epoch: 14 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:29,011-Speed 5658.50 samples/sec Loss 2.7860 LearningRate 0.0080 Epoch: 14 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:30,824-Speed 5650.46 samples/sec Loss 2.6391 LearningRate 0.0080 Epoch: 14 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:32,633-Speed 5662.33 samples/sec Loss 2.6096 LearningRate 0.0080 Epoch: 14 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:34,449-Speed 5641.57 samples/sec Loss 2.6360 LearningRate 0.0080 Epoch: 14 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:36,275-Speed 5609.09 samples/sec Loss 2.7322 LearningRate 0.0080 Epoch: 14 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:38,156-Speed 5445.83 samples/sec Loss 2.6985 LearningRate 0.0080 Epoch: 14 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:39,991-Speed 5581.14 samples/sec Loss 2.6547 LearningRate 0.0080 Epoch: 14 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:41,827-Speed 5580.45 samples/sec Loss 2.7208 LearningRate 0.0080 Epoch: 14 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:43,746-Speed 5339.12 samples/sec Loss 2.5985 LearningRate 0.0080 Epoch: 14 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:45,617-Speed 5472.92 samples/sec Loss 2.7426 LearningRate 0.0080 Epoch: 14 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:47,463-Speed 5549.73 samples/sec Loss 2.7479 LearningRate 0.0080 Epoch: 14 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:49,298-Speed 5581.35 samples/sec Loss 2.6560 LearningRate 0.0080 Epoch: 14 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:51,122-Speed 5617.91 samples/sec Loss 2.6768 LearningRate 0.0080 Epoch: 14 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:52,987-Speed 5490.68 samples/sec Loss 2.8112 LearningRate 0.0079 Epoch: 14 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:54,813-Speed 5610.39 samples/sec Loss 2.7612 LearningRate 0.0079 Epoch: 14 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:56,637-Speed 5616.84 samples/sec Loss 2.6742 LearningRate 0.0079 Epoch: 14 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:35:58,454-Speed 5637.14 samples/sec Loss 2.6324 LearningRate 0.0079 Epoch: 14 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:00,267-Speed 5649.19 samples/sec Loss 2.6393 LearningRate 0.0079 Epoch: 14 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:02,091-Speed 5617.67 samples/sec Loss 2.5426 LearningRate 0.0079 Epoch: 14 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:03,899-Speed 5665.49 samples/sec Loss 2.8641 LearningRate 0.0079 Epoch: 14 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:05,724-Speed 5612.37 samples/sec Loss 2.6688 LearningRate 0.0079 Epoch: 14 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:07,540-Speed 5642.09 samples/sec Loss 2.6387 LearningRate 0.0079 Epoch: 14 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:09,366-Speed 5609.05 samples/sec Loss 2.6579 LearningRate 0.0079 Epoch: 14 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:11,179-Speed 5649.40 samples/sec Loss 2.6414 LearningRate 0.0079 Epoch: 14 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:12,993-Speed 5647.86 samples/sec Loss 2.6118 LearningRate 0.0079 Epoch: 14 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:14,804-Speed 5653.79 samples/sec Loss 2.6959 LearningRate 0.0079 Epoch: 14 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:16,630-Speed 5611.89 samples/sec Loss 2.7080 LearningRate 0.0079 Epoch: 14 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:18,456-Speed 5609.34 samples/sec Loss 2.7896 LearningRate 0.0079 Epoch: 14 Global Step: 81800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:20,283-Speed 5607.04 samples/sec Loss 2.6712 LearningRate 0.0079 Epoch: 14 Global Step: 81810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:22,100-Speed 5634.54 samples/sec Loss 2.8046 LearningRate 0.0079 Epoch: 14 Global Step: 81820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:23,921-Speed 5626.95 samples/sec Loss 2.7270 LearningRate 0.0079 Epoch: 14 Global Step: 81830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:25,751-Speed 5597.76 samples/sec Loss 2.6447 LearningRate 0.0079 Epoch: 14 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:27,590-Speed 5570.78 samples/sec Loss 2.7330 LearningRate 0.0079 Epoch: 14 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:29,430-Speed 5567.31 samples/sec Loss 2.6851 LearningRate 0.0078 Epoch: 14 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:31,240-Speed 5659.44 samples/sec Loss 2.5062 LearningRate 0.0078 Epoch: 14 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:33,083-Speed 5556.18 samples/sec Loss 2.7399 LearningRate 0.0078 Epoch: 14 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:34,902-Speed 5633.13 samples/sec Loss 2.7603 LearningRate 0.0078 Epoch: 14 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:36:36,727-Speed 5613.63 samples/sec Loss 2.8064 LearningRate 0.0078 Epoch: 14 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:38,569-Speed 5558.30 samples/sec Loss 2.6745 LearningRate 0.0078 Epoch: 14 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:40,394-Speed 5612.63 samples/sec Loss 2.7181 LearningRate 0.0078 Epoch: 14 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:42,231-Speed 5579.26 samples/sec Loss 2.6737 LearningRate 0.0078 Epoch: 14 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:44,069-Speed 5570.24 samples/sec Loss 2.7276 LearningRate 0.0078 Epoch: 14 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:45,926-Speed 5516.61 samples/sec Loss 2.7333 LearningRate 0.0078 Epoch: 14 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:47,772-Speed 5548.86 samples/sec Loss 2.7095 LearningRate 0.0078 Epoch: 14 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:49,609-Speed 5576.34 samples/sec Loss 2.7680 LearningRate 0.0078 Epoch: 14 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:51,436-Speed 5608.71 samples/sec Loss 2.6426 LearningRate 0.0078 Epoch: 14 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:53,262-Speed 5609.21 samples/sec Loss 2.8085 LearningRate 0.0078 Epoch: 14 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:36:55,101-Speed 5570.05 samples/sec Loss 2.6660 LearningRate 0.0078 Epoch: 14 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:37:21,391-[lfw][82000]XNorm: 21.839809 Training: 2022-04-27 06:37:21,392-[lfw][82000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-27 06:37:21,392-[lfw][82000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:37:51,847-[cfp_fp][82000]XNorm: 19.821888 Training: 2022-04-27 06:37:51,848-[cfp_fp][82000]Accuracy-Flip: 0.96857+-0.00761 Training: 2022-04-27 06:37:51,848-[cfp_fp][82000]Accuracy-Highest: 0.96943 Training: 2022-04-27 06:38:18,164-[agedb_30][82000]XNorm: 21.807388 Training: 2022-04-27 06:38:18,165-[agedb_30][82000]Accuracy-Flip: 0.97933+-0.00873 Training: 2022-04-27 06:38:18,165-[agedb_30][82000]Accuracy-Highest: 0.98017 Training: 2022-04-27 06:38:20,041-Speed 120.56 samples/sec Loss 2.7373 LearningRate 0.0078 Epoch: 14 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:21,868-Speed 5605.94 samples/sec Loss 2.8470 LearningRate 0.0078 Epoch: 14 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:23,687-Speed 5629.30 samples/sec Loss 2.7982 LearningRate 0.0078 Epoch: 14 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:25,500-Speed 5649.47 samples/sec Loss 2.7957 LearningRate 0.0078 Epoch: 14 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:27,314-Speed 5647.91 samples/sec Loss 2.7459 LearningRate 0.0078 Epoch: 14 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:29,138-Speed 5617.91 samples/sec Loss 2.7259 LearningRate 0.0078 Epoch: 14 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:30,953-Speed 5642.46 samples/sec Loss 2.6716 LearningRate 0.0077 Epoch: 14 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:32,792-Speed 5569.69 samples/sec Loss 2.6854 LearningRate 0.0077 Epoch: 14 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:34,599-Speed 5669.82 samples/sec Loss 2.5657 LearningRate 0.0077 Epoch: 14 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:36,444-Speed 5550.07 samples/sec Loss 2.5992 LearningRate 0.0077 Epoch: 14 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:38,268-Speed 5617.91 samples/sec Loss 2.7919 LearningRate 0.0077 Epoch: 14 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:40,075-Speed 5667.92 samples/sec Loss 2.6596 LearningRate 0.0077 Epoch: 14 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:41,899-Speed 5616.15 samples/sec Loss 2.6829 LearningRate 0.0077 Epoch: 14 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:43,725-Speed 5607.63 samples/sec Loss 2.7246 LearningRate 0.0077 Epoch: 14 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:45,540-Speed 5645.12 samples/sec Loss 2.6864 LearningRate 0.0077 Epoch: 14 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:47,383-Speed 5558.74 samples/sec Loss 2.7558 LearningRate 0.0077 Epoch: 14 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:49,229-Speed 5547.89 samples/sec Loss 2.6646 LearningRate 0.0077 Epoch: 14 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:51,057-Speed 5605.54 samples/sec Loss 2.7474 LearningRate 0.0077 Epoch: 14 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:52,884-Speed 5606.71 samples/sec Loss 2.7870 LearningRate 0.0077 Epoch: 14 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:54,722-Speed 5572.31 samples/sec Loss 2.6446 LearningRate 0.0077 Epoch: 14 Global Step: 82200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:38:56,539-Speed 5638.13 samples/sec Loss 2.6277 LearningRate 0.0077 Epoch: 14 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:38:58,347-Speed 5663.66 samples/sec Loss 2.6927 LearningRate 0.0077 Epoch: 14 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:00,175-Speed 5605.32 samples/sec Loss 2.6425 LearningRate 0.0077 Epoch: 14 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:02,001-Speed 5608.76 samples/sec Loss 2.6224 LearningRate 0.0077 Epoch: 14 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:03,836-Speed 5582.21 samples/sec Loss 2.7527 LearningRate 0.0077 Epoch: 14 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:05,659-Speed 5620.10 samples/sec Loss 2.7135 LearningRate 0.0077 Epoch: 14 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:07,475-Speed 5640.32 samples/sec Loss 2.6210 LearningRate 0.0076 Epoch: 14 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:09,316-Speed 5565.57 samples/sec Loss 2.6925 LearningRate 0.0076 Epoch: 14 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:11,132-Speed 5638.55 samples/sec Loss 2.7719 LearningRate 0.0076 Epoch: 14 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:12,949-Speed 5637.83 samples/sec Loss 2.7166 LearningRate 0.0076 Epoch: 14 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:14,751-Speed 5685.02 samples/sec Loss 2.6750 LearningRate 0.0076 Epoch: 14 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:16,580-Speed 5599.99 samples/sec Loss 2.6227 LearningRate 0.0076 Epoch: 14 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:18,405-Speed 5613.15 samples/sec Loss 2.7521 LearningRate 0.0076 Epoch: 14 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:20,228-Speed 5617.40 samples/sec Loss 2.7377 LearningRate 0.0076 Epoch: 14 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:22,069-Speed 5564.06 samples/sec Loss 2.7015 LearningRate 0.0076 Epoch: 14 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:23,890-Speed 5625.21 samples/sec Loss 2.6515 LearningRate 0.0076 Epoch: 14 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:25,706-Speed 5642.04 samples/sec Loss 2.7520 LearningRate 0.0076 Epoch: 14 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:27,531-Speed 5611.73 samples/sec Loss 2.6887 LearningRate 0.0076 Epoch: 14 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:29,362-Speed 5594.22 samples/sec Loss 2.6301 LearningRate 0.0076 Epoch: 14 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:31,200-Speed 5576.08 samples/sec Loss 2.6647 LearningRate 0.0076 Epoch: 14 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:33,018-Speed 5633.94 samples/sec Loss 2.6344 LearningRate 0.0076 Epoch: 14 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:34,843-Speed 5611.57 samples/sec Loss 2.7101 LearningRate 0.0076 Epoch: 14 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:36,669-Speed 5609.61 samples/sec Loss 2.7256 LearningRate 0.0076 Epoch: 14 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:38,512-Speed 5560.28 samples/sec Loss 2.6763 LearningRate 0.0076 Epoch: 14 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:40,333-Speed 5622.33 samples/sec Loss 2.6189 LearningRate 0.0076 Epoch: 14 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:42,170-Speed 5576.28 samples/sec Loss 2.6847 LearningRate 0.0076 Epoch: 14 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:43,995-Speed 5612.16 samples/sec Loss 2.6296 LearningRate 0.0076 Epoch: 14 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:45,828-Speed 5588.02 samples/sec Loss 2.6817 LearningRate 0.0075 Epoch: 14 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:47,655-Speed 5607.33 samples/sec Loss 2.7218 LearningRate 0.0075 Epoch: 14 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:49,518-Speed 5498.92 samples/sec Loss 2.6633 LearningRate 0.0075 Epoch: 14 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:51,350-Speed 5591.46 samples/sec Loss 2.6887 LearningRate 0.0075 Epoch: 14 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:53,169-Speed 5633.58 samples/sec Loss 2.7310 LearningRate 0.0075 Epoch: 14 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:54,996-Speed 5606.55 samples/sec Loss 2.6777 LearningRate 0.0075 Epoch: 14 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:56,816-Speed 5627.56 samples/sec Loss 2.7379 LearningRate 0.0075 Epoch: 14 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:39:58,634-Speed 5635.54 samples/sec Loss 2.7386 LearningRate 0.0075 Epoch: 14 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:00,467-Speed 5587.96 samples/sec Loss 2.6198 LearningRate 0.0075 Epoch: 14 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:02,283-Speed 5640.97 samples/sec Loss 2.7870 LearningRate 0.0075 Epoch: 14 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:04,110-Speed 5606.82 samples/sec Loss 2.5995 LearningRate 0.0075 Epoch: 14 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:05,935-Speed 5612.28 samples/sec Loss 2.7519 LearningRate 0.0075 Epoch: 14 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:07,763-Speed 5603.67 samples/sec Loss 2.7366 LearningRate 0.0075 Epoch: 14 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:09,579-Speed 5641.12 samples/sec Loss 2.7266 LearningRate 0.0075 Epoch: 14 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:11,399-Speed 5627.81 samples/sec Loss 2.5890 LearningRate 0.0075 Epoch: 14 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:13,266-Speed 5483.90 samples/sec Loss 2.7095 LearningRate 0.0075 Epoch: 14 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:15,090-Speed 5616.77 samples/sec Loss 2.7008 LearningRate 0.0075 Epoch: 14 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:16,928-Speed 5573.03 samples/sec Loss 2.6654 LearningRate 0.0075 Epoch: 14 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:18,781-Speed 5530.16 samples/sec Loss 2.6664 LearningRate 0.0075 Epoch: 14 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:20,605-Speed 5614.08 samples/sec Loss 2.6383 LearningRate 0.0075 Epoch: 14 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:22,433-Speed 5603.11 samples/sec Loss 2.6662 LearningRate 0.0075 Epoch: 14 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:24,291-Speed 5514.90 samples/sec Loss 2.6801 LearningRate 0.0074 Epoch: 14 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:26,109-Speed 5634.58 samples/sec Loss 2.6465 LearningRate 0.0074 Epoch: 14 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:27,913-Speed 5677.15 samples/sec Loss 2.6234 LearningRate 0.0074 Epoch: 14 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:29,734-Speed 5624.95 samples/sec Loss 2.7013 LearningRate 0.0074 Epoch: 14 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:31,559-Speed 5612.57 samples/sec Loss 2.7149 LearningRate 0.0074 Epoch: 14 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:33,377-Speed 5634.36 samples/sec Loss 2.7435 LearningRate 0.0074 Epoch: 14 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:35,194-Speed 5639.02 samples/sec Loss 2.7435 LearningRate 0.0074 Epoch: 14 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:37,021-Speed 5606.68 samples/sec Loss 2.6208 LearningRate 0.0074 Epoch: 14 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:38,861-Speed 5568.19 samples/sec Loss 2.7513 LearningRate 0.0074 Epoch: 14 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:40,679-Speed 5633.58 samples/sec Loss 2.7506 LearningRate 0.0074 Epoch: 14 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:42,499-Speed 5626.87 samples/sec Loss 2.7034 LearningRate 0.0074 Epoch: 14 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:44,325-Speed 5609.96 samples/sec Loss 2.6048 LearningRate 0.0074 Epoch: 14 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:46,133-Speed 5665.34 samples/sec Loss 2.7114 LearningRate 0.0074 Epoch: 14 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:47,953-Speed 5629.66 samples/sec Loss 2.7167 LearningRate 0.0074 Epoch: 14 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:49,767-Speed 5645.54 samples/sec Loss 2.8593 LearningRate 0.0074 Epoch: 14 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:51,593-Speed 5610.14 samples/sec Loss 2.7151 LearningRate 0.0074 Epoch: 14 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:53,455-Speed 5500.42 samples/sec Loss 2.6705 LearningRate 0.0074 Epoch: 14 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:55,284-Speed 5602.68 samples/sec Loss 2.6260 LearningRate 0.0074 Epoch: 14 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:57,105-Speed 5623.20 samples/sec Loss 2.6487 LearningRate 0.0074 Epoch: 14 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:40:58,934-Speed 5599.77 samples/sec Loss 2.6384 LearningRate 0.0074 Epoch: 14 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:00,750-Speed 5641.95 samples/sec Loss 2.5977 LearningRate 0.0073 Epoch: 14 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:02,574-Speed 5616.83 samples/sec Loss 2.6478 LearningRate 0.0073 Epoch: 14 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:04,396-Speed 5622.68 samples/sec Loss 2.7669 LearningRate 0.0073 Epoch: 14 Global Step: 82910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:41:06,206-Speed 5659.42 samples/sec Loss 2.7189 LearningRate 0.0073 Epoch: 14 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:08,018-Speed 5654.30 samples/sec Loss 2.7360 LearningRate 0.0073 Epoch: 14 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:09,849-Speed 5594.17 samples/sec Loss 2.6590 LearningRate 0.0073 Epoch: 14 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:11,663-Speed 5643.80 samples/sec Loss 2.7152 LearningRate 0.0073 Epoch: 14 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:13,482-Speed 5632.62 samples/sec Loss 2.7553 LearningRate 0.0073 Epoch: 14 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:15,307-Speed 5613.86 samples/sec Loss 2.6020 LearningRate 0.0073 Epoch: 14 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:17,126-Speed 5630.78 samples/sec Loss 2.6798 LearningRate 0.0073 Epoch: 14 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:18,936-Speed 5660.03 samples/sec Loss 2.6929 LearningRate 0.0073 Epoch: 14 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:20,755-Speed 5629.54 samples/sec Loss 2.5959 LearningRate 0.0073 Epoch: 14 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:22,573-Speed 5636.31 samples/sec Loss 2.5891 LearningRate 0.0073 Epoch: 14 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:24,399-Speed 5610.17 samples/sec Loss 2.6728 LearningRate 0.0073 Epoch: 14 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:26,212-Speed 5650.02 samples/sec Loss 2.6180 LearningRate 0.0073 Epoch: 14 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:28,025-Speed 5648.64 samples/sec Loss 2.8103 LearningRate 0.0073 Epoch: 14 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:29,846-Speed 5624.61 samples/sec Loss 2.7299 LearningRate 0.0073 Epoch: 14 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:31,678-Speed 5593.51 samples/sec Loss 2.6272 LearningRate 0.0073 Epoch: 14 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:33,488-Speed 5656.97 samples/sec Loss 2.6588 LearningRate 0.0073 Epoch: 14 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:35,304-Speed 5642.48 samples/sec Loss 2.7562 LearningRate 0.0073 Epoch: 14 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:37,121-Speed 5635.91 samples/sec Loss 2.6569 LearningRate 0.0073 Epoch: 14 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:38,957-Speed 5580.46 samples/sec Loss 2.7221 LearningRate 0.0072 Epoch: 14 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:40,779-Speed 5620.82 samples/sec Loss 2.6524 LearningRate 0.0072 Epoch: 14 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:42,591-Speed 5652.76 samples/sec Loss 2.6699 LearningRate 0.0072 Epoch: 14 Global Step: 83120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:44,414-Speed 5619.97 samples/sec Loss 2.7128 LearningRate 0.0072 Epoch: 14 Global Step: 83130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:46,225-Speed 5658.11 samples/sec Loss 2.6122 LearningRate 0.0072 Epoch: 14 Global Step: 83140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:48,046-Speed 5625.92 samples/sec Loss 2.7026 LearningRate 0.0072 Epoch: 14 Global Step: 83150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:49,858-Speed 5653.43 samples/sec Loss 2.7167 LearningRate 0.0072 Epoch: 14 Global Step: 83160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:51,724-Speed 5489.27 samples/sec Loss 2.7475 LearningRate 0.0072 Epoch: 14 Global Step: 83170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:53,603-Speed 5450.02 samples/sec Loss 2.6022 LearningRate 0.0072 Epoch: 14 Global Step: 83180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:41:55,445-Speed 5561.98 samples/sec Loss 2.6815 LearningRate 0.0072 Epoch: 14 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:57,299-Speed 5525.01 samples/sec Loss 2.6801 LearningRate 0.0072 Epoch: 14 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:41:59,130-Speed 5593.41 samples/sec Loss 2.6469 LearningRate 0.0072 Epoch: 14 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:00,968-Speed 5573.13 samples/sec Loss 2.7056 LearningRate 0.0072 Epoch: 14 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:02,807-Speed 5570.50 samples/sec Loss 2.6595 LearningRate 0.0072 Epoch: 14 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:04,651-Speed 5556.04 samples/sec Loss 2.6726 LearningRate 0.0072 Epoch: 14 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:06,474-Speed 5617.10 samples/sec Loss 2.7448 LearningRate 0.0072 Epoch: 14 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:08,290-Speed 5643.10 samples/sec Loss 2.6488 LearningRate 0.0072 Epoch: 14 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:10,140-Speed 5536.12 samples/sec Loss 2.6764 LearningRate 0.0072 Epoch: 14 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:11,961-Speed 5625.45 samples/sec Loss 2.5847 LearningRate 0.0072 Epoch: 14 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:13,785-Speed 5616.64 samples/sec Loss 2.6389 LearningRate 0.0072 Epoch: 14 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:42:15,589-Speed 5675.82 samples/sec Loss 2.5873 LearningRate 0.0072 Epoch: 14 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:17,415-Speed 5612.38 samples/sec Loss 2.7126 LearningRate 0.0072 Epoch: 14 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:19,248-Speed 5587.52 samples/sec Loss 2.6321 LearningRate 0.0071 Epoch: 14 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:21,068-Speed 5627.21 samples/sec Loss 2.6771 LearningRate 0.0071 Epoch: 14 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:22,890-Speed 5623.63 samples/sec Loss 2.6589 LearningRate 0.0071 Epoch: 14 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:24,724-Speed 5582.21 samples/sec Loss 2.6939 LearningRate 0.0071 Epoch: 14 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:26,542-Speed 5635.43 samples/sec Loss 2.5897 LearningRate 0.0071 Epoch: 14 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:28,364-Speed 5622.19 samples/sec Loss 2.6019 LearningRate 0.0071 Epoch: 14 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:30,177-Speed 5652.35 samples/sec Loss 2.6322 LearningRate 0.0071 Epoch: 14 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:32,011-Speed 5585.37 samples/sec Loss 2.7290 LearningRate 0.0071 Epoch: 14 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:33,818-Speed 5667.59 samples/sec Loss 2.6604 LearningRate 0.0071 Epoch: 14 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:35,649-Speed 5596.17 samples/sec Loss 2.7264 LearningRate 0.0071 Epoch: 14 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:37,479-Speed 5595.40 samples/sec Loss 2.5909 LearningRate 0.0071 Epoch: 14 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:39,305-Speed 5611.03 samples/sec Loss 2.6985 LearningRate 0.0071 Epoch: 14 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:41,131-Speed 5608.71 samples/sec Loss 2.6158 LearningRate 0.0071 Epoch: 14 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:42,960-Speed 5601.20 samples/sec Loss 2.5564 LearningRate 0.0071 Epoch: 14 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:44,795-Speed 5579.97 samples/sec Loss 2.7593 LearningRate 0.0071 Epoch: 14 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:46,633-Speed 5575.13 samples/sec Loss 2.6954 LearningRate 0.0071 Epoch: 14 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:48,471-Speed 5573.32 samples/sec Loss 2.4597 LearningRate 0.0071 Epoch: 14 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:50,310-Speed 5570.43 samples/sec Loss 2.6270 LearningRate 0.0071 Epoch: 14 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:52,130-Speed 5626.53 samples/sec Loss 2.5593 LearningRate 0.0071 Epoch: 14 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:53,951-Speed 5624.99 samples/sec Loss 2.6342 LearningRate 0.0071 Epoch: 14 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:55,776-Speed 5614.95 samples/sec Loss 2.6691 LearningRate 0.0071 Epoch: 14 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:57,620-Speed 5553.30 samples/sec Loss 2.6969 LearningRate 0.0070 Epoch: 14 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:42:59,437-Speed 5639.80 samples/sec Loss 2.7596 LearningRate 0.0070 Epoch: 14 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:01,277-Speed 5567.39 samples/sec Loss 2.7612 LearningRate 0.0070 Epoch: 14 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:03,125-Speed 5540.54 samples/sec Loss 2.7060 LearningRate 0.0070 Epoch: 14 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:04,943-Speed 5637.22 samples/sec Loss 2.6652 LearningRate 0.0070 Epoch: 14 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:06,766-Speed 5616.17 samples/sec Loss 2.6612 LearningRate 0.0070 Epoch: 14 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:08,583-Speed 5637.45 samples/sec Loss 2.5481 LearningRate 0.0070 Epoch: 14 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:10,390-Speed 5669.24 samples/sec Loss 2.6439 LearningRate 0.0070 Epoch: 14 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:12,228-Speed 5573.90 samples/sec Loss 2.6683 LearningRate 0.0070 Epoch: 14 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:14,057-Speed 5601.26 samples/sec Loss 2.6370 LearningRate 0.0070 Epoch: 14 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:15,884-Speed 5606.10 samples/sec Loss 2.6307 LearningRate 0.0070 Epoch: 14 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:17,731-Speed 5545.37 samples/sec Loss 2.6809 LearningRate 0.0070 Epoch: 14 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:19,565-Speed 5586.23 samples/sec Loss 2.5866 LearningRate 0.0070 Epoch: 14 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:21,387-Speed 5623.43 samples/sec Loss 2.6670 LearningRate 0.0070 Epoch: 14 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:23,227-Speed 5564.18 samples/sec Loss 2.5566 LearningRate 0.0070 Epoch: 14 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:25,063-Speed 5579.05 samples/sec Loss 2.7552 LearningRate 0.0070 Epoch: 14 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:26,891-Speed 5604.43 samples/sec Loss 2.6978 LearningRate 0.0070 Epoch: 14 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:28,717-Speed 5609.57 samples/sec Loss 2.5905 LearningRate 0.0070 Epoch: 14 Global Step: 83700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:43:30,534-Speed 5637.31 samples/sec Loss 2.6565 LearningRate 0.0070 Epoch: 14 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:32,393-Speed 5512.22 samples/sec Loss 2.6497 LearningRate 0.0070 Epoch: 14 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:34,209-Speed 5637.56 samples/sec Loss 2.6513 LearningRate 0.0070 Epoch: 14 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:36,035-Speed 5613.25 samples/sec Loss 2.6481 LearningRate 0.0070 Epoch: 14 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:37,847-Speed 5652.68 samples/sec Loss 2.6149 LearningRate 0.0069 Epoch: 14 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:39,668-Speed 5624.04 samples/sec Loss 2.6596 LearningRate 0.0069 Epoch: 14 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:41,494-Speed 5610.52 samples/sec Loss 2.6330 LearningRate 0.0069 Epoch: 14 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:43,315-Speed 5624.22 samples/sec Loss 2.4894 LearningRate 0.0069 Epoch: 14 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:45,146-Speed 5594.77 samples/sec Loss 2.5485 LearningRate 0.0069 Epoch: 14 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:46,971-Speed 5611.78 samples/sec Loss 2.6688 LearningRate 0.0069 Epoch: 14 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:48,816-Speed 5553.94 samples/sec Loss 2.6589 LearningRate 0.0069 Epoch: 14 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:50,644-Speed 5604.19 samples/sec Loss 2.6538 LearningRate 0.0069 Epoch: 14 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:52,476-Speed 5590.24 samples/sec Loss 2.6860 LearningRate 0.0069 Epoch: 14 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:54,310-Speed 5585.96 samples/sec Loss 2.5738 LearningRate 0.0069 Epoch: 14 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:56,132-Speed 5621.64 samples/sec Loss 2.5919 LearningRate 0.0069 Epoch: 14 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:57,966-Speed 5586.46 samples/sec Loss 2.6158 LearningRate 0.0069 Epoch: 14 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:43:59,816-Speed 5536.47 samples/sec Loss 2.6846 LearningRate 0.0069 Epoch: 14 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:01,661-Speed 5552.03 samples/sec Loss 2.6813 LearningRate 0.0069 Epoch: 14 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:03,489-Speed 5604.55 samples/sec Loss 2.6300 LearningRate 0.0069 Epoch: 14 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:05,310-Speed 5624.85 samples/sec Loss 2.6099 LearningRate 0.0069 Epoch: 14 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:07,135-Speed 5611.33 samples/sec Loss 2.7045 LearningRate 0.0069 Epoch: 14 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:08,978-Speed 5557.75 samples/sec Loss 2.7896 LearningRate 0.0069 Epoch: 14 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:10,800-Speed 5622.29 samples/sec Loss 2.7009 LearningRate 0.0069 Epoch: 14 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:12,637-Speed 5577.00 samples/sec Loss 2.6710 LearningRate 0.0069 Epoch: 14 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:14,452-Speed 5644.38 samples/sec Loss 2.6449 LearningRate 0.0069 Epoch: 14 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:16,260-Speed 5664.87 samples/sec Loss 2.7128 LearningRate 0.0068 Epoch: 14 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:18,088-Speed 5604.39 samples/sec Loss 2.6024 LearningRate 0.0068 Epoch: 14 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:19,912-Speed 5615.24 samples/sec Loss 2.5303 LearningRate 0.0068 Epoch: 14 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:21,727-Speed 5644.27 samples/sec Loss 2.7667 LearningRate 0.0068 Epoch: 14 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:23,571-Speed 5556.45 samples/sec Loss 2.5808 LearningRate 0.0068 Epoch: 14 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:44:49,760-[lfw][84000]XNorm: 22.024569 Training: 2022-04-27 06:44:49,760-[lfw][84000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-04-27 06:44:49,761-[lfw][84000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:45:20,393-[cfp_fp][84000]XNorm: 20.142442 Training: 2022-04-27 06:45:20,393-[cfp_fp][84000]Accuracy-Flip: 0.97114+-0.00728 Training: 2022-04-27 06:45:20,394-[cfp_fp][84000]Accuracy-Highest: 0.97114 Training: 2022-04-27 06:45:46,759-[agedb_30][84000]XNorm: 22.211608 Training: 2022-04-27 06:45:46,760-[agedb_30][84000]Accuracy-Flip: 0.98117+-0.00730 Training: 2022-04-27 06:45:46,760-[agedb_30][84000]Accuracy-Highest: 0.98117 Training: 2022-04-27 06:45:48,574-Speed 120.47 samples/sec Loss 2.5513 LearningRate 0.0068 Epoch: 14 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:50,380-Speed 5673.84 samples/sec Loss 2.6131 LearningRate 0.0068 Epoch: 14 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:52,216-Speed 5577.93 samples/sec Loss 2.6097 LearningRate 0.0068 Epoch: 14 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:54,026-Speed 5660.10 samples/sec Loss 2.7747 LearningRate 0.0068 Epoch: 14 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:55,839-Speed 5650.24 samples/sec Loss 2.6565 LearningRate 0.0068 Epoch: 14 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:57,664-Speed 5612.97 samples/sec Loss 2.6472 LearningRate 0.0068 Epoch: 14 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:45:59,473-Speed 5660.26 samples/sec Loss 2.5819 LearningRate 0.0068 Epoch: 14 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:01,281-Speed 5667.19 samples/sec Loss 2.7385 LearningRate 0.0068 Epoch: 14 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:03,113-Speed 5589.58 samples/sec Loss 2.6308 LearningRate 0.0068 Epoch: 14 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:04,924-Speed 5657.00 samples/sec Loss 2.6345 LearningRate 0.0068 Epoch: 14 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:06,724-Speed 5690.78 samples/sec Loss 2.6060 LearningRate 0.0068 Epoch: 14 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:08,543-Speed 5630.46 samples/sec Loss 2.5064 LearningRate 0.0068 Epoch: 14 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:10,363-Speed 5628.45 samples/sec Loss 2.6954 LearningRate 0.0068 Epoch: 14 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:12,177-Speed 5646.77 samples/sec Loss 2.6785 LearningRate 0.0068 Epoch: 14 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:14,008-Speed 5594.30 samples/sec Loss 2.6318 LearningRate 0.0068 Epoch: 14 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:15,864-Speed 5518.06 samples/sec Loss 2.7280 LearningRate 0.0068 Epoch: 14 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:17,725-Speed 5506.68 samples/sec Loss 2.6775 LearningRate 0.0068 Epoch: 14 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:19,550-Speed 5614.26 samples/sec Loss 2.5960 LearningRate 0.0067 Epoch: 14 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:21,364-Speed 5644.85 samples/sec Loss 2.5573 LearningRate 0.0067 Epoch: 14 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:23,181-Speed 5638.56 samples/sec Loss 2.6570 LearningRate 0.0067 Epoch: 14 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:25,001-Speed 5626.66 samples/sec Loss 2.5749 LearningRate 0.0067 Epoch: 14 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:26,808-Speed 5669.39 samples/sec Loss 2.6012 LearningRate 0.0067 Epoch: 14 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:28,624-Speed 5644.80 samples/sec Loss 2.6836 LearningRate 0.0067 Epoch: 14 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:30,460-Speed 5578.14 samples/sec Loss 2.6388 LearningRate 0.0067 Epoch: 14 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:32,276-Speed 5639.48 samples/sec Loss 2.6885 LearningRate 0.0067 Epoch: 14 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:34,105-Speed 5600.17 samples/sec Loss 2.6109 LearningRate 0.0067 Epoch: 14 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:35,920-Speed 5645.60 samples/sec Loss 2.6135 LearningRate 0.0067 Epoch: 14 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:37,786-Speed 5491.27 samples/sec Loss 2.5886 LearningRate 0.0067 Epoch: 14 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:39,619-Speed 5588.14 samples/sec Loss 2.6574 LearningRate 0.0067 Epoch: 14 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:41,457-Speed 5572.10 samples/sec Loss 2.5803 LearningRate 0.0067 Epoch: 14 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:43,261-Speed 5679.56 samples/sec Loss 2.6717 LearningRate 0.0067 Epoch: 14 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:45,094-Speed 5587.22 samples/sec Loss 2.6070 LearningRate 0.0067 Epoch: 14 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:46,915-Speed 5624.38 samples/sec Loss 2.5739 LearningRate 0.0067 Epoch: 14 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:48,746-Speed 5593.56 samples/sec Loss 2.6426 LearningRate 0.0067 Epoch: 14 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:50,621-Speed 5463.23 samples/sec Loss 2.6037 LearningRate 0.0067 Epoch: 14 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:52,542-Speed 5334.06 samples/sec Loss 2.5716 LearningRate 0.0067 Epoch: 14 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:54,382-Speed 5566.73 samples/sec Loss 2.5081 LearningRate 0.0067 Epoch: 14 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:56,211-Speed 5600.03 samples/sec Loss 2.6336 LearningRate 0.0067 Epoch: 14 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:46:58,045-Speed 5589.11 samples/sec Loss 2.7026 LearningRate 0.0067 Epoch: 14 Global Step: 84390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:46:59,885-Speed 5566.55 samples/sec Loss 2.5572 LearningRate 0.0066 Epoch: 14 Global Step: 84400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:01,726-Speed 5564.05 samples/sec Loss 2.7601 LearningRate 0.0066 Epoch: 14 Global Step: 84410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:03,544-Speed 5633.04 samples/sec Loss 2.6320 LearningRate 0.0066 Epoch: 14 Global Step: 84420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:05,391-Speed 5545.36 samples/sec Loss 2.5367 LearningRate 0.0066 Epoch: 14 Global Step: 84430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:07,260-Speed 5480.12 samples/sec Loss 2.6715 LearningRate 0.0066 Epoch: 14 Global Step: 84440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:09,111-Speed 5536.03 samples/sec Loss 2.6428 LearningRate 0.0066 Epoch: 14 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:10,931-Speed 5628.22 samples/sec Loss 2.7021 LearningRate 0.0066 Epoch: 14 Global Step: 84460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:12,767-Speed 5577.65 samples/sec Loss 2.6363 LearningRate 0.0066 Epoch: 14 Global Step: 84470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:14,594-Speed 5608.74 samples/sec Loss 2.6622 LearningRate 0.0066 Epoch: 14 Global Step: 84480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:47:16,419-Speed 5611.56 samples/sec Loss 2.6062 LearningRate 0.0066 Epoch: 14 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:18,272-Speed 5529.36 samples/sec Loss 2.5801 LearningRate 0.0066 Epoch: 14 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:20,104-Speed 5589.79 samples/sec Loss 2.6283 LearningRate 0.0066 Epoch: 14 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:21,933-Speed 5601.20 samples/sec Loss 2.6897 LearningRate 0.0066 Epoch: 14 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:23,832-Speed 5394.16 samples/sec Loss 2.5790 LearningRate 0.0066 Epoch: 14 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:25,756-Speed 5324.09 samples/sec Loss 2.6156 LearningRate 0.0066 Epoch: 14 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:27,609-Speed 5529.68 samples/sec Loss 2.5165 LearningRate 0.0066 Epoch: 14 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:29,433-Speed 5614.99 samples/sec Loss 2.6698 LearningRate 0.0066 Epoch: 14 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:31,241-Speed 5664.25 samples/sec Loss 2.6042 LearningRate 0.0066 Epoch: 14 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:33,078-Speed 5576.64 samples/sec Loss 2.5851 LearningRate 0.0066 Epoch: 14 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:34,894-Speed 5641.45 samples/sec Loss 2.6226 LearningRate 0.0066 Epoch: 14 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:36,715-Speed 5625.73 samples/sec Loss 2.5954 LearningRate 0.0066 Epoch: 14 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:38,566-Speed 5534.44 samples/sec Loss 2.6095 LearningRate 0.0066 Epoch: 14 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:40,400-Speed 5583.93 samples/sec Loss 2.5809 LearningRate 0.0065 Epoch: 14 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:42,227-Speed 5608.59 samples/sec Loss 2.6671 LearningRate 0.0065 Epoch: 14 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:44,055-Speed 5602.89 samples/sec Loss 2.5965 LearningRate 0.0065 Epoch: 14 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:45,907-Speed 5532.39 samples/sec Loss 2.6509 LearningRate 0.0065 Epoch: 14 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:47,728-Speed 5625.08 samples/sec Loss 2.5883 LearningRate 0.0065 Epoch: 14 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:49,537-Speed 5661.72 samples/sec Loss 2.6662 LearningRate 0.0065 Epoch: 14 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:51,366-Speed 5600.53 samples/sec Loss 2.5802 LearningRate 0.0065 Epoch: 14 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:53,185-Speed 5629.08 samples/sec Loss 2.4683 LearningRate 0.0065 Epoch: 14 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:55,001-Speed 5640.91 samples/sec Loss 2.6148 LearningRate 0.0065 Epoch: 14 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:56,840-Speed 5572.20 samples/sec Loss 2.5323 LearningRate 0.0065 Epoch: 14 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:47:58,679-Speed 5567.96 samples/sec Loss 2.4876 LearningRate 0.0065 Epoch: 14 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:00,504-Speed 5614.06 samples/sec Loss 2.5191 LearningRate 0.0065 Epoch: 14 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:02,322-Speed 5633.80 samples/sec Loss 2.5679 LearningRate 0.0065 Epoch: 14 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:04,149-Speed 5608.55 samples/sec Loss 2.7268 LearningRate 0.0065 Epoch: 14 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:05,981-Speed 5591.70 samples/sec Loss 2.5880 LearningRate 0.0065 Epoch: 14 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:07,826-Speed 5549.70 samples/sec Loss 2.5985 LearningRate 0.0065 Epoch: 14 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:09,679-Speed 5529.99 samples/sec Loss 2.6411 LearningRate 0.0065 Epoch: 14 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:11,490-Speed 5656.01 samples/sec Loss 2.6510 LearningRate 0.0065 Epoch: 14 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:13,299-Speed 5660.28 samples/sec Loss 2.5470 LearningRate 0.0065 Epoch: 14 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:15,120-Speed 5628.23 samples/sec Loss 2.5932 LearningRate 0.0065 Epoch: 14 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:16,952-Speed 5589.08 samples/sec Loss 2.5257 LearningRate 0.0065 Epoch: 14 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:18,761-Speed 5663.34 samples/sec Loss 2.6099 LearningRate 0.0065 Epoch: 14 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:20,595-Speed 5584.46 samples/sec Loss 2.6484 LearningRate 0.0064 Epoch: 14 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:22,404-Speed 5661.79 samples/sec Loss 2.6330 LearningRate 0.0064 Epoch: 14 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:24,238-Speed 5587.85 samples/sec Loss 2.6793 LearningRate 0.0064 Epoch: 14 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:26,100-Speed 5500.09 samples/sec Loss 2.6867 LearningRate 0.0064 Epoch: 14 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:27,931-Speed 5595.04 samples/sec Loss 2.5743 LearningRate 0.0064 Epoch: 14 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:29,756-Speed 5611.04 samples/sec Loss 2.6052 LearningRate 0.0064 Epoch: 14 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:31,595-Speed 5572.41 samples/sec Loss 2.5374 LearningRate 0.0064 Epoch: 14 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:33,401-Speed 5671.04 samples/sec Loss 2.6160 LearningRate 0.0064 Epoch: 14 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:35,219-Speed 5633.50 samples/sec Loss 2.6342 LearningRate 0.0064 Epoch: 14 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:48:37,036-Speed 5638.78 samples/sec Loss 2.6894 LearningRate 0.0064 Epoch: 14 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:38,856-Speed 5628.12 samples/sec Loss 2.6456 LearningRate 0.0064 Epoch: 14 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:40,677-Speed 5623.43 samples/sec Loss 2.5845 LearningRate 0.0064 Epoch: 14 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:42,497-Speed 5629.56 samples/sec Loss 2.6143 LearningRate 0.0064 Epoch: 14 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:44,344-Speed 5546.69 samples/sec Loss 2.6411 LearningRate 0.0064 Epoch: 14 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:46,172-Speed 5603.41 samples/sec Loss 2.6651 LearningRate 0.0064 Epoch: 14 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:48,013-Speed 5564.95 samples/sec Loss 2.5621 LearningRate 0.0064 Epoch: 14 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:49,846-Speed 5586.81 samples/sec Loss 2.5248 LearningRate 0.0064 Epoch: 14 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:51,681-Speed 5582.58 samples/sec Loss 2.5623 LearningRate 0.0064 Epoch: 14 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:53,511-Speed 5598.37 samples/sec Loss 2.6380 LearningRate 0.0064 Epoch: 14 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:55,327-Speed 5640.55 samples/sec Loss 2.5419 LearningRate 0.0064 Epoch: 14 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:57,153-Speed 5607.27 samples/sec Loss 2.5507 LearningRate 0.0064 Epoch: 14 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:48:58,966-Speed 5651.46 samples/sec Loss 2.5960 LearningRate 0.0064 Epoch: 14 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:00,792-Speed 5609.15 samples/sec Loss 2.6111 LearningRate 0.0064 Epoch: 14 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:02,613-Speed 5624.40 samples/sec Loss 2.5768 LearningRate 0.0063 Epoch: 14 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:04,432-Speed 5633.65 samples/sec Loss 2.5767 LearningRate 0.0063 Epoch: 14 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:06,251-Speed 5630.00 samples/sec Loss 2.5588 LearningRate 0.0063 Epoch: 14 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:08,069-Speed 5634.07 samples/sec Loss 2.6024 LearningRate 0.0063 Epoch: 14 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:09,881-Speed 5655.23 samples/sec Loss 2.5219 LearningRate 0.0063 Epoch: 14 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:11,733-Speed 5531.81 samples/sec Loss 2.4864 LearningRate 0.0063 Epoch: 14 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:13,557-Speed 5615.16 samples/sec Loss 2.4748 LearningRate 0.0063 Epoch: 14 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:15,395-Speed 5571.83 samples/sec Loss 2.6555 LearningRate 0.0063 Epoch: 14 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:17,222-Speed 5608.87 samples/sec Loss 2.5302 LearningRate 0.0063 Epoch: 14 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:19,063-Speed 5563.55 samples/sec Loss 2.5971 LearningRate 0.0063 Epoch: 14 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:20,879-Speed 5638.09 samples/sec Loss 2.5704 LearningRate 0.0063 Epoch: 14 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:22,690-Speed 5655.99 samples/sec Loss 2.5099 LearningRate 0.0063 Epoch: 14 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:24,514-Speed 5616.38 samples/sec Loss 2.5158 LearningRate 0.0063 Epoch: 14 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:26,346-Speed 5592.21 samples/sec Loss 2.6176 LearningRate 0.0063 Epoch: 14 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:28,164-Speed 5632.82 samples/sec Loss 2.4966 LearningRate 0.0063 Epoch: 14 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:29,976-Speed 5653.85 samples/sec Loss 2.5662 LearningRate 0.0063 Epoch: 14 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:31,803-Speed 5608.87 samples/sec Loss 2.6640 LearningRate 0.0063 Epoch: 14 Global Step: 85230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:49:33,607-Speed 5677.45 samples/sec Loss 2.6414 LearningRate 0.0063 Epoch: 14 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:35,432-Speed 5613.60 samples/sec Loss 2.4453 LearningRate 0.0063 Epoch: 14 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:37,259-Speed 5607.83 samples/sec Loss 2.6367 LearningRate 0.0063 Epoch: 14 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:39,100-Speed 5564.58 samples/sec Loss 2.6090 LearningRate 0.0063 Epoch: 14 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:41,020-Speed 5332.76 samples/sec Loss 2.5434 LearningRate 0.0063 Epoch: 14 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:42,832-Speed 5653.43 samples/sec Loss 2.5866 LearningRate 0.0063 Epoch: 14 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:54,343-Speed 889.65 samples/sec Loss 2.0116 LearningRate 0.0062 Epoch: 15 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:56,310-Speed 5210.06 samples/sec Loss 1.9755 LearningRate 0.0062 Epoch: 15 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:49:58,163-Speed 5527.66 samples/sec Loss 2.0267 LearningRate 0.0062 Epoch: 15 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:00,003-Speed 5566.26 samples/sec Loss 2.0159 LearningRate 0.0062 Epoch: 15 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:01,838-Speed 5580.75 samples/sec Loss 1.9352 LearningRate 0.0062 Epoch: 15 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:03,676-Speed 5576.19 samples/sec Loss 2.0299 LearningRate 0.0062 Epoch: 15 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:05,495-Speed 5632.22 samples/sec Loss 2.0057 LearningRate 0.0062 Epoch: 15 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:07,311-Speed 5640.57 samples/sec Loss 1.8824 LearningRate 0.0062 Epoch: 15 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:09,142-Speed 5594.61 samples/sec Loss 1.9820 LearningRate 0.0062 Epoch: 15 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:10,971-Speed 5599.16 samples/sec Loss 1.9518 LearningRate 0.0062 Epoch: 15 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:12,814-Speed 5557.74 samples/sec Loss 1.8721 LearningRate 0.0062 Epoch: 15 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:14,735-Speed 5334.64 samples/sec Loss 2.0760 LearningRate 0.0062 Epoch: 15 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:16,596-Speed 5503.75 samples/sec Loss 1.9916 LearningRate 0.0062 Epoch: 15 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:18,431-Speed 5582.76 samples/sec Loss 2.0600 LearningRate 0.0062 Epoch: 15 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:20,251-Speed 5627.47 samples/sec Loss 1.9937 LearningRate 0.0062 Epoch: 15 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:22,090-Speed 5569.95 samples/sec Loss 1.9375 LearningRate 0.0062 Epoch: 15 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:23,919-Speed 5599.90 samples/sec Loss 2.0550 LearningRate 0.0062 Epoch: 15 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:25,745-Speed 5610.68 samples/sec Loss 2.0321 LearningRate 0.0062 Epoch: 15 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:27,598-Speed 5527.62 samples/sec Loss 2.1211 LearningRate 0.0062 Epoch: 15 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:29,419-Speed 5624.53 samples/sec Loss 2.0959 LearningRate 0.0062 Epoch: 15 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:31,241-Speed 5623.22 samples/sec Loss 2.0064 LearningRate 0.0062 Epoch: 15 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:33,075-Speed 5584.10 samples/sec Loss 1.9898 LearningRate 0.0062 Epoch: 15 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:34,903-Speed 5604.37 samples/sec Loss 2.0049 LearningRate 0.0061 Epoch: 15 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:36,846-Speed 5270.85 samples/sec Loss 2.1517 LearningRate 0.0061 Epoch: 15 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:38,686-Speed 5567.88 samples/sec Loss 1.9408 LearningRate 0.0061 Epoch: 15 Global Step: 85540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:50:40,499-Speed 5649.67 samples/sec Loss 2.0618 LearningRate 0.0061 Epoch: 15 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:42,340-Speed 5565.77 samples/sec Loss 1.9871 LearningRate 0.0061 Epoch: 15 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:44,181-Speed 5563.55 samples/sec Loss 1.9389 LearningRate 0.0061 Epoch: 15 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:46,001-Speed 5628.19 samples/sec Loss 2.0446 LearningRate 0.0061 Epoch: 15 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:47,825-Speed 5616.33 samples/sec Loss 2.0150 LearningRate 0.0061 Epoch: 15 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:49,663-Speed 5573.00 samples/sec Loss 2.0128 LearningRate 0.0061 Epoch: 15 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:51,506-Speed 5557.45 samples/sec Loss 1.9882 LearningRate 0.0061 Epoch: 15 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:53,355-Speed 5539.10 samples/sec Loss 2.1135 LearningRate 0.0061 Epoch: 15 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:55,182-Speed 5607.44 samples/sec Loss 1.9278 LearningRate 0.0061 Epoch: 15 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:57,025-Speed 5558.10 samples/sec Loss 1.9915 LearningRate 0.0061 Epoch: 15 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:50:58,869-Speed 5555.37 samples/sec Loss 2.0626 LearningRate 0.0061 Epoch: 15 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:00,711-Speed 5560.21 samples/sec Loss 2.0905 LearningRate 0.0061 Epoch: 15 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:02,546-Speed 5583.52 samples/sec Loss 2.2154 LearningRate 0.0061 Epoch: 15 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:04,388-Speed 5560.20 samples/sec Loss 1.9661 LearningRate 0.0061 Epoch: 15 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:06,211-Speed 5619.88 samples/sec Loss 2.0248 LearningRate 0.0061 Epoch: 15 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:08,054-Speed 5557.90 samples/sec Loss 2.0430 LearningRate 0.0061 Epoch: 15 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:09,867-Speed 5649.45 samples/sec Loss 2.0308 LearningRate 0.0061 Epoch: 15 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:11,695-Speed 5605.21 samples/sec Loss 2.0825 LearningRate 0.0061 Epoch: 15 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:13,554-Speed 5509.27 samples/sec Loss 2.1556 LearningRate 0.0061 Epoch: 15 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:15,373-Speed 5632.69 samples/sec Loss 2.0923 LearningRate 0.0061 Epoch: 15 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:17,181-Speed 5664.25 samples/sec Loss 1.9388 LearningRate 0.0060 Epoch: 15 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:19,014-Speed 5587.78 samples/sec Loss 1.9458 LearningRate 0.0060 Epoch: 15 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:20,840-Speed 5610.95 samples/sec Loss 2.0412 LearningRate 0.0060 Epoch: 15 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:22,674-Speed 5586.08 samples/sec Loss 2.0337 LearningRate 0.0060 Epoch: 15 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:24,510-Speed 5578.71 samples/sec Loss 2.0914 LearningRate 0.0060 Epoch: 15 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:26,329-Speed 5633.97 samples/sec Loss 1.9600 LearningRate 0.0060 Epoch: 15 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:28,157-Speed 5603.11 samples/sec Loss 2.1200 LearningRate 0.0060 Epoch: 15 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:30,000-Speed 5558.04 samples/sec Loss 2.0049 LearningRate 0.0060 Epoch: 15 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:31,859-Speed 5508.92 samples/sec Loss 2.0686 LearningRate 0.0060 Epoch: 15 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:33,677-Speed 5637.45 samples/sec Loss 2.0484 LearningRate 0.0060 Epoch: 15 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:35,500-Speed 5619.33 samples/sec Loss 2.0797 LearningRate 0.0060 Epoch: 15 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:37,323-Speed 5618.72 samples/sec Loss 2.0786 LearningRate 0.0060 Epoch: 15 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:51:39,138-Speed 5643.46 samples/sec Loss 2.0327 LearningRate 0.0060 Epoch: 15 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:40,972-Speed 5582.89 samples/sec Loss 2.1208 LearningRate 0.0060 Epoch: 15 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:42,799-Speed 5608.83 samples/sec Loss 2.0279 LearningRate 0.0060 Epoch: 15 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:44,614-Speed 5642.24 samples/sec Loss 2.0078 LearningRate 0.0060 Epoch: 15 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:46,433-Speed 5631.13 samples/sec Loss 2.1016 LearningRate 0.0060 Epoch: 15 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:48,242-Speed 5663.93 samples/sec Loss 2.0810 LearningRate 0.0060 Epoch: 15 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:50,072-Speed 5596.32 samples/sec Loss 2.1386 LearningRate 0.0060 Epoch: 15 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:51,887-Speed 5656.12 samples/sec Loss 2.1297 LearningRate 0.0060 Epoch: 15 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:53,711-Speed 5616.75 samples/sec Loss 2.1668 LearningRate 0.0060 Epoch: 15 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:55,534-Speed 5620.10 samples/sec Loss 2.1806 LearningRate 0.0060 Epoch: 15 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:51:57,358-Speed 5616.45 samples/sec Loss 2.0848 LearningRate 0.0060 Epoch: 15 Global Step: 85970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:51:59,175-Speed 5636.50 samples/sec Loss 2.0757 LearningRate 0.0060 Epoch: 15 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:52:01,003-Speed 5602.65 samples/sec Loss 2.1096 LearningRate 0.0059 Epoch: 15 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:52:02,823-Speed 5628.43 samples/sec Loss 2.1358 LearningRate 0.0059 Epoch: 15 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:52:28,877-[lfw][86000]XNorm: 22.929928 Training: 2022-04-27 06:52:28,877-[lfw][86000]Accuracy-Flip: 0.99767+-0.00249 Training: 2022-04-27 06:52:28,878-[lfw][86000]Accuracy-Highest: 0.99800 Training: 2022-04-27 06:52:59,058-[cfp_fp][86000]XNorm: 21.044778 Training: 2022-04-27 06:52:59,058-[cfp_fp][86000]Accuracy-Flip: 0.97357+-0.00806 Training: 2022-04-27 06:52:59,059-[cfp_fp][86000]Accuracy-Highest: 0.97357 Training: 2022-04-27 06:53:25,118-[agedb_30][86000]XNorm: 22.775292 Training: 2022-04-27 06:53:25,119-[agedb_30][86000]Accuracy-Flip: 0.97967+-0.00802 Training: 2022-04-27 06:53:25,119-[agedb_30][86000]Accuracy-Highest: 0.98117 Training: 2022-04-27 06:53:26,976-Speed 121.69 samples/sec Loss 2.0402 LearningRate 0.0059 Epoch: 15 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:28,787-Speed 5655.39 samples/sec Loss 2.1865 LearningRate 0.0059 Epoch: 15 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:30,592-Speed 5673.03 samples/sec Loss 2.0303 LearningRate 0.0059 Epoch: 15 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:32,408-Speed 5642.07 samples/sec Loss 2.0746 LearningRate 0.0059 Epoch: 15 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:34,224-Speed 5641.36 samples/sec Loss 1.9870 LearningRate 0.0059 Epoch: 15 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:36,048-Speed 5614.25 samples/sec Loss 2.0376 LearningRate 0.0059 Epoch: 15 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:37,862-Speed 5647.34 samples/sec Loss 2.0254 LearningRate 0.0059 Epoch: 15 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:39,658-Speed 5703.03 samples/sec Loss 2.1255 LearningRate 0.0059 Epoch: 15 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:41,472-Speed 5647.96 samples/sec Loss 2.0680 LearningRate 0.0059 Epoch: 15 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:43,284-Speed 5653.14 samples/sec Loss 2.1092 LearningRate 0.0059 Epoch: 15 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:45,110-Speed 5609.40 samples/sec Loss 2.1103 LearningRate 0.0059 Epoch: 15 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:46,920-Speed 5659.60 samples/sec Loss 2.1186 LearningRate 0.0059 Epoch: 15 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:48,726-Speed 5671.18 samples/sec Loss 2.1459 LearningRate 0.0059 Epoch: 15 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:50,544-Speed 5633.07 samples/sec Loss 2.0053 LearningRate 0.0059 Epoch: 15 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:52,368-Speed 5616.29 samples/sec Loss 2.0870 LearningRate 0.0059 Epoch: 15 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:54,217-Speed 5540.83 samples/sec Loss 2.1051 LearningRate 0.0059 Epoch: 15 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:56,059-Speed 5561.72 samples/sec Loss 2.1185 LearningRate 0.0059 Epoch: 15 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:53:57,877-Speed 5633.94 samples/sec Loss 2.0779 LearningRate 0.0059 Epoch: 15 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:53:59,703-Speed 5608.37 samples/sec Loss 2.1111 LearningRate 0.0059 Epoch: 15 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:01,546-Speed 5560.81 samples/sec Loss 2.1781 LearningRate 0.0059 Epoch: 15 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:03,379-Speed 5587.78 samples/sec Loss 2.1638 LearningRate 0.0059 Epoch: 15 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:05,215-Speed 5577.28 samples/sec Loss 2.1265 LearningRate 0.0058 Epoch: 15 Global Step: 86220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:07,032-Speed 5638.15 samples/sec Loss 2.0110 LearningRate 0.0058 Epoch: 15 Global Step: 86230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:08,861-Speed 5600.74 samples/sec Loss 2.1130 LearningRate 0.0058 Epoch: 15 Global Step: 86240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:10,696-Speed 5583.63 samples/sec Loss 2.0459 LearningRate 0.0058 Epoch: 15 Global Step: 86250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:12,514-Speed 5633.82 samples/sec Loss 2.2006 LearningRate 0.0058 Epoch: 15 Global Step: 86260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:14,329-Speed 5644.80 samples/sec Loss 2.1895 LearningRate 0.0058 Epoch: 15 Global Step: 86270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:54:16,146-Speed 5637.95 samples/sec Loss 2.1463 LearningRate 0.0058 Epoch: 15 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:17,977-Speed 5592.10 samples/sec Loss 2.1214 LearningRate 0.0058 Epoch: 15 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:19,802-Speed 5615.06 samples/sec Loss 2.1382 LearningRate 0.0058 Epoch: 15 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:21,626-Speed 5614.26 samples/sec Loss 2.1494 LearningRate 0.0058 Epoch: 15 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:23,463-Speed 5575.73 samples/sec Loss 2.1444 LearningRate 0.0058 Epoch: 15 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:25,316-Speed 5527.53 samples/sec Loss 2.1491 LearningRate 0.0058 Epoch: 15 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:27,154-Speed 5575.17 samples/sec Loss 2.2616 LearningRate 0.0058 Epoch: 15 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:28,986-Speed 5592.92 samples/sec Loss 2.1763 LearningRate 0.0058 Epoch: 15 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:30,853-Speed 5486.52 samples/sec Loss 2.1084 LearningRate 0.0058 Epoch: 15 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:32,676-Speed 5618.50 samples/sec Loss 2.1854 LearningRate 0.0058 Epoch: 15 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:34,478-Speed 5684.68 samples/sec Loss 2.2200 LearningRate 0.0058 Epoch: 15 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:36,329-Speed 5532.71 samples/sec Loss 2.1528 LearningRate 0.0058 Epoch: 15 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:38,159-Speed 5598.56 samples/sec Loss 2.0905 LearningRate 0.0058 Epoch: 15 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:39,989-Speed 5599.80 samples/sec Loss 2.1203 LearningRate 0.0058 Epoch: 15 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:41,859-Speed 5475.49 samples/sec Loss 2.1240 LearningRate 0.0058 Epoch: 15 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:43,785-Speed 5319.51 samples/sec Loss 2.1421 LearningRate 0.0058 Epoch: 15 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:45,607-Speed 5620.84 samples/sec Loss 2.0460 LearningRate 0.0058 Epoch: 15 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:47,478-Speed 5475.88 samples/sec Loss 2.1705 LearningRate 0.0058 Epoch: 15 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:49,364-Speed 5433.30 samples/sec Loss 1.9865 LearningRate 0.0057 Epoch: 15 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:51,216-Speed 5530.27 samples/sec Loss 2.1136 LearningRate 0.0057 Epoch: 15 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:53,131-Speed 5350.57 samples/sec Loss 2.2682 LearningRate 0.0057 Epoch: 15 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:55,015-Speed 5437.84 samples/sec Loss 2.1597 LearningRate 0.0057 Epoch: 15 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:56,854-Speed 5569.62 samples/sec Loss 2.1872 LearningRate 0.0057 Epoch: 15 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:54:58,674-Speed 5627.71 samples/sec Loss 2.2565 LearningRate 0.0057 Epoch: 15 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:00,515-Speed 5563.09 samples/sec Loss 2.1679 LearningRate 0.0057 Epoch: 15 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:02,396-Speed 5446.62 samples/sec Loss 2.2005 LearningRate 0.0057 Epoch: 15 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:04,204-Speed 5666.25 samples/sec Loss 2.0462 LearningRate 0.0057 Epoch: 15 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:06,018-Speed 5645.04 samples/sec Loss 2.1116 LearningRate 0.0057 Epoch: 15 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:07,857-Speed 5570.55 samples/sec Loss 2.1363 LearningRate 0.0057 Epoch: 15 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:09,693-Speed 5579.09 samples/sec Loss 2.2602 LearningRate 0.0057 Epoch: 15 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:11,515-Speed 5623.46 samples/sec Loss 2.1223 LearningRate 0.0057 Epoch: 15 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:13,358-Speed 5557.30 samples/sec Loss 2.1748 LearningRate 0.0057 Epoch: 15 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:15,196-Speed 5573.19 samples/sec Loss 2.2249 LearningRate 0.0057 Epoch: 15 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:17,021-Speed 5612.82 samples/sec Loss 2.1894 LearningRate 0.0057 Epoch: 15 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:18,852-Speed 5595.89 samples/sec Loss 2.1932 LearningRate 0.0057 Epoch: 15 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:20,671-Speed 5632.04 samples/sec Loss 2.1664 LearningRate 0.0057 Epoch: 15 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:22,509-Speed 5571.20 samples/sec Loss 2.0942 LearningRate 0.0057 Epoch: 15 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:24,324-Speed 5645.17 samples/sec Loss 2.1823 LearningRate 0.0057 Epoch: 15 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:26,172-Speed 5541.59 samples/sec Loss 2.1287 LearningRate 0.0057 Epoch: 15 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:27,988-Speed 5640.49 samples/sec Loss 2.2125 LearningRate 0.0057 Epoch: 15 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:29,797-Speed 5662.07 samples/sec Loss 2.0875 LearningRate 0.0057 Epoch: 15 Global Step: 86680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:55:31,596-Speed 5696.41 samples/sec Loss 2.2101 LearningRate 0.0056 Epoch: 15 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:33,417-Speed 5623.02 samples/sec Loss 2.1897 LearningRate 0.0056 Epoch: 15 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:35,235-Speed 5634.59 samples/sec Loss 2.1950 LearningRate 0.0056 Epoch: 15 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:37,067-Speed 5592.12 samples/sec Loss 2.1172 LearningRate 0.0056 Epoch: 15 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:38,875-Speed 5667.48 samples/sec Loss 2.1656 LearningRate 0.0056 Epoch: 15 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:40,682-Speed 5666.28 samples/sec Loss 2.1912 LearningRate 0.0056 Epoch: 15 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:42,495-Speed 5650.66 samples/sec Loss 2.1388 LearningRate 0.0056 Epoch: 15 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:44,322-Speed 5605.45 samples/sec Loss 2.1590 LearningRate 0.0056 Epoch: 15 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:46,164-Speed 5562.60 samples/sec Loss 2.1299 LearningRate 0.0056 Epoch: 15 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:48,005-Speed 5563.55 samples/sec Loss 2.0321 LearningRate 0.0056 Epoch: 15 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:49,818-Speed 5647.92 samples/sec Loss 2.1530 LearningRate 0.0056 Epoch: 15 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:51,646-Speed 5606.74 samples/sec Loss 2.1921 LearningRate 0.0056 Epoch: 15 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:53,453-Speed 5667.28 samples/sec Loss 2.1252 LearningRate 0.0056 Epoch: 15 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:55,268-Speed 5644.68 samples/sec Loss 2.1614 LearningRate 0.0056 Epoch: 15 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:57,097-Speed 5600.83 samples/sec Loss 2.2855 LearningRate 0.0056 Epoch: 15 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:55:58,916-Speed 5630.86 samples/sec Loss 2.1968 LearningRate 0.0056 Epoch: 15 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:00,746-Speed 5599.69 samples/sec Loss 2.0245 LearningRate 0.0056 Epoch: 15 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:02,555-Speed 5659.73 samples/sec Loss 2.2508 LearningRate 0.0056 Epoch: 15 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:04,371-Speed 5642.21 samples/sec Loss 2.2915 LearningRate 0.0056 Epoch: 15 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:06,235-Speed 5493.86 samples/sec Loss 2.1983 LearningRate 0.0056 Epoch: 15 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:08,061-Speed 5609.85 samples/sec Loss 2.2859 LearningRate 0.0056 Epoch: 15 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:09,876-Speed 5643.76 samples/sec Loss 2.1532 LearningRate 0.0056 Epoch: 15 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:11,704-Speed 5605.31 samples/sec Loss 2.1817 LearningRate 0.0056 Epoch: 15 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:13,539-Speed 5580.22 samples/sec Loss 2.2524 LearningRate 0.0056 Epoch: 15 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:15,350-Speed 5657.04 samples/sec Loss 2.1999 LearningRate 0.0055 Epoch: 15 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:17,169-Speed 5632.95 samples/sec Loss 2.1122 LearningRate 0.0055 Epoch: 15 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:18,995-Speed 5610.04 samples/sec Loss 2.2064 LearningRate 0.0055 Epoch: 15 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:20,818-Speed 5619.18 samples/sec Loss 2.1783 LearningRate 0.0055 Epoch: 15 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:22,628-Speed 5657.32 samples/sec Loss 2.1267 LearningRate 0.0055 Epoch: 15 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:24,448-Speed 5628.43 samples/sec Loss 2.0952 LearningRate 0.0055 Epoch: 15 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:26,246-Speed 5698.55 samples/sec Loss 2.2614 LearningRate 0.0055 Epoch: 15 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:28,064-Speed 5633.63 samples/sec Loss 2.1687 LearningRate 0.0055 Epoch: 15 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:29,885-Speed 5625.31 samples/sec Loss 2.1627 LearningRate 0.0055 Epoch: 15 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:31,730-Speed 5554.51 samples/sec Loss 2.2317 LearningRate 0.0055 Epoch: 15 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:33,568-Speed 5573.35 samples/sec Loss 2.0718 LearningRate 0.0055 Epoch: 15 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:35,486-Speed 5339.03 samples/sec Loss 2.1591 LearningRate 0.0055 Epoch: 15 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:37,377-Speed 5418.84 samples/sec Loss 2.1902 LearningRate 0.0055 Epoch: 15 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:39,278-Speed 5389.12 samples/sec Loss 2.2289 LearningRate 0.0055 Epoch: 15 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:41,214-Speed 5289.92 samples/sec Loss 2.1828 LearningRate 0.0055 Epoch: 15 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:43,062-Speed 5544.65 samples/sec Loss 2.1686 LearningRate 0.0055 Epoch: 15 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:44,904-Speed 5560.96 samples/sec Loss 2.2984 LearningRate 0.0055 Epoch: 15 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:46,733-Speed 5598.45 samples/sec Loss 2.2052 LearningRate 0.0055 Epoch: 15 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:48,600-Speed 5487.88 samples/sec Loss 2.1936 LearningRate 0.0055 Epoch: 15 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:50,421-Speed 5622.60 samples/sec Loss 2.0682 LearningRate 0.0055 Epoch: 15 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:52,251-Speed 5600.27 samples/sec Loss 2.1940 LearningRate 0.0055 Epoch: 15 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:54,084-Speed 5588.12 samples/sec Loss 2.2478 LearningRate 0.0055 Epoch: 15 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:55,977-Speed 5411.26 samples/sec Loss 2.2196 LearningRate 0.0055 Epoch: 15 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:57,887-Speed 5362.03 samples/sec Loss 2.2099 LearningRate 0.0055 Epoch: 15 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:56:59,720-Speed 5587.99 samples/sec Loss 2.1208 LearningRate 0.0055 Epoch: 15 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:01,538-Speed 5635.80 samples/sec Loss 2.1677 LearningRate 0.0054 Epoch: 15 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:03,348-Speed 5660.14 samples/sec Loss 2.2846 LearningRate 0.0054 Epoch: 15 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:05,158-Speed 5659.48 samples/sec Loss 2.3086 LearningRate 0.0054 Epoch: 15 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:06,981-Speed 5617.54 samples/sec Loss 2.2349 LearningRate 0.0054 Epoch: 15 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:08,820-Speed 5572.75 samples/sec Loss 2.2294 LearningRate 0.0054 Epoch: 15 Global Step: 87220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:10,645-Speed 5612.86 samples/sec Loss 2.1731 LearningRate 0.0054 Epoch: 15 Global Step: 87230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:12,460-Speed 5643.43 samples/sec Loss 2.1518 LearningRate 0.0054 Epoch: 15 Global Step: 87240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:14,278-Speed 5631.48 samples/sec Loss 2.2446 LearningRate 0.0054 Epoch: 15 Global Step: 87250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:16,096-Speed 5636.39 samples/sec Loss 2.1200 LearningRate 0.0054 Epoch: 15 Global Step: 87260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:17,917-Speed 5623.84 samples/sec Loss 2.1965 LearningRate 0.0054 Epoch: 15 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:19,730-Speed 5648.91 samples/sec Loss 2.2119 LearningRate 0.0054 Epoch: 15 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:21,567-Speed 5576.66 samples/sec Loss 2.1911 LearningRate 0.0054 Epoch: 15 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:23,397-Speed 5600.90 samples/sec Loss 2.2308 LearningRate 0.0054 Epoch: 15 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:25,205-Speed 5663.39 samples/sec Loss 2.1876 LearningRate 0.0054 Epoch: 15 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:27,010-Speed 5675.37 samples/sec Loss 2.1769 LearningRate 0.0054 Epoch: 15 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:28,840-Speed 5597.89 samples/sec Loss 2.2806 LearningRate 0.0054 Epoch: 15 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:30,688-Speed 5543.68 samples/sec Loss 2.0566 LearningRate 0.0054 Epoch: 15 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:32,501-Speed 5649.86 samples/sec Loss 2.1866 LearningRate 0.0054 Epoch: 15 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:34,326-Speed 5612.56 samples/sec Loss 2.1388 LearningRate 0.0054 Epoch: 15 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:36,157-Speed 5593.10 samples/sec Loss 2.1340 LearningRate 0.0054 Epoch: 15 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:37,967-Speed 5660.71 samples/sec Loss 2.3084 LearningRate 0.0054 Epoch: 15 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:39,789-Speed 5621.79 samples/sec Loss 2.1089 LearningRate 0.0054 Epoch: 15 Global Step: 87390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:57:41,629-Speed 5569.01 samples/sec Loss 2.1403 LearningRate 0.0054 Epoch: 15 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:43,483-Speed 5524.90 samples/sec Loss 2.1820 LearningRate 0.0054 Epoch: 15 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:45,309-Speed 5607.91 samples/sec Loss 2.1366 LearningRate 0.0053 Epoch: 15 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:47,162-Speed 5529.07 samples/sec Loss 2.3236 LearningRate 0.0053 Epoch: 15 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:48,971-Speed 5662.01 samples/sec Loss 2.2402 LearningRate 0.0053 Epoch: 15 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:50,794-Speed 5619.83 samples/sec Loss 2.1626 LearningRate 0.0053 Epoch: 15 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:52,618-Speed 5616.34 samples/sec Loss 2.1508 LearningRate 0.0053 Epoch: 15 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:54,433-Speed 5644.51 samples/sec Loss 2.2310 LearningRate 0.0053 Epoch: 15 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:56,248-Speed 5642.38 samples/sec Loss 2.1042 LearningRate 0.0053 Epoch: 15 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:58,083-Speed 5582.48 samples/sec Loss 2.1765 LearningRate 0.0053 Epoch: 15 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:57:59,898-Speed 5645.22 samples/sec Loss 2.2630 LearningRate 0.0053 Epoch: 15 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:01,725-Speed 5605.06 samples/sec Loss 2.2741 LearningRate 0.0053 Epoch: 15 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:03,556-Speed 5594.84 samples/sec Loss 2.1954 LearningRate 0.0053 Epoch: 15 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:05,381-Speed 5613.16 samples/sec Loss 2.1853 LearningRate 0.0053 Epoch: 15 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:07,217-Speed 5578.45 samples/sec Loss 2.1319 LearningRate 0.0053 Epoch: 15 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:09,065-Speed 5544.26 samples/sec Loss 2.1235 LearningRate 0.0053 Epoch: 15 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:10,895-Speed 5598.22 samples/sec Loss 2.2071 LearningRate 0.0053 Epoch: 15 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:12,723-Speed 5604.46 samples/sec Loss 2.1983 LearningRate 0.0053 Epoch: 15 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:14,549-Speed 5609.85 samples/sec Loss 2.1703 LearningRate 0.0053 Epoch: 15 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:16,372-Speed 5615.88 samples/sec Loss 2.2274 LearningRate 0.0053 Epoch: 15 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:18,183-Speed 5656.67 samples/sec Loss 2.2079 LearningRate 0.0053 Epoch: 15 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:20,007-Speed 5615.83 samples/sec Loss 2.2568 LearningRate 0.0053 Epoch: 15 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:21,825-Speed 5636.94 samples/sec Loss 2.1241 LearningRate 0.0053 Epoch: 15 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:23,652-Speed 5606.07 samples/sec Loss 2.1898 LearningRate 0.0053 Epoch: 15 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:25,472-Speed 5626.32 samples/sec Loss 2.1691 LearningRate 0.0053 Epoch: 15 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:27,284-Speed 5653.28 samples/sec Loss 2.1492 LearningRate 0.0053 Epoch: 15 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:29,102-Speed 5634.08 samples/sec Loss 2.2339 LearningRate 0.0053 Epoch: 15 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:30,910-Speed 5667.73 samples/sec Loss 2.2902 LearningRate 0.0052 Epoch: 15 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:32,756-Speed 5548.21 samples/sec Loss 2.1500 LearningRate 0.0052 Epoch: 15 Global Step: 87680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:34,569-Speed 5649.97 samples/sec Loss 2.1545 LearningRate 0.0052 Epoch: 15 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:36,384-Speed 5645.85 samples/sec Loss 2.2021 LearningRate 0.0052 Epoch: 15 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:38,198-Speed 5646.09 samples/sec Loss 2.0105 LearningRate 0.0052 Epoch: 15 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:40,027-Speed 5599.71 samples/sec Loss 2.2100 LearningRate 0.0052 Epoch: 15 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:41,852-Speed 5615.26 samples/sec Loss 2.1805 LearningRate 0.0052 Epoch: 15 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:43,664-Speed 5650.65 samples/sec Loss 2.2757 LearningRate 0.0052 Epoch: 15 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:45,470-Speed 5673.24 samples/sec Loss 2.2585 LearningRate 0.0052 Epoch: 15 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:47,300-Speed 5598.12 samples/sec Loss 2.1485 LearningRate 0.0052 Epoch: 15 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:49,114-Speed 5643.91 samples/sec Loss 2.2281 LearningRate 0.0052 Epoch: 15 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:50,987-Speed 5469.85 samples/sec Loss 2.1316 LearningRate 0.0052 Epoch: 15 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:52,817-Speed 5599.58 samples/sec Loss 2.1647 LearningRate 0.0052 Epoch: 15 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:54,643-Speed 5608.66 samples/sec Loss 2.1616 LearningRate 0.0052 Epoch: 15 Global Step: 87800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 06:58:56,445-Speed 5684.99 samples/sec Loss 2.2761 LearningRate 0.0052 Epoch: 15 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:58:58,276-Speed 5594.82 samples/sec Loss 2.2536 LearningRate 0.0052 Epoch: 15 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:00,087-Speed 5655.34 samples/sec Loss 2.2605 LearningRate 0.0052 Epoch: 15 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:01,915-Speed 5603.91 samples/sec Loss 2.2484 LearningRate 0.0052 Epoch: 15 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:03,730-Speed 5645.49 samples/sec Loss 2.2407 LearningRate 0.0052 Epoch: 15 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:05,546-Speed 5638.85 samples/sec Loss 2.1837 LearningRate 0.0052 Epoch: 15 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:07,371-Speed 5612.90 samples/sec Loss 2.2447 LearningRate 0.0052 Epoch: 15 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:09,198-Speed 5606.02 samples/sec Loss 2.0300 LearningRate 0.0052 Epoch: 15 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:11,043-Speed 5552.29 samples/sec Loss 2.2141 LearningRate 0.0052 Epoch: 15 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:12,867-Speed 5617.78 samples/sec Loss 2.2641 LearningRate 0.0052 Epoch: 15 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:14,674-Speed 5667.81 samples/sec Loss 2.2807 LearningRate 0.0052 Epoch: 15 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:16,501-Speed 5607.74 samples/sec Loss 2.1397 LearningRate 0.0051 Epoch: 15 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:18,316-Speed 5642.91 samples/sec Loss 2.2066 LearningRate 0.0051 Epoch: 15 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:20,156-Speed 5567.46 samples/sec Loss 2.1846 LearningRate 0.0051 Epoch: 15 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:21,992-Speed 5579.00 samples/sec Loss 2.1123 LearningRate 0.0051 Epoch: 15 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:23,823-Speed 5593.92 samples/sec Loss 2.1423 LearningRate 0.0051 Epoch: 15 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:25,655-Speed 5592.28 samples/sec Loss 2.1719 LearningRate 0.0051 Epoch: 15 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:27,533-Speed 5453.76 samples/sec Loss 2.1825 LearningRate 0.0051 Epoch: 15 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:29,345-Speed 5653.70 samples/sec Loss 2.2140 LearningRate 0.0051 Epoch: 15 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 06:59:31,147-Speed 5685.52 samples/sec Loss 2.2411 LearningRate 0.0051 Epoch: 15 Global Step: 88000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 06:59:57,399-[lfw][88000]XNorm: 22.511046 Training: 2022-04-27 06:59:57,399-[lfw][88000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-27 06:59:57,400-[lfw][88000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:00:27,907-[cfp_fp][88000]XNorm: 20.966420 Training: 2022-04-27 07:00:27,907-[cfp_fp][88000]Accuracy-Flip: 0.97357+-0.00754 Training: 2022-04-27 07:00:27,908-[cfp_fp][88000]Accuracy-Highest: 0.97357 Training: 2022-04-27 07:00:54,188-[agedb_30][88000]XNorm: 22.440152 Training: 2022-04-27 07:00:54,189-[agedb_30][88000]Accuracy-Flip: 0.97967+-0.00980 Training: 2022-04-27 07:00:54,189-[agedb_30][88000]Accuracy-Highest: 0.98117 Training: 2022-04-27 07:00:56,043-Speed 120.62 samples/sec Loss 2.0603 LearningRate 0.0051 Epoch: 15 Global Step: 88010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:00:57,958-Speed 5349.67 samples/sec Loss 2.2235 LearningRate 0.0051 Epoch: 15 Global Step: 88020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:00:59,797-Speed 5568.87 samples/sec Loss 2.0918 LearningRate 0.0051 Epoch: 15 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:01,606-Speed 5663.36 samples/sec Loss 2.1937 LearningRate 0.0051 Epoch: 15 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:03,418-Speed 5651.94 samples/sec Loss 2.2387 LearningRate 0.0051 Epoch: 15 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:05,224-Speed 5670.78 samples/sec Loss 2.2703 LearningRate 0.0051 Epoch: 15 Global Step: 88060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:07,026-Speed 5684.89 samples/sec Loss 2.1501 LearningRate 0.0051 Epoch: 15 Global Step: 88070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:08,871-Speed 5551.36 samples/sec Loss 2.2396 LearningRate 0.0051 Epoch: 15 Global Step: 88080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:10,677-Speed 5674.31 samples/sec Loss 2.1074 LearningRate 0.0051 Epoch: 15 Global Step: 88090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:01:12,488-Speed 5656.19 samples/sec Loss 2.3323 LearningRate 0.0051 Epoch: 15 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:14,308-Speed 5627.00 samples/sec Loss 2.1219 LearningRate 0.0051 Epoch: 15 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:16,129-Speed 5623.53 samples/sec Loss 2.2364 LearningRate 0.0051 Epoch: 15 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:17,936-Speed 5671.04 samples/sec Loss 2.1872 LearningRate 0.0051 Epoch: 15 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:19,756-Speed 5628.72 samples/sec Loss 2.1690 LearningRate 0.0051 Epoch: 15 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:21,582-Speed 5609.21 samples/sec Loss 2.1099 LearningRate 0.0051 Epoch: 15 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:23,436-Speed 5525.95 samples/sec Loss 2.0679 LearningRate 0.0051 Epoch: 15 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:25,288-Speed 5530.41 samples/sec Loss 2.2667 LearningRate 0.0050 Epoch: 15 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:27,128-Speed 5568.26 samples/sec Loss 2.0958 LearningRate 0.0050 Epoch: 15 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:28,956-Speed 5600.60 samples/sec Loss 2.2113 LearningRate 0.0050 Epoch: 15 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:30,785-Speed 5602.58 samples/sec Loss 2.1065 LearningRate 0.0050 Epoch: 15 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:32,655-Speed 5477.98 samples/sec Loss 2.1935 LearningRate 0.0050 Epoch: 15 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:34,464-Speed 5660.13 samples/sec Loss 2.2453 LearningRate 0.0050 Epoch: 15 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:36,294-Speed 5597.77 samples/sec Loss 2.2137 LearningRate 0.0050 Epoch: 15 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:38,126-Speed 5592.70 samples/sec Loss 2.3070 LearningRate 0.0050 Epoch: 15 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:39,968-Speed 5561.08 samples/sec Loss 2.2106 LearningRate 0.0050 Epoch: 15 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:41,794-Speed 5608.89 samples/sec Loss 2.2055 LearningRate 0.0050 Epoch: 15 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:43,613-Speed 5630.62 samples/sec Loss 2.2088 LearningRate 0.0050 Epoch: 15 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:45,445-Speed 5591.58 samples/sec Loss 2.0566 LearningRate 0.0050 Epoch: 15 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:47,266-Speed 5625.75 samples/sec Loss 2.1702 LearningRate 0.0050 Epoch: 15 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:49,078-Speed 5653.34 samples/sec Loss 2.2700 LearningRate 0.0050 Epoch: 15 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:51,009-Speed 5304.67 samples/sec Loss 2.2093 LearningRate 0.0050 Epoch: 15 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:52,944-Speed 5294.11 samples/sec Loss 2.1890 LearningRate 0.0050 Epoch: 15 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:54,778-Speed 5586.40 samples/sec Loss 2.2527 LearningRate 0.0050 Epoch: 15 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:56,612-Speed 5584.94 samples/sec Loss 2.0351 LearningRate 0.0050 Epoch: 15 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:01:58,448-Speed 5578.16 samples/sec Loss 2.1924 LearningRate 0.0050 Epoch: 15 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:00,299-Speed 5534.43 samples/sec Loss 2.1343 LearningRate 0.0050 Epoch: 15 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:02,141-Speed 5559.72 samples/sec Loss 2.2169 LearningRate 0.0050 Epoch: 15 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:04,048-Speed 5373.71 samples/sec Loss 2.1402 LearningRate 0.0050 Epoch: 15 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:05,896-Speed 5542.99 samples/sec Loss 2.2397 LearningRate 0.0050 Epoch: 15 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:07,694-Speed 5695.43 samples/sec Loss 2.1295 LearningRate 0.0050 Epoch: 15 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:09,512-Speed 5638.37 samples/sec Loss 2.2286 LearningRate 0.0050 Epoch: 15 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:11,397-Speed 5434.64 samples/sec Loss 2.2194 LearningRate 0.0049 Epoch: 15 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:13,229-Speed 5590.06 samples/sec Loss 2.2124 LearningRate 0.0049 Epoch: 15 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:15,060-Speed 5595.92 samples/sec Loss 2.0515 LearningRate 0.0049 Epoch: 15 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:16,874-Speed 5644.65 samples/sec Loss 2.1393 LearningRate 0.0049 Epoch: 15 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:18,687-Speed 5651.87 samples/sec Loss 2.2354 LearningRate 0.0049 Epoch: 15 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:20,495-Speed 5666.00 samples/sec Loss 2.2680 LearningRate 0.0049 Epoch: 15 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:22,305-Speed 5657.46 samples/sec Loss 2.2333 LearningRate 0.0049 Epoch: 15 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:24,121-Speed 5639.99 samples/sec Loss 2.1917 LearningRate 0.0049 Epoch: 15 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:25,925-Speed 5680.26 samples/sec Loss 2.1831 LearningRate 0.0049 Epoch: 15 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:27,736-Speed 5654.94 samples/sec Loss 2.2320 LearningRate 0.0049 Epoch: 15 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:29,553-Speed 5640.02 samples/sec Loss 2.1810 LearningRate 0.0049 Epoch: 15 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:31,373-Speed 5629.13 samples/sec Loss 2.2225 LearningRate 0.0049 Epoch: 15 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:33,206-Speed 5587.12 samples/sec Loss 2.1590 LearningRate 0.0049 Epoch: 15 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:35,017-Speed 5657.31 samples/sec Loss 2.1568 LearningRate 0.0049 Epoch: 15 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:36,852-Speed 5581.16 samples/sec Loss 2.1863 LearningRate 0.0049 Epoch: 15 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:38,677-Speed 5611.58 samples/sec Loss 2.1933 LearningRate 0.0049 Epoch: 15 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:40,491-Speed 5648.87 samples/sec Loss 2.1847 LearningRate 0.0049 Epoch: 15 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:42,313-Speed 5620.42 samples/sec Loss 2.1926 LearningRate 0.0049 Epoch: 15 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:44,125-Speed 5653.09 samples/sec Loss 2.1130 LearningRate 0.0049 Epoch: 15 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 07:02:45,942-Speed 5637.20 samples/sec Loss 2.3753 LearningRate 0.0049 Epoch: 15 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:47,766-Speed 5615.53 samples/sec Loss 2.1990 LearningRate 0.0049 Epoch: 15 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:49,580-Speed 5646.95 samples/sec Loss 2.0932 LearningRate 0.0049 Epoch: 15 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:51,398-Speed 5637.07 samples/sec Loss 2.2071 LearningRate 0.0049 Epoch: 15 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:53,215-Speed 5637.06 samples/sec Loss 2.1875 LearningRate 0.0049 Epoch: 15 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:55,044-Speed 5599.74 samples/sec Loss 2.1676 LearningRate 0.0049 Epoch: 15 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:56,894-Speed 5538.27 samples/sec Loss 2.0995 LearningRate 0.0049 Epoch: 15 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:02:58,725-Speed 5593.78 samples/sec Loss 2.2179 LearningRate 0.0048 Epoch: 15 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:00,539-Speed 5647.20 samples/sec Loss 2.1339 LearningRate 0.0048 Epoch: 15 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:02,361-Speed 5622.39 samples/sec Loss 2.1838 LearningRate 0.0048 Epoch: 15 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:04,188-Speed 5605.14 samples/sec Loss 2.0964 LearningRate 0.0048 Epoch: 15 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:05,994-Speed 5671.77 samples/sec Loss 2.1467 LearningRate 0.0048 Epoch: 15 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:07,813-Speed 5631.45 samples/sec Loss 2.1576 LearningRate 0.0048 Epoch: 15 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:09,644-Speed 5594.86 samples/sec Loss 2.1972 LearningRate 0.0048 Epoch: 15 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:11,466-Speed 5623.70 samples/sec Loss 2.1677 LearningRate 0.0048 Epoch: 15 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:13,281-Speed 5642.97 samples/sec Loss 2.2237 LearningRate 0.0048 Epoch: 15 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:15,105-Speed 5616.02 samples/sec Loss 2.1933 LearningRate 0.0048 Epoch: 15 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:16,913-Speed 5666.94 samples/sec Loss 2.1822 LearningRate 0.0048 Epoch: 15 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:18,737-Speed 5613.76 samples/sec Loss 2.2749 LearningRate 0.0048 Epoch: 15 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:20,559-Speed 5623.10 samples/sec Loss 2.1673 LearningRate 0.0048 Epoch: 15 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:22,369-Speed 5660.30 samples/sec Loss 2.1519 LearningRate 0.0048 Epoch: 15 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:24,189-Speed 5627.17 samples/sec Loss 2.2020 LearningRate 0.0048 Epoch: 15 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:26,007-Speed 5635.29 samples/sec Loss 2.3039 LearningRate 0.0048 Epoch: 15 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:27,820-Speed 5648.64 samples/sec Loss 2.1733 LearningRate 0.0048 Epoch: 15 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:29,637-Speed 5638.42 samples/sec Loss 2.1549 LearningRate 0.0048 Epoch: 15 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:31,450-Speed 5649.48 samples/sec Loss 2.1679 LearningRate 0.0048 Epoch: 15 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:33,268-Speed 5633.67 samples/sec Loss 2.2046 LearningRate 0.0048 Epoch: 15 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:35,082-Speed 5649.10 samples/sec Loss 2.2404 LearningRate 0.0048 Epoch: 15 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:36,934-Speed 5530.67 samples/sec Loss 2.1444 LearningRate 0.0048 Epoch: 15 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:38,747-Speed 5650.77 samples/sec Loss 2.1401 LearningRate 0.0048 Epoch: 15 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:40,554-Speed 5669.72 samples/sec Loss 2.2226 LearningRate 0.0048 Epoch: 15 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:42,383-Speed 5598.74 samples/sec Loss 2.1203 LearningRate 0.0048 Epoch: 15 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:44,195-Speed 5654.69 samples/sec Loss 2.1580 LearningRate 0.0048 Epoch: 15 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:46,018-Speed 5616.59 samples/sec Loss 2.2465 LearningRate 0.0047 Epoch: 15 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:47,859-Speed 5564.95 samples/sec Loss 2.1916 LearningRate 0.0047 Epoch: 15 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:49,686-Speed 5605.45 samples/sec Loss 2.2578 LearningRate 0.0047 Epoch: 15 Global Step: 88960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:51,523-Speed 5576.99 samples/sec Loss 2.1505 LearningRate 0.0047 Epoch: 15 Global Step: 88970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:53,347-Speed 5616.43 samples/sec Loss 2.1417 LearningRate 0.0047 Epoch: 15 Global Step: 88980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:55,183-Speed 5580.35 samples/sec Loss 2.1423 LearningRate 0.0047 Epoch: 15 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:56,997-Speed 5647.42 samples/sec Loss 2.1979 LearningRate 0.0047 Epoch: 15 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:03:58,811-Speed 5645.39 samples/sec Loss 2.1028 LearningRate 0.0047 Epoch: 15 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:00,623-Speed 5654.72 samples/sec Loss 2.2645 LearningRate 0.0047 Epoch: 15 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:02,436-Speed 5648.87 samples/sec Loss 2.1961 LearningRate 0.0047 Epoch: 15 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:04,266-Speed 5598.79 samples/sec Loss 2.0691 LearningRate 0.0047 Epoch: 15 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:06,083-Speed 5638.19 samples/sec Loss 2.2009 LearningRate 0.0047 Epoch: 15 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:07,895-Speed 5652.31 samples/sec Loss 2.2123 LearningRate 0.0047 Epoch: 15 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:09,707-Speed 5652.44 samples/sec Loss 2.0821 LearningRate 0.0047 Epoch: 15 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:11,523-Speed 5642.16 samples/sec Loss 2.2171 LearningRate 0.0047 Epoch: 15 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:13,350-Speed 5605.72 samples/sec Loss 2.2563 LearningRate 0.0047 Epoch: 15 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:15,170-Speed 5626.14 samples/sec Loss 2.1128 LearningRate 0.0047 Epoch: 15 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:16,995-Speed 5615.27 samples/sec Loss 2.1802 LearningRate 0.0047 Epoch: 15 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:18,802-Speed 5668.06 samples/sec Loss 2.1803 LearningRate 0.0047 Epoch: 15 Global Step: 89120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:20,612-Speed 5660.81 samples/sec Loss 2.1434 LearningRate 0.0047 Epoch: 15 Global Step: 89130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:22,430-Speed 5634.18 samples/sec Loss 2.1866 LearningRate 0.0047 Epoch: 15 Global Step: 89140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:24,272-Speed 5560.22 samples/sec Loss 2.1544 LearningRate 0.0047 Epoch: 15 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:26,092-Speed 5628.46 samples/sec Loss 2.1897 LearningRate 0.0047 Epoch: 15 Global Step: 89160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:27,917-Speed 5613.58 samples/sec Loss 2.2370 LearningRate 0.0047 Epoch: 15 Global Step: 89170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:29,734-Speed 5635.51 samples/sec Loss 2.0739 LearningRate 0.0047 Epoch: 15 Global Step: 89180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:31,558-Speed 5616.51 samples/sec Loss 2.1992 LearningRate 0.0047 Epoch: 15 Global Step: 89190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:33,420-Speed 5501.76 samples/sec Loss 2.1493 LearningRate 0.0046 Epoch: 15 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:35,325-Speed 5376.25 samples/sec Loss 2.1443 LearningRate 0.0046 Epoch: 15 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:37,246-Speed 5334.74 samples/sec Loss 2.1604 LearningRate 0.0046 Epoch: 15 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:39,169-Speed 5325.56 samples/sec Loss 2.2186 LearningRate 0.0046 Epoch: 15 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:41,038-Speed 5481.23 samples/sec Loss 2.1250 LearningRate 0.0046 Epoch: 15 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:42,857-Speed 5629.29 samples/sec Loss 2.1576 LearningRate 0.0046 Epoch: 15 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:44,680-Speed 5620.53 samples/sec Loss 2.2950 LearningRate 0.0046 Epoch: 15 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:46,494-Speed 5647.35 samples/sec Loss 2.2113 LearningRate 0.0046 Epoch: 15 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:48,331-Speed 5573.65 samples/sec Loss 2.1681 LearningRate 0.0046 Epoch: 15 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:50,157-Speed 5611.39 samples/sec Loss 2.2060 LearningRate 0.0046 Epoch: 15 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:51,987-Speed 5596.24 samples/sec Loss 2.1011 LearningRate 0.0046 Epoch: 15 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:53,824-Speed 5576.88 samples/sec Loss 2.0487 LearningRate 0.0046 Epoch: 15 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:55,641-Speed 5640.54 samples/sec Loss 2.1218 LearningRate 0.0046 Epoch: 15 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:57,474-Speed 5586.34 samples/sec Loss 2.2080 LearningRate 0.0046 Epoch: 15 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:04:59,297-Speed 5617.96 samples/sec Loss 2.1677 LearningRate 0.0046 Epoch: 15 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:01,147-Speed 5538.78 samples/sec Loss 2.1087 LearningRate 0.0046 Epoch: 15 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:02,971-Speed 5614.79 samples/sec Loss 2.1240 LearningRate 0.0046 Epoch: 15 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:04,785-Speed 5648.77 samples/sec Loss 2.1023 LearningRate 0.0046 Epoch: 15 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:06,620-Speed 5582.18 samples/sec Loss 2.1949 LearningRate 0.0046 Epoch: 15 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:08,443-Speed 5616.75 samples/sec Loss 2.1795 LearningRate 0.0046 Epoch: 15 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:10,276-Speed 5590.53 samples/sec Loss 2.1073 LearningRate 0.0046 Epoch: 15 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:12,104-Speed 5602.25 samples/sec Loss 2.2462 LearningRate 0.0046 Epoch: 15 Global Step: 89410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-27 07:05:13,911-Speed 5671.60 samples/sec Loss 2.1129 LearningRate 0.0046 Epoch: 15 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:15,724-Speed 5648.86 samples/sec Loss 2.1681 LearningRate 0.0046 Epoch: 15 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:17,542-Speed 5634.13 samples/sec Loss 2.2560 LearningRate 0.0046 Epoch: 15 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:19,353-Speed 5656.36 samples/sec Loss 2.1352 LearningRate 0.0046 Epoch: 15 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:21,166-Speed 5649.17 samples/sec Loss 2.1240 LearningRate 0.0046 Epoch: 15 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:22,988-Speed 5622.67 samples/sec Loss 2.1389 LearningRate 0.0045 Epoch: 15 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:24,803-Speed 5643.08 samples/sec Loss 2.1804 LearningRate 0.0045 Epoch: 15 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:26,628-Speed 5612.77 samples/sec Loss 2.1456 LearningRate 0.0045 Epoch: 15 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:28,457-Speed 5600.69 samples/sec Loss 2.2088 LearningRate 0.0045 Epoch: 15 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:30,294-Speed 5577.10 samples/sec Loss 2.1387 LearningRate 0.0045 Epoch: 15 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:32,103-Speed 5660.34 samples/sec Loss 2.2782 LearningRate 0.0045 Epoch: 15 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:33,922-Speed 5632.16 samples/sec Loss 2.1381 LearningRate 0.0045 Epoch: 15 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:35,736-Speed 5646.56 samples/sec Loss 2.2067 LearningRate 0.0045 Epoch: 15 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:37,557-Speed 5626.80 samples/sec Loss 2.1803 LearningRate 0.0045 Epoch: 15 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:39,392-Speed 5582.70 samples/sec Loss 2.1121 LearningRate 0.0045 Epoch: 15 Global Step: 89560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:41,210-Speed 5634.35 samples/sec Loss 2.0794 LearningRate 0.0045 Epoch: 15 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:43,022-Speed 5652.62 samples/sec Loss 2.2160 LearningRate 0.0045 Epoch: 15 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:44,876-Speed 5524.94 samples/sec Loss 2.1227 LearningRate 0.0045 Epoch: 15 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:46,693-Speed 5636.89 samples/sec Loss 2.1621 LearningRate 0.0045 Epoch: 15 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:48,529-Speed 5581.07 samples/sec Loss 2.1941 LearningRate 0.0045 Epoch: 15 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:50,357-Speed 5602.51 samples/sec Loss 2.1043 LearningRate 0.0045 Epoch: 15 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:52,192-Speed 5581.27 samples/sec Loss 2.0759 LearningRate 0.0045 Epoch: 15 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:54,055-Speed 5498.41 samples/sec Loss 2.0997 LearningRate 0.0045 Epoch: 15 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:55,893-Speed 5572.71 samples/sec Loss 2.2201 LearningRate 0.0045 Epoch: 15 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:57,748-Speed 5524.49 samples/sec Loss 2.1604 LearningRate 0.0045 Epoch: 15 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:05:59,642-Speed 5408.78 samples/sec Loss 2.1815 LearningRate 0.0045 Epoch: 15 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:01,456-Speed 5645.04 samples/sec Loss 2.1824 LearningRate 0.0045 Epoch: 15 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:03,277-Speed 5625.13 samples/sec Loss 2.1279 LearningRate 0.0045 Epoch: 15 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:05,095-Speed 5633.66 samples/sec Loss 2.1115 LearningRate 0.0045 Epoch: 15 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:06,919-Speed 5618.33 samples/sec Loss 2.0863 LearningRate 0.0045 Epoch: 15 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:08,732-Speed 5650.40 samples/sec Loss 2.1267 LearningRate 0.0045 Epoch: 15 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:10,554-Speed 5620.02 samples/sec Loss 2.1286 LearningRate 0.0045 Epoch: 15 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:12,374-Speed 5628.35 samples/sec Loss 2.1876 LearningRate 0.0044 Epoch: 15 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:14,194-Speed 5628.36 samples/sec Loss 2.0218 LearningRate 0.0044 Epoch: 15 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:16,017-Speed 5619.92 samples/sec Loss 2.2426 LearningRate 0.0044 Epoch: 15 Global Step: 89760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:17,849-Speed 5593.43 samples/sec Loss 2.2036 LearningRate 0.0044 Epoch: 15 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 07:06:19,658-Speed 5661.67 samples/sec Loss 2.0495 LearningRate 0.0044 Epoch: 15 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:06:21,473-Speed 5643.86 samples/sec Loss 2.1373 LearningRate 0.0044 Epoch: 15 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:06:23,289-Speed 5640.15 samples/sec Loss 2.1373 LearningRate 0.0044 Epoch: 15 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:06:25,105-Speed 5639.83 samples/sec Loss 2.0650 LearningRate 0.0044 Epoch: 15 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 07:06:26,914-Speed 5662.16 samples/sec Loss 2.0896 LearningRate 0.0044 Epoch: 15 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:28,734-Speed 5628.68 samples/sec Loss 2.0425 LearningRate 0.0044 Epoch: 15 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:30,558-Speed 5616.17 samples/sec Loss 2.1810 LearningRate 0.0044 Epoch: 15 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:32,380-Speed 5623.04 samples/sec Loss 2.1892 LearningRate 0.0044 Epoch: 15 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:34,217-Speed 5574.87 samples/sec Loss 2.1874 LearningRate 0.0044 Epoch: 15 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:36,043-Speed 5609.38 samples/sec Loss 2.1301 LearningRate 0.0044 Epoch: 15 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:37,941-Speed 5399.13 samples/sec Loss 2.1391 LearningRate 0.0044 Epoch: 15 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:39,884-Speed 5272.92 samples/sec Loss 2.1430 LearningRate 0.0044 Epoch: 15 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:41,813-Speed 5307.73 samples/sec Loss 2.0907 LearningRate 0.0044 Epoch: 15 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:43,662-Speed 5539.56 samples/sec Loss 2.0996 LearningRate 0.0044 Epoch: 15 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:45,494-Speed 5594.35 samples/sec Loss 2.1382 LearningRate 0.0044 Epoch: 15 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:47,310-Speed 5640.47 samples/sec Loss 2.1569 LearningRate 0.0044 Epoch: 15 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:49,143-Speed 5585.91 samples/sec Loss 2.1118 LearningRate 0.0044 Epoch: 15 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:06:50,962-Speed 5631.93 samples/sec Loss 2.2027 LearningRate 0.0044 Epoch: 15 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:52,791-Speed 5599.62 samples/sec Loss 2.1602 LearningRate 0.0044 Epoch: 15 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:54,608-Speed 5637.58 samples/sec Loss 2.1506 LearningRate 0.0044 Epoch: 15 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:56,492-Speed 5436.73 samples/sec Loss 2.0516 LearningRate 0.0044 Epoch: 15 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:06:58,416-Speed 5325.52 samples/sec Loss 2.1733 LearningRate 0.0044 Epoch: 15 Global Step: 89990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:07:00,312-Speed 5403.50 samples/sec Loss 2.0500 LearningRate 0.0044 Epoch: 15 Global Step: 90000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:07:26,372-[lfw][90000]XNorm: 22.569485 Training: 2022-04-27 07:07:26,373-[lfw][90000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-27 07:07:26,373-[lfw][90000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:07:56,594-[cfp_fp][90000]XNorm: 21.390203 Training: 2022-04-27 07:07:56,595-[cfp_fp][90000]Accuracy-Flip: 0.97486+-0.00743 Training: 2022-04-27 07:07:56,595-[cfp_fp][90000]Accuracy-Highest: 0.97486 Training: 2022-04-27 07:08:22,687-[agedb_30][90000]XNorm: 22.610529 Training: 2022-04-27 07:08:22,687-[agedb_30][90000]Accuracy-Flip: 0.98083+-0.00647 Training: 2022-04-27 07:08:22,688-[agedb_30][90000]Accuracy-Highest: 0.98117 Training: 2022-04-27 07:08:24,514-Speed 121.61 samples/sec Loss 2.2206 LearningRate 0.0043 Epoch: 15 Global Step: 90010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:08:26,334-Speed 5626.76 samples/sec Loss 2.2108 LearningRate 0.0043 Epoch: 15 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:08:28,138-Speed 5680.23 samples/sec Loss 2.1805 LearningRate 0.0043 Epoch: 15 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:08:29,965-Speed 5606.49 samples/sec Loss 2.0648 LearningRate 0.0043 Epoch: 15 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:08:31,777-Speed 5652.73 samples/sec Loss 2.1073 LearningRate 0.0043 Epoch: 15 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:33,611-Speed 5584.56 samples/sec Loss 2.1777 LearningRate 0.0043 Epoch: 15 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:35,420-Speed 5662.89 samples/sec Loss 2.2701 LearningRate 0.0043 Epoch: 15 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:37,240-Speed 5629.39 samples/sec Loss 2.1318 LearningRate 0.0043 Epoch: 15 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:39,052-Speed 5653.04 samples/sec Loss 2.2030 LearningRate 0.0043 Epoch: 15 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:40,863-Speed 5656.24 samples/sec Loss 2.1717 LearningRate 0.0043 Epoch: 15 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:42,673-Speed 5656.81 samples/sec Loss 2.1975 LearningRate 0.0043 Epoch: 15 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:44,505-Speed 5593.11 samples/sec Loss 2.1097 LearningRate 0.0043 Epoch: 15 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:46,334-Speed 5600.92 samples/sec Loss 2.1265 LearningRate 0.0043 Epoch: 15 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:48,147-Speed 5650.57 samples/sec Loss 2.0930 LearningRate 0.0043 Epoch: 15 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:49,960-Speed 5649.33 samples/sec Loss 2.2121 LearningRate 0.0043 Epoch: 15 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:51,784-Speed 5615.63 samples/sec Loss 2.0657 LearningRate 0.0043 Epoch: 15 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:53,630-Speed 5549.27 samples/sec Loss 2.2076 LearningRate 0.0043 Epoch: 15 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:55,441-Speed 5655.29 samples/sec Loss 2.0857 LearningRate 0.0043 Epoch: 15 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:57,255-Speed 5649.47 samples/sec Loss 2.0827 LearningRate 0.0043 Epoch: 15 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:08:59,060-Speed 5674.08 samples/sec Loss 2.0122 LearningRate 0.0043 Epoch: 15 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:00,876-Speed 5640.00 samples/sec Loss 2.1522 LearningRate 0.0043 Epoch: 15 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:02,701-Speed 5612.37 samples/sec Loss 2.1135 LearningRate 0.0043 Epoch: 15 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:04,518-Speed 5637.98 samples/sec Loss 2.1075 LearningRate 0.0043 Epoch: 15 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:06,344-Speed 5611.12 samples/sec Loss 2.1578 LearningRate 0.0043 Epoch: 15 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:08,195-Speed 5533.52 samples/sec Loss 2.0528 LearningRate 0.0043 Epoch: 15 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:10,031-Speed 5578.28 samples/sec Loss 2.0443 LearningRate 0.0043 Epoch: 15 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:11,872-Speed 5565.22 samples/sec Loss 2.1692 LearningRate 0.0043 Epoch: 15 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:13,691-Speed 5632.21 samples/sec Loss 2.2213 LearningRate 0.0042 Epoch: 15 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:15,580-Speed 5421.71 samples/sec Loss 2.1127 LearningRate 0.0042 Epoch: 15 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:17,409-Speed 5601.85 samples/sec Loss 2.1247 LearningRate 0.0042 Epoch: 15 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:19,226-Speed 5637.12 samples/sec Loss 2.2434 LearningRate 0.0042 Epoch: 15 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:21,051-Speed 5611.36 samples/sec Loss 2.0607 LearningRate 0.0042 Epoch: 15 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:22,869-Speed 5634.07 samples/sec Loss 2.1706 LearningRate 0.0042 Epoch: 15 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:24,690-Speed 5625.97 samples/sec Loss 2.2333 LearningRate 0.0042 Epoch: 15 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:26,509-Speed 5629.83 samples/sec Loss 2.2057 LearningRate 0.0042 Epoch: 15 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:28,328-Speed 5634.48 samples/sec Loss 2.1170 LearningRate 0.0042 Epoch: 15 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:30,162-Speed 5585.54 samples/sec Loss 2.1409 LearningRate 0.0042 Epoch: 15 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:31,970-Speed 5663.92 samples/sec Loss 2.1448 LearningRate 0.0042 Epoch: 15 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:33,800-Speed 5597.36 samples/sec Loss 2.0587 LearningRate 0.0042 Epoch: 15 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:35,620-Speed 5628.35 samples/sec Loss 2.0762 LearningRate 0.0042 Epoch: 15 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:37,442-Speed 5622.84 samples/sec Loss 2.1334 LearningRate 0.0042 Epoch: 15 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:39,255-Speed 5648.81 samples/sec Loss 2.1749 LearningRate 0.0042 Epoch: 15 Global Step: 90420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:41,085-Speed 5598.76 samples/sec Loss 2.1865 LearningRate 0.0042 Epoch: 15 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:42,926-Speed 5564.38 samples/sec Loss 2.1375 LearningRate 0.0042 Epoch: 15 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:44,745-Speed 5634.56 samples/sec Loss 2.1456 LearningRate 0.0042 Epoch: 15 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:46,562-Speed 5636.46 samples/sec Loss 2.1855 LearningRate 0.0042 Epoch: 15 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:48,402-Speed 5567.42 samples/sec Loss 2.1926 LearningRate 0.0042 Epoch: 15 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:50,244-Speed 5562.43 samples/sec Loss 2.1604 LearningRate 0.0042 Epoch: 15 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:52,095-Speed 5532.42 samples/sec Loss 2.0872 LearningRate 0.0042 Epoch: 15 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:53,920-Speed 5613.13 samples/sec Loss 2.1719 LearningRate 0.0042 Epoch: 15 Global Step: 90500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:55,741-Speed 5624.63 samples/sec Loss 2.1324 LearningRate 0.0042 Epoch: 15 Global Step: 90510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:09:57,551-Speed 5658.95 samples/sec Loss 2.1847 LearningRate 0.0042 Epoch: 15 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:09:59,375-Speed 5616.59 samples/sec Loss 2.1708 LearningRate 0.0042 Epoch: 15 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:01,214-Speed 5571.20 samples/sec Loss 2.1270 LearningRate 0.0042 Epoch: 15 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:03,047-Speed 5588.75 samples/sec Loss 2.1205 LearningRate 0.0042 Epoch: 15 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:04,865-Speed 5633.75 samples/sec Loss 2.1618 LearningRate 0.0041 Epoch: 15 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:06,703-Speed 5571.75 samples/sec Loss 2.1094 LearningRate 0.0041 Epoch: 15 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:08,614-Speed 5362.00 samples/sec Loss 2.1500 LearningRate 0.0041 Epoch: 15 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:10,437-Speed 5617.72 samples/sec Loss 2.2123 LearningRate 0.0041 Epoch: 15 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:12,272-Speed 5582.03 samples/sec Loss 2.2056 LearningRate 0.0041 Epoch: 15 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:14,088-Speed 5643.23 samples/sec Loss 2.1722 LearningRate 0.0041 Epoch: 15 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:15,901-Speed 5649.37 samples/sec Loss 2.0862 LearningRate 0.0041 Epoch: 15 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:17,726-Speed 5613.83 samples/sec Loss 2.0219 LearningRate 0.0041 Epoch: 15 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:19,540-Speed 5644.27 samples/sec Loss 2.0988 LearningRate 0.0041 Epoch: 15 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:21,359-Speed 5633.28 samples/sec Loss 2.1196 LearningRate 0.0041 Epoch: 15 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:23,166-Speed 5666.42 samples/sec Loss 2.1189 LearningRate 0.0041 Epoch: 15 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:25,000-Speed 5586.46 samples/sec Loss 2.1146 LearningRate 0.0041 Epoch: 15 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:26,828-Speed 5602.69 samples/sec Loss 2.1872 LearningRate 0.0041 Epoch: 15 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:28,675-Speed 5548.41 samples/sec Loss 2.1295 LearningRate 0.0041 Epoch: 15 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:30,490-Speed 5641.97 samples/sec Loss 2.1416 LearningRate 0.0041 Epoch: 15 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:32,308-Speed 5633.00 samples/sec Loss 2.0347 LearningRate 0.0041 Epoch: 15 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:34,119-Speed 5657.64 samples/sec Loss 2.0514 LearningRate 0.0041 Epoch: 15 Global Step: 90720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:35,964-Speed 5554.21 samples/sec Loss 2.1951 LearningRate 0.0041 Epoch: 15 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:37,789-Speed 5611.45 samples/sec Loss 2.0957 LearningRate 0.0041 Epoch: 15 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:39,601-Speed 5653.53 samples/sec Loss 2.1840 LearningRate 0.0041 Epoch: 15 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:41,428-Speed 5605.46 samples/sec Loss 2.1375 LearningRate 0.0041 Epoch: 15 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:43,321-Speed 5412.66 samples/sec Loss 2.2279 LearningRate 0.0041 Epoch: 15 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:45,171-Speed 5537.47 samples/sec Loss 2.1111 LearningRate 0.0041 Epoch: 15 Global Step: 90780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:47,018-Speed 5546.41 samples/sec Loss 2.0359 LearningRate 0.0041 Epoch: 15 Global Step: 90790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:48,844-Speed 5608.16 samples/sec Loss 2.1851 LearningRate 0.0041 Epoch: 15 Global Step: 90800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:50,660-Speed 5639.66 samples/sec Loss 2.0921 LearningRate 0.0041 Epoch: 15 Global Step: 90810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:10:52,472-Speed 5652.73 samples/sec Loss 2.1306 LearningRate 0.0041 Epoch: 15 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:54,278-Speed 5674.36 samples/sec Loss 2.1177 LearningRate 0.0041 Epoch: 15 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:56,098-Speed 5628.44 samples/sec Loss 2.0502 LearningRate 0.0040 Epoch: 15 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:57,912-Speed 5646.96 samples/sec Loss 2.1705 LearningRate 0.0040 Epoch: 15 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:10:59,726-Speed 5645.32 samples/sec Loss 2.2531 LearningRate 0.0040 Epoch: 15 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:01,556-Speed 5597.45 samples/sec Loss 2.1588 LearningRate 0.0040 Epoch: 15 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:03,380-Speed 5617.59 samples/sec Loss 2.0937 LearningRate 0.0040 Epoch: 15 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:05,208-Speed 5604.19 samples/sec Loss 2.1950 LearningRate 0.0040 Epoch: 15 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:07,028-Speed 5628.78 samples/sec Loss 2.1942 LearningRate 0.0040 Epoch: 15 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:08,840-Speed 5651.15 samples/sec Loss 2.0421 LearningRate 0.0040 Epoch: 15 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:10,641-Speed 5689.92 samples/sec Loss 2.1458 LearningRate 0.0040 Epoch: 15 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:12,454-Speed 5649.05 samples/sec Loss 2.1688 LearningRate 0.0040 Epoch: 15 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:14,284-Speed 5598.08 samples/sec Loss 2.1743 LearningRate 0.0040 Epoch: 15 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:16,121-Speed 5574.89 samples/sec Loss 2.1602 LearningRate 0.0040 Epoch: 15 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:17,929-Speed 5666.21 samples/sec Loss 2.1208 LearningRate 0.0040 Epoch: 15 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:19,892-Speed 5218.15 samples/sec Loss 2.0837 LearningRate 0.0040 Epoch: 15 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:36,846-Speed 604.04 samples/sec Loss 1.9688 LearningRate 0.0040 Epoch: 16 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:38,845-Speed 5125.33 samples/sec Loss 1.6200 LearningRate 0.0040 Epoch: 16 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:40,847-Speed 5117.63 samples/sec Loss 1.5882 LearningRate 0.0040 Epoch: 16 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:42,824-Speed 5181.61 samples/sec Loss 1.6368 LearningRate 0.0040 Epoch: 16 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:44,639-Speed 5643.66 samples/sec Loss 1.6377 LearningRate 0.0040 Epoch: 16 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:11:46,478-Speed 5570.50 samples/sec Loss 1.5363 LearningRate 0.0040 Epoch: 16 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:48,308-Speed 5599.75 samples/sec Loss 1.6753 LearningRate 0.0040 Epoch: 16 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:50,115-Speed 5667.24 samples/sec Loss 1.5501 LearningRate 0.0040 Epoch: 16 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:51,936-Speed 5624.60 samples/sec Loss 1.5906 LearningRate 0.0040 Epoch: 16 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:53,750-Speed 5647.07 samples/sec Loss 1.5742 LearningRate 0.0040 Epoch: 16 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:55,573-Speed 5617.24 samples/sec Loss 1.6411 LearningRate 0.0040 Epoch: 16 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:57,387-Speed 5648.53 samples/sec Loss 1.5176 LearningRate 0.0040 Epoch: 16 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:11:59,216-Speed 5600.99 samples/sec Loss 1.6433 LearningRate 0.0040 Epoch: 16 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:01,047-Speed 5594.05 samples/sec Loss 1.6306 LearningRate 0.0040 Epoch: 16 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:02,880-Speed 5590.52 samples/sec Loss 1.5547 LearningRate 0.0039 Epoch: 16 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:04,725-Speed 5551.17 samples/sec Loss 1.6565 LearningRate 0.0039 Epoch: 16 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:06,605-Speed 5447.70 samples/sec Loss 1.5125 LearningRate 0.0039 Epoch: 16 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:08,466-Speed 5508.97 samples/sec Loss 1.5511 LearningRate 0.0039 Epoch: 16 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:10,291-Speed 5612.41 samples/sec Loss 1.5416 LearningRate 0.0039 Epoch: 16 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:12,110-Speed 5630.10 samples/sec Loss 1.5992 LearningRate 0.0039 Epoch: 16 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:13,940-Speed 5596.80 samples/sec Loss 1.6185 LearningRate 0.0039 Epoch: 16 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:15,772-Speed 5591.50 samples/sec Loss 1.5716 LearningRate 0.0039 Epoch: 16 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:17,609-Speed 5577.85 samples/sec Loss 1.5453 LearningRate 0.0039 Epoch: 16 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:19,441-Speed 5592.28 samples/sec Loss 1.6863 LearningRate 0.0039 Epoch: 16 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:21,270-Speed 5600.38 samples/sec Loss 1.5394 LearningRate 0.0039 Epoch: 16 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:23,090-Speed 5625.40 samples/sec Loss 1.5662 LearningRate 0.0039 Epoch: 16 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:24,947-Speed 5516.44 samples/sec Loss 1.7360 LearningRate 0.0039 Epoch: 16 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:26,793-Speed 5549.98 samples/sec Loss 1.7222 LearningRate 0.0039 Epoch: 16 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:28,630-Speed 5577.00 samples/sec Loss 1.5906 LearningRate 0.0039 Epoch: 16 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:30,464-Speed 5584.06 samples/sec Loss 1.5800 LearningRate 0.0039 Epoch: 16 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:32,278-Speed 5645.91 samples/sec Loss 1.5693 LearningRate 0.0039 Epoch: 16 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:34,106-Speed 5605.37 samples/sec Loss 1.5725 LearningRate 0.0039 Epoch: 16 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:35,950-Speed 5552.56 samples/sec Loss 1.6078 LearningRate 0.0039 Epoch: 16 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:37,787-Speed 5577.81 samples/sec Loss 1.5708 LearningRate 0.0039 Epoch: 16 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:39,615-Speed 5604.58 samples/sec Loss 1.7357 LearningRate 0.0039 Epoch: 16 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:41,454-Speed 5570.68 samples/sec Loss 1.6105 LearningRate 0.0039 Epoch: 16 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:43,371-Speed 5344.13 samples/sec Loss 1.6054 LearningRate 0.0039 Epoch: 16 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:45,259-Speed 5425.72 samples/sec Loss 1.5896 LearningRate 0.0039 Epoch: 16 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:47,074-Speed 5642.90 samples/sec Loss 1.6103 LearningRate 0.0039 Epoch: 16 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:48,913-Speed 5569.75 samples/sec Loss 1.6286 LearningRate 0.0039 Epoch: 16 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:50,735-Speed 5622.66 samples/sec Loss 1.5725 LearningRate 0.0039 Epoch: 16 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:52,573-Speed 5570.12 samples/sec Loss 1.6096 LearningRate 0.0039 Epoch: 16 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:54,395-Speed 5623.13 samples/sec Loss 1.6176 LearningRate 0.0039 Epoch: 16 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:56,228-Speed 5586.90 samples/sec Loss 1.6386 LearningRate 0.0038 Epoch: 16 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:58,052-Speed 5618.21 samples/sec Loss 1.6779 LearningRate 0.0038 Epoch: 16 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:12:59,885-Speed 5586.95 samples/sec Loss 1.7031 LearningRate 0.0038 Epoch: 16 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:13:01,706-Speed 5625.09 samples/sec Loss 1.6120 LearningRate 0.0038 Epoch: 16 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:03,529-Speed 5621.44 samples/sec Loss 1.5984 LearningRate 0.0038 Epoch: 16 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:05,344-Speed 5642.02 samples/sec Loss 1.5723 LearningRate 0.0038 Epoch: 16 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:07,163-Speed 5632.22 samples/sec Loss 1.6198 LearningRate 0.0038 Epoch: 16 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:08,985-Speed 5621.42 samples/sec Loss 1.6454 LearningRate 0.0038 Epoch: 16 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:10,794-Speed 5662.14 samples/sec Loss 1.5867 LearningRate 0.0038 Epoch: 16 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:12,602-Speed 5667.38 samples/sec Loss 1.7237 LearningRate 0.0038 Epoch: 16 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:14,422-Speed 5626.97 samples/sec Loss 1.6510 LearningRate 0.0038 Epoch: 16 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:16,229-Speed 5669.86 samples/sec Loss 1.5818 LearningRate 0.0038 Epoch: 16 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:18,038-Speed 5663.46 samples/sec Loss 1.6330 LearningRate 0.0038 Epoch: 16 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:19,841-Speed 5680.08 samples/sec Loss 1.6647 LearningRate 0.0038 Epoch: 16 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:21,651-Speed 5657.37 samples/sec Loss 1.6413 LearningRate 0.0038 Epoch: 16 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:23,471-Speed 5628.15 samples/sec Loss 1.6247 LearningRate 0.0038 Epoch: 16 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:25,311-Speed 5567.53 samples/sec Loss 1.5541 LearningRate 0.0038 Epoch: 16 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:27,122-Speed 5657.40 samples/sec Loss 1.6735 LearningRate 0.0038 Epoch: 16 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:28,936-Speed 5647.58 samples/sec Loss 1.6268 LearningRate 0.0038 Epoch: 16 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:30,741-Speed 5674.54 samples/sec Loss 1.6493 LearningRate 0.0038 Epoch: 16 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:32,561-Speed 5629.77 samples/sec Loss 1.7138 LearningRate 0.0038 Epoch: 16 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:34,389-Speed 5602.69 samples/sec Loss 1.6276 LearningRate 0.0038 Epoch: 16 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:36,213-Speed 5616.47 samples/sec Loss 1.5259 LearningRate 0.0038 Epoch: 16 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:38,034-Speed 5623.84 samples/sec Loss 1.6847 LearningRate 0.0038 Epoch: 16 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:39,860-Speed 5611.96 samples/sec Loss 1.7511 LearningRate 0.0038 Epoch: 16 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:41,680-Speed 5627.40 samples/sec Loss 1.6074 LearningRate 0.0038 Epoch: 16 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:43,505-Speed 5612.94 samples/sec Loss 1.6431 LearningRate 0.0038 Epoch: 16 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:45,319-Speed 5646.30 samples/sec Loss 1.6317 LearningRate 0.0038 Epoch: 16 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:47,136-Speed 5637.06 samples/sec Loss 1.7238 LearningRate 0.0038 Epoch: 16 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:48,979-Speed 5557.64 samples/sec Loss 1.7042 LearningRate 0.0037 Epoch: 16 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:50,805-Speed 5611.83 samples/sec Loss 1.6109 LearningRate 0.0037 Epoch: 16 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:52,618-Speed 5650.63 samples/sec Loss 1.6300 LearningRate 0.0037 Epoch: 16 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:54,435-Speed 5635.07 samples/sec Loss 1.6588 LearningRate 0.0037 Epoch: 16 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:56,241-Speed 5671.44 samples/sec Loss 1.5928 LearningRate 0.0037 Epoch: 16 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:58,049-Speed 5666.26 samples/sec Loss 1.6902 LearningRate 0.0037 Epoch: 16 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:13:59,859-Speed 5659.83 samples/sec Loss 1.6639 LearningRate 0.0037 Epoch: 16 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:01,690-Speed 5595.27 samples/sec Loss 1.6048 LearningRate 0.0037 Epoch: 16 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:03,500-Speed 5657.31 samples/sec Loss 1.6389 LearningRate 0.0037 Epoch: 16 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:05,312-Speed 5653.08 samples/sec Loss 1.6450 LearningRate 0.0037 Epoch: 16 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:07,128-Speed 5643.90 samples/sec Loss 1.6385 LearningRate 0.0037 Epoch: 16 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:08,959-Speed 5593.86 samples/sec Loss 1.7345 LearningRate 0.0037 Epoch: 16 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:10,794-Speed 5581.67 samples/sec Loss 1.6705 LearningRate 0.0037 Epoch: 16 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:12,603-Speed 5662.03 samples/sec Loss 1.6531 LearningRate 0.0037 Epoch: 16 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:14,411-Speed 5666.36 samples/sec Loss 1.6645 LearningRate 0.0037 Epoch: 16 Global Step: 91840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:14:16,210-Speed 5695.24 samples/sec Loss 1.5756 LearningRate 0.0037 Epoch: 16 Global Step: 91850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:18,023-Speed 5649.96 samples/sec Loss 1.6789 LearningRate 0.0037 Epoch: 16 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:19,832-Speed 5660.02 samples/sec Loss 1.6856 LearningRate 0.0037 Epoch: 16 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:21,676-Speed 5554.50 samples/sec Loss 1.5757 LearningRate 0.0037 Epoch: 16 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:23,505-Speed 5601.50 samples/sec Loss 1.6535 LearningRate 0.0037 Epoch: 16 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:25,317-Speed 5652.35 samples/sec Loss 1.6464 LearningRate 0.0037 Epoch: 16 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:27,131-Speed 5647.51 samples/sec Loss 1.6703 LearningRate 0.0037 Epoch: 16 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:28,967-Speed 5580.84 samples/sec Loss 1.7428 LearningRate 0.0037 Epoch: 16 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:30,787-Speed 5626.54 samples/sec Loss 1.5945 LearningRate 0.0037 Epoch: 16 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:32,630-Speed 5560.50 samples/sec Loss 1.6669 LearningRate 0.0037 Epoch: 16 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:34,448-Speed 5632.87 samples/sec Loss 1.6298 LearningRate 0.0037 Epoch: 16 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:36,290-Speed 5562.04 samples/sec Loss 1.7162 LearningRate 0.0037 Epoch: 16 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:38,138-Speed 5541.47 samples/sec Loss 1.6404 LearningRate 0.0037 Epoch: 16 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:39,956-Speed 5635.16 samples/sec Loss 1.6579 LearningRate 0.0037 Epoch: 16 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:41,773-Speed 5637.43 samples/sec Loss 1.6880 LearningRate 0.0037 Epoch: 16 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:14:43,607-Speed 5584.84 samples/sec Loss 1.6948 LearningRate 0.0036 Epoch: 16 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:15:09,658-[lfw][92000]XNorm: 21.828823 Training: 2022-04-27 07:15:09,659-[lfw][92000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-27 07:15:09,659-[lfw][92000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:15:39,836-[cfp_fp][92000]XNorm: 20.339098 Training: 2022-04-27 07:15:39,836-[cfp_fp][92000]Accuracy-Flip: 0.97557+-0.00594 Training: 2022-04-27 07:15:39,837-[cfp_fp][92000]Accuracy-Highest: 0.97557 Training: 2022-04-27 07:16:05,869-[agedb_30][92000]XNorm: 21.785522 Training: 2022-04-27 07:16:05,870-[agedb_30][92000]Accuracy-Flip: 0.98167+-0.00715 Training: 2022-04-27 07:16:05,870-[agedb_30][92000]Accuracy-Highest: 0.98167 Training: 2022-04-27 07:16:07,702-Speed 121.77 samples/sec Loss 1.6111 LearningRate 0.0036 Epoch: 16 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:09,513-Speed 5656.55 samples/sec Loss 1.6362 LearningRate 0.0036 Epoch: 16 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:11,322-Speed 5660.90 samples/sec Loss 1.6244 LearningRate 0.0036 Epoch: 16 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:13,149-Speed 5605.17 samples/sec Loss 1.7146 LearningRate 0.0036 Epoch: 16 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:14,940-Speed 5720.46 samples/sec Loss 1.6954 LearningRate 0.0036 Epoch: 16 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:16,750-Speed 5659.51 samples/sec Loss 1.6277 LearningRate 0.0036 Epoch: 16 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:18,559-Speed 5661.83 samples/sec Loss 1.6821 LearningRate 0.0036 Epoch: 16 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:20,397-Speed 5573.70 samples/sec Loss 1.5939 LearningRate 0.0036 Epoch: 16 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:22,238-Speed 5565.48 samples/sec Loss 1.5971 LearningRate 0.0036 Epoch: 16 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:24,073-Speed 5580.65 samples/sec Loss 1.6517 LearningRate 0.0036 Epoch: 16 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:25,891-Speed 5634.57 samples/sec Loss 1.6631 LearningRate 0.0036 Epoch: 16 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:27,704-Speed 5649.85 samples/sec Loss 1.6765 LearningRate 0.0036 Epoch: 16 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:29,547-Speed 5558.00 samples/sec Loss 1.7684 LearningRate 0.0036 Epoch: 16 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:31,375-Speed 5605.79 samples/sec Loss 1.6865 LearningRate 0.0036 Epoch: 16 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:33,188-Speed 5649.29 samples/sec Loss 1.7359 LearningRate 0.0036 Epoch: 16 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:35,003-Speed 5644.18 samples/sec Loss 1.6810 LearningRate 0.0036 Epoch: 16 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:36,824-Speed 5626.51 samples/sec Loss 1.7166 LearningRate 0.0036 Epoch: 16 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:38,657-Speed 5588.68 samples/sec Loss 1.6172 LearningRate 0.0036 Epoch: 16 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:40,506-Speed 5538.06 samples/sec Loss 1.6924 LearningRate 0.0036 Epoch: 16 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:42,315-Speed 5664.67 samples/sec Loss 1.5996 LearningRate 0.0036 Epoch: 16 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:44,137-Speed 5619.63 samples/sec Loss 1.6688 LearningRate 0.0036 Epoch: 16 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:45,969-Speed 5592.08 samples/sec Loss 1.7416 LearningRate 0.0036 Epoch: 16 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:47,784-Speed 5645.49 samples/sec Loss 1.5631 LearningRate 0.0036 Epoch: 16 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:49,600-Speed 5638.30 samples/sec Loss 1.7200 LearningRate 0.0036 Epoch: 16 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:51,416-Speed 5640.65 samples/sec Loss 1.7085 LearningRate 0.0036 Epoch: 16 Global Step: 92250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:16:53,219-Speed 5682.35 samples/sec Loss 1.6513 LearningRate 0.0036 Epoch: 16 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:55,026-Speed 5670.88 samples/sec Loss 1.6588 LearningRate 0.0036 Epoch: 16 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:56,838-Speed 5651.94 samples/sec Loss 1.6068 LearningRate 0.0036 Epoch: 16 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:16:58,644-Speed 5671.69 samples/sec Loss 1.6333 LearningRate 0.0036 Epoch: 16 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:00,447-Speed 5681.94 samples/sec Loss 1.6847 LearningRate 0.0035 Epoch: 16 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:02,269-Speed 5622.30 samples/sec Loss 1.7292 LearningRate 0.0035 Epoch: 16 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:04,088-Speed 5629.57 samples/sec Loss 1.6766 LearningRate 0.0035 Epoch: 16 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:05,916-Speed 5603.94 samples/sec Loss 1.7493 LearningRate 0.0035 Epoch: 16 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:07,743-Speed 5606.78 samples/sec Loss 1.6163 LearningRate 0.0035 Epoch: 16 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:09,577-Speed 5585.35 samples/sec Loss 1.6090 LearningRate 0.0035 Epoch: 16 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:11,387-Speed 5657.97 samples/sec Loss 1.6838 LearningRate 0.0035 Epoch: 16 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:13,201-Speed 5649.30 samples/sec Loss 1.7548 LearningRate 0.0035 Epoch: 16 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:15,026-Speed 5610.41 samples/sec Loss 1.6908 LearningRate 0.0035 Epoch: 16 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:16,855-Speed 5602.89 samples/sec Loss 1.7784 LearningRate 0.0035 Epoch: 16 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:18,669-Speed 5647.51 samples/sec Loss 1.6892 LearningRate 0.0035 Epoch: 16 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:20,492-Speed 5619.74 samples/sec Loss 1.7082 LearningRate 0.0035 Epoch: 16 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:22,364-Speed 5471.82 samples/sec Loss 1.7807 LearningRate 0.0035 Epoch: 16 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:24,189-Speed 5613.06 samples/sec Loss 1.6377 LearningRate 0.0035 Epoch: 16 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:25,996-Speed 5666.97 samples/sec Loss 1.6889 LearningRate 0.0035 Epoch: 16 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:27,805-Speed 5663.12 samples/sec Loss 1.7323 LearningRate 0.0035 Epoch: 16 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:29,606-Speed 5686.25 samples/sec Loss 1.7498 LearningRate 0.0035 Epoch: 16 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:31,423-Speed 5638.40 samples/sec Loss 1.6852 LearningRate 0.0035 Epoch: 16 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:33,261-Speed 5572.56 samples/sec Loss 1.7173 LearningRate 0.0035 Epoch: 16 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:35,094-Speed 5588.44 samples/sec Loss 1.7468 LearningRate 0.0035 Epoch: 16 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:36,911-Speed 5637.78 samples/sec Loss 1.7346 LearningRate 0.0035 Epoch: 16 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:38,750-Speed 5570.07 samples/sec Loss 1.6346 LearningRate 0.0035 Epoch: 16 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:40,576-Speed 5610.24 samples/sec Loss 1.6308 LearningRate 0.0035 Epoch: 16 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:42,404-Speed 5604.49 samples/sec Loss 1.5990 LearningRate 0.0035 Epoch: 16 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:44,213-Speed 5661.29 samples/sec Loss 1.7661 LearningRate 0.0035 Epoch: 16 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:46,027-Speed 5647.00 samples/sec Loss 1.7564 LearningRate 0.0035 Epoch: 16 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:47,844-Speed 5637.51 samples/sec Loss 1.6152 LearningRate 0.0035 Epoch: 16 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:49,662-Speed 5633.91 samples/sec Loss 1.7465 LearningRate 0.0035 Epoch: 16 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:51,489-Speed 5608.53 samples/sec Loss 1.7232 LearningRate 0.0035 Epoch: 16 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:17:53,350-Speed 5502.81 samples/sec Loss 1.7102 LearningRate 0.0035 Epoch: 16 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:55,206-Speed 5518.51 samples/sec Loss 1.6396 LearningRate 0.0034 Epoch: 16 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:57,067-Speed 5504.84 samples/sec Loss 1.6939 LearningRate 0.0034 Epoch: 16 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:17:58,901-Speed 5584.81 samples/sec Loss 1.6584 LearningRate 0.0034 Epoch: 16 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:00,706-Speed 5675.34 samples/sec Loss 1.7467 LearningRate 0.0034 Epoch: 16 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:02,526-Speed 5630.12 samples/sec Loss 1.6319 LearningRate 0.0034 Epoch: 16 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:04,365-Speed 5569.99 samples/sec Loss 1.7578 LearningRate 0.0034 Epoch: 16 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:06,278-Speed 5355.46 samples/sec Loss 1.7108 LearningRate 0.0034 Epoch: 16 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:08,089-Speed 5656.05 samples/sec Loss 1.6885 LearningRate 0.0034 Epoch: 16 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:09,905-Speed 5638.38 samples/sec Loss 1.6332 LearningRate 0.0034 Epoch: 16 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:11,711-Speed 5673.06 samples/sec Loss 1.7025 LearningRate 0.0034 Epoch: 16 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:13,556-Speed 5552.51 samples/sec Loss 1.7002 LearningRate 0.0034 Epoch: 16 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:15,385-Speed 5598.59 samples/sec Loss 1.7151 LearningRate 0.0034 Epoch: 16 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:17,219-Speed 5586.17 samples/sec Loss 1.7687 LearningRate 0.0034 Epoch: 16 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:19,041-Speed 5620.92 samples/sec Loss 1.7380 LearningRate 0.0034 Epoch: 16 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:20,876-Speed 5583.06 samples/sec Loss 1.7389 LearningRate 0.0034 Epoch: 16 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:22,692-Speed 5640.23 samples/sec Loss 1.5988 LearningRate 0.0034 Epoch: 16 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:24,534-Speed 5563.92 samples/sec Loss 1.6560 LearningRate 0.0034 Epoch: 16 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:26,363-Speed 5599.68 samples/sec Loss 1.6916 LearningRate 0.0034 Epoch: 16 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:28,195-Speed 5592.90 samples/sec Loss 1.6527 LearningRate 0.0034 Epoch: 16 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:30,011-Speed 5638.98 samples/sec Loss 1.7753 LearningRate 0.0034 Epoch: 16 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:31,893-Speed 5441.79 samples/sec Loss 1.7874 LearningRate 0.0034 Epoch: 16 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:33,741-Speed 5543.15 samples/sec Loss 1.6815 LearningRate 0.0034 Epoch: 16 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:35,554-Speed 5649.15 samples/sec Loss 1.8089 LearningRate 0.0034 Epoch: 16 Global Step: 92820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:37,389-Speed 5583.56 samples/sec Loss 1.7326 LearningRate 0.0034 Epoch: 16 Global Step: 92830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:39,210-Speed 5623.96 samples/sec Loss 1.6815 LearningRate 0.0034 Epoch: 16 Global Step: 92840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:41,073-Speed 5500.40 samples/sec Loss 1.6738 LearningRate 0.0034 Epoch: 16 Global Step: 92850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:42,907-Speed 5583.14 samples/sec Loss 1.7099 LearningRate 0.0034 Epoch: 16 Global Step: 92860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:44,725-Speed 5635.58 samples/sec Loss 1.7205 LearningRate 0.0034 Epoch: 16 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:46,556-Speed 5594.59 samples/sec Loss 1.7082 LearningRate 0.0034 Epoch: 16 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:48,371-Speed 5643.38 samples/sec Loss 1.6862 LearningRate 0.0034 Epoch: 16 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:50,202-Speed 5596.57 samples/sec Loss 1.6918 LearningRate 0.0034 Epoch: 16 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:52,014-Speed 5652.24 samples/sec Loss 1.7086 LearningRate 0.0033 Epoch: 16 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:53,843-Speed 5600.63 samples/sec Loss 1.6854 LearningRate 0.0033 Epoch: 16 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:55,676-Speed 5589.54 samples/sec Loss 1.6696 LearningRate 0.0033 Epoch: 16 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:57,489-Speed 5649.07 samples/sec Loss 1.7124 LearningRate 0.0033 Epoch: 16 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:18:59,315-Speed 5608.02 samples/sec Loss 1.7395 LearningRate 0.0033 Epoch: 16 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:01,132-Speed 5639.11 samples/sec Loss 1.6658 LearningRate 0.0033 Epoch: 16 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:02,941-Speed 5662.35 samples/sec Loss 1.7845 LearningRate 0.0033 Epoch: 16 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:04,757-Speed 5638.17 samples/sec Loss 1.7378 LearningRate 0.0033 Epoch: 16 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:06,585-Speed 5606.16 samples/sec Loss 1.6982 LearningRate 0.0033 Epoch: 16 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:08,408-Speed 5620.47 samples/sec Loss 1.7654 LearningRate 0.0033 Epoch: 16 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:10,235-Speed 5607.03 samples/sec Loss 1.7113 LearningRate 0.0033 Epoch: 16 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:12,064-Speed 5597.95 samples/sec Loss 1.6741 LearningRate 0.0033 Epoch: 16 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:13,887-Speed 5618.73 samples/sec Loss 1.7552 LearningRate 0.0033 Epoch: 16 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:15,707-Speed 5629.52 samples/sec Loss 1.7276 LearningRate 0.0033 Epoch: 16 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:17,521-Speed 5647.92 samples/sec Loss 1.8275 LearningRate 0.0033 Epoch: 16 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:19,339-Speed 5634.08 samples/sec Loss 1.7635 LearningRate 0.0033 Epoch: 16 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:21,171-Speed 5590.41 samples/sec Loss 1.7143 LearningRate 0.0033 Epoch: 16 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:22,987-Speed 5639.38 samples/sec Loss 1.7373 LearningRate 0.0033 Epoch: 16 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:24,805-Speed 5635.75 samples/sec Loss 1.7013 LearningRate 0.0033 Epoch: 16 Global Step: 93090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:19:26,623-Speed 5636.03 samples/sec Loss 1.7521 LearningRate 0.0033 Epoch: 16 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:28,425-Speed 5683.44 samples/sec Loss 1.6906 LearningRate 0.0033 Epoch: 16 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:30,242-Speed 5638.78 samples/sec Loss 1.6373 LearningRate 0.0033 Epoch: 16 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:32,058-Speed 5638.27 samples/sec Loss 1.7364 LearningRate 0.0033 Epoch: 16 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:33,877-Speed 5634.05 samples/sec Loss 1.8087 LearningRate 0.0033 Epoch: 16 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:35,687-Speed 5659.30 samples/sec Loss 1.7876 LearningRate 0.0033 Epoch: 16 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:37,510-Speed 5619.84 samples/sec Loss 1.6330 LearningRate 0.0033 Epoch: 16 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:39,339-Speed 5598.53 samples/sec Loss 1.7263 LearningRate 0.0033 Epoch: 16 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:41,160-Speed 5624.19 samples/sec Loss 1.6928 LearningRate 0.0033 Epoch: 16 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:42,974-Speed 5646.82 samples/sec Loss 1.7167 LearningRate 0.0033 Epoch: 16 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:44,789-Speed 5643.91 samples/sec Loss 1.7484 LearningRate 0.0033 Epoch: 16 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:19:46,601-Speed 5654.88 samples/sec Loss 1.7247 LearningRate 0.0033 Epoch: 16 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:48,410-Speed 5661.75 samples/sec Loss 1.8342 LearningRate 0.0032 Epoch: 16 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:50,232-Speed 5621.23 samples/sec Loss 1.6730 LearningRate 0.0032 Epoch: 16 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:52,053-Speed 5624.54 samples/sec Loss 1.6808 LearningRate 0.0032 Epoch: 16 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:53,877-Speed 5619.31 samples/sec Loss 1.7570 LearningRate 0.0032 Epoch: 16 Global Step: 93250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:55,686-Speed 5660.42 samples/sec Loss 1.7706 LearningRate 0.0032 Epoch: 16 Global Step: 93260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:57,508-Speed 5623.12 samples/sec Loss 1.7493 LearningRate 0.0032 Epoch: 16 Global Step: 93270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:19:59,326-Speed 5635.35 samples/sec Loss 1.6599 LearningRate 0.0032 Epoch: 16 Global Step: 93280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:01,148-Speed 5621.91 samples/sec Loss 1.6860 LearningRate 0.0032 Epoch: 16 Global Step: 93290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:02,960-Speed 5653.18 samples/sec Loss 1.7500 LearningRate 0.0032 Epoch: 16 Global Step: 93300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:04,757-Speed 5698.71 samples/sec Loss 1.7686 LearningRate 0.0032 Epoch: 16 Global Step: 93310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:06,593-Speed 5582.82 samples/sec Loss 1.6586 LearningRate 0.0032 Epoch: 16 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:08,409-Speed 5640.11 samples/sec Loss 1.6728 LearningRate 0.0032 Epoch: 16 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:10,222-Speed 5648.91 samples/sec Loss 1.6980 LearningRate 0.0032 Epoch: 16 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:12,041-Speed 5631.01 samples/sec Loss 1.6548 LearningRate 0.0032 Epoch: 16 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:13,864-Speed 5620.70 samples/sec Loss 1.6852 LearningRate 0.0032 Epoch: 16 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:15,695-Speed 5594.42 samples/sec Loss 1.6653 LearningRate 0.0032 Epoch: 16 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:17,530-Speed 5582.34 samples/sec Loss 1.7695 LearningRate 0.0032 Epoch: 16 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:19,338-Speed 5664.94 samples/sec Loss 1.6916 LearningRate 0.0032 Epoch: 16 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:21,149-Speed 5656.99 samples/sec Loss 1.6751 LearningRate 0.0032 Epoch: 16 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:22,960-Speed 5654.93 samples/sec Loss 1.6728 LearningRate 0.0032 Epoch: 16 Global Step: 93410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:20:24,784-Speed 5617.44 samples/sec Loss 1.7158 LearningRate 0.0032 Epoch: 16 Global Step: 93420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:26,597-Speed 5650.72 samples/sec Loss 1.7127 LearningRate 0.0032 Epoch: 16 Global Step: 93430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:28,413-Speed 5639.92 samples/sec Loss 1.6824 LearningRate 0.0032 Epoch: 16 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:30,225-Speed 5651.33 samples/sec Loss 1.7477 LearningRate 0.0032 Epoch: 16 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:32,040-Speed 5643.38 samples/sec Loss 1.7508 LearningRate 0.0032 Epoch: 16 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:33,877-Speed 5577.00 samples/sec Loss 1.7419 LearningRate 0.0032 Epoch: 16 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:35,692-Speed 5644.38 samples/sec Loss 1.7826 LearningRate 0.0032 Epoch: 16 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:37,510-Speed 5637.12 samples/sec Loss 1.6385 LearningRate 0.0032 Epoch: 16 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:39,333-Speed 5616.70 samples/sec Loss 1.6758 LearningRate 0.0032 Epoch: 16 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:41,163-Speed 5598.80 samples/sec Loss 1.6635 LearningRate 0.0032 Epoch: 16 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:42,986-Speed 5617.26 samples/sec Loss 1.7185 LearningRate 0.0032 Epoch: 16 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:44,797-Speed 5657.67 samples/sec Loss 1.8146 LearningRate 0.0032 Epoch: 16 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:46,606-Speed 5661.93 samples/sec Loss 1.6634 LearningRate 0.0031 Epoch: 16 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:48,437-Speed 5594.94 samples/sec Loss 1.6444 LearningRate 0.0031 Epoch: 16 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:50,256-Speed 5629.46 samples/sec Loss 1.7872 LearningRate 0.0031 Epoch: 16 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:52,076-Speed 5628.55 samples/sec Loss 1.7383 LearningRate 0.0031 Epoch: 16 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:53,911-Speed 5582.89 samples/sec Loss 1.6435 LearningRate 0.0031 Epoch: 16 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:55,721-Speed 5658.97 samples/sec Loss 1.6152 LearningRate 0.0031 Epoch: 16 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:57,545-Speed 5617.61 samples/sec Loss 1.7199 LearningRate 0.0031 Epoch: 16 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:20:59,360-Speed 5642.78 samples/sec Loss 1.7264 LearningRate 0.0031 Epoch: 16 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:01,166-Speed 5672.11 samples/sec Loss 1.7307 LearningRate 0.0031 Epoch: 16 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:03,004-Speed 5572.76 samples/sec Loss 1.7653 LearningRate 0.0031 Epoch: 16 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:04,838-Speed 5586.32 samples/sec Loss 1.7261 LearningRate 0.0031 Epoch: 16 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:06,671-Speed 5588.71 samples/sec Loss 1.6677 LearningRate 0.0031 Epoch: 16 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:08,492-Speed 5623.12 samples/sec Loss 1.7162 LearningRate 0.0031 Epoch: 16 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:10,313-Speed 5625.15 samples/sec Loss 1.6759 LearningRate 0.0031 Epoch: 16 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:12,127-Speed 5646.86 samples/sec Loss 1.7332 LearningRate 0.0031 Epoch: 16 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:13,946-Speed 5631.38 samples/sec Loss 1.8190 LearningRate 0.0031 Epoch: 16 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:15,786-Speed 5565.92 samples/sec Loss 1.7273 LearningRate 0.0031 Epoch: 16 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:17,615-Speed 5602.30 samples/sec Loss 1.7585 LearningRate 0.0031 Epoch: 16 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:19,429-Speed 5648.95 samples/sec Loss 1.6866 LearningRate 0.0031 Epoch: 16 Global Step: 93720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:21:21,230-Speed 5685.50 samples/sec Loss 1.7202 LearningRate 0.0031 Epoch: 16 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:23,045-Speed 5646.00 samples/sec Loss 1.7427 LearningRate 0.0031 Epoch: 16 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:24,858-Speed 5649.68 samples/sec Loss 1.6992 LearningRate 0.0031 Epoch: 16 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:26,697-Speed 5568.43 samples/sec Loss 1.7654 LearningRate 0.0031 Epoch: 16 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:28,531-Speed 5585.19 samples/sec Loss 1.6850 LearningRate 0.0031 Epoch: 16 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:30,359-Speed 5603.47 samples/sec Loss 1.6620 LearningRate 0.0031 Epoch: 16 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:32,179-Speed 5628.05 samples/sec Loss 1.7143 LearningRate 0.0031 Epoch: 16 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:33,989-Speed 5661.50 samples/sec Loss 1.7068 LearningRate 0.0031 Epoch: 16 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:35,819-Speed 5597.26 samples/sec Loss 1.6784 LearningRate 0.0031 Epoch: 16 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:37,669-Speed 5538.17 samples/sec Loss 1.7509 LearningRate 0.0031 Epoch: 16 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:39,598-Speed 5309.36 samples/sec Loss 1.7797 LearningRate 0.0031 Epoch: 16 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:41,466-Speed 5484.55 samples/sec Loss 1.7519 LearningRate 0.0031 Epoch: 16 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:43,289-Speed 5617.46 samples/sec Loss 1.7404 LearningRate 0.0031 Epoch: 16 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:45,104-Speed 5644.11 samples/sec Loss 1.6503 LearningRate 0.0030 Epoch: 16 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:46,927-Speed 5620.96 samples/sec Loss 1.7022 LearningRate 0.0030 Epoch: 16 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:48,759-Speed 5589.81 samples/sec Loss 1.7387 LearningRate 0.0030 Epoch: 16 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:50,573-Speed 5648.65 samples/sec Loss 1.7865 LearningRate 0.0030 Epoch: 16 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:21:52,384-Speed 5655.46 samples/sec Loss 1.7271 LearningRate 0.0030 Epoch: 16 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:54,202-Speed 5633.71 samples/sec Loss 1.7649 LearningRate 0.0030 Epoch: 16 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:56,038-Speed 5577.86 samples/sec Loss 1.7537 LearningRate 0.0030 Epoch: 16 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:57,854-Speed 5642.39 samples/sec Loss 1.6784 LearningRate 0.0030 Epoch: 16 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:21:59,676-Speed 5622.87 samples/sec Loss 1.7278 LearningRate 0.0030 Epoch: 16 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:01,487-Speed 5653.30 samples/sec Loss 1.7055 LearningRate 0.0030 Epoch: 16 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:03,303-Speed 5641.69 samples/sec Loss 1.7704 LearningRate 0.0030 Epoch: 16 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:05,142-Speed 5571.26 samples/sec Loss 1.7379 LearningRate 0.0030 Epoch: 16 Global Step: 93970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:06,971-Speed 5601.99 samples/sec Loss 1.7499 LearningRate 0.0030 Epoch: 16 Global Step: 93980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:08,841-Speed 5477.04 samples/sec Loss 1.8010 LearningRate 0.0030 Epoch: 16 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:10,730-Speed 5421.82 samples/sec Loss 1.7082 LearningRate 0.0030 Epoch: 16 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:22:37,013-[lfw][94000]XNorm: 21.855137 Training: 2022-04-27 07:22:37,014-[lfw][94000]Accuracy-Flip: 0.99750+-0.00327 Training: 2022-04-27 07:22:37,014-[lfw][94000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:23:07,511-[cfp_fp][94000]XNorm: 20.663132 Training: 2022-04-27 07:23:07,512-[cfp_fp][94000]Accuracy-Flip: 0.97557+-0.00744 Training: 2022-04-27 07:23:07,512-[cfp_fp][94000]Accuracy-Highest: 0.97557 Training: 2022-04-27 07:23:33,833-[agedb_30][94000]XNorm: 21.931744 Training: 2022-04-27 07:23:33,833-[agedb_30][94000]Accuracy-Flip: 0.97983+-0.00728 Training: 2022-04-27 07:23:33,834-[agedb_30][94000]Accuracy-Highest: 0.98167 Training: 2022-04-27 07:23:35,646-Speed 120.59 samples/sec Loss 1.6699 LearningRate 0.0030 Epoch: 16 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:37,469-Speed 5619.81 samples/sec Loss 1.7534 LearningRate 0.0030 Epoch: 16 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:39,285-Speed 5642.19 samples/sec Loss 1.7101 LearningRate 0.0030 Epoch: 16 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:41,105-Speed 5626.16 samples/sec Loss 1.6266 LearningRate 0.0030 Epoch: 16 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:42,934-Speed 5600.31 samples/sec Loss 1.6716 LearningRate 0.0030 Epoch: 16 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:44,755-Speed 5627.08 samples/sec Loss 1.7798 LearningRate 0.0030 Epoch: 16 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:46,587-Speed 5590.64 samples/sec Loss 1.6104 LearningRate 0.0030 Epoch: 16 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:48,395-Speed 5665.34 samples/sec Loss 1.6588 LearningRate 0.0030 Epoch: 16 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:50,203-Speed 5664.54 samples/sec Loss 1.7501 LearningRate 0.0030 Epoch: 16 Global Step: 94090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:51,994-Speed 5720.54 samples/sec Loss 1.7712 LearningRate 0.0030 Epoch: 16 Global Step: 94100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:53,800-Speed 5671.99 samples/sec Loss 1.6006 LearningRate 0.0030 Epoch: 16 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:55,602-Speed 5684.12 samples/sec Loss 1.6904 LearningRate 0.0030 Epoch: 16 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:57,413-Speed 5656.40 samples/sec Loss 1.6522 LearningRate 0.0030 Epoch: 16 Global Step: 94130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:23:59,217-Speed 5676.48 samples/sec Loss 1.7095 LearningRate 0.0030 Epoch: 16 Global Step: 94140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:01,036-Speed 5631.57 samples/sec Loss 1.6906 LearningRate 0.0030 Epoch: 16 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:02,837-Speed 5689.80 samples/sec Loss 1.6736 LearningRate 0.0030 Epoch: 16 Global Step: 94160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:04,643-Speed 5672.60 samples/sec Loss 1.6390 LearningRate 0.0030 Epoch: 16 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:06,447-Speed 5677.09 samples/sec Loss 1.6715 LearningRate 0.0030 Epoch: 16 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:08,278-Speed 5594.62 samples/sec Loss 1.7514 LearningRate 0.0029 Epoch: 16 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:10,074-Speed 5702.36 samples/sec Loss 1.6907 LearningRate 0.0029 Epoch: 16 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:11,915-Speed 5565.17 samples/sec Loss 1.5860 LearningRate 0.0029 Epoch: 16 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:13,760-Speed 5550.61 samples/sec Loss 1.7616 LearningRate 0.0029 Epoch: 16 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:15,576-Speed 5641.64 samples/sec Loss 1.7647 LearningRate 0.0029 Epoch: 16 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:17,394-Speed 5634.79 samples/sec Loss 1.7141 LearningRate 0.0029 Epoch: 16 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:19,203-Speed 5662.13 samples/sec Loss 1.7859 LearningRate 0.0029 Epoch: 16 Global Step: 94250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:21,021-Speed 5633.82 samples/sec Loss 1.7211 LearningRate 0.0029 Epoch: 16 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:22,843-Speed 5624.22 samples/sec Loss 1.7356 LearningRate 0.0029 Epoch: 16 Global Step: 94270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:24,654-Speed 5656.01 samples/sec Loss 1.7352 LearningRate 0.0029 Epoch: 16 Global Step: 94280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:26,465-Speed 5655.65 samples/sec Loss 1.7116 LearningRate 0.0029 Epoch: 16 Global Step: 94290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:28,284-Speed 5630.40 samples/sec Loss 1.6494 LearningRate 0.0029 Epoch: 16 Global Step: 94300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:30,092-Speed 5668.58 samples/sec Loss 1.8207 LearningRate 0.0029 Epoch: 16 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:31,902-Speed 5658.71 samples/sec Loss 1.8033 LearningRate 0.0029 Epoch: 16 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:33,715-Speed 5650.11 samples/sec Loss 1.6853 LearningRate 0.0029 Epoch: 16 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:35,536-Speed 5624.46 samples/sec Loss 1.7151 LearningRate 0.0029 Epoch: 16 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:37,359-Speed 5619.06 samples/sec Loss 1.6994 LearningRate 0.0029 Epoch: 16 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:39,181-Speed 5622.21 samples/sec Loss 1.6958 LearningRate 0.0029 Epoch: 16 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:41,005-Speed 5616.70 samples/sec Loss 1.7912 LearningRate 0.0029 Epoch: 16 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:42,819-Speed 5646.54 samples/sec Loss 1.7727 LearningRate 0.0029 Epoch: 16 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:44,632-Speed 5649.60 samples/sec Loss 1.7188 LearningRate 0.0029 Epoch: 16 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:46,454-Speed 5624.10 samples/sec Loss 1.7800 LearningRate 0.0029 Epoch: 16 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:48,272-Speed 5632.41 samples/sec Loss 1.7912 LearningRate 0.0029 Epoch: 16 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:50,096-Speed 5617.79 samples/sec Loss 1.6485 LearningRate 0.0029 Epoch: 16 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:51,903-Speed 5667.04 samples/sec Loss 1.7291 LearningRate 0.0029 Epoch: 16 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:53,722-Speed 5630.48 samples/sec Loss 1.8040 LearningRate 0.0029 Epoch: 16 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:55,537-Speed 5643.60 samples/sec Loss 1.8031 LearningRate 0.0029 Epoch: 16 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:57,388-Speed 5533.54 samples/sec Loss 1.7253 LearningRate 0.0029 Epoch: 16 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:24:59,202-Speed 5648.70 samples/sec Loss 1.7432 LearningRate 0.0029 Epoch: 16 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:01,025-Speed 5619.12 samples/sec Loss 1.8390 LearningRate 0.0029 Epoch: 16 Global Step: 94480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:02,848-Speed 5618.61 samples/sec Loss 1.8297 LearningRate 0.0029 Epoch: 16 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:04,692-Speed 5557.59 samples/sec Loss 1.6210 LearningRate 0.0029 Epoch: 16 Global Step: 94500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:25:06,489-Speed 5699.96 samples/sec Loss 1.7042 LearningRate 0.0029 Epoch: 16 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:08,297-Speed 5665.71 samples/sec Loss 1.6707 LearningRate 0.0029 Epoch: 16 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:10,138-Speed 5563.33 samples/sec Loss 1.7029 LearningRate 0.0028 Epoch: 16 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:12,029-Speed 5416.85 samples/sec Loss 1.7375 LearningRate 0.0028 Epoch: 16 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:13,866-Speed 5576.97 samples/sec Loss 1.7536 LearningRate 0.0028 Epoch: 16 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:15,700-Speed 5584.75 samples/sec Loss 1.8786 LearningRate 0.0028 Epoch: 16 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:17,566-Speed 5490.86 samples/sec Loss 1.6440 LearningRate 0.0028 Epoch: 16 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:19,485-Speed 5337.19 samples/sec Loss 1.6845 LearningRate 0.0028 Epoch: 16 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:21,334-Speed 5542.77 samples/sec Loss 1.7150 LearningRate 0.0028 Epoch: 16 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:23,149-Speed 5642.09 samples/sec Loss 1.7406 LearningRate 0.0028 Epoch: 16 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:24,967-Speed 5633.36 samples/sec Loss 1.7869 LearningRate 0.0028 Epoch: 16 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:26,788-Speed 5627.93 samples/sec Loss 1.6550 LearningRate 0.0028 Epoch: 16 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:28,615-Speed 5606.42 samples/sec Loss 1.6972 LearningRate 0.0028 Epoch: 16 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:30,437-Speed 5619.79 samples/sec Loss 1.6563 LearningRate 0.0028 Epoch: 16 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:32,251-Speed 5648.53 samples/sec Loss 1.6897 LearningRate 0.0028 Epoch: 16 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:34,072-Speed 5625.85 samples/sec Loss 1.7183 LearningRate 0.0028 Epoch: 16 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:35,891-Speed 5631.00 samples/sec Loss 1.6819 LearningRate 0.0028 Epoch: 16 Global Step: 94670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:37,736-Speed 5550.18 samples/sec Loss 1.7807 LearningRate 0.0028 Epoch: 16 Global Step: 94680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:39,557-Speed 5628.30 samples/sec Loss 1.7289 LearningRate 0.0028 Epoch: 16 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:41,394-Speed 5575.60 samples/sec Loss 1.7045 LearningRate 0.0028 Epoch: 16 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:43,210-Speed 5641.29 samples/sec Loss 1.6732 LearningRate 0.0028 Epoch: 16 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:45,026-Speed 5638.54 samples/sec Loss 1.7216 LearningRate 0.0028 Epoch: 16 Global Step: 94720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:46,848-Speed 5622.16 samples/sec Loss 1.6822 LearningRate 0.0028 Epoch: 16 Global Step: 94730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:48,664-Speed 5640.64 samples/sec Loss 1.7453 LearningRate 0.0028 Epoch: 16 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:50,481-Speed 5637.87 samples/sec Loss 1.6609 LearningRate 0.0028 Epoch: 16 Global Step: 94750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:52,303-Speed 5622.73 samples/sec Loss 1.7235 LearningRate 0.0028 Epoch: 16 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:54,116-Speed 5651.12 samples/sec Loss 1.7500 LearningRate 0.0028 Epoch: 16 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:55,930-Speed 5644.51 samples/sec Loss 1.6932 LearningRate 0.0028 Epoch: 16 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:57,745-Speed 5643.02 samples/sec Loss 1.7243 LearningRate 0.0028 Epoch: 16 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:25:59,578-Speed 5590.07 samples/sec Loss 1.7186 LearningRate 0.0028 Epoch: 16 Global Step: 94800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:01,435-Speed 5516.81 samples/sec Loss 1.6923 LearningRate 0.0028 Epoch: 16 Global Step: 94810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:03,269-Speed 5584.94 samples/sec Loss 1.7113 LearningRate 0.0028 Epoch: 16 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:05,109-Speed 5567.17 samples/sec Loss 1.6567 LearningRate 0.0028 Epoch: 16 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:06,914-Speed 5676.13 samples/sec Loss 1.7465 LearningRate 0.0028 Epoch: 16 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:08,720-Speed 5670.81 samples/sec Loss 1.6889 LearningRate 0.0028 Epoch: 16 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:10,533-Speed 5649.95 samples/sec Loss 1.7374 LearningRate 0.0028 Epoch: 16 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:12,356-Speed 5620.67 samples/sec Loss 1.6166 LearningRate 0.0027 Epoch: 16 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:14,175-Speed 5631.17 samples/sec Loss 1.7820 LearningRate 0.0027 Epoch: 16 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:15,996-Speed 5622.83 samples/sec Loss 1.7833 LearningRate 0.0027 Epoch: 16 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:17,811-Speed 5651.23 samples/sec Loss 1.7161 LearningRate 0.0027 Epoch: 16 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:19,613-Speed 5682.41 samples/sec Loss 1.7164 LearningRate 0.0027 Epoch: 16 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:21,423-Speed 5660.01 samples/sec Loss 1.6735 LearningRate 0.0027 Epoch: 16 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:23,233-Speed 5657.91 samples/sec Loss 1.6381 LearningRate 0.0027 Epoch: 16 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:25,043-Speed 5660.82 samples/sec Loss 1.6882 LearningRate 0.0027 Epoch: 16 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:26,853-Speed 5660.94 samples/sec Loss 1.7207 LearningRate 0.0027 Epoch: 16 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:28,663-Speed 5656.89 samples/sec Loss 1.6800 LearningRate 0.0027 Epoch: 16 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:30,488-Speed 5614.61 samples/sec Loss 1.6614 LearningRate 0.0027 Epoch: 16 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:32,298-Speed 5657.65 samples/sec Loss 1.7135 LearningRate 0.0027 Epoch: 16 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:34,116-Speed 5634.68 samples/sec Loss 1.6953 LearningRate 0.0027 Epoch: 16 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:35,925-Speed 5664.22 samples/sec Loss 1.6684 LearningRate 0.0027 Epoch: 16 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:37,735-Speed 5656.83 samples/sec Loss 1.7072 LearningRate 0.0027 Epoch: 16 Global Step: 95010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:26:39,535-Speed 5693.03 samples/sec Loss 1.6350 LearningRate 0.0027 Epoch: 16 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:41,348-Speed 5648.23 samples/sec Loss 1.7025 LearningRate 0.0027 Epoch: 16 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:43,167-Speed 5632.50 samples/sec Loss 1.6005 LearningRate 0.0027 Epoch: 16 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:44,980-Speed 5648.94 samples/sec Loss 1.6682 LearningRate 0.0027 Epoch: 16 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:46,788-Speed 5669.12 samples/sec Loss 1.5989 LearningRate 0.0027 Epoch: 16 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:48,602-Speed 5644.99 samples/sec Loss 1.7315 LearningRate 0.0027 Epoch: 16 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:50,422-Speed 5628.65 samples/sec Loss 1.6410 LearningRate 0.0027 Epoch: 16 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:52,229-Speed 5669.15 samples/sec Loss 1.7984 LearningRate 0.0027 Epoch: 16 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:54,035-Speed 5670.30 samples/sec Loss 1.6860 LearningRate 0.0027 Epoch: 16 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:55,847-Speed 5654.00 samples/sec Loss 1.6326 LearningRate 0.0027 Epoch: 16 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:57,659-Speed 5653.25 samples/sec Loss 1.5589 LearningRate 0.0027 Epoch: 16 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:26:59,483-Speed 5616.73 samples/sec Loss 1.6679 LearningRate 0.0027 Epoch: 16 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:01,299-Speed 5639.99 samples/sec Loss 1.6998 LearningRate 0.0027 Epoch: 16 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:03,112-Speed 5648.87 samples/sec Loss 1.7520 LearningRate 0.0027 Epoch: 16 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:04,918-Speed 5672.12 samples/sec Loss 1.6982 LearningRate 0.0027 Epoch: 16 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:06,726-Speed 5666.50 samples/sec Loss 1.5724 LearningRate 0.0027 Epoch: 16 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:08,539-Speed 5650.42 samples/sec Loss 1.7767 LearningRate 0.0027 Epoch: 16 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:10,358-Speed 5631.95 samples/sec Loss 1.7642 LearningRate 0.0027 Epoch: 16 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:12,172-Speed 5645.52 samples/sec Loss 1.7138 LearningRate 0.0027 Epoch: 16 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:13,982-Speed 5660.72 samples/sec Loss 1.6734 LearningRate 0.0026 Epoch: 16 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:15,809-Speed 5605.78 samples/sec Loss 1.7914 LearningRate 0.0026 Epoch: 16 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:17,618-Speed 5662.64 samples/sec Loss 1.6700 LearningRate 0.0026 Epoch: 16 Global Step: 95230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:19,449-Speed 5593.97 samples/sec Loss 1.6492 LearningRate 0.0026 Epoch: 16 Global Step: 95240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:21,284-Speed 5581.89 samples/sec Loss 1.8269 LearningRate 0.0026 Epoch: 16 Global Step: 95250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:23,107-Speed 5618.92 samples/sec Loss 1.7101 LearningRate 0.0026 Epoch: 16 Global Step: 95260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:24,919-Speed 5655.63 samples/sec Loss 1.7398 LearningRate 0.0026 Epoch: 16 Global Step: 95270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:26,744-Speed 5612.23 samples/sec Loss 1.5651 LearningRate 0.0026 Epoch: 16 Global Step: 95280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:28,562-Speed 5633.29 samples/sec Loss 1.7572 LearningRate 0.0026 Epoch: 16 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:30,367-Speed 5676.14 samples/sec Loss 1.6439 LearningRate 0.0026 Epoch: 16 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:32,188-Speed 5624.14 samples/sec Loss 1.6972 LearningRate 0.0026 Epoch: 16 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:33,994-Speed 5672.35 samples/sec Loss 1.7093 LearningRate 0.0026 Epoch: 16 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:35,808-Speed 5648.03 samples/sec Loss 1.6451 LearningRate 0.0026 Epoch: 16 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:37,615-Speed 5669.52 samples/sec Loss 1.7144 LearningRate 0.0026 Epoch: 16 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:39,422-Speed 5668.76 samples/sec Loss 1.7203 LearningRate 0.0026 Epoch: 16 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:41,238-Speed 5638.84 samples/sec Loss 1.6276 LearningRate 0.0026 Epoch: 16 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:27:43,067-Speed 5599.79 samples/sec Loss 1.7180 LearningRate 0.0026 Epoch: 16 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:44,989-Speed 5331.77 samples/sec Loss 1.5989 LearningRate 0.0026 Epoch: 16 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:46,876-Speed 5426.75 samples/sec Loss 1.6926 LearningRate 0.0026 Epoch: 16 Global Step: 95390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:48,682-Speed 5671.68 samples/sec Loss 1.6426 LearningRate 0.0026 Epoch: 16 Global Step: 95400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:50,492-Speed 5661.78 samples/sec Loss 1.6701 LearningRate 0.0026 Epoch: 16 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:52,301-Speed 5662.31 samples/sec Loss 1.7381 LearningRate 0.0026 Epoch: 16 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:54,109-Speed 5667.00 samples/sec Loss 1.6864 LearningRate 0.0026 Epoch: 16 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:55,924-Speed 5642.42 samples/sec Loss 1.6100 LearningRate 0.0026 Epoch: 16 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:57,734-Speed 5660.15 samples/sec Loss 1.7506 LearningRate 0.0026 Epoch: 16 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:27:59,542-Speed 5666.23 samples/sec Loss 1.8065 LearningRate 0.0026 Epoch: 16 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:01,363-Speed 5623.58 samples/sec Loss 1.7122 LearningRate 0.0026 Epoch: 16 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:03,184-Speed 5625.72 samples/sec Loss 1.6505 LearningRate 0.0026 Epoch: 16 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:05,020-Speed 5580.64 samples/sec Loss 1.6542 LearningRate 0.0026 Epoch: 16 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:06,837-Speed 5635.25 samples/sec Loss 1.6824 LearningRate 0.0026 Epoch: 16 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:08,654-Speed 5638.38 samples/sec Loss 1.6069 LearningRate 0.0026 Epoch: 16 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:10,469-Speed 5643.94 samples/sec Loss 1.7059 LearningRate 0.0026 Epoch: 16 Global Step: 95520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:12,291-Speed 5622.03 samples/sec Loss 1.7807 LearningRate 0.0026 Epoch: 16 Global Step: 95530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:14,134-Speed 5558.14 samples/sec Loss 1.6918 LearningRate 0.0026 Epoch: 16 Global Step: 95540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:15,949-Speed 5642.95 samples/sec Loss 1.7778 LearningRate 0.0026 Epoch: 16 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:17,785-Speed 5579.30 samples/sec Loss 1.7191 LearningRate 0.0026 Epoch: 16 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:19,646-Speed 5505.91 samples/sec Loss 1.6781 LearningRate 0.0025 Epoch: 16 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:21,462-Speed 5641.89 samples/sec Loss 1.7217 LearningRate 0.0025 Epoch: 16 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:23,276-Speed 5644.05 samples/sec Loss 1.6039 LearningRate 0.0025 Epoch: 16 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:25,099-Speed 5620.79 samples/sec Loss 1.6606 LearningRate 0.0025 Epoch: 16 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:26,937-Speed 5573.25 samples/sec Loss 1.7246 LearningRate 0.0025 Epoch: 16 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:28:28,772-Speed 5581.17 samples/sec Loss 1.6491 LearningRate 0.0025 Epoch: 16 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:30,596-Speed 5615.62 samples/sec Loss 1.7601 LearningRate 0.0025 Epoch: 16 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:32,418-Speed 5621.66 samples/sec Loss 1.5783 LearningRate 0.0025 Epoch: 16 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:34,227-Speed 5663.98 samples/sec Loss 1.6130 LearningRate 0.0025 Epoch: 16 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:36,043-Speed 5639.00 samples/sec Loss 1.5986 LearningRate 0.0025 Epoch: 16 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:37,867-Speed 5616.98 samples/sec Loss 1.7091 LearningRate 0.0025 Epoch: 16 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:39,674-Speed 5669.56 samples/sec Loss 1.7549 LearningRate 0.0025 Epoch: 16 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:41,485-Speed 5655.29 samples/sec Loss 1.7229 LearningRate 0.0025 Epoch: 16 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:43,295-Speed 5658.56 samples/sec Loss 1.6087 LearningRate 0.0025 Epoch: 16 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:45,109-Speed 5648.88 samples/sec Loss 1.7617 LearningRate 0.0025 Epoch: 16 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:46,939-Speed 5596.02 samples/sec Loss 1.7335 LearningRate 0.0025 Epoch: 16 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:48,755-Speed 5641.16 samples/sec Loss 1.7092 LearningRate 0.0025 Epoch: 16 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:50,576-Speed 5625.72 samples/sec Loss 1.6399 LearningRate 0.0025 Epoch: 16 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:52,385-Speed 5661.67 samples/sec Loss 1.6922 LearningRate 0.0025 Epoch: 16 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:54,189-Speed 5677.31 samples/sec Loss 1.6971 LearningRate 0.0025 Epoch: 16 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:55,997-Speed 5667.63 samples/sec Loss 1.7405 LearningRate 0.0025 Epoch: 16 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:57,819-Speed 5621.88 samples/sec Loss 1.7328 LearningRate 0.0025 Epoch: 16 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:28:59,650-Speed 5594.60 samples/sec Loss 1.7028 LearningRate 0.0025 Epoch: 16 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:01,471-Speed 5624.57 samples/sec Loss 1.5968 LearningRate 0.0025 Epoch: 16 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:03,291-Speed 5628.91 samples/sec Loss 1.7580 LearningRate 0.0025 Epoch: 16 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:05,105-Speed 5647.35 samples/sec Loss 1.6651 LearningRate 0.0025 Epoch: 16 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:06,924-Speed 5630.29 samples/sec Loss 1.7638 LearningRate 0.0025 Epoch: 16 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:08,730-Speed 5672.49 samples/sec Loss 1.6734 LearningRate 0.0025 Epoch: 16 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:10,540-Speed 5660.20 samples/sec Loss 1.8053 LearningRate 0.0025 Epoch: 16 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:12,349-Speed 5660.60 samples/sec Loss 1.6434 LearningRate 0.0025 Epoch: 16 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:14,159-Speed 5659.95 samples/sec Loss 1.6611 LearningRate 0.0025 Epoch: 16 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:15,972-Speed 5649.43 samples/sec Loss 1.8022 LearningRate 0.0025 Epoch: 16 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:17,794-Speed 5622.50 samples/sec Loss 1.6671 LearningRate 0.0025 Epoch: 16 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:19,618-Speed 5615.62 samples/sec Loss 1.7142 LearningRate 0.0025 Epoch: 16 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:21,433-Speed 5645.21 samples/sec Loss 1.7199 LearningRate 0.0025 Epoch: 16 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:23,260-Speed 5605.70 samples/sec Loss 1.6306 LearningRate 0.0025 Epoch: 16 Global Step: 95920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:25,067-Speed 5670.38 samples/sec Loss 1.7600 LearningRate 0.0024 Epoch: 16 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:26,905-Speed 5573.80 samples/sec Loss 1.7312 LearningRate 0.0024 Epoch: 16 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:28,714-Speed 5659.67 samples/sec Loss 1.7305 LearningRate 0.0024 Epoch: 16 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:29:30,521-Speed 5668.95 samples/sec Loss 1.7465 LearningRate 0.0024 Epoch: 16 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:32,333-Speed 5654.24 samples/sec Loss 1.6571 LearningRate 0.0024 Epoch: 16 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:34,143-Speed 5657.28 samples/sec Loss 1.6967 LearningRate 0.0024 Epoch: 16 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:35,955-Speed 5653.86 samples/sec Loss 1.5713 LearningRate 0.0024 Epoch: 16 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:29:37,761-Speed 5673.32 samples/sec Loss 1.5802 LearningRate 0.0024 Epoch: 16 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:30:03,939-[lfw][96000]XNorm: 22.476795 Training: 2022-04-27 07:30:03,940-[lfw][96000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-27 07:30:03,940-[lfw][96000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:30:34,160-[cfp_fp][96000]XNorm: 21.217829 Training: 2022-04-27 07:30:34,161-[cfp_fp][96000]Accuracy-Flip: 0.97614+-0.00642 Training: 2022-04-27 07:30:34,161-[cfp_fp][96000]Accuracy-Highest: 0.97614 Training: 2022-04-27 07:31:00,243-[agedb_30][96000]XNorm: 22.530527 Training: 2022-04-27 07:31:00,244-[agedb_30][96000]Accuracy-Flip: 0.98050+-0.00799 Training: 2022-04-27 07:31:00,244-[agedb_30][96000]Accuracy-Highest: 0.98167 Training: 2022-04-27 07:31:02,083-Speed 121.44 samples/sec Loss 1.7249 LearningRate 0.0024 Epoch: 16 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:03,894-Speed 5657.16 samples/sec Loss 1.6462 LearningRate 0.0024 Epoch: 16 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:05,720-Speed 5608.13 samples/sec Loss 1.6859 LearningRate 0.0024 Epoch: 16 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:07,562-Speed 5562.89 samples/sec Loss 1.6874 LearningRate 0.0024 Epoch: 16 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:09,368-Speed 5672.06 samples/sec Loss 1.6453 LearningRate 0.0024 Epoch: 16 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:11,181-Speed 5648.66 samples/sec Loss 1.6614 LearningRate 0.0024 Epoch: 16 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:12,996-Speed 5643.09 samples/sec Loss 1.7513 LearningRate 0.0024 Epoch: 16 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:14,824-Speed 5606.14 samples/sec Loss 1.6619 LearningRate 0.0024 Epoch: 16 Global Step: 96080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:16,631-Speed 5667.22 samples/sec Loss 1.7497 LearningRate 0.0024 Epoch: 16 Global Step: 96090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:18,444-Speed 5649.65 samples/sec Loss 1.5791 LearningRate 0.0024 Epoch: 16 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:20,263-Speed 5631.49 samples/sec Loss 1.6241 LearningRate 0.0024 Epoch: 16 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:22,065-Speed 5686.17 samples/sec Loss 1.7423 LearningRate 0.0024 Epoch: 16 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:23,904-Speed 5569.84 samples/sec Loss 1.7259 LearningRate 0.0024 Epoch: 16 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:25,712-Speed 5662.48 samples/sec Loss 1.6485 LearningRate 0.0024 Epoch: 16 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:27,582-Speed 5479.14 samples/sec Loss 1.6065 LearningRate 0.0024 Epoch: 16 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:29,407-Speed 5612.65 samples/sec Loss 1.7213 LearningRate 0.0024 Epoch: 16 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:31,233-Speed 5611.36 samples/sec Loss 1.6510 LearningRate 0.0024 Epoch: 16 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:33,041-Speed 5666.60 samples/sec Loss 1.6226 LearningRate 0.0024 Epoch: 16 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:34,857-Speed 5640.52 samples/sec Loss 1.6994 LearningRate 0.0024 Epoch: 16 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:36,664-Speed 5667.59 samples/sec Loss 1.7110 LearningRate 0.0024 Epoch: 16 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:38,472-Speed 5667.92 samples/sec Loss 1.6883 LearningRate 0.0024 Epoch: 16 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:40,284-Speed 5652.75 samples/sec Loss 1.6940 LearningRate 0.0024 Epoch: 16 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:42,108-Speed 5613.68 samples/sec Loss 1.6559 LearningRate 0.0024 Epoch: 16 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:44,019-Speed 5360.58 samples/sec Loss 1.6693 LearningRate 0.0024 Epoch: 16 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:45,892-Speed 5469.79 samples/sec Loss 1.6954 LearningRate 0.0024 Epoch: 16 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:47,699-Speed 5667.81 samples/sec Loss 1.6699 LearningRate 0.0024 Epoch: 16 Global Step: 96260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:31:49,519-Speed 5628.15 samples/sec Loss 1.7031 LearningRate 0.0024 Epoch: 16 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:51,332-Speed 5650.58 samples/sec Loss 1.5457 LearningRate 0.0024 Epoch: 16 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:53,143-Speed 5657.72 samples/sec Loss 1.6649 LearningRate 0.0023 Epoch: 16 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:54,964-Speed 5625.08 samples/sec Loss 1.7667 LearningRate 0.0023 Epoch: 16 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:56,782-Speed 5634.05 samples/sec Loss 1.6235 LearningRate 0.0023 Epoch: 16 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:31:58,594-Speed 5652.59 samples/sec Loss 1.7142 LearningRate 0.0023 Epoch: 16 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:00,414-Speed 5627.88 samples/sec Loss 1.7010 LearningRate 0.0023 Epoch: 16 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:02,228-Speed 5648.59 samples/sec Loss 1.7594 LearningRate 0.0023 Epoch: 16 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:04,050-Speed 5621.15 samples/sec Loss 1.7276 LearningRate 0.0023 Epoch: 16 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:05,886-Speed 5580.04 samples/sec Loss 1.7376 LearningRate 0.0023 Epoch: 16 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:07,692-Speed 5670.05 samples/sec Loss 1.7051 LearningRate 0.0023 Epoch: 16 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:09,509-Speed 5639.41 samples/sec Loss 1.7042 LearningRate 0.0023 Epoch: 16 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:11,327-Speed 5633.58 samples/sec Loss 1.7085 LearningRate 0.0023 Epoch: 16 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:13,140-Speed 5649.85 samples/sec Loss 1.6737 LearningRate 0.0023 Epoch: 16 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:14,949-Speed 5666.19 samples/sec Loss 1.6595 LearningRate 0.0023 Epoch: 16 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:16,774-Speed 5611.25 samples/sec Loss 1.7193 LearningRate 0.0023 Epoch: 16 Global Step: 96420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:18,608-Speed 5584.13 samples/sec Loss 1.5853 LearningRate 0.0023 Epoch: 16 Global Step: 96430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:20,413-Speed 5677.38 samples/sec Loss 1.6699 LearningRate 0.0023 Epoch: 16 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:22,235-Speed 5619.55 samples/sec Loss 1.6857 LearningRate 0.0023 Epoch: 16 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:24,060-Speed 5614.31 samples/sec Loss 1.7666 LearningRate 0.0023 Epoch: 16 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:25,905-Speed 5551.63 samples/sec Loss 1.6387 LearningRate 0.0023 Epoch: 16 Global Step: 96470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:27,715-Speed 5659.23 samples/sec Loss 1.6910 LearningRate 0.0023 Epoch: 16 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:29,534-Speed 5630.85 samples/sec Loss 1.6210 LearningRate 0.0023 Epoch: 16 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:32:31,373-Speed 5569.31 samples/sec Loss 1.7376 LearningRate 0.0023 Epoch: 16 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:33,196-Speed 5622.04 samples/sec Loss 1.6775 LearningRate 0.0023 Epoch: 16 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:35,027-Speed 5593.09 samples/sec Loss 1.6996 LearningRate 0.0023 Epoch: 16 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:36,847-Speed 5629.65 samples/sec Loss 1.7240 LearningRate 0.0023 Epoch: 16 Global Step: 96530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:38,680-Speed 5586.58 samples/sec Loss 1.6999 LearningRate 0.0023 Epoch: 16 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:40,505-Speed 5614.63 samples/sec Loss 1.6617 LearningRate 0.0023 Epoch: 16 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:42,333-Speed 5604.11 samples/sec Loss 1.6656 LearningRate 0.0023 Epoch: 16 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:44,161-Speed 5601.74 samples/sec Loss 1.6267 LearningRate 0.0023 Epoch: 16 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:45,977-Speed 5640.27 samples/sec Loss 1.7105 LearningRate 0.0023 Epoch: 16 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:47,794-Speed 5637.46 samples/sec Loss 1.6743 LearningRate 0.0023 Epoch: 16 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:49,680-Speed 5431.55 samples/sec Loss 1.7301 LearningRate 0.0023 Epoch: 16 Global Step: 96600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:51,626-Speed 5264.98 samples/sec Loss 1.6317 LearningRate 0.0023 Epoch: 16 Global Step: 96610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:53,460-Speed 5584.14 samples/sec Loss 1.6320 LearningRate 0.0023 Epoch: 16 Global Step: 96620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:55,293-Speed 5590.83 samples/sec Loss 1.6568 LearningRate 0.0023 Epoch: 16 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:57,107-Speed 5645.08 samples/sec Loss 1.6649 LearningRate 0.0023 Epoch: 16 Global Step: 96640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:32:58,933-Speed 5611.62 samples/sec Loss 1.6638 LearningRate 0.0023 Epoch: 16 Global Step: 96650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:00,804-Speed 5474.30 samples/sec Loss 1.6025 LearningRate 0.0023 Epoch: 16 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:12,919-Speed 845.31 samples/sec Loss 1.3002 LearningRate 0.0022 Epoch: 17 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:14,775-Speed 5521.62 samples/sec Loss 1.3107 LearningRate 0.0022 Epoch: 17 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:16,788-Speed 5089.37 samples/sec Loss 1.1992 LearningRate 0.0022 Epoch: 17 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:18,632-Speed 5554.67 samples/sec Loss 1.2697 LearningRate 0.0022 Epoch: 17 Global Step: 96700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:33:20,578-Speed 5263.58 samples/sec Loss 1.2583 LearningRate 0.0022 Epoch: 17 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:22,422-Speed 5554.09 samples/sec Loss 1.2437 LearningRate 0.0022 Epoch: 17 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:24,259-Speed 5574.95 samples/sec Loss 1.1716 LearningRate 0.0022 Epoch: 17 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:26,096-Speed 5578.35 samples/sec Loss 1.2721 LearningRate 0.0022 Epoch: 17 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:27,928-Speed 5590.11 samples/sec Loss 1.1610 LearningRate 0.0022 Epoch: 17 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:29,756-Speed 5606.40 samples/sec Loss 1.2472 LearningRate 0.0022 Epoch: 17 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:31,590-Speed 5582.75 samples/sec Loss 1.2153 LearningRate 0.0022 Epoch: 17 Global Step: 96770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:33,441-Speed 5534.39 samples/sec Loss 1.3194 LearningRate 0.0022 Epoch: 17 Global Step: 96780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:35,309-Speed 5483.48 samples/sec Loss 1.2484 LearningRate 0.0022 Epoch: 17 Global Step: 96790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:37,143-Speed 5588.14 samples/sec Loss 1.1824 LearningRate 0.0022 Epoch: 17 Global Step: 96800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:38,995-Speed 5531.06 samples/sec Loss 1.2667 LearningRate 0.0022 Epoch: 17 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:40,813-Speed 5632.91 samples/sec Loss 1.1601 LearningRate 0.0022 Epoch: 17 Global Step: 96820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:42,636-Speed 5617.15 samples/sec Loss 1.2921 LearningRate 0.0022 Epoch: 17 Global Step: 96830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:44,462-Speed 5610.59 samples/sec Loss 1.3051 LearningRate 0.0022 Epoch: 17 Global Step: 96840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:46,291-Speed 5601.24 samples/sec Loss 1.2392 LearningRate 0.0022 Epoch: 17 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:48,140-Speed 5540.49 samples/sec Loss 1.2897 LearningRate 0.0022 Epoch: 17 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:49,969-Speed 5599.53 samples/sec Loss 1.2265 LearningRate 0.0022 Epoch: 17 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:33:51,779-Speed 5661.39 samples/sec Loss 1.2452 LearningRate 0.0022 Epoch: 17 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:33:53,601-Speed 5621.25 samples/sec Loss 1.1845 LearningRate 0.0022 Epoch: 17 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:33:55,426-Speed 5614.43 samples/sec Loss 1.2852 LearningRate 0.0022 Epoch: 17 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:33:57,262-Speed 5576.37 samples/sec Loss 1.3398 LearningRate 0.0022 Epoch: 17 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:33:59,099-Speed 5577.20 samples/sec Loss 1.2652 LearningRate 0.0022 Epoch: 17 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:00,930-Speed 5594.71 samples/sec Loss 1.2483 LearningRate 0.0022 Epoch: 17 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:02,757-Speed 5605.41 samples/sec Loss 1.1881 LearningRate 0.0022 Epoch: 17 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:04,592-Speed 5582.48 samples/sec Loss 1.3068 LearningRate 0.0022 Epoch: 17 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:06,431-Speed 5570.68 samples/sec Loss 1.2932 LearningRate 0.0022 Epoch: 17 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:08,259-Speed 5606.45 samples/sec Loss 1.2477 LearningRate 0.0022 Epoch: 17 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:10,104-Speed 5549.43 samples/sec Loss 1.3392 LearningRate 0.0022 Epoch: 17 Global Step: 96980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:11,931-Speed 5606.40 samples/sec Loss 1.1695 LearningRate 0.0022 Epoch: 17 Global Step: 96990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:13,777-Speed 5551.10 samples/sec Loss 1.3732 LearningRate 0.0022 Epoch: 17 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:15,639-Speed 5499.36 samples/sec Loss 1.2661 LearningRate 0.0022 Epoch: 17 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:17,476-Speed 5576.23 samples/sec Loss 1.2625 LearningRate 0.0022 Epoch: 17 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:19,298-Speed 5621.74 samples/sec Loss 1.3229 LearningRate 0.0022 Epoch: 17 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:21,149-Speed 5535.06 samples/sec Loss 1.2562 LearningRate 0.0022 Epoch: 17 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:22,990-Speed 5565.43 samples/sec Loss 1.2702 LearningRate 0.0021 Epoch: 17 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:24,817-Speed 5606.26 samples/sec Loss 1.3009 LearningRate 0.0021 Epoch: 17 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:26,715-Speed 5396.59 samples/sec Loss 1.2674 LearningRate 0.0021 Epoch: 17 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:28,533-Speed 5635.49 samples/sec Loss 1.2800 LearningRate 0.0021 Epoch: 17 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:30,357-Speed 5616.47 samples/sec Loss 1.2677 LearningRate 0.0021 Epoch: 17 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:32,195-Speed 5574.61 samples/sec Loss 1.2243 LearningRate 0.0021 Epoch: 17 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:34,012-Speed 5635.12 samples/sec Loss 1.2002 LearningRate 0.0021 Epoch: 17 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:35,835-Speed 5618.69 samples/sec Loss 1.2346 LearningRate 0.0021 Epoch: 17 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:37,684-Speed 5541.18 samples/sec Loss 1.3136 LearningRate 0.0021 Epoch: 17 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:39,552-Speed 5481.75 samples/sec Loss 1.3275 LearningRate 0.0021 Epoch: 17 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:34:41,452-Speed 5392.83 samples/sec Loss 1.2408 LearningRate 0.0021 Epoch: 17 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:43,294-Speed 5560.79 samples/sec Loss 1.3131 LearningRate 0.0021 Epoch: 17 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:45,149-Speed 5520.68 samples/sec Loss 1.2782 LearningRate 0.0021 Epoch: 17 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:46,995-Speed 5548.78 samples/sec Loss 1.3043 LearningRate 0.0021 Epoch: 17 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:48,841-Speed 5551.77 samples/sec Loss 1.3228 LearningRate 0.0021 Epoch: 17 Global Step: 97190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:50,719-Speed 5455.51 samples/sec Loss 1.2443 LearningRate 0.0021 Epoch: 17 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:52,607-Speed 5424.48 samples/sec Loss 1.1875 LearningRate 0.0021 Epoch: 17 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:54,448-Speed 5563.76 samples/sec Loss 1.2715 LearningRate 0.0021 Epoch: 17 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:56,309-Speed 5503.75 samples/sec Loss 1.3186 LearningRate 0.0021 Epoch: 17 Global Step: 97230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:58,133-Speed 5617.10 samples/sec Loss 1.2190 LearningRate 0.0021 Epoch: 17 Global Step: 97240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:34:59,985-Speed 5530.10 samples/sec Loss 1.3437 LearningRate 0.0021 Epoch: 17 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:01,840-Speed 5522.11 samples/sec Loss 1.2527 LearningRate 0.0021 Epoch: 17 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:03,668-Speed 5603.61 samples/sec Loss 1.2762 LearningRate 0.0021 Epoch: 17 Global Step: 97270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:05,515-Speed 5546.03 samples/sec Loss 1.2859 LearningRate 0.0021 Epoch: 17 Global Step: 97280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:07,356-Speed 5565.84 samples/sec Loss 1.3372 LearningRate 0.0021 Epoch: 17 Global Step: 97290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:09,197-Speed 5561.72 samples/sec Loss 1.2303 LearningRate 0.0021 Epoch: 17 Global Step: 97300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:11,076-Speed 5453.96 samples/sec Loss 1.2529 LearningRate 0.0021 Epoch: 17 Global Step: 97310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:12,913-Speed 5574.16 samples/sec Loss 1.2128 LearningRate 0.0021 Epoch: 17 Global Step: 97320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:14,749-Speed 5580.86 samples/sec Loss 1.2748 LearningRate 0.0021 Epoch: 17 Global Step: 97330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:16,585-Speed 5579.85 samples/sec Loss 1.3097 LearningRate 0.0021 Epoch: 17 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:18,415-Speed 5594.80 samples/sec Loss 1.1626 LearningRate 0.0021 Epoch: 17 Global Step: 97350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:20,265-Speed 5538.19 samples/sec Loss 1.3627 LearningRate 0.0021 Epoch: 17 Global Step: 97360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:35:22,105-Speed 5566.33 samples/sec Loss 1.2750 LearningRate 0.0021 Epoch: 17 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:23,959-Speed 5525.03 samples/sec Loss 1.3242 LearningRate 0.0021 Epoch: 17 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:25,807-Speed 5544.35 samples/sec Loss 1.2799 LearningRate 0.0021 Epoch: 17 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:27,630-Speed 5617.52 samples/sec Loss 1.3131 LearningRate 0.0021 Epoch: 17 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:29,470-Speed 5568.22 samples/sec Loss 1.2840 LearningRate 0.0021 Epoch: 17 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:31,296-Speed 5608.21 samples/sec Loss 1.2326 LearningRate 0.0021 Epoch: 17 Global Step: 97420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:33,126-Speed 5599.44 samples/sec Loss 1.2945 LearningRate 0.0021 Epoch: 17 Global Step: 97430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:34,956-Speed 5598.30 samples/sec Loss 1.3289 LearningRate 0.0020 Epoch: 17 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:36,793-Speed 5574.62 samples/sec Loss 1.3437 LearningRate 0.0020 Epoch: 17 Global Step: 97450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:38,666-Speed 5471.01 samples/sec Loss 1.2491 LearningRate 0.0020 Epoch: 17 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:40,570-Speed 5378.65 samples/sec Loss 1.2939 LearningRate 0.0020 Epoch: 17 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:42,417-Speed 5546.34 samples/sec Loss 1.3328 LearningRate 0.0020 Epoch: 17 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:44,255-Speed 5571.36 samples/sec Loss 1.2946 LearningRate 0.0020 Epoch: 17 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:46,079-Speed 5617.54 samples/sec Loss 1.2653 LearningRate 0.0020 Epoch: 17 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:47,900-Speed 5624.06 samples/sec Loss 1.2428 LearningRate 0.0020 Epoch: 17 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:49,740-Speed 5567.92 samples/sec Loss 1.2196 LearningRate 0.0020 Epoch: 17 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:51,580-Speed 5567.78 samples/sec Loss 1.2269 LearningRate 0.0020 Epoch: 17 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:53,416-Speed 5576.96 samples/sec Loss 1.2854 LearningRate 0.0020 Epoch: 17 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:55,294-Speed 5456.50 samples/sec Loss 1.2571 LearningRate 0.0020 Epoch: 17 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:57,137-Speed 5558.00 samples/sec Loss 1.3102 LearningRate 0.0020 Epoch: 17 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:35:58,965-Speed 5606.26 samples/sec Loss 1.2908 LearningRate 0.0020 Epoch: 17 Global Step: 97570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:36:00,789-Speed 5614.18 samples/sec Loss 1.3055 LearningRate 0.0020 Epoch: 17 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:02,638-Speed 5541.46 samples/sec Loss 1.2589 LearningRate 0.0020 Epoch: 17 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:04,484-Speed 5548.97 samples/sec Loss 1.2938 LearningRate 0.0020 Epoch: 17 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:06,341-Speed 5515.17 samples/sec Loss 1.2903 LearningRate 0.0020 Epoch: 17 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:08,219-Speed 5454.24 samples/sec Loss 1.1949 LearningRate 0.0020 Epoch: 17 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:10,127-Speed 5368.63 samples/sec Loss 1.2806 LearningRate 0.0020 Epoch: 17 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:11,961-Speed 5585.88 samples/sec Loss 1.3294 LearningRate 0.0020 Epoch: 17 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:13,829-Speed 5483.83 samples/sec Loss 1.3213 LearningRate 0.0020 Epoch: 17 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:15,744-Speed 5351.04 samples/sec Loss 1.2484 LearningRate 0.0020 Epoch: 17 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:17,609-Speed 5490.81 samples/sec Loss 1.2291 LearningRate 0.0020 Epoch: 17 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:19,425-Speed 5640.35 samples/sec Loss 1.3257 LearningRate 0.0020 Epoch: 17 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:21,287-Speed 5502.34 samples/sec Loss 1.3012 LearningRate 0.0020 Epoch: 17 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:23,115-Speed 5602.11 samples/sec Loss 1.2342 LearningRate 0.0020 Epoch: 17 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:24,949-Speed 5585.30 samples/sec Loss 1.3076 LearningRate 0.0020 Epoch: 17 Global Step: 97710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:26,790-Speed 5564.85 samples/sec Loss 1.2492 LearningRate 0.0020 Epoch: 17 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:28,618-Speed 5603.44 samples/sec Loss 1.2582 LearningRate 0.0020 Epoch: 17 Global Step: 97730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:30,463-Speed 5550.69 samples/sec Loss 1.1899 LearningRate 0.0020 Epoch: 17 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:32,328-Speed 5494.80 samples/sec Loss 1.3297 LearningRate 0.0020 Epoch: 17 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:34,188-Speed 5505.25 samples/sec Loss 1.2309 LearningRate 0.0020 Epoch: 17 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:36,013-Speed 5614.33 samples/sec Loss 1.3133 LearningRate 0.0020 Epoch: 17 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:37,833-Speed 5627.78 samples/sec Loss 1.3590 LearningRate 0.0020 Epoch: 17 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:39,664-Speed 5593.61 samples/sec Loss 1.3091 LearningRate 0.0020 Epoch: 17 Global Step: 97790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:41,518-Speed 5527.66 samples/sec Loss 1.2165 LearningRate 0.0020 Epoch: 17 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:43,352-Speed 5583.65 samples/sec Loss 1.2979 LearningRate 0.0020 Epoch: 17 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:45,222-Speed 5478.76 samples/sec Loss 1.3133 LearningRate 0.0020 Epoch: 17 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:47,042-Speed 5626.33 samples/sec Loss 1.2149 LearningRate 0.0020 Epoch: 17 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:48,862-Speed 5628.42 samples/sec Loss 1.2668 LearningRate 0.0019 Epoch: 17 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:36:50,673-Speed 5657.23 samples/sec Loss 1.3308 LearningRate 0.0019 Epoch: 17 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:36:52,498-Speed 5612.80 samples/sec Loss 1.2635 LearningRate 0.0019 Epoch: 17 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:36:54,408-Speed 5362.49 samples/sec Loss 1.2944 LearningRate 0.0019 Epoch: 17 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:36:56,241-Speed 5589.43 samples/sec Loss 1.3054 LearningRate 0.0019 Epoch: 17 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:36:58,064-Speed 5616.68 samples/sec Loss 1.2533 LearningRate 0.0019 Epoch: 17 Global Step: 97890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:36:59,921-Speed 5516.87 samples/sec Loss 1.3249 LearningRate 0.0019 Epoch: 17 Global Step: 97900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:37:01,762-Speed 5564.24 samples/sec Loss 1.2586 LearningRate 0.0019 Epoch: 17 Global Step: 97910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:37:03,602-Speed 5567.73 samples/sec Loss 1.2930 LearningRate 0.0019 Epoch: 17 Global Step: 97920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:37:05,421-Speed 5632.28 samples/sec Loss 1.2627 LearningRate 0.0019 Epoch: 17 Global Step: 97930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:37:07,248-Speed 5604.84 samples/sec Loss 1.2996 LearningRate 0.0019 Epoch: 17 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:37:09,068-Speed 5630.31 samples/sec Loss 1.2713 LearningRate 0.0019 Epoch: 17 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:10,899-Speed 5594.81 samples/sec Loss 1.2432 LearningRate 0.0019 Epoch: 17 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:12,754-Speed 5522.70 samples/sec Loss 1.3261 LearningRate 0.0019 Epoch: 17 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:14,630-Speed 5457.41 samples/sec Loss 1.2472 LearningRate 0.0019 Epoch: 17 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:16,497-Speed 5487.22 samples/sec Loss 1.3022 LearningRate 0.0019 Epoch: 17 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:18,326-Speed 5601.60 samples/sec Loss 1.2835 LearningRate 0.0019 Epoch: 17 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:37:44,383-[lfw][98000]XNorm: 22.298069 Training: 2022-04-27 07:37:44,383-[lfw][98000]Accuracy-Flip: 0.99800+-0.00221 Training: 2022-04-27 07:37:44,384-[lfw][98000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:38:14,592-[cfp_fp][98000]XNorm: 21.089435 Training: 2022-04-27 07:38:14,592-[cfp_fp][98000]Accuracy-Flip: 0.97929+-0.00674 Training: 2022-04-27 07:38:14,593-[cfp_fp][98000]Accuracy-Highest: 0.97929 Training: 2022-04-27 07:38:40,682-[agedb_30][98000]XNorm: 22.387062 Training: 2022-04-27 07:38:40,682-[agedb_30][98000]Accuracy-Flip: 0.98183+-0.00529 Training: 2022-04-27 07:38:40,683-[agedb_30][98000]Accuracy-Highest: 0.98183 Training: 2022-04-27 07:38:42,530-Speed 121.61 samples/sec Loss 1.2492 LearningRate 0.0019 Epoch: 17 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:44,361-Speed 5593.36 samples/sec Loss 1.3550 LearningRate 0.0019 Epoch: 17 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:46,188-Speed 5607.85 samples/sec Loss 1.2428 LearningRate 0.0019 Epoch: 17 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:48,022-Speed 5584.24 samples/sec Loss 1.2606 LearningRate 0.0019 Epoch: 17 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:49,838-Speed 5642.88 samples/sec Loss 1.3021 LearningRate 0.0019 Epoch: 17 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:51,686-Speed 5541.89 samples/sec Loss 1.3133 LearningRate 0.0019 Epoch: 17 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:53,518-Speed 5591.88 samples/sec Loss 1.3800 LearningRate 0.0019 Epoch: 17 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:55,364-Speed 5548.19 samples/sec Loss 1.2590 LearningRate 0.0019 Epoch: 17 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:57,183-Speed 5630.11 samples/sec Loss 1.3365 LearningRate 0.0019 Epoch: 17 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:38:59,014-Speed 5596.00 samples/sec Loss 1.2442 LearningRate 0.0019 Epoch: 17 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:00,839-Speed 5614.91 samples/sec Loss 1.3150 LearningRate 0.0019 Epoch: 17 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:02,679-Speed 5564.53 samples/sec Loss 1.2674 LearningRate 0.0019 Epoch: 17 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:04,504-Speed 5613.00 samples/sec Loss 1.3033 LearningRate 0.0019 Epoch: 17 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:06,334-Speed 5597.29 samples/sec Loss 1.2358 LearningRate 0.0019 Epoch: 17 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:08,169-Speed 5583.82 samples/sec Loss 1.2367 LearningRate 0.0019 Epoch: 17 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:10,017-Speed 5543.60 samples/sec Loss 1.3444 LearningRate 0.0019 Epoch: 17 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:11,837-Speed 5625.84 samples/sec Loss 1.3116 LearningRate 0.0019 Epoch: 17 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:13,667-Speed 5596.95 samples/sec Loss 1.2066 LearningRate 0.0019 Epoch: 17 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:15,497-Speed 5597.48 samples/sec Loss 1.2792 LearningRate 0.0019 Epoch: 17 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:17,356-Speed 5510.26 samples/sec Loss 1.3050 LearningRate 0.0019 Epoch: 17 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:19,237-Speed 5448.02 samples/sec Loss 1.2645 LearningRate 0.0019 Epoch: 17 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:21,079-Speed 5562.07 samples/sec Loss 1.3254 LearningRate 0.0019 Epoch: 17 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:39:22,911-Speed 5590.28 samples/sec Loss 1.2327 LearningRate 0.0019 Epoch: 17 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:24,749-Speed 5574.18 samples/sec Loss 1.3114 LearningRate 0.0019 Epoch: 17 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:26,565-Speed 5640.19 samples/sec Loss 1.3041 LearningRate 0.0019 Epoch: 17 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:28,406-Speed 5563.91 samples/sec Loss 1.3206 LearningRate 0.0018 Epoch: 17 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:30,268-Speed 5502.34 samples/sec Loss 1.2936 LearningRate 0.0018 Epoch: 17 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:32,091-Speed 5618.14 samples/sec Loss 1.2433 LearningRate 0.0018 Epoch: 17 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:33,932-Speed 5562.69 samples/sec Loss 1.3531 LearningRate 0.0018 Epoch: 17 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:35,781-Speed 5540.12 samples/sec Loss 1.2284 LearningRate 0.0018 Epoch: 17 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:37,596-Speed 5643.84 samples/sec Loss 1.2622 LearningRate 0.0018 Epoch: 17 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:39,436-Speed 5566.55 samples/sec Loss 1.3174 LearningRate 0.0018 Epoch: 17 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:41,289-Speed 5529.76 samples/sec Loss 1.3257 LearningRate 0.0018 Epoch: 17 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:43,115-Speed 5608.29 samples/sec Loss 1.3406 LearningRate 0.0018 Epoch: 17 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:44,947-Speed 5592.25 samples/sec Loss 1.2976 LearningRate 0.0018 Epoch: 17 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:46,804-Speed 5516.19 samples/sec Loss 1.3832 LearningRate 0.0018 Epoch: 17 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:48,655-Speed 5533.15 samples/sec Loss 1.3060 LearningRate 0.0018 Epoch: 17 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:50,476-Speed 5627.11 samples/sec Loss 1.2981 LearningRate 0.0018 Epoch: 17 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:52,317-Speed 5563.73 samples/sec Loss 1.2955 LearningRate 0.0018 Epoch: 17 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:54,145-Speed 5601.93 samples/sec Loss 1.2935 LearningRate 0.0018 Epoch: 17 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:55,966-Speed 5626.07 samples/sec Loss 1.3753 LearningRate 0.0018 Epoch: 17 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:57,793-Speed 5605.20 samples/sec Loss 1.3170 LearningRate 0.0018 Epoch: 17 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:39:59,613-Speed 5630.02 samples/sec Loss 1.1996 LearningRate 0.0018 Epoch: 17 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:01,452-Speed 5570.65 samples/sec Loss 1.3549 LearningRate 0.0018 Epoch: 17 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:03,276-Speed 5616.85 samples/sec Loss 1.1759 LearningRate 0.0018 Epoch: 17 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:05,103-Speed 5607.45 samples/sec Loss 1.3662 LearningRate 0.0018 Epoch: 17 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:06,926-Speed 5617.54 samples/sec Loss 1.2885 LearningRate 0.0018 Epoch: 17 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:08,749-Speed 5621.41 samples/sec Loss 1.3065 LearningRate 0.0018 Epoch: 17 Global Step: 98480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:10,590-Speed 5562.18 samples/sec Loss 1.2946 LearningRate 0.0018 Epoch: 17 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:12,408-Speed 5635.01 samples/sec Loss 1.3078 LearningRate 0.0018 Epoch: 17 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:14,241-Speed 5588.74 samples/sec Loss 1.3384 LearningRate 0.0018 Epoch: 17 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:16,060-Speed 5630.75 samples/sec Loss 1.3110 LearningRate 0.0018 Epoch: 17 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:17,887-Speed 5605.41 samples/sec Loss 1.3242 LearningRate 0.0018 Epoch: 17 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:19,716-Speed 5601.37 samples/sec Loss 1.2069 LearningRate 0.0018 Epoch: 17 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:21,530-Speed 5649.70 samples/sec Loss 1.2603 LearningRate 0.0018 Epoch: 17 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:23,363-Speed 5589.04 samples/sec Loss 1.3171 LearningRate 0.0018 Epoch: 17 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:25,187-Speed 5615.79 samples/sec Loss 1.4091 LearningRate 0.0018 Epoch: 17 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:27,023-Speed 5579.69 samples/sec Loss 1.2954 LearningRate 0.0018 Epoch: 17 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:28,871-Speed 5543.33 samples/sec Loss 1.3214 LearningRate 0.0018 Epoch: 17 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:30,691-Speed 5628.09 samples/sec Loss 1.3229 LearningRate 0.0018 Epoch: 17 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:32,507-Speed 5639.57 samples/sec Loss 1.3132 LearningRate 0.0018 Epoch: 17 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:34,324-Speed 5637.70 samples/sec Loss 1.3014 LearningRate 0.0018 Epoch: 17 Global Step: 98620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:36,144-Speed 5629.49 samples/sec Loss 1.3285 LearningRate 0.0018 Epoch: 17 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:37,964-Speed 5626.70 samples/sec Loss 1.1698 LearningRate 0.0018 Epoch: 17 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:39,799-Speed 5584.84 samples/sec Loss 1.2358 LearningRate 0.0018 Epoch: 17 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:41,638-Speed 5567.37 samples/sec Loss 1.3179 LearningRate 0.0018 Epoch: 17 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:43,504-Speed 5490.11 samples/sec Loss 1.2742 LearningRate 0.0018 Epoch: 17 Global Step: 98670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:45,350-Speed 5551.39 samples/sec Loss 1.3358 LearningRate 0.0017 Epoch: 17 Global Step: 98680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:47,196-Speed 5548.14 samples/sec Loss 1.3580 LearningRate 0.0017 Epoch: 17 Global Step: 98690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:49,045-Speed 5541.58 samples/sec Loss 1.3151 LearningRate 0.0017 Epoch: 17 Global Step: 98700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:40:50,855-Speed 5659.95 samples/sec Loss 1.3889 LearningRate 0.0017 Epoch: 17 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:52,749-Speed 5406.73 samples/sec Loss 1.2734 LearningRate 0.0017 Epoch: 17 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:54,603-Speed 5526.38 samples/sec Loss 1.2922 LearningRate 0.0017 Epoch: 17 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:56,463-Speed 5508.51 samples/sec Loss 1.3051 LearningRate 0.0017 Epoch: 17 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:40:58,287-Speed 5614.48 samples/sec Loss 1.2489 LearningRate 0.0017 Epoch: 17 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:00,147-Speed 5506.73 samples/sec Loss 1.3409 LearningRate 0.0017 Epoch: 17 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:01,977-Speed 5596.60 samples/sec Loss 1.3676 LearningRate 0.0017 Epoch: 17 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:03,815-Speed 5575.21 samples/sec Loss 1.2685 LearningRate 0.0017 Epoch: 17 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:05,662-Speed 5546.90 samples/sec Loss 1.3131 LearningRate 0.0017 Epoch: 17 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:07,492-Speed 5597.53 samples/sec Loss 1.2961 LearningRate 0.0017 Epoch: 17 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:09,323-Speed 5593.60 samples/sec Loss 1.4315 LearningRate 0.0017 Epoch: 17 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:11,160-Speed 5577.63 samples/sec Loss 1.3493 LearningRate 0.0017 Epoch: 17 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:13,042-Speed 5440.76 samples/sec Loss 1.3737 LearningRate 0.0017 Epoch: 17 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:14,900-Speed 5513.28 samples/sec Loss 1.3014 LearningRate 0.0017 Epoch: 17 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:16,726-Speed 5611.27 samples/sec Loss 1.3413 LearningRate 0.0017 Epoch: 17 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:18,544-Speed 5632.86 samples/sec Loss 1.2841 LearningRate 0.0017 Epoch: 17 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:20,362-Speed 5636.28 samples/sec Loss 1.3820 LearningRate 0.0017 Epoch: 17 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:22,196-Speed 5583.64 samples/sec Loss 1.2383 LearningRate 0.0017 Epoch: 17 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:24,032-Speed 5580.45 samples/sec Loss 1.2663 LearningRate 0.0017 Epoch: 17 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:25,853-Speed 5623.77 samples/sec Loss 1.3051 LearningRate 0.0017 Epoch: 17 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:27,697-Speed 5556.20 samples/sec Loss 1.2837 LearningRate 0.0017 Epoch: 17 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:29,546-Speed 5538.87 samples/sec Loss 1.3631 LearningRate 0.0017 Epoch: 17 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:31,387-Speed 5564.56 samples/sec Loss 1.3501 LearningRate 0.0017 Epoch: 17 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:33,228-Speed 5564.75 samples/sec Loss 1.2666 LearningRate 0.0017 Epoch: 17 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:35,057-Speed 5600.35 samples/sec Loss 1.2433 LearningRate 0.0017 Epoch: 17 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:36,865-Speed 5667.15 samples/sec Loss 1.3612 LearningRate 0.0017 Epoch: 17 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:38,688-Speed 5618.92 samples/sec Loss 1.3319 LearningRate 0.0017 Epoch: 17 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:40,506-Speed 5631.78 samples/sec Loss 1.2259 LearningRate 0.0017 Epoch: 17 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:42,336-Speed 5599.71 samples/sec Loss 1.3287 LearningRate 0.0017 Epoch: 17 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:44,156-Speed 5628.57 samples/sec Loss 1.2853 LearningRate 0.0017 Epoch: 17 Global Step: 99000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:45,991-Speed 5579.36 samples/sec Loss 1.3143 LearningRate 0.0017 Epoch: 17 Global Step: 99010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:47,828-Speed 5577.05 samples/sec Loss 1.2461 LearningRate 0.0017 Epoch: 17 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:49,644-Speed 5640.50 samples/sec Loss 1.3418 LearningRate 0.0017 Epoch: 17 Global Step: 99030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:51,467-Speed 5619.83 samples/sec Loss 1.2857 LearningRate 0.0017 Epoch: 17 Global Step: 99040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:53,287-Speed 5630.15 samples/sec Loss 1.3209 LearningRate 0.0017 Epoch: 17 Global Step: 99050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:41:55,104-Speed 5637.15 samples/sec Loss 1.2910 LearningRate 0.0017 Epoch: 17 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:56,919-Speed 5645.96 samples/sec Loss 1.3311 LearningRate 0.0017 Epoch: 17 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:41:58,818-Speed 5394.16 samples/sec Loss 1.2527 LearningRate 0.0017 Epoch: 17 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:00,641-Speed 5618.96 samples/sec Loss 1.2427 LearningRate 0.0017 Epoch: 17 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:02,478-Speed 5574.24 samples/sec Loss 1.2682 LearningRate 0.0017 Epoch: 17 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:04,326-Speed 5544.28 samples/sec Loss 1.2853 LearningRate 0.0017 Epoch: 17 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:06,188-Speed 5502.44 samples/sec Loss 1.3831 LearningRate 0.0016 Epoch: 17 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:08,028-Speed 5565.29 samples/sec Loss 1.4121 LearningRate 0.0016 Epoch: 17 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:09,855-Speed 5608.62 samples/sec Loss 1.2986 LearningRate 0.0016 Epoch: 17 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:11,698-Speed 5558.51 samples/sec Loss 1.3315 LearningRate 0.0016 Epoch: 17 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:13,503-Speed 5673.17 samples/sec Loss 1.3619 LearningRate 0.0016 Epoch: 17 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:15,343-Speed 5568.34 samples/sec Loss 1.3670 LearningRate 0.0016 Epoch: 17 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:17,182-Speed 5570.52 samples/sec Loss 1.2413 LearningRate 0.0016 Epoch: 17 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:19,032-Speed 5536.30 samples/sec Loss 1.3004 LearningRate 0.0016 Epoch: 17 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:20,884-Speed 5528.75 samples/sec Loss 1.2808 LearningRate 0.0016 Epoch: 17 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:22,710-Speed 5611.24 samples/sec Loss 1.2948 LearningRate 0.0016 Epoch: 17 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:24,530-Speed 5627.22 samples/sec Loss 1.2813 LearningRate 0.0016 Epoch: 17 Global Step: 99220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:26,368-Speed 5574.56 samples/sec Loss 1.4011 LearningRate 0.0016 Epoch: 17 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:28,238-Speed 5476.51 samples/sec Loss 1.3691 LearningRate 0.0016 Epoch: 17 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:30,084-Speed 5550.37 samples/sec Loss 1.3215 LearningRate 0.0016 Epoch: 17 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:31,930-Speed 5547.82 samples/sec Loss 1.3273 LearningRate 0.0016 Epoch: 17 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:33,764-Speed 5586.31 samples/sec Loss 1.2487 LearningRate 0.0016 Epoch: 17 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:35,597-Speed 5590.19 samples/sec Loss 1.3617 LearningRate 0.0016 Epoch: 17 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:37,425-Speed 5602.89 samples/sec Loss 1.3588 LearningRate 0.0016 Epoch: 17 Global Step: 99290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:39,259-Speed 5584.36 samples/sec Loss 1.3203 LearningRate 0.0016 Epoch: 17 Global Step: 99300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:41,092-Speed 5589.04 samples/sec Loss 1.2628 LearningRate 0.0016 Epoch: 17 Global Step: 99310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:42:42,911-Speed 5632.09 samples/sec Loss 1.3324 LearningRate 0.0016 Epoch: 17 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:44,748-Speed 5575.01 samples/sec Loss 1.2717 LearningRate 0.0016 Epoch: 17 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:46,569-Speed 5625.52 samples/sec Loss 1.3742 LearningRate 0.0016 Epoch: 17 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:48,397-Speed 5603.26 samples/sec Loss 1.4195 LearningRate 0.0016 Epoch: 17 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:50,244-Speed 5547.10 samples/sec Loss 1.2568 LearningRate 0.0016 Epoch: 17 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:52,080-Speed 5577.64 samples/sec Loss 1.3350 LearningRate 0.0016 Epoch: 17 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:53,917-Speed 5578.41 samples/sec Loss 1.2238 LearningRate 0.0016 Epoch: 17 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:55,761-Speed 5553.39 samples/sec Loss 1.2396 LearningRate 0.0016 Epoch: 17 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:57,594-Speed 5590.56 samples/sec Loss 1.3374 LearningRate 0.0016 Epoch: 17 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:42:59,449-Speed 5519.96 samples/sec Loss 1.3870 LearningRate 0.0016 Epoch: 17 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:01,286-Speed 5575.98 samples/sec Loss 1.3480 LearningRate 0.0016 Epoch: 17 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:03,114-Speed 5606.16 samples/sec Loss 1.3655 LearningRate 0.0016 Epoch: 17 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:04,940-Speed 5608.85 samples/sec Loss 1.3463 LearningRate 0.0016 Epoch: 17 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:06,773-Speed 5587.46 samples/sec Loss 1.3159 LearningRate 0.0016 Epoch: 17 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:08,591-Speed 5636.40 samples/sec Loss 1.3219 LearningRate 0.0016 Epoch: 17 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:10,415-Speed 5612.84 samples/sec Loss 1.2232 LearningRate 0.0016 Epoch: 17 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:12,287-Speed 5472.87 samples/sec Loss 1.3597 LearningRate 0.0016 Epoch: 17 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:14,122-Speed 5583.55 samples/sec Loss 1.3451 LearningRate 0.0016 Epoch: 17 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:15,979-Speed 5517.53 samples/sec Loss 1.3170 LearningRate 0.0016 Epoch: 17 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:17,818-Speed 5568.90 samples/sec Loss 1.3264 LearningRate 0.0016 Epoch: 17 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:19,652-Speed 5586.05 samples/sec Loss 1.3640 LearningRate 0.0016 Epoch: 17 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:21,470-Speed 5632.64 samples/sec Loss 1.3341 LearningRate 0.0016 Epoch: 17 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:23,296-Speed 5612.00 samples/sec Loss 1.3204 LearningRate 0.0016 Epoch: 17 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:25,126-Speed 5596.77 samples/sec Loss 1.2834 LearningRate 0.0016 Epoch: 17 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:26,964-Speed 5573.64 samples/sec Loss 1.2001 LearningRate 0.0016 Epoch: 17 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:28,795-Speed 5596.80 samples/sec Loss 1.3205 LearningRate 0.0015 Epoch: 17 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:30,666-Speed 5472.84 samples/sec Loss 1.3253 LearningRate 0.0015 Epoch: 17 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:32,520-Speed 5526.02 samples/sec Loss 1.2860 LearningRate 0.0015 Epoch: 17 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:34,407-Speed 5426.90 samples/sec Loss 1.4130 LearningRate 0.0015 Epoch: 17 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:36,332-Speed 5322.68 samples/sec Loss 1.3356 LearningRate 0.0015 Epoch: 17 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:38,199-Speed 5486.14 samples/sec Loss 1.2507 LearningRate 0.0015 Epoch: 17 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:40,054-Speed 5524.28 samples/sec Loss 1.2647 LearningRate 0.0015 Epoch: 17 Global Step: 99630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:41,894-Speed 5565.64 samples/sec Loss 1.2895 LearningRate 0.0015 Epoch: 17 Global Step: 99640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:43,737-Speed 5557.84 samples/sec Loss 1.3781 LearningRate 0.0015 Epoch: 17 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:45,569-Speed 5593.25 samples/sec Loss 1.2576 LearningRate 0.0015 Epoch: 17 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:47,390-Speed 5624.40 samples/sec Loss 1.3955 LearningRate 0.0015 Epoch: 17 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:49,293-Speed 5381.51 samples/sec Loss 1.3700 LearningRate 0.0015 Epoch: 17 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:51,122-Speed 5601.67 samples/sec Loss 1.2964 LearningRate 0.0015 Epoch: 17 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:52,970-Speed 5540.58 samples/sec Loss 1.2668 LearningRate 0.0015 Epoch: 17 Global Step: 99700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:54,816-Speed 5551.63 samples/sec Loss 1.3788 LearningRate 0.0015 Epoch: 17 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:43:56,667-Speed 5535.97 samples/sec Loss 1.2635 LearningRate 0.0015 Epoch: 17 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:43:58,506-Speed 5569.59 samples/sec Loss 1.2001 LearningRate 0.0015 Epoch: 17 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:00,334-Speed 5604.29 samples/sec Loss 1.2708 LearningRate 0.0015 Epoch: 17 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:02,161-Speed 5606.48 samples/sec Loss 1.3301 LearningRate 0.0015 Epoch: 17 Global Step: 99750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:03,988-Speed 5606.63 samples/sec Loss 1.3844 LearningRate 0.0015 Epoch: 17 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:05,820-Speed 5591.80 samples/sec Loss 1.2266 LearningRate 0.0015 Epoch: 17 Global Step: 99770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:07,684-Speed 5495.66 samples/sec Loss 1.3464 LearningRate 0.0015 Epoch: 17 Global Step: 99780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:09,574-Speed 5420.78 samples/sec Loss 1.2322 LearningRate 0.0015 Epoch: 17 Global Step: 99790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:11,405-Speed 5592.36 samples/sec Loss 1.2722 LearningRate 0.0015 Epoch: 17 Global Step: 99800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:13,237-Speed 5592.46 samples/sec Loss 1.4131 LearningRate 0.0015 Epoch: 17 Global Step: 99810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:15,065-Speed 5604.88 samples/sec Loss 1.3067 LearningRate 0.0015 Epoch: 17 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:16,888-Speed 5616.41 samples/sec Loss 1.3885 LearningRate 0.0015 Epoch: 17 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:18,711-Speed 5619.41 samples/sec Loss 1.3565 LearningRate 0.0015 Epoch: 17 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:20,542-Speed 5594.16 samples/sec Loss 1.3163 LearningRate 0.0015 Epoch: 17 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:22,367-Speed 5612.94 samples/sec Loss 1.2958 LearningRate 0.0015 Epoch: 17 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:24,196-Speed 5601.05 samples/sec Loss 1.3414 LearningRate 0.0015 Epoch: 17 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:26,029-Speed 5588.60 samples/sec Loss 1.3774 LearningRate 0.0015 Epoch: 17 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:27,875-Speed 5549.51 samples/sec Loss 1.2510 LearningRate 0.0015 Epoch: 17 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:29,722-Speed 5545.19 samples/sec Loss 1.3722 LearningRate 0.0015 Epoch: 17 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:31,557-Speed 5582.87 samples/sec Loss 1.3486 LearningRate 0.0015 Epoch: 17 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:33,387-Speed 5596.92 samples/sec Loss 1.3087 LearningRate 0.0015 Epoch: 17 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:35,212-Speed 5614.09 samples/sec Loss 1.3121 LearningRate 0.0015 Epoch: 17 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:44:37,034-Speed 5621.18 samples/sec Loss 1.3030 LearningRate 0.0015 Epoch: 17 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:38,886-Speed 5531.68 samples/sec Loss 1.3618 LearningRate 0.0015 Epoch: 17 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:40,722-Speed 5578.43 samples/sec Loss 1.2842 LearningRate 0.0015 Epoch: 17 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:42,570-Speed 5542.76 samples/sec Loss 1.3603 LearningRate 0.0015 Epoch: 17 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:44,412-Speed 5561.76 samples/sec Loss 1.2993 LearningRate 0.0015 Epoch: 17 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:46,249-Speed 5576.55 samples/sec Loss 1.2814 LearningRate 0.0015 Epoch: 17 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:44:48,080-Speed 5596.27 samples/sec Loss 1.2462 LearningRate 0.0015 Epoch: 17 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:45:14,464-[lfw][100000]XNorm: 22.185568 Training: 2022-04-27 07:45:14,464-[lfw][100000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 07:45:14,465-[lfw][100000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:45:45,005-[cfp_fp][100000]XNorm: 21.126163 Training: 2022-04-27 07:45:45,005-[cfp_fp][100000]Accuracy-Flip: 0.97743+-0.00635 Training: 2022-04-27 07:45:45,006-[cfp_fp][100000]Accuracy-Highest: 0.97929 Training: 2022-04-27 07:46:11,427-[agedb_30][100000]XNorm: 22.106058 Training: 2022-04-27 07:46:11,428-[agedb_30][100000]Accuracy-Flip: 0.98083+-0.00574 Training: 2022-04-27 07:46:11,428-[agedb_30][100000]Accuracy-Highest: 0.98183 Training: 2022-04-27 07:46:13,256-Speed 120.22 samples/sec Loss 1.3943 LearningRate 0.0015 Epoch: 17 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:46:15,075-Speed 5630.50 samples/sec Loss 1.2623 LearningRate 0.0015 Epoch: 17 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:46:16,899-Speed 5618.00 samples/sec Loss 1.2216 LearningRate 0.0014 Epoch: 17 Global Step: 100030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:46:18,751-Speed 5529.86 samples/sec Loss 1.3304 LearningRate 0.0014 Epoch: 17 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:20,581-Speed 5597.37 samples/sec Loss 1.2703 LearningRate 0.0014 Epoch: 17 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:22,438-Speed 5515.55 samples/sec Loss 1.3174 LearningRate 0.0014 Epoch: 17 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:24,275-Speed 5576.68 samples/sec Loss 1.2212 LearningRate 0.0014 Epoch: 17 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:26,095-Speed 5629.04 samples/sec Loss 1.3692 LearningRate 0.0014 Epoch: 17 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:27,936-Speed 5561.91 samples/sec Loss 1.3343 LearningRate 0.0014 Epoch: 17 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:29,775-Speed 5569.36 samples/sec Loss 1.3914 LearningRate 0.0014 Epoch: 17 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:31,618-Speed 5557.79 samples/sec Loss 1.2761 LearningRate 0.0014 Epoch: 17 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:33,442-Speed 5617.18 samples/sec Loss 1.3358 LearningRate 0.0014 Epoch: 17 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:35,264-Speed 5623.74 samples/sec Loss 1.2907 LearningRate 0.0014 Epoch: 17 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:37,102-Speed 5571.46 samples/sec Loss 1.3200 LearningRate 0.0014 Epoch: 17 Global Step: 100140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:46:38,928-Speed 5608.32 samples/sec Loss 1.3584 LearningRate 0.0014 Epoch: 17 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:40,774-Speed 5549.73 samples/sec Loss 1.3133 LearningRate 0.0014 Epoch: 17 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:42,600-Speed 5610.92 samples/sec Loss 1.3226 LearningRate 0.0014 Epoch: 17 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:44,442-Speed 5561.34 samples/sec Loss 1.2271 LearningRate 0.0014 Epoch: 17 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:46,263-Speed 5624.51 samples/sec Loss 1.3029 LearningRate 0.0014 Epoch: 17 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:48,091-Speed 5603.06 samples/sec Loss 1.3209 LearningRate 0.0014 Epoch: 17 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:49,926-Speed 5583.36 samples/sec Loss 1.3278 LearningRate 0.0014 Epoch: 17 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:51,757-Speed 5594.34 samples/sec Loss 1.3711 LearningRate 0.0014 Epoch: 17 Global Step: 100220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:53,592-Speed 5583.37 samples/sec Loss 1.3412 LearningRate 0.0014 Epoch: 17 Global Step: 100230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:55,419-Speed 5604.87 samples/sec Loss 1.2360 LearningRate 0.0014 Epoch: 17 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:57,269-Speed 5536.04 samples/sec Loss 1.3593 LearningRate 0.0014 Epoch: 17 Global Step: 100250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:46:59,108-Speed 5569.95 samples/sec Loss 1.2622 LearningRate 0.0014 Epoch: 17 Global Step: 100260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:00,955-Speed 5546.07 samples/sec Loss 1.2898 LearningRate 0.0014 Epoch: 17 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:02,789-Speed 5585.79 samples/sec Loss 1.2542 LearningRate 0.0014 Epoch: 17 Global Step: 100280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:04,643-Speed 5525.80 samples/sec Loss 1.3821 LearningRate 0.0014 Epoch: 17 Global Step: 100290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:06,498-Speed 5524.19 samples/sec Loss 1.3843 LearningRate 0.0014 Epoch: 17 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:08,351-Speed 5525.88 samples/sec Loss 1.3222 LearningRate 0.0014 Epoch: 17 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:10,196-Speed 5554.48 samples/sec Loss 1.3129 LearningRate 0.0014 Epoch: 17 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:12,029-Speed 5588.35 samples/sec Loss 1.2951 LearningRate 0.0014 Epoch: 17 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:13,853-Speed 5615.76 samples/sec Loss 1.4157 LearningRate 0.0014 Epoch: 17 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:15,688-Speed 5580.27 samples/sec Loss 1.2913 LearningRate 0.0014 Epoch: 17 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:17,524-Speed 5578.89 samples/sec Loss 1.3769 LearningRate 0.0014 Epoch: 17 Global Step: 100360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:19,358-Speed 5586.93 samples/sec Loss 1.3675 LearningRate 0.0014 Epoch: 17 Global Step: 100370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:21,194-Speed 5579.59 samples/sec Loss 1.4061 LearningRate 0.0014 Epoch: 17 Global Step: 100380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:23,035-Speed 5564.27 samples/sec Loss 1.3748 LearningRate 0.0014 Epoch: 17 Global Step: 100390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:24,869-Speed 5583.16 samples/sec Loss 1.3242 LearningRate 0.0014 Epoch: 17 Global Step: 100400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:26,715-Speed 5551.93 samples/sec Loss 1.4535 LearningRate 0.0014 Epoch: 17 Global Step: 100410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:28,544-Speed 5599.08 samples/sec Loss 1.2930 LearningRate 0.0014 Epoch: 17 Global Step: 100420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:30,388-Speed 5556.56 samples/sec Loss 1.3225 LearningRate 0.0014 Epoch: 17 Global Step: 100430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:32,214-Speed 5609.96 samples/sec Loss 1.3121 LearningRate 0.0014 Epoch: 17 Global Step: 100440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:34,050-Speed 5576.49 samples/sec Loss 1.3775 LearningRate 0.0014 Epoch: 17 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:47:35,898-Speed 5545.12 samples/sec Loss 1.2907 LearningRate 0.0014 Epoch: 17 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:37,737-Speed 5568.74 samples/sec Loss 1.3150 LearningRate 0.0014 Epoch: 17 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:39,566-Speed 5599.00 samples/sec Loss 1.3028 LearningRate 0.0014 Epoch: 17 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:41,390-Speed 5617.81 samples/sec Loss 1.3158 LearningRate 0.0014 Epoch: 17 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:43,244-Speed 5523.43 samples/sec Loss 1.3201 LearningRate 0.0014 Epoch: 17 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:45,077-Speed 5590.47 samples/sec Loss 1.2457 LearningRate 0.0013 Epoch: 17 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:46,911-Speed 5584.52 samples/sec Loss 1.3441 LearningRate 0.0013 Epoch: 17 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:48,737-Speed 5609.46 samples/sec Loss 1.3649 LearningRate 0.0013 Epoch: 17 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:50,608-Speed 5475.67 samples/sec Loss 1.3140 LearningRate 0.0013 Epoch: 17 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:52,439-Speed 5594.90 samples/sec Loss 1.2967 LearningRate 0.0013 Epoch: 17 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:54,257-Speed 5632.67 samples/sec Loss 1.2827 LearningRate 0.0013 Epoch: 17 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:56,101-Speed 5556.90 samples/sec Loss 1.2916 LearningRate 0.0013 Epoch: 17 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:57,937-Speed 5577.64 samples/sec Loss 1.3139 LearningRate 0.0013 Epoch: 17 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:47:59,766-Speed 5599.71 samples/sec Loss 1.2469 LearningRate 0.0013 Epoch: 17 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:01,609-Speed 5558.60 samples/sec Loss 1.2795 LearningRate 0.0013 Epoch: 17 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:03,456-Speed 5546.47 samples/sec Loss 1.3656 LearningRate 0.0013 Epoch: 17 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:05,287-Speed 5595.13 samples/sec Loss 1.3438 LearningRate 0.0013 Epoch: 17 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:07,130-Speed 5557.26 samples/sec Loss 1.2827 LearningRate 0.0013 Epoch: 17 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:08,955-Speed 5613.67 samples/sec Loss 1.3138 LearningRate 0.0013 Epoch: 17 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:10,788-Speed 5586.21 samples/sec Loss 1.3306 LearningRate 0.0013 Epoch: 17 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:12,625-Speed 5579.25 samples/sec Loss 1.3371 LearningRate 0.0013 Epoch: 17 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:14,473-Speed 5541.51 samples/sec Loss 1.2516 LearningRate 0.0013 Epoch: 17 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:16,297-Speed 5615.10 samples/sec Loss 1.2286 LearningRate 0.0013 Epoch: 17 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:18,147-Speed 5538.57 samples/sec Loss 1.2598 LearningRate 0.0013 Epoch: 17 Global Step: 100690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:20,021-Speed 5464.97 samples/sec Loss 1.2625 LearningRate 0.0013 Epoch: 17 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:21,869-Speed 5544.22 samples/sec Loss 1.3644 LearningRate 0.0013 Epoch: 17 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:23,713-Speed 5555.15 samples/sec Loss 1.3557 LearningRate 0.0013 Epoch: 17 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:25,552-Speed 5568.05 samples/sec Loss 1.3482 LearningRate 0.0013 Epoch: 17 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:27,394-Speed 5561.23 samples/sec Loss 1.2623 LearningRate 0.0013 Epoch: 17 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:29,229-Speed 5581.36 samples/sec Loss 1.2936 LearningRate 0.0013 Epoch: 17 Global Step: 100750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:31,064-Speed 5584.04 samples/sec Loss 1.4124 LearningRate 0.0013 Epoch: 17 Global Step: 100760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:48:32,880-Speed 5643.88 samples/sec Loss 1.3482 LearningRate 0.0013 Epoch: 17 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:34,740-Speed 5508.02 samples/sec Loss 1.3176 LearningRate 0.0013 Epoch: 17 Global Step: 100780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:36,563-Speed 5617.78 samples/sec Loss 1.4013 LearningRate 0.0013 Epoch: 17 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:38,393-Speed 5597.06 samples/sec Loss 1.2633 LearningRate 0.0013 Epoch: 17 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:40,218-Speed 5612.66 samples/sec Loss 1.2977 LearningRate 0.0013 Epoch: 17 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:42,048-Speed 5599.76 samples/sec Loss 1.3016 LearningRate 0.0013 Epoch: 17 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:43,884-Speed 5579.59 samples/sec Loss 1.2840 LearningRate 0.0013 Epoch: 17 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:45,725-Speed 5563.49 samples/sec Loss 1.3377 LearningRate 0.0013 Epoch: 17 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:47,587-Speed 5500.63 samples/sec Loss 1.3472 LearningRate 0.0013 Epoch: 17 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:49,428-Speed 5563.38 samples/sec Loss 1.3070 LearningRate 0.0013 Epoch: 17 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:51,251-Speed 5621.17 samples/sec Loss 1.3210 LearningRate 0.0013 Epoch: 17 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:53,120-Speed 5477.96 samples/sec Loss 1.3416 LearningRate 0.0013 Epoch: 17 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:54,958-Speed 5576.34 samples/sec Loss 1.3318 LearningRate 0.0013 Epoch: 17 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:56,794-Speed 5577.70 samples/sec Loss 1.3251 LearningRate 0.0013 Epoch: 17 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:48:58,621-Speed 5608.69 samples/sec Loss 1.2766 LearningRate 0.0013 Epoch: 17 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:00,464-Speed 5557.55 samples/sec Loss 1.2846 LearningRate 0.0013 Epoch: 17 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:02,308-Speed 5554.31 samples/sec Loss 1.3429 LearningRate 0.0013 Epoch: 17 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:04,132-Speed 5614.77 samples/sec Loss 1.3078 LearningRate 0.0013 Epoch: 17 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:05,992-Speed 5506.37 samples/sec Loss 1.3674 LearningRate 0.0013 Epoch: 17 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:07,842-Speed 5537.30 samples/sec Loss 1.2956 LearningRate 0.0013 Epoch: 17 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:09,679-Speed 5577.57 samples/sec Loss 1.3269 LearningRate 0.0013 Epoch: 17 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:11,521-Speed 5559.33 samples/sec Loss 1.2376 LearningRate 0.0013 Epoch: 17 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:13,359-Speed 5573.19 samples/sec Loss 1.2968 LearningRate 0.0013 Epoch: 17 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:15,199-Speed 5567.30 samples/sec Loss 1.2417 LearningRate 0.0013 Epoch: 17 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:17,065-Speed 5489.46 samples/sec Loss 1.3243 LearningRate 0.0012 Epoch: 17 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:18,923-Speed 5515.01 samples/sec Loss 1.2962 LearningRate 0.0012 Epoch: 17 Global Step: 101020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:20,776-Speed 5527.85 samples/sec Loss 1.2754 LearningRate 0.0012 Epoch: 17 Global Step: 101030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:22,624-Speed 5542.95 samples/sec Loss 1.3226 LearningRate 0.0012 Epoch: 17 Global Step: 101040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:24,465-Speed 5565.91 samples/sec Loss 1.3305 LearningRate 0.0012 Epoch: 17 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:26,296-Speed 5594.45 samples/sec Loss 1.3132 LearningRate 0.0012 Epoch: 17 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:28,111-Speed 5642.03 samples/sec Loss 1.2856 LearningRate 0.0012 Epoch: 17 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:29,956-Speed 5553.38 samples/sec Loss 1.2697 LearningRate 0.0012 Epoch: 17 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:31,801-Speed 5552.17 samples/sec Loss 1.2965 LearningRate 0.0012 Epoch: 17 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:33,636-Speed 5579.63 samples/sec Loss 1.2813 LearningRate 0.0012 Epoch: 17 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:35,468-Speed 5592.59 samples/sec Loss 1.2883 LearningRate 0.0012 Epoch: 17 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:37,291-Speed 5618.03 samples/sec Loss 1.3188 LearningRate 0.0012 Epoch: 17 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:39,141-Speed 5537.56 samples/sec Loss 1.3167 LearningRate 0.0012 Epoch: 17 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:40,995-Speed 5525.76 samples/sec Loss 1.2976 LearningRate 0.0012 Epoch: 17 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:42,853-Speed 5513.02 samples/sec Loss 1.3406 LearningRate 0.0012 Epoch: 17 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:44,686-Speed 5587.56 samples/sec Loss 1.3105 LearningRate 0.0012 Epoch: 17 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:46,519-Speed 5589.57 samples/sec Loss 1.2606 LearningRate 0.0012 Epoch: 17 Global Step: 101170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:48,344-Speed 5612.78 samples/sec Loss 1.3382 LearningRate 0.0012 Epoch: 17 Global Step: 101180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:50,203-Speed 5510.90 samples/sec Loss 1.3176 LearningRate 0.0012 Epoch: 17 Global Step: 101190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:52,081-Speed 5452.42 samples/sec Loss 1.3371 LearningRate 0.0012 Epoch: 17 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:53,935-Speed 5525.51 samples/sec Loss 1.2766 LearningRate 0.0012 Epoch: 17 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:55,768-Speed 5587.54 samples/sec Loss 1.3043 LearningRate 0.0012 Epoch: 17 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:57,594-Speed 5611.06 samples/sec Loss 1.2039 LearningRate 0.0012 Epoch: 17 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:49:59,433-Speed 5567.81 samples/sec Loss 1.3275 LearningRate 0.0012 Epoch: 17 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:01,360-Speed 5315.59 samples/sec Loss 1.3170 LearningRate 0.0012 Epoch: 17 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:03,247-Speed 5430.02 samples/sec Loss 1.3538 LearningRate 0.0012 Epoch: 17 Global Step: 101260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:05,076-Speed 5600.28 samples/sec Loss 1.2048 LearningRate 0.0012 Epoch: 17 Global Step: 101270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:06,923-Speed 5546.33 samples/sec Loss 1.2973 LearningRate 0.0012 Epoch: 17 Global Step: 101280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:08,761-Speed 5573.56 samples/sec Loss 1.3456 LearningRate 0.0012 Epoch: 17 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:10,603-Speed 5562.47 samples/sec Loss 1.3258 LearningRate 0.0012 Epoch: 17 Global Step: 101300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:12,442-Speed 5570.69 samples/sec Loss 1.3215 LearningRate 0.0012 Epoch: 17 Global Step: 101310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:14,285-Speed 5555.38 samples/sec Loss 1.2832 LearningRate 0.0012 Epoch: 17 Global Step: 101320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:16,126-Speed 5565.45 samples/sec Loss 1.3893 LearningRate 0.0012 Epoch: 17 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:17,953-Speed 5605.41 samples/sec Loss 1.2385 LearningRate 0.0012 Epoch: 17 Global Step: 101340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:19,794-Speed 5563.75 samples/sec Loss 1.3038 LearningRate 0.0012 Epoch: 17 Global Step: 101350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:21,621-Speed 5606.52 samples/sec Loss 1.2151 LearningRate 0.0012 Epoch: 17 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:23,464-Speed 5559.52 samples/sec Loss 1.2573 LearningRate 0.0012 Epoch: 17 Global Step: 101370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:50:25,286-Speed 5622.57 samples/sec Loss 1.2451 LearningRate 0.0012 Epoch: 17 Global Step: 101380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:27,147-Speed 5504.36 samples/sec Loss 1.3611 LearningRate 0.0012 Epoch: 17 Global Step: 101390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:28,991-Speed 5553.61 samples/sec Loss 1.3413 LearningRate 0.0012 Epoch: 17 Global Step: 101400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:30,835-Speed 5554.56 samples/sec Loss 1.2812 LearningRate 0.0012 Epoch: 17 Global Step: 101410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:32,689-Speed 5526.81 samples/sec Loss 1.3450 LearningRate 0.0012 Epoch: 17 Global Step: 101420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:34,522-Speed 5587.44 samples/sec Loss 1.2456 LearningRate 0.0012 Epoch: 17 Global Step: 101430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:36,373-Speed 5534.96 samples/sec Loss 1.2920 LearningRate 0.0012 Epoch: 17 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:38,209-Speed 5577.85 samples/sec Loss 1.3146 LearningRate 0.0012 Epoch: 17 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:40,038-Speed 5600.29 samples/sec Loss 1.2891 LearningRate 0.0012 Epoch: 17 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:41,891-Speed 5532.99 samples/sec Loss 1.2711 LearningRate 0.0012 Epoch: 17 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:43,704-Speed 5648.94 samples/sec Loss 1.3292 LearningRate 0.0012 Epoch: 17 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:45,547-Speed 5557.15 samples/sec Loss 1.3181 LearningRate 0.0012 Epoch: 17 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:47,399-Speed 5531.45 samples/sec Loss 1.3226 LearningRate 0.0012 Epoch: 17 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:49,237-Speed 5574.04 samples/sec Loss 1.3415 LearningRate 0.0012 Epoch: 17 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:51,072-Speed 5583.18 samples/sec Loss 1.3064 LearningRate 0.0012 Epoch: 17 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:52,914-Speed 5561.92 samples/sec Loss 1.3729 LearningRate 0.0011 Epoch: 17 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:54,842-Speed 5313.42 samples/sec Loss 1.3272 LearningRate 0.0011 Epoch: 17 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:56,690-Speed 5540.85 samples/sec Loss 1.3183 LearningRate 0.0011 Epoch: 17 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:50:58,583-Speed 5411.40 samples/sec Loss 1.3633 LearningRate 0.0011 Epoch: 17 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:00,414-Speed 5595.32 samples/sec Loss 1.1846 LearningRate 0.0011 Epoch: 17 Global Step: 101570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:02,245-Speed 5593.15 samples/sec Loss 1.2991 LearningRate 0.0011 Epoch: 17 Global Step: 101580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:04,078-Speed 5590.44 samples/sec Loss 1.2920 LearningRate 0.0011 Epoch: 17 Global Step: 101590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:05,974-Speed 5400.83 samples/sec Loss 1.2465 LearningRate 0.0011 Epoch: 17 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:07,822-Speed 5542.57 samples/sec Loss 1.2933 LearningRate 0.0011 Epoch: 17 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:09,654-Speed 5592.06 samples/sec Loss 1.3737 LearningRate 0.0011 Epoch: 17 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:11,491-Speed 5578.69 samples/sec Loss 1.3083 LearningRate 0.0011 Epoch: 17 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:13,328-Speed 5574.37 samples/sec Loss 1.3706 LearningRate 0.0011 Epoch: 17 Global Step: 101640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:15,167-Speed 5572.01 samples/sec Loss 1.3454 LearningRate 0.0011 Epoch: 17 Global Step: 101650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:16,992-Speed 5612.85 samples/sec Loss 1.2992 LearningRate 0.0011 Epoch: 17 Global Step: 101660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:18,831-Speed 5572.64 samples/sec Loss 1.3467 LearningRate 0.0011 Epoch: 17 Global Step: 101670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:20,672-Speed 5561.83 samples/sec Loss 1.2248 LearningRate 0.0011 Epoch: 17 Global Step: 101680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:22,519-Speed 5547.57 samples/sec Loss 1.2716 LearningRate 0.0011 Epoch: 17 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:24,359-Speed 5566.64 samples/sec Loss 1.2760 LearningRate 0.0011 Epoch: 17 Global Step: 101700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:26,189-Speed 5597.07 samples/sec Loss 1.3035 LearningRate 0.0011 Epoch: 17 Global Step: 101710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:28,042-Speed 5526.80 samples/sec Loss 1.3466 LearningRate 0.0011 Epoch: 17 Global Step: 101720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:29,886-Speed 5554.90 samples/sec Loss 1.3070 LearningRate 0.0011 Epoch: 17 Global Step: 101730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:31,748-Speed 5501.24 samples/sec Loss 1.3161 LearningRate 0.0011 Epoch: 17 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:33,584-Speed 5580.31 samples/sec Loss 1.2989 LearningRate 0.0011 Epoch: 17 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:35,410-Speed 5610.02 samples/sec Loss 1.2263 LearningRate 0.0011 Epoch: 17 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:37,246-Speed 5580.54 samples/sec Loss 1.3112 LearningRate 0.0011 Epoch: 17 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:51:39,081-Speed 5580.54 samples/sec Loss 1.3905 LearningRate 0.0011 Epoch: 17 Global Step: 101780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:51:40,902-Speed 5625.23 samples/sec Loss 1.3653 LearningRate 0.0011 Epoch: 17 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:42,730-Speed 5605.63 samples/sec Loss 1.3171 LearningRate 0.0011 Epoch: 17 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:44,575-Speed 5549.67 samples/sec Loss 1.2688 LearningRate 0.0011 Epoch: 17 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:46,412-Speed 5575.50 samples/sec Loss 1.3315 LearningRate 0.0011 Epoch: 17 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:48,241-Speed 5601.95 samples/sec Loss 1.3750 LearningRate 0.0011 Epoch: 17 Global Step: 101830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:50,104-Speed 5498.40 samples/sec Loss 1.3373 LearningRate 0.0011 Epoch: 17 Global Step: 101840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:51,945-Speed 5562.82 samples/sec Loss 1.2829 LearningRate 0.0011 Epoch: 17 Global Step: 101850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:53,776-Speed 5594.95 samples/sec Loss 1.2874 LearningRate 0.0011 Epoch: 17 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:55,601-Speed 5613.57 samples/sec Loss 1.2610 LearningRate 0.0011 Epoch: 17 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:57,435-Speed 5586.62 samples/sec Loss 1.2058 LearningRate 0.0011 Epoch: 17 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:51:59,279-Speed 5554.17 samples/sec Loss 1.2537 LearningRate 0.0011 Epoch: 17 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:01,134-Speed 5521.95 samples/sec Loss 1.3111 LearningRate 0.0011 Epoch: 17 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:02,995-Speed 5505.90 samples/sec Loss 1.2884 LearningRate 0.0011 Epoch: 17 Global Step: 101910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:04,856-Speed 5503.79 samples/sec Loss 1.3263 LearningRate 0.0011 Epoch: 17 Global Step: 101920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:06,690-Speed 5584.70 samples/sec Loss 1.2549 LearningRate 0.0011 Epoch: 17 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:08,546-Speed 5518.34 samples/sec Loss 1.2506 LearningRate 0.0011 Epoch: 17 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:10,398-Speed 5531.49 samples/sec Loss 1.3138 LearningRate 0.0011 Epoch: 17 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:12,230-Speed 5591.80 samples/sec Loss 1.3516 LearningRate 0.0011 Epoch: 17 Global Step: 101960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:14,066-Speed 5578.30 samples/sec Loss 1.2858 LearningRate 0.0011 Epoch: 17 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:15,913-Speed 5547.48 samples/sec Loss 1.3012 LearningRate 0.0011 Epoch: 17 Global Step: 101980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:17,730-Speed 5637.00 samples/sec Loss 1.2669 LearningRate 0.0011 Epoch: 17 Global Step: 101990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:19,568-Speed 5571.81 samples/sec Loss 1.3307 LearningRate 0.0011 Epoch: 17 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:52:45,797-[lfw][102000]XNorm: 21.973692 Training: 2022-04-27 07:52:45,798-[lfw][102000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-27 07:52:45,798-[lfw][102000]Accuracy-Highest: 0.99800 Training: 2022-04-27 07:53:16,214-[cfp_fp][102000]XNorm: 21.134001 Training: 2022-04-27 07:53:16,215-[cfp_fp][102000]Accuracy-Flip: 0.97886+-0.00682 Training: 2022-04-27 07:53:16,215-[cfp_fp][102000]Accuracy-Highest: 0.97929 Training: 2022-04-27 07:53:42,529-[agedb_30][102000]XNorm: 22.020916 Training: 2022-04-27 07:53:42,529-[agedb_30][102000]Accuracy-Flip: 0.98100+-0.00680 Training: 2022-04-27 07:53:42,529-[agedb_30][102000]Accuracy-Highest: 0.98183 Training: 2022-04-27 07:53:44,384-Speed 120.73 samples/sec Loss 1.2003 LearningRate 0.0011 Epoch: 17 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:46,214-Speed 5598.73 samples/sec Loss 1.2926 LearningRate 0.0011 Epoch: 17 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:48,060-Speed 5548.92 samples/sec Loss 1.3242 LearningRate 0.0011 Epoch: 17 Global Step: 102030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:49,877-Speed 5635.07 samples/sec Loss 1.3168 LearningRate 0.0011 Epoch: 17 Global Step: 102040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:51,697-Speed 5629.21 samples/sec Loss 1.2361 LearningRate 0.0011 Epoch: 17 Global Step: 102050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:53,517-Speed 5629.16 samples/sec Loss 1.3409 LearningRate 0.0011 Epoch: 17 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:55,332-Speed 5644.27 samples/sec Loss 1.2373 LearningRate 0.0010 Epoch: 17 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:57,155-Speed 5619.05 samples/sec Loss 1.3487 LearningRate 0.0010 Epoch: 17 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:53:58,961-Speed 5670.48 samples/sec Loss 1.4083 LearningRate 0.0010 Epoch: 17 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:00,786-Speed 5612.94 samples/sec Loss 1.3311 LearningRate 0.0010 Epoch: 17 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:02,599-Speed 5650.62 samples/sec Loss 1.2817 LearningRate 0.0010 Epoch: 17 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:04,413-Speed 5647.02 samples/sec Loss 1.2352 LearningRate 0.0010 Epoch: 17 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:06,243-Speed 5596.21 samples/sec Loss 1.2806 LearningRate 0.0010 Epoch: 17 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:08,071-Speed 5605.47 samples/sec Loss 1.3994 LearningRate 0.0010 Epoch: 17 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:09,890-Speed 5630.46 samples/sec Loss 1.2987 LearningRate 0.0010 Epoch: 17 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:11,730-Speed 5567.32 samples/sec Loss 1.2510 LearningRate 0.0010 Epoch: 17 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:13,547-Speed 5636.33 samples/sec Loss 1.2541 LearningRate 0.0010 Epoch: 17 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:15,385-Speed 5572.82 samples/sec Loss 1.1958 LearningRate 0.0010 Epoch: 17 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:17,215-Speed 5597.73 samples/sec Loss 1.4077 LearningRate 0.0010 Epoch: 17 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:19,068-Speed 5533.53 samples/sec Loss 1.2842 LearningRate 0.0010 Epoch: 17 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:20,891-Speed 5616.57 samples/sec Loss 1.2704 LearningRate 0.0010 Epoch: 17 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:22,728-Speed 5577.38 samples/sec Loss 1.3264 LearningRate 0.0010 Epoch: 17 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:24,570-Speed 5559.57 samples/sec Loss 1.2319 LearningRate 0.0010 Epoch: 17 Global Step: 102230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:26,397-Speed 5608.46 samples/sec Loss 1.3643 LearningRate 0.0010 Epoch: 17 Global Step: 102240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:28,228-Speed 5594.22 samples/sec Loss 1.3350 LearningRate 0.0010 Epoch: 17 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:30,047-Speed 5630.23 samples/sec Loss 1.3264 LearningRate 0.0010 Epoch: 17 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:31,875-Speed 5604.90 samples/sec Loss 1.3136 LearningRate 0.0010 Epoch: 17 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:33,718-Speed 5556.78 samples/sec Loss 1.2868 LearningRate 0.0010 Epoch: 17 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:35,540-Speed 5621.98 samples/sec Loss 1.2896 LearningRate 0.0010 Epoch: 17 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:37,358-Speed 5636.39 samples/sec Loss 1.3198 LearningRate 0.0010 Epoch: 17 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:39,182-Speed 5619.00 samples/sec Loss 1.2137 LearningRate 0.0010 Epoch: 17 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:41,021-Speed 5571.98 samples/sec Loss 1.2610 LearningRate 0.0010 Epoch: 17 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:42,844-Speed 5618.53 samples/sec Loss 1.3642 LearningRate 0.0010 Epoch: 17 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:44,743-Speed 5393.39 samples/sec Loss 1.3162 LearningRate 0.0010 Epoch: 17 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:56,150-Speed 897.76 samples/sec Loss 1.1820 LearningRate 0.0010 Epoch: 18 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:58,045-Speed 5404.57 samples/sec Loss 1.0219 LearningRate 0.0010 Epoch: 18 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:54:59,918-Speed 5470.60 samples/sec Loss 1.0493 LearningRate 0.0010 Epoch: 18 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:55:01,735-Speed 5635.58 samples/sec Loss 0.9767 LearningRate 0.0010 Epoch: 18 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:55:03,601-Speed 5491.61 samples/sec Loss 1.0362 LearningRate 0.0010 Epoch: 18 Global Step: 102390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:55:05,456-Speed 5522.72 samples/sec Loss 1.0232 LearningRate 0.0010 Epoch: 18 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:55:07,281-Speed 5613.62 samples/sec Loss 0.9464 LearningRate 0.0010 Epoch: 18 Global Step: 102410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:09,106-Speed 5611.71 samples/sec Loss 1.0031 LearningRate 0.0010 Epoch: 18 Global Step: 102420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:10,944-Speed 5574.21 samples/sec Loss 1.0077 LearningRate 0.0010 Epoch: 18 Global Step: 102430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:12,810-Speed 5487.23 samples/sec Loss 0.9662 LearningRate 0.0010 Epoch: 18 Global Step: 102440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:14,657-Speed 5546.97 samples/sec Loss 0.9396 LearningRate 0.0010 Epoch: 18 Global Step: 102450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:16,484-Speed 5605.72 samples/sec Loss 1.0221 LearningRate 0.0010 Epoch: 18 Global Step: 102460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:18,318-Speed 5585.08 samples/sec Loss 1.0043 LearningRate 0.0010 Epoch: 18 Global Step: 102470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:20,162-Speed 5554.80 samples/sec Loss 1.0858 LearningRate 0.0010 Epoch: 18 Global Step: 102480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:22,006-Speed 5555.48 samples/sec Loss 0.9606 LearningRate 0.0010 Epoch: 18 Global Step: 102490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:23,859-Speed 5529.26 samples/sec Loss 1.0015 LearningRate 0.0010 Epoch: 18 Global Step: 102500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:25,685-Speed 5608.83 samples/sec Loss 1.0054 LearningRate 0.0010 Epoch: 18 Global Step: 102510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:27,521-Speed 5578.99 samples/sec Loss 1.0659 LearningRate 0.0010 Epoch: 18 Global Step: 102520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:29,354-Speed 5590.70 samples/sec Loss 1.0562 LearningRate 0.0010 Epoch: 18 Global Step: 102530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:31,174-Speed 5625.91 samples/sec Loss 0.9357 LearningRate 0.0010 Epoch: 18 Global Step: 102540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:33,028-Speed 5525.82 samples/sec Loss 1.0102 LearningRate 0.0010 Epoch: 18 Global Step: 102550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:34,865-Speed 5574.68 samples/sec Loss 1.1214 LearningRate 0.0010 Epoch: 18 Global Step: 102560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 07:55:36,685-Speed 5630.53 samples/sec Loss 0.9694 LearningRate 0.0010 Epoch: 18 Global Step: 102570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:38,529-Speed 5555.06 samples/sec Loss 1.0175 LearningRate 0.0010 Epoch: 18 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:40,379-Speed 5537.05 samples/sec Loss 1.0447 LearningRate 0.0010 Epoch: 18 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:42,201-Speed 5622.22 samples/sec Loss 0.9637 LearningRate 0.0010 Epoch: 18 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:44,040-Speed 5569.02 samples/sec Loss 0.9533 LearningRate 0.0010 Epoch: 18 Global Step: 102610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:45,906-Speed 5488.96 samples/sec Loss 0.9830 LearningRate 0.0010 Epoch: 18 Global Step: 102620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:47,750-Speed 5556.41 samples/sec Loss 1.0114 LearningRate 0.0010 Epoch: 18 Global Step: 102630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:49,599-Speed 5540.30 samples/sec Loss 0.9802 LearningRate 0.0009 Epoch: 18 Global Step: 102640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:51,437-Speed 5571.18 samples/sec Loss 0.9988 LearningRate 0.0009 Epoch: 18 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:53,262-Speed 5613.02 samples/sec Loss 1.0013 LearningRate 0.0009 Epoch: 18 Global Step: 102660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:55:55,132-Speed 5480.19 samples/sec Loss 0.9888 LearningRate 0.0009 Epoch: 18 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:55:56,990-Speed 5511.97 samples/sec Loss 1.0504 LearningRate 0.0009 Epoch: 18 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:55:58,861-Speed 5474.41 samples/sec Loss 0.9441 LearningRate 0.0009 Epoch: 18 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:00,725-Speed 5496.86 samples/sec Loss 0.9987 LearningRate 0.0009 Epoch: 18 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:02,578-Speed 5527.26 samples/sec Loss 1.0021 LearningRate 0.0009 Epoch: 18 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:04,419-Speed 5562.08 samples/sec Loss 1.0044 LearningRate 0.0009 Epoch: 18 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:06,252-Speed 5589.58 samples/sec Loss 1.0582 LearningRate 0.0009 Epoch: 18 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:08,106-Speed 5524.15 samples/sec Loss 1.0269 LearningRate 0.0009 Epoch: 18 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:09,957-Speed 5535.03 samples/sec Loss 0.9887 LearningRate 0.0009 Epoch: 18 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:11,777-Speed 5630.70 samples/sec Loss 1.0588 LearningRate 0.0009 Epoch: 18 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:13,595-Speed 5631.94 samples/sec Loss 0.9912 LearningRate 0.0009 Epoch: 18 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:15,423-Speed 5604.48 samples/sec Loss 1.0604 LearningRate 0.0009 Epoch: 18 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:17,257-Speed 5584.72 samples/sec Loss 0.9339 LearningRate 0.0009 Epoch: 18 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:19,115-Speed 5514.86 samples/sec Loss 1.0507 LearningRate 0.0009 Epoch: 18 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:20,977-Speed 5498.48 samples/sec Loss 1.0599 LearningRate 0.0009 Epoch: 18 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:22,816-Speed 5571.80 samples/sec Loss 1.0483 LearningRate 0.0009 Epoch: 18 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:24,689-Speed 5466.79 samples/sec Loss 1.0886 LearningRate 0.0009 Epoch: 18 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:26,532-Speed 5559.53 samples/sec Loss 1.1020 LearningRate 0.0009 Epoch: 18 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:28,361-Speed 5601.88 samples/sec Loss 0.9842 LearningRate 0.0009 Epoch: 18 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:30,196-Speed 5580.39 samples/sec Loss 0.9914 LearningRate 0.0009 Epoch: 18 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:32,031-Speed 5582.63 samples/sec Loss 1.0278 LearningRate 0.0009 Epoch: 18 Global Step: 102870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:56:33,852-Speed 5624.34 samples/sec Loss 1.0549 LearningRate 0.0009 Epoch: 18 Global Step: 102880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:56:35,697-Speed 5552.37 samples/sec Loss 1.0151 LearningRate 0.0009 Epoch: 18 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:37,543-Speed 5550.83 samples/sec Loss 1.0708 LearningRate 0.0009 Epoch: 18 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:39,391-Speed 5541.45 samples/sec Loss 0.9517 LearningRate 0.0009 Epoch: 18 Global Step: 102910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:41,481-Speed 4901.94 samples/sec Loss 0.9987 LearningRate 0.0009 Epoch: 18 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:43,337-Speed 5517.84 samples/sec Loss 0.9706 LearningRate 0.0009 Epoch: 18 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:45,166-Speed 5601.36 samples/sec Loss 0.9552 LearningRate 0.0009 Epoch: 18 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:47,011-Speed 5552.55 samples/sec Loss 0.9950 LearningRate 0.0009 Epoch: 18 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:48,832-Speed 5623.64 samples/sec Loss 1.0199 LearningRate 0.0009 Epoch: 18 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:50,664-Speed 5592.93 samples/sec Loss 1.0077 LearningRate 0.0009 Epoch: 18 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:52,504-Speed 5565.79 samples/sec Loss 1.0684 LearningRate 0.0009 Epoch: 18 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:54,332-Speed 5603.75 samples/sec Loss 1.0763 LearningRate 0.0009 Epoch: 18 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:56,166-Speed 5586.59 samples/sec Loss 1.0240 LearningRate 0.0009 Epoch: 18 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:58,019-Speed 5528.61 samples/sec Loss 0.9381 LearningRate 0.0009 Epoch: 18 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:56:59,861-Speed 5561.25 samples/sec Loss 1.0730 LearningRate 0.0009 Epoch: 18 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:01,702-Speed 5564.91 samples/sec Loss 1.0282 LearningRate 0.0009 Epoch: 18 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:03,537-Speed 5580.67 samples/sec Loss 1.0208 LearningRate 0.0009 Epoch: 18 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:05,373-Speed 5579.51 samples/sec Loss 0.9601 LearningRate 0.0009 Epoch: 18 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:07,219-Speed 5547.25 samples/sec Loss 1.0599 LearningRate 0.0009 Epoch: 18 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:09,051-Speed 5594.23 samples/sec Loss 1.0006 LearningRate 0.0009 Epoch: 18 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:10,886-Speed 5580.79 samples/sec Loss 0.9659 LearningRate 0.0009 Epoch: 18 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:12,700-Speed 5646.62 samples/sec Loss 0.9316 LearningRate 0.0009 Epoch: 18 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:14,531-Speed 5596.06 samples/sec Loss 1.1055 LearningRate 0.0009 Epoch: 18 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:16,368-Speed 5573.21 samples/sec Loss 1.0659 LearningRate 0.0009 Epoch: 18 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:18,187-Speed 5634.58 samples/sec Loss 1.0158 LearningRate 0.0009 Epoch: 18 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:20,038-Speed 5532.09 samples/sec Loss 1.0933 LearningRate 0.0009 Epoch: 18 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:21,870-Speed 5592.48 samples/sec Loss 1.0709 LearningRate 0.0009 Epoch: 18 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:23,731-Speed 5504.83 samples/sec Loss 0.9931 LearningRate 0.0009 Epoch: 18 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:25,572-Speed 5564.14 samples/sec Loss 1.0661 LearningRate 0.0009 Epoch: 18 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:27,418-Speed 5549.12 samples/sec Loss 1.0308 LearningRate 0.0009 Epoch: 18 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:29,252-Speed 5584.73 samples/sec Loss 1.0587 LearningRate 0.0009 Epoch: 18 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:31,072-Speed 5627.83 samples/sec Loss 1.0299 LearningRate 0.0009 Epoch: 18 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:32,890-Speed 5634.73 samples/sec Loss 0.9866 LearningRate 0.0009 Epoch: 18 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:34,739-Speed 5538.84 samples/sec Loss 0.9706 LearningRate 0.0009 Epoch: 18 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:36,586-Speed 5545.40 samples/sec Loss 1.0151 LearningRate 0.0009 Epoch: 18 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:38,431-Speed 5552.16 samples/sec Loss 1.0217 LearningRate 0.0009 Epoch: 18 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:40,271-Speed 5566.47 samples/sec Loss 1.0064 LearningRate 0.0008 Epoch: 18 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:42,125-Speed 5527.76 samples/sec Loss 1.0164 LearningRate 0.0008 Epoch: 18 Global Step: 103250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:43,964-Speed 5570.09 samples/sec Loss 0.9826 LearningRate 0.0008 Epoch: 18 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:45,808-Speed 5553.37 samples/sec Loss 1.0281 LearningRate 0.0008 Epoch: 18 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:47,629-Speed 5626.43 samples/sec Loss 1.0529 LearningRate 0.0008 Epoch: 18 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:49,448-Speed 5630.92 samples/sec Loss 1.0731 LearningRate 0.0008 Epoch: 18 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:51,285-Speed 5576.94 samples/sec Loss 1.0377 LearningRate 0.0008 Epoch: 18 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:53,118-Speed 5587.85 samples/sec Loss 1.0118 LearningRate 0.0008 Epoch: 18 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:54,981-Speed 5498.41 samples/sec Loss 1.0225 LearningRate 0.0008 Epoch: 18 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:56,844-Speed 5498.32 samples/sec Loss 1.0088 LearningRate 0.0008 Epoch: 18 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:57:58,685-Speed 5565.07 samples/sec Loss 1.0463 LearningRate 0.0008 Epoch: 18 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:00,514-Speed 5599.71 samples/sec Loss 1.0218 LearningRate 0.0008 Epoch: 18 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:02,356-Speed 5562.11 samples/sec Loss 1.0113 LearningRate 0.0008 Epoch: 18 Global Step: 103360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:04,190-Speed 5584.07 samples/sec Loss 0.9756 LearningRate 0.0008 Epoch: 18 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:06,011-Speed 5624.74 samples/sec Loss 1.0271 LearningRate 0.0008 Epoch: 18 Global Step: 103380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:07,840-Speed 5601.80 samples/sec Loss 1.0817 LearningRate 0.0008 Epoch: 18 Global Step: 103390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:58:09,653-Speed 5651.18 samples/sec Loss 1.0502 LearningRate 0.0008 Epoch: 18 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:11,481-Speed 5601.37 samples/sec Loss 1.0251 LearningRate 0.0008 Epoch: 18 Global Step: 103410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:13,332-Speed 5535.84 samples/sec Loss 0.9920 LearningRate 0.0008 Epoch: 18 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:15,171-Speed 5570.19 samples/sec Loss 1.0174 LearningRate 0.0008 Epoch: 18 Global Step: 103430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:17,015-Speed 5554.81 samples/sec Loss 0.9810 LearningRate 0.0008 Epoch: 18 Global Step: 103440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:18,855-Speed 5567.40 samples/sec Loss 1.0782 LearningRate 0.0008 Epoch: 18 Global Step: 103450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:20,711-Speed 5518.06 samples/sec Loss 1.0463 LearningRate 0.0008 Epoch: 18 Global Step: 103460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:22,534-Speed 5619.64 samples/sec Loss 1.0155 LearningRate 0.0008 Epoch: 18 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:24,376-Speed 5560.88 samples/sec Loss 1.0037 LearningRate 0.0008 Epoch: 18 Global Step: 103480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:26,204-Speed 5603.17 samples/sec Loss 1.0388 LearningRate 0.0008 Epoch: 18 Global Step: 103490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:28,031-Speed 5607.06 samples/sec Loss 1.0062 LearningRate 0.0008 Epoch: 18 Global Step: 103500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 07:58:29,855-Speed 5617.00 samples/sec Loss 1.0292 LearningRate 0.0008 Epoch: 18 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:31,694-Speed 5570.25 samples/sec Loss 1.0865 LearningRate 0.0008 Epoch: 18 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:33,524-Speed 5597.02 samples/sec Loss 0.9890 LearningRate 0.0008 Epoch: 18 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:35,355-Speed 5592.64 samples/sec Loss 1.0452 LearningRate 0.0008 Epoch: 18 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:37,193-Speed 5572.76 samples/sec Loss 1.0462 LearningRate 0.0008 Epoch: 18 Global Step: 103550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:39,022-Speed 5603.01 samples/sec Loss 1.0016 LearningRate 0.0008 Epoch: 18 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:40,856-Speed 5583.54 samples/sec Loss 1.0388 LearningRate 0.0008 Epoch: 18 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:42,677-Speed 5626.25 samples/sec Loss 0.9499 LearningRate 0.0008 Epoch: 18 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:44,499-Speed 5620.95 samples/sec Loss 1.0483 LearningRate 0.0008 Epoch: 18 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:46,325-Speed 5610.00 samples/sec Loss 1.0364 LearningRate 0.0008 Epoch: 18 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:48,148-Speed 5623.99 samples/sec Loss 0.9702 LearningRate 0.0008 Epoch: 18 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:58:49,974-Speed 5609.19 samples/sec Loss 1.0615 LearningRate 0.0008 Epoch: 18 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:51,816-Speed 5561.71 samples/sec Loss 0.9431 LearningRate 0.0008 Epoch: 18 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:53,656-Speed 5564.22 samples/sec Loss 1.0484 LearningRate 0.0008 Epoch: 18 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:55,490-Speed 5587.37 samples/sec Loss 1.0075 LearningRate 0.0008 Epoch: 18 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:57,316-Speed 5609.04 samples/sec Loss 1.1037 LearningRate 0.0008 Epoch: 18 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:58:59,154-Speed 5574.61 samples/sec Loss 1.0753 LearningRate 0.0008 Epoch: 18 Global Step: 103670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:01,062-Speed 5368.03 samples/sec Loss 0.9682 LearningRate 0.0008 Epoch: 18 Global Step: 103680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:02,925-Speed 5497.72 samples/sec Loss 1.0394 LearningRate 0.0008 Epoch: 18 Global Step: 103690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:04,760-Speed 5583.38 samples/sec Loss 1.0374 LearningRate 0.0008 Epoch: 18 Global Step: 103700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:06,612-Speed 5529.52 samples/sec Loss 1.0018 LearningRate 0.0008 Epoch: 18 Global Step: 103710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:08,439-Speed 5607.40 samples/sec Loss 1.0032 LearningRate 0.0008 Epoch: 18 Global Step: 103720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:10,269-Speed 5596.38 samples/sec Loss 1.0681 LearningRate 0.0008 Epoch: 18 Global Step: 103730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:12,100-Speed 5596.73 samples/sec Loss 1.0889 LearningRate 0.0008 Epoch: 18 Global Step: 103740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:13,940-Speed 5565.66 samples/sec Loss 1.0668 LearningRate 0.0008 Epoch: 18 Global Step: 103750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:15,789-Speed 5539.07 samples/sec Loss 1.0242 LearningRate 0.0008 Epoch: 18 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:17,630-Speed 5564.86 samples/sec Loss 1.0376 LearningRate 0.0008 Epoch: 18 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:19,463-Speed 5589.17 samples/sec Loss 1.0415 LearningRate 0.0008 Epoch: 18 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:21,335-Speed 5470.91 samples/sec Loss 1.1068 LearningRate 0.0008 Epoch: 18 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:23,160-Speed 5612.69 samples/sec Loss 1.0472 LearningRate 0.0008 Epoch: 18 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:24,986-Speed 5609.08 samples/sec Loss 0.9841 LearningRate 0.0008 Epoch: 18 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:26,808-Speed 5623.46 samples/sec Loss 1.0483 LearningRate 0.0008 Epoch: 18 Global Step: 103820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 07:59:28,620-Speed 5653.85 samples/sec Loss 1.0729 LearningRate 0.0008 Epoch: 18 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:30,459-Speed 5569.69 samples/sec Loss 0.9850 LearningRate 0.0008 Epoch: 18 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:32,293-Speed 5584.94 samples/sec Loss 1.0240 LearningRate 0.0008 Epoch: 18 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:34,119-Speed 5612.28 samples/sec Loss 1.0729 LearningRate 0.0008 Epoch: 18 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:35,976-Speed 5514.21 samples/sec Loss 0.9934 LearningRate 0.0008 Epoch: 18 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:37,811-Speed 5583.62 samples/sec Loss 1.0409 LearningRate 0.0007 Epoch: 18 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:39,649-Speed 5572.90 samples/sec Loss 1.0355 LearningRate 0.0007 Epoch: 18 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:41,484-Speed 5581.84 samples/sec Loss 1.1386 LearningRate 0.0007 Epoch: 18 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:43,384-Speed 5391.43 samples/sec Loss 0.9868 LearningRate 0.0007 Epoch: 18 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:45,307-Speed 5326.16 samples/sec Loss 0.9636 LearningRate 0.0007 Epoch: 18 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:47,143-Speed 5578.27 samples/sec Loss 1.1160 LearningRate 0.0007 Epoch: 18 Global Step: 103930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:48,964-Speed 5627.16 samples/sec Loss 0.9392 LearningRate 0.0007 Epoch: 18 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:50,787-Speed 5616.18 samples/sec Loss 1.0570 LearningRate 0.0007 Epoch: 18 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:52,624-Speed 5576.80 samples/sec Loss 1.0191 LearningRate 0.0007 Epoch: 18 Global Step: 103960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:54,458-Speed 5584.57 samples/sec Loss 1.0025 LearningRate 0.0007 Epoch: 18 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:56,287-Speed 5601.53 samples/sec Loss 1.1198 LearningRate 0.0007 Epoch: 18 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:58,113-Speed 5609.87 samples/sec Loss 1.0771 LearningRate 0.0007 Epoch: 18 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 07:59:59,929-Speed 5642.91 samples/sec Loss 1.0455 LearningRate 0.0007 Epoch: 18 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:00:26,056-[lfw][104000]XNorm: 21.937785 Training: 2022-04-27 08:00:26,057-[lfw][104000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-04-27 08:00:26,057-[lfw][104000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:00:56,312-[cfp_fp][104000]XNorm: 21.038418 Training: 2022-04-27 08:00:56,312-[cfp_fp][104000]Accuracy-Flip: 0.97814+-0.00645 Training: 2022-04-27 08:00:56,313-[cfp_fp][104000]Accuracy-Highest: 0.97929 Training: 2022-04-27 08:01:22,464-[agedb_30][104000]XNorm: 22.024103 Training: 2022-04-27 08:01:22,465-[agedb_30][104000]Accuracy-Flip: 0.98133+-0.00640 Training: 2022-04-27 08:01:22,465-[agedb_30][104000]Accuracy-Highest: 0.98183 Training: 2022-04-27 08:01:24,296-Speed 121.37 samples/sec Loss 0.9541 LearningRate 0.0007 Epoch: 18 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:26,113-Speed 5638.03 samples/sec Loss 1.0262 LearningRate 0.0007 Epoch: 18 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:27,929-Speed 5641.27 samples/sec Loss 0.9317 LearningRate 0.0007 Epoch: 18 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:29,746-Speed 5637.71 samples/sec Loss 0.9943 LearningRate 0.0007 Epoch: 18 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:31,562-Speed 5640.02 samples/sec Loss 1.0729 LearningRate 0.0007 Epoch: 18 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:33,398-Speed 5578.29 samples/sec Loss 1.0019 LearningRate 0.0007 Epoch: 18 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:35,211-Speed 5652.37 samples/sec Loss 1.0129 LearningRate 0.0007 Epoch: 18 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:37,029-Speed 5631.47 samples/sec Loss 1.0375 LearningRate 0.0007 Epoch: 18 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:38,851-Speed 5623.80 samples/sec Loss 0.9718 LearningRate 0.0007 Epoch: 18 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:40,712-Speed 5503.48 samples/sec Loss 0.9917 LearningRate 0.0007 Epoch: 18 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:42,543-Speed 5596.25 samples/sec Loss 1.0478 LearningRate 0.0007 Epoch: 18 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:44,358-Speed 5641.79 samples/sec Loss 1.0711 LearningRate 0.0007 Epoch: 18 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:46,167-Speed 5663.16 samples/sec Loss 1.0105 LearningRate 0.0007 Epoch: 18 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:47,991-Speed 5614.77 samples/sec Loss 1.0131 LearningRate 0.0007 Epoch: 18 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:49,806-Speed 5642.86 samples/sec Loss 0.9935 LearningRate 0.0007 Epoch: 18 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:51,630-Speed 5619.32 samples/sec Loss 1.0330 LearningRate 0.0007 Epoch: 18 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:53,484-Speed 5523.32 samples/sec Loss 0.9973 LearningRate 0.0007 Epoch: 18 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:55,311-Speed 5609.05 samples/sec Loss 1.0635 LearningRate 0.0007 Epoch: 18 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:57,135-Speed 5614.50 samples/sec Loss 1.0349 LearningRate 0.0007 Epoch: 18 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:01:58,955-Speed 5628.23 samples/sec Loss 1.1048 LearningRate 0.0007 Epoch: 18 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:00,773-Speed 5632.63 samples/sec Loss 0.9975 LearningRate 0.0007 Epoch: 18 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:02,610-Speed 5578.87 samples/sec Loss 1.0452 LearningRate 0.0007 Epoch: 18 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:04,417-Speed 5666.73 samples/sec Loss 0.9872 LearningRate 0.0007 Epoch: 18 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:06,247-Speed 5598.66 samples/sec Loss 1.0215 LearningRate 0.0007 Epoch: 18 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:08,088-Speed 5564.22 samples/sec Loss 1.0248 LearningRate 0.0007 Epoch: 18 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:09,924-Speed 5577.62 samples/sec Loss 1.0174 LearningRate 0.0007 Epoch: 18 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:11,740-Speed 5642.43 samples/sec Loss 1.0753 LearningRate 0.0007 Epoch: 18 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:13,567-Speed 5604.63 samples/sec Loss 1.0515 LearningRate 0.0007 Epoch: 18 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:15,393-Speed 5610.04 samples/sec Loss 1.0877 LearningRate 0.0007 Epoch: 18 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:17,213-Speed 5630.22 samples/sec Loss 1.0131 LearningRate 0.0007 Epoch: 18 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:19,054-Speed 5564.28 samples/sec Loss 1.0918 LearningRate 0.0007 Epoch: 18 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:20,875-Speed 5623.94 samples/sec Loss 1.0005 LearningRate 0.0007 Epoch: 18 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:22,704-Speed 5600.53 samples/sec Loss 0.9662 LearningRate 0.0007 Epoch: 18 Global Step: 104330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 08:02:24,517-Speed 5651.15 samples/sec Loss 0.9910 LearningRate 0.0007 Epoch: 18 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:26,342-Speed 5613.00 samples/sec Loss 1.0847 LearningRate 0.0007 Epoch: 18 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:28,164-Speed 5620.78 samples/sec Loss 1.0312 LearningRate 0.0007 Epoch: 18 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:29,980-Speed 5640.70 samples/sec Loss 1.0021 LearningRate 0.0007 Epoch: 18 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:31,799-Speed 5630.86 samples/sec Loss 0.9874 LearningRate 0.0007 Epoch: 18 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:33,622-Speed 5621.31 samples/sec Loss 1.0183 LearningRate 0.0007 Epoch: 18 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:35,442-Speed 5627.77 samples/sec Loss 1.0111 LearningRate 0.0007 Epoch: 18 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:37,278-Speed 5579.01 samples/sec Loss 1.0107 LearningRate 0.0007 Epoch: 18 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:39,103-Speed 5614.24 samples/sec Loss 1.1240 LearningRate 0.0007 Epoch: 18 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:40,936-Speed 5585.77 samples/sec Loss 1.0056 LearningRate 0.0007 Epoch: 18 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:42,746-Speed 5659.99 samples/sec Loss 1.0209 LearningRate 0.0007 Epoch: 18 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:44,572-Speed 5609.24 samples/sec Loss 1.0343 LearningRate 0.0007 Epoch: 18 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:46,392-Speed 5628.54 samples/sec Loss 1.0150 LearningRate 0.0007 Epoch: 18 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:48,236-Speed 5554.48 samples/sec Loss 1.0381 LearningRate 0.0007 Epoch: 18 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:50,058-Speed 5622.95 samples/sec Loss 1.0003 LearningRate 0.0007 Epoch: 18 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:51,882-Speed 5615.18 samples/sec Loss 1.0416 LearningRate 0.0007 Epoch: 18 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:53,706-Speed 5616.26 samples/sec Loss 0.9945 LearningRate 0.0007 Epoch: 18 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:55,542-Speed 5580.36 samples/sec Loss 1.0468 LearningRate 0.0007 Epoch: 18 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:57,372-Speed 5596.22 samples/sec Loss 0.9960 LearningRate 0.0007 Epoch: 18 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:02:59,198-Speed 5611.07 samples/sec Loss 0.9860 LearningRate 0.0007 Epoch: 18 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:01,012-Speed 5647.31 samples/sec Loss 1.0803 LearningRate 0.0007 Epoch: 18 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:02,846-Speed 5584.68 samples/sec Loss 1.0506 LearningRate 0.0007 Epoch: 18 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:04,668-Speed 5621.60 samples/sec Loss 1.0778 LearningRate 0.0006 Epoch: 18 Global Step: 104560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:06,503-Speed 5582.74 samples/sec Loss 1.0833 LearningRate 0.0006 Epoch: 18 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:08,342-Speed 5571.12 samples/sec Loss 0.9907 LearningRate 0.0006 Epoch: 18 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:10,166-Speed 5616.30 samples/sec Loss 0.9890 LearningRate 0.0006 Epoch: 18 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:11,995-Speed 5599.31 samples/sec Loss 0.9655 LearningRate 0.0006 Epoch: 18 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:13,817-Speed 5623.29 samples/sec Loss 1.1457 LearningRate 0.0006 Epoch: 18 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:15,645-Speed 5601.71 samples/sec Loss 1.0292 LearningRate 0.0006 Epoch: 18 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:17,465-Speed 5628.06 samples/sec Loss 0.9896 LearningRate 0.0006 Epoch: 18 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:19,281-Speed 5640.91 samples/sec Loss 1.0467 LearningRate 0.0006 Epoch: 18 Global Step: 104640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:21,108-Speed 5605.20 samples/sec Loss 0.9798 LearningRate 0.0006 Epoch: 18 Global Step: 104650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:22,922-Speed 5649.49 samples/sec Loss 1.0690 LearningRate 0.0006 Epoch: 18 Global Step: 104660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:03:24,782-Speed 5506.54 samples/sec Loss 1.0619 LearningRate 0.0006 Epoch: 18 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:26,601-Speed 5631.77 samples/sec Loss 1.0440 LearningRate 0.0006 Epoch: 18 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:28,422-Speed 5626.13 samples/sec Loss 1.0287 LearningRate 0.0006 Epoch: 18 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:30,261-Speed 5569.71 samples/sec Loss 1.0182 LearningRate 0.0006 Epoch: 18 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:32,081-Speed 5626.76 samples/sec Loss 1.0969 LearningRate 0.0006 Epoch: 18 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:33,900-Speed 5631.71 samples/sec Loss 1.0050 LearningRate 0.0006 Epoch: 18 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:35,726-Speed 5610.01 samples/sec Loss 1.0993 LearningRate 0.0006 Epoch: 18 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:37,559-Speed 5587.00 samples/sec Loss 0.9770 LearningRate 0.0006 Epoch: 18 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:39,384-Speed 5614.04 samples/sec Loss 1.0506 LearningRate 0.0006 Epoch: 18 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:41,205-Speed 5626.29 samples/sec Loss 1.0049 LearningRate 0.0006 Epoch: 18 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:43,020-Speed 5643.84 samples/sec Loss 1.1262 LearningRate 0.0006 Epoch: 18 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:44,849-Speed 5599.26 samples/sec Loss 0.9807 LearningRate 0.0006 Epoch: 18 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:46,680-Speed 5595.59 samples/sec Loss 0.9978 LearningRate 0.0006 Epoch: 18 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:48,509-Speed 5601.50 samples/sec Loss 1.0036 LearningRate 0.0006 Epoch: 18 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:50,331-Speed 5620.51 samples/sec Loss 1.0675 LearningRate 0.0006 Epoch: 18 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:52,174-Speed 5557.12 samples/sec Loss 0.9902 LearningRate 0.0006 Epoch: 18 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:53,995-Speed 5625.32 samples/sec Loss 0.9928 LearningRate 0.0006 Epoch: 18 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:55,826-Speed 5594.29 samples/sec Loss 1.0614 LearningRate 0.0006 Epoch: 18 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:57,653-Speed 5608.84 samples/sec Loss 1.0938 LearningRate 0.0006 Epoch: 18 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:03:59,488-Speed 5582.28 samples/sec Loss 1.0971 LearningRate 0.0006 Epoch: 18 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:01,302-Speed 5646.32 samples/sec Loss 1.0277 LearningRate 0.0006 Epoch: 18 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:03,136-Speed 5584.39 samples/sec Loss 1.1356 LearningRate 0.0006 Epoch: 18 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:04,960-Speed 5615.76 samples/sec Loss 1.0617 LearningRate 0.0006 Epoch: 18 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:06,780-Speed 5630.04 samples/sec Loss 1.0217 LearningRate 0.0006 Epoch: 18 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:08,622-Speed 5564.41 samples/sec Loss 0.9526 LearningRate 0.0006 Epoch: 18 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:10,469-Speed 5544.67 samples/sec Loss 1.0019 LearningRate 0.0006 Epoch: 18 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:12,290-Speed 5625.08 samples/sec Loss 0.9927 LearningRate 0.0006 Epoch: 18 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:14,125-Speed 5582.88 samples/sec Loss 1.0255 LearningRate 0.0006 Epoch: 18 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:15,951-Speed 5609.14 samples/sec Loss 1.0332 LearningRate 0.0006 Epoch: 18 Global Step: 104950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:17,776-Speed 5613.14 samples/sec Loss 0.9997 LearningRate 0.0006 Epoch: 18 Global Step: 104960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:19,618-Speed 5562.25 samples/sec Loss 0.9670 LearningRate 0.0006 Epoch: 18 Global Step: 104970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:21,450-Speed 5590.14 samples/sec Loss 0.9671 LearningRate 0.0006 Epoch: 18 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:23,298-Speed 5542.00 samples/sec Loss 0.9880 LearningRate 0.0006 Epoch: 18 Global Step: 104990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:25,139-Speed 5565.52 samples/sec Loss 1.0014 LearningRate 0.0006 Epoch: 18 Global Step: 105000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:26,959-Speed 5627.26 samples/sec Loss 1.0303 LearningRate 0.0006 Epoch: 18 Global Step: 105010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:28,799-Speed 5569.60 samples/sec Loss 1.0750 LearningRate 0.0006 Epoch: 18 Global Step: 105020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:30,629-Speed 5595.29 samples/sec Loss 1.0340 LearningRate 0.0006 Epoch: 18 Global Step: 105030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:32,449-Speed 5629.81 samples/sec Loss 1.0645 LearningRate 0.0006 Epoch: 18 Global Step: 105040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:34,268-Speed 5631.94 samples/sec Loss 0.9922 LearningRate 0.0006 Epoch: 18 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:36,094-Speed 5609.21 samples/sec Loss 1.0084 LearningRate 0.0006 Epoch: 18 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:37,918-Speed 5614.28 samples/sec Loss 1.0241 LearningRate 0.0006 Epoch: 18 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:39,749-Speed 5595.74 samples/sec Loss 1.0481 LearningRate 0.0006 Epoch: 18 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:41,609-Speed 5507.53 samples/sec Loss 1.0029 LearningRate 0.0006 Epoch: 18 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:43,514-Speed 5376.94 samples/sec Loss 1.0933 LearningRate 0.0006 Epoch: 18 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:45,351-Speed 5574.97 samples/sec Loss 0.9843 LearningRate 0.0006 Epoch: 18 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:47,168-Speed 5640.76 samples/sec Loss 1.0433 LearningRate 0.0006 Epoch: 18 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:49,014-Speed 5547.49 samples/sec Loss 1.0215 LearningRate 0.0006 Epoch: 18 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:50,834-Speed 5628.32 samples/sec Loss 1.0326 LearningRate 0.0006 Epoch: 18 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:52,673-Speed 5570.59 samples/sec Loss 1.1659 LearningRate 0.0006 Epoch: 18 Global Step: 105150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:54,494-Speed 5626.57 samples/sec Loss 1.0376 LearningRate 0.0006 Epoch: 18 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:56,312-Speed 5634.35 samples/sec Loss 1.0401 LearningRate 0.0006 Epoch: 18 Global Step: 105170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-27 08:04:58,129-Speed 5636.97 samples/sec Loss 1.0202 LearningRate 0.0006 Epoch: 18 Global Step: 105180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:04:59,980-Speed 5534.31 samples/sec Loss 0.9740 LearningRate 0.0006 Epoch: 18 Global Step: 105190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:01,818-Speed 5571.93 samples/sec Loss 1.0278 LearningRate 0.0006 Epoch: 18 Global Step: 105200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:03,659-Speed 5563.06 samples/sec Loss 1.0775 LearningRate 0.0006 Epoch: 18 Global Step: 105210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:05,489-Speed 5599.16 samples/sec Loss 1.0972 LearningRate 0.0006 Epoch: 18 Global Step: 105220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:07,311-Speed 5621.77 samples/sec Loss 1.0403 LearningRate 0.0006 Epoch: 18 Global Step: 105230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:09,157-Speed 5550.18 samples/sec Loss 0.9780 LearningRate 0.0006 Epoch: 18 Global Step: 105240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:10,979-Speed 5622.13 samples/sec Loss 1.0080 LearningRate 0.0006 Epoch: 18 Global Step: 105250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:12,797-Speed 5634.90 samples/sec Loss 1.0856 LearningRate 0.0006 Epoch: 18 Global Step: 105260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:14,662-Speed 5491.78 samples/sec Loss 1.0630 LearningRate 0.0006 Epoch: 18 Global Step: 105270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:16,521-Speed 5510.97 samples/sec Loss 1.0912 LearningRate 0.0006 Epoch: 18 Global Step: 105280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:18,347-Speed 5609.36 samples/sec Loss 1.0675 LearningRate 0.0005 Epoch: 18 Global Step: 105290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:20,176-Speed 5599.52 samples/sec Loss 0.9936 LearningRate 0.0005 Epoch: 18 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:22,023-Speed 5546.30 samples/sec Loss 0.9425 LearningRate 0.0005 Epoch: 18 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:23,859-Speed 5577.66 samples/sec Loss 1.0334 LearningRate 0.0005 Epoch: 18 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:25,689-Speed 5597.74 samples/sec Loss 1.0573 LearningRate 0.0005 Epoch: 18 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:27,529-Speed 5569.27 samples/sec Loss 0.9516 LearningRate 0.0005 Epoch: 18 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:29,364-Speed 5580.24 samples/sec Loss 1.0418 LearningRate 0.0005 Epoch: 18 Global Step: 105350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:31,184-Speed 5630.18 samples/sec Loss 1.0445 LearningRate 0.0005 Epoch: 18 Global Step: 105360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:33,010-Speed 5611.37 samples/sec Loss 1.0557 LearningRate 0.0005 Epoch: 18 Global Step: 105370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:34,837-Speed 5604.23 samples/sec Loss 0.9727 LearningRate 0.0005 Epoch: 18 Global Step: 105380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:36,690-Speed 5529.73 samples/sec Loss 0.9136 LearningRate 0.0005 Epoch: 18 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:38,548-Speed 5510.87 samples/sec Loss 1.0495 LearningRate 0.0005 Epoch: 18 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:40,397-Speed 5541.34 samples/sec Loss 0.9163 LearningRate 0.0005 Epoch: 18 Global Step: 105410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:42,270-Speed 5469.90 samples/sec Loss 0.9643 LearningRate 0.0005 Epoch: 18 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:44,186-Speed 5346.49 samples/sec Loss 1.0354 LearningRate 0.0005 Epoch: 18 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:46,028-Speed 5561.04 samples/sec Loss 0.9942 LearningRate 0.0005 Epoch: 18 Global Step: 105440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:47,897-Speed 5479.93 samples/sec Loss 1.0187 LearningRate 0.0005 Epoch: 18 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:49,745-Speed 5540.70 samples/sec Loss 1.0810 LearningRate 0.0005 Epoch: 18 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:05:51,569-Speed 5616.73 samples/sec Loss 1.0546 LearningRate 0.0005 Epoch: 18 Global Step: 105470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:53,457-Speed 5426.94 samples/sec Loss 1.0643 LearningRate 0.0005 Epoch: 18 Global Step: 105480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:55,278-Speed 5624.91 samples/sec Loss 1.1060 LearningRate 0.0005 Epoch: 18 Global Step: 105490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:57,099-Speed 5624.48 samples/sec Loss 0.9120 LearningRate 0.0005 Epoch: 18 Global Step: 105500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:05:58,940-Speed 5566.02 samples/sec Loss 1.0676 LearningRate 0.0005 Epoch: 18 Global Step: 105510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:00,759-Speed 5629.82 samples/sec Loss 0.9861 LearningRate 0.0005 Epoch: 18 Global Step: 105520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:02,590-Speed 5595.65 samples/sec Loss 1.1058 LearningRate 0.0005 Epoch: 18 Global Step: 105530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:04,407-Speed 5636.50 samples/sec Loss 1.0489 LearningRate 0.0005 Epoch: 18 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:06,246-Speed 5570.58 samples/sec Loss 1.0468 LearningRate 0.0005 Epoch: 18 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:08,075-Speed 5599.94 samples/sec Loss 1.0385 LearningRate 0.0005 Epoch: 18 Global Step: 105560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 08:06:09,902-Speed 5605.95 samples/sec Loss 0.9901 LearningRate 0.0005 Epoch: 18 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:11,750-Speed 5545.18 samples/sec Loss 0.9225 LearningRate 0.0005 Epoch: 18 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:13,591-Speed 5561.97 samples/sec Loss 1.0231 LearningRate 0.0005 Epoch: 18 Global Step: 105590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:15,420-Speed 5602.19 samples/sec Loss 1.0238 LearningRate 0.0005 Epoch: 18 Global Step: 105600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:17,249-Speed 5601.79 samples/sec Loss 1.0517 LearningRate 0.0005 Epoch: 18 Global Step: 105610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:19,128-Speed 5449.84 samples/sec Loss 1.0521 LearningRate 0.0005 Epoch: 18 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:20,955-Speed 5608.37 samples/sec Loss 1.0500 LearningRate 0.0005 Epoch: 18 Global Step: 105630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:22,820-Speed 5492.28 samples/sec Loss 1.0501 LearningRate 0.0005 Epoch: 18 Global Step: 105640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:24,636-Speed 5640.84 samples/sec Loss 1.0125 LearningRate 0.0005 Epoch: 18 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:26,460-Speed 5614.06 samples/sec Loss 0.9783 LearningRate 0.0005 Epoch: 18 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:28,299-Speed 5571.04 samples/sec Loss 1.0494 LearningRate 0.0005 Epoch: 18 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:30,124-Speed 5613.06 samples/sec Loss 1.0361 LearningRate 0.0005 Epoch: 18 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:31,968-Speed 5555.91 samples/sec Loss 1.0821 LearningRate 0.0005 Epoch: 18 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:33,806-Speed 5570.27 samples/sec Loss 1.0110 LearningRate 0.0005 Epoch: 18 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:35,648-Speed 5563.39 samples/sec Loss 1.0151 LearningRate 0.0005 Epoch: 18 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:37,480-Speed 5590.93 samples/sec Loss 0.9386 LearningRate 0.0005 Epoch: 18 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:39,312-Speed 5592.76 samples/sec Loss 1.0082 LearningRate 0.0005 Epoch: 18 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:41,135-Speed 5618.66 samples/sec Loss 0.9919 LearningRate 0.0005 Epoch: 18 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:42,954-Speed 5631.90 samples/sec Loss 1.0752 LearningRate 0.0005 Epoch: 18 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 08:06:44,783-Speed 5597.61 samples/sec Loss 1.0834 LearningRate 0.0005 Epoch: 18 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:46,661-Speed 5454.77 samples/sec Loss 1.0170 LearningRate 0.0005 Epoch: 18 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:48,488-Speed 5608.03 samples/sec Loss 1.0505 LearningRate 0.0005 Epoch: 18 Global Step: 105780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:50,318-Speed 5597.99 samples/sec Loss 1.0219 LearningRate 0.0005 Epoch: 18 Global Step: 105790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:52,145-Speed 5606.19 samples/sec Loss 1.0387 LearningRate 0.0005 Epoch: 18 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:53,982-Speed 5575.42 samples/sec Loss 0.9854 LearningRate 0.0005 Epoch: 18 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:55,821-Speed 5570.22 samples/sec Loss 1.0980 LearningRate 0.0005 Epoch: 18 Global Step: 105820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:57,642-Speed 5625.17 samples/sec Loss 1.0979 LearningRate 0.0005 Epoch: 18 Global Step: 105830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:06:59,489-Speed 5544.75 samples/sec Loss 1.0566 LearningRate 0.0005 Epoch: 18 Global Step: 105840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:01,324-Speed 5582.44 samples/sec Loss 1.0832 LearningRate 0.0005 Epoch: 18 Global Step: 105850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:03,163-Speed 5572.71 samples/sec Loss 1.0498 LearningRate 0.0005 Epoch: 18 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:04,980-Speed 5637.46 samples/sec Loss 0.9651 LearningRate 0.0005 Epoch: 18 Global Step: 105870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:07:06,788-Speed 5663.05 samples/sec Loss 1.0346 LearningRate 0.0005 Epoch: 18 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:08,612-Speed 5618.12 samples/sec Loss 0.9743 LearningRate 0.0005 Epoch: 18 Global Step: 105890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:10,437-Speed 5613.51 samples/sec Loss 1.1036 LearningRate 0.0005 Epoch: 18 Global Step: 105900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:12,260-Speed 5618.55 samples/sec Loss 1.0265 LearningRate 0.0005 Epoch: 18 Global Step: 105910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:14,081-Speed 5624.67 samples/sec Loss 0.9915 LearningRate 0.0005 Epoch: 18 Global Step: 105920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:15,895-Speed 5645.74 samples/sec Loss 1.0326 LearningRate 0.0005 Epoch: 18 Global Step: 105930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:17,713-Speed 5634.58 samples/sec Loss 1.0514 LearningRate 0.0005 Epoch: 18 Global Step: 105940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:19,531-Speed 5633.53 samples/sec Loss 1.1153 LearningRate 0.0005 Epoch: 18 Global Step: 105950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:21,364-Speed 5590.89 samples/sec Loss 1.0830 LearningRate 0.0005 Epoch: 18 Global Step: 105960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:23,204-Speed 5565.92 samples/sec Loss 1.0930 LearningRate 0.0005 Epoch: 18 Global Step: 105970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:25,018-Speed 5648.11 samples/sec Loss 1.0692 LearningRate 0.0005 Epoch: 18 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:26,835-Speed 5636.46 samples/sec Loss 1.0666 LearningRate 0.0005 Epoch: 18 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:28,659-Speed 5614.79 samples/sec Loss 1.0788 LearningRate 0.0005 Epoch: 18 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:07:55,096-[lfw][106000]XNorm: 21.998031 Training: 2022-04-27 08:07:55,096-[lfw][106000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-27 08:07:55,097-[lfw][106000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:08:25,657-[cfp_fp][106000]XNorm: 21.255755 Training: 2022-04-27 08:08:25,658-[cfp_fp][106000]Accuracy-Flip: 0.97814+-0.00645 Training: 2022-04-27 08:08:25,658-[cfp_fp][106000]Accuracy-Highest: 0.97929 Training: 2022-04-27 08:08:52,051-[agedb_30][106000]XNorm: 22.189983 Training: 2022-04-27 08:08:52,052-[agedb_30][106000]Accuracy-Flip: 0.98233+-0.00620 Training: 2022-04-27 08:08:52,052-[agedb_30][106000]Accuracy-Highest: 0.98233 Training: 2022-04-27 08:08:53,918-Speed 120.11 samples/sec Loss 0.9803 LearningRate 0.0005 Epoch: 18 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:08:55,738-Speed 5627.29 samples/sec Loss 1.0809 LearningRate 0.0005 Epoch: 18 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:08:57,606-Speed 5483.49 samples/sec Loss 1.0784 LearningRate 0.0005 Epoch: 18 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:08:59,415-Speed 5661.80 samples/sec Loss 1.0296 LearningRate 0.0005 Epoch: 18 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:01,230-Speed 5646.02 samples/sec Loss 0.9690 LearningRate 0.0005 Epoch: 18 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:03,052-Speed 5619.95 samples/sec Loss 1.0370 LearningRate 0.0005 Epoch: 18 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:04,882-Speed 5599.18 samples/sec Loss 1.0989 LearningRate 0.0005 Epoch: 18 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:06,682-Speed 5688.59 samples/sec Loss 0.9538 LearningRate 0.0005 Epoch: 18 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:08,503-Speed 5625.39 samples/sec Loss 1.0803 LearningRate 0.0005 Epoch: 18 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:10,331-Speed 5603.72 samples/sec Loss 1.0719 LearningRate 0.0004 Epoch: 18 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:12,156-Speed 5611.83 samples/sec Loss 1.0248 LearningRate 0.0004 Epoch: 18 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:14,004-Speed 5544.52 samples/sec Loss 0.9879 LearningRate 0.0004 Epoch: 18 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:15,820-Speed 5640.30 samples/sec Loss 1.0702 LearningRate 0.0004 Epoch: 18 Global Step: 106130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:17,675-Speed 5522.74 samples/sec Loss 1.1148 LearningRate 0.0004 Epoch: 18 Global Step: 106140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:19,554-Speed 5449.66 samples/sec Loss 0.9949 LearningRate 0.0004 Epoch: 18 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:21,390-Speed 5580.15 samples/sec Loss 1.0078 LearningRate 0.0004 Epoch: 18 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:23,217-Speed 5606.93 samples/sec Loss 0.9958 LearningRate 0.0004 Epoch: 18 Global Step: 106170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:25,044-Speed 5606.09 samples/sec Loss 0.9526 LearningRate 0.0004 Epoch: 18 Global Step: 106180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:26,875-Speed 5595.75 samples/sec Loss 1.1155 LearningRate 0.0004 Epoch: 18 Global Step: 106190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:28,692-Speed 5638.89 samples/sec Loss 0.9941 LearningRate 0.0004 Epoch: 18 Global Step: 106200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:30,522-Speed 5596.37 samples/sec Loss 0.9876 LearningRate 0.0004 Epoch: 18 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:32,361-Speed 5569.18 samples/sec Loss 1.0613 LearningRate 0.0004 Epoch: 18 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:34,199-Speed 5573.02 samples/sec Loss 1.0833 LearningRate 0.0004 Epoch: 18 Global Step: 106230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:36,065-Speed 5490.15 samples/sec Loss 1.0668 LearningRate 0.0004 Epoch: 18 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:37,930-Speed 5492.02 samples/sec Loss 0.9975 LearningRate 0.0004 Epoch: 18 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:39,748-Speed 5633.29 samples/sec Loss 1.0651 LearningRate 0.0004 Epoch: 18 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:41,582-Speed 5585.57 samples/sec Loss 1.0692 LearningRate 0.0004 Epoch: 18 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:43,408-Speed 5612.47 samples/sec Loss 1.0309 LearningRate 0.0004 Epoch: 18 Global Step: 106280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:09:45,226-Speed 5632.43 samples/sec Loss 1.0745 LearningRate 0.0004 Epoch: 18 Global Step: 106290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:09:47,043-Speed 5639.07 samples/sec Loss 0.9964 LearningRate 0.0004 Epoch: 18 Global Step: 106300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:48,869-Speed 5609.21 samples/sec Loss 1.0253 LearningRate 0.0004 Epoch: 18 Global Step: 106310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:50,701-Speed 5592.34 samples/sec Loss 1.0465 LearningRate 0.0004 Epoch: 18 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:52,525-Speed 5614.13 samples/sec Loss 1.0027 LearningRate 0.0004 Epoch: 18 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:54,346-Speed 5626.65 samples/sec Loss 1.0826 LearningRate 0.0004 Epoch: 18 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:56,206-Speed 5507.54 samples/sec Loss 1.0311 LearningRate 0.0004 Epoch: 18 Global Step: 106350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:58,052-Speed 5547.84 samples/sec Loss 1.0584 LearningRate 0.0004 Epoch: 18 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:09:59,875-Speed 5619.35 samples/sec Loss 0.9177 LearningRate 0.0004 Epoch: 18 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:01,722-Speed 5545.31 samples/sec Loss 1.0282 LearningRate 0.0004 Epoch: 18 Global Step: 106380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:03,562-Speed 5567.79 samples/sec Loss 1.0807 LearningRate 0.0004 Epoch: 18 Global Step: 106390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:05,401-Speed 5569.14 samples/sec Loss 1.0120 LearningRate 0.0004 Epoch: 18 Global Step: 106400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:07,223-Speed 5622.11 samples/sec Loss 1.0571 LearningRate 0.0004 Epoch: 18 Global Step: 106410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:09,054-Speed 5595.19 samples/sec Loss 1.0999 LearningRate 0.0004 Epoch: 18 Global Step: 106420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:10,883-Speed 5601.51 samples/sec Loss 1.0239 LearningRate 0.0004 Epoch: 18 Global Step: 106430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:12,729-Speed 5547.87 samples/sec Loss 1.0654 LearningRate 0.0004 Epoch: 18 Global Step: 106440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:14,556-Speed 5606.62 samples/sec Loss 1.0533 LearningRate 0.0004 Epoch: 18 Global Step: 106450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:16,372-Speed 5641.78 samples/sec Loss 1.0440 LearningRate 0.0004 Epoch: 18 Global Step: 106460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:18,225-Speed 5527.40 samples/sec Loss 1.0377 LearningRate 0.0004 Epoch: 18 Global Step: 106470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:20,065-Speed 5566.44 samples/sec Loss 1.0564 LearningRate 0.0004 Epoch: 18 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:21,901-Speed 5578.70 samples/sec Loss 1.1069 LearningRate 0.0004 Epoch: 18 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:23,728-Speed 5605.38 samples/sec Loss 1.0104 LearningRate 0.0004 Epoch: 18 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:25,581-Speed 5530.63 samples/sec Loss 1.0250 LearningRate 0.0004 Epoch: 18 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:27,437-Speed 5516.97 samples/sec Loss 1.0727 LearningRate 0.0004 Epoch: 18 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:29,288-Speed 5537.18 samples/sec Loss 1.0292 LearningRate 0.0004 Epoch: 18 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:31,105-Speed 5636.26 samples/sec Loss 1.0482 LearningRate 0.0004 Epoch: 18 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:32,928-Speed 5619.82 samples/sec Loss 0.9758 LearningRate 0.0004 Epoch: 18 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:34,754-Speed 5609.69 samples/sec Loss 1.0381 LearningRate 0.0004 Epoch: 18 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:36,579-Speed 5611.37 samples/sec Loss 1.0806 LearningRate 0.0004 Epoch: 18 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:38,415-Speed 5579.91 samples/sec Loss 0.9880 LearningRate 0.0004 Epoch: 18 Global Step: 106580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:10:40,232-Speed 5635.85 samples/sec Loss 1.0161 LearningRate 0.0004 Epoch: 18 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:42,063-Speed 5596.08 samples/sec Loss 1.1029 LearningRate 0.0004 Epoch: 18 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:10:43,877-Speed 5647.64 samples/sec Loss 1.0377 LearningRate 0.0004 Epoch: 18 Global Step: 106610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:45,695-Speed 5634.13 samples/sec Loss 1.0157 LearningRate 0.0004 Epoch: 18 Global Step: 106620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:47,544-Speed 5537.81 samples/sec Loss 1.0757 LearningRate 0.0004 Epoch: 18 Global Step: 106630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:49,372-Speed 5604.79 samples/sec Loss 1.0606 LearningRate 0.0004 Epoch: 18 Global Step: 106640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:51,204-Speed 5591.33 samples/sec Loss 1.0389 LearningRate 0.0004 Epoch: 18 Global Step: 106650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:53,029-Speed 5613.48 samples/sec Loss 1.0612 LearningRate 0.0004 Epoch: 18 Global Step: 106660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:54,882-Speed 5527.12 samples/sec Loss 1.1131 LearningRate 0.0004 Epoch: 18 Global Step: 106670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:56,699-Speed 5640.19 samples/sec Loss 1.0468 LearningRate 0.0004 Epoch: 18 Global Step: 106680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:10:58,526-Speed 5606.64 samples/sec Loss 1.0429 LearningRate 0.0004 Epoch: 18 Global Step: 106690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:00,347-Speed 5622.81 samples/sec Loss 1.0403 LearningRate 0.0004 Epoch: 18 Global Step: 106700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:02,175-Speed 5603.47 samples/sec Loss 1.0471 LearningRate 0.0004 Epoch: 18 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:03,999-Speed 5616.72 samples/sec Loss 1.0420 LearningRate 0.0004 Epoch: 18 Global Step: 106720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:05,821-Speed 5621.83 samples/sec Loss 1.0623 LearningRate 0.0004 Epoch: 18 Global Step: 106730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:07,665-Speed 5555.91 samples/sec Loss 1.0923 LearningRate 0.0004 Epoch: 18 Global Step: 106740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:09,496-Speed 5595.13 samples/sec Loss 1.0024 LearningRate 0.0004 Epoch: 18 Global Step: 106750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:11,313-Speed 5636.40 samples/sec Loss 1.0662 LearningRate 0.0004 Epoch: 18 Global Step: 106760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:13,157-Speed 5554.78 samples/sec Loss 1.0078 LearningRate 0.0004 Epoch: 18 Global Step: 106770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:14,987-Speed 5598.90 samples/sec Loss 1.0736 LearningRate 0.0004 Epoch: 18 Global Step: 106780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:16,820-Speed 5588.84 samples/sec Loss 1.0062 LearningRate 0.0004 Epoch: 18 Global Step: 106790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:18,665-Speed 5552.39 samples/sec Loss 1.0178 LearningRate 0.0004 Epoch: 18 Global Step: 106800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:20,487-Speed 5620.48 samples/sec Loss 1.0094 LearningRate 0.0004 Epoch: 18 Global Step: 106810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:11:22,324-Speed 5576.14 samples/sec Loss 1.0517 LearningRate 0.0004 Epoch: 18 Global Step: 106820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:24,174-Speed 5537.27 samples/sec Loss 1.1117 LearningRate 0.0004 Epoch: 18 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:26,015-Speed 5564.52 samples/sec Loss 1.0234 LearningRate 0.0004 Epoch: 18 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:27,887-Speed 5472.03 samples/sec Loss 1.0827 LearningRate 0.0004 Epoch: 18 Global Step: 106850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:29,740-Speed 5526.09 samples/sec Loss 0.9605 LearningRate 0.0004 Epoch: 18 Global Step: 106860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:31,572-Speed 5592.59 samples/sec Loss 0.9996 LearningRate 0.0004 Epoch: 18 Global Step: 106870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:33,403-Speed 5594.79 samples/sec Loss 1.0765 LearningRate 0.0004 Epoch: 18 Global Step: 106880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:35,226-Speed 5618.63 samples/sec Loss 1.0103 LearningRate 0.0004 Epoch: 18 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:37,071-Speed 5551.85 samples/sec Loss 0.9768 LearningRate 0.0004 Epoch: 18 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:38,917-Speed 5549.15 samples/sec Loss 1.0107 LearningRate 0.0004 Epoch: 18 Global Step: 106910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:40,753-Speed 5578.08 samples/sec Loss 1.1139 LearningRate 0.0004 Epoch: 18 Global Step: 106920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:11:42,588-Speed 5584.20 samples/sec Loss 1.0817 LearningRate 0.0004 Epoch: 18 Global Step: 106930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:44,411-Speed 5616.64 samples/sec Loss 1.0882 LearningRate 0.0004 Epoch: 18 Global Step: 106940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:46,230-Speed 5632.03 samples/sec Loss 1.1369 LearningRate 0.0004 Epoch: 18 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:48,059-Speed 5599.96 samples/sec Loss 0.9998 LearningRate 0.0004 Epoch: 18 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:49,893-Speed 5587.92 samples/sec Loss 1.0469 LearningRate 0.0004 Epoch: 18 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:51,734-Speed 5561.38 samples/sec Loss 0.9501 LearningRate 0.0004 Epoch: 18 Global Step: 106980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:53,554-Speed 5627.57 samples/sec Loss 1.1440 LearningRate 0.0004 Epoch: 18 Global Step: 106990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:55,392-Speed 5574.84 samples/sec Loss 1.0401 LearningRate 0.0003 Epoch: 18 Global Step: 107000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:57,250-Speed 5515.60 samples/sec Loss 1.0361 LearningRate 0.0003 Epoch: 18 Global Step: 107010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:11:59,113-Speed 5496.11 samples/sec Loss 1.0556 LearningRate 0.0003 Epoch: 18 Global Step: 107020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:00,938-Speed 5614.79 samples/sec Loss 1.0693 LearningRate 0.0003 Epoch: 18 Global Step: 107030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:02,757-Speed 5629.69 samples/sec Loss 1.0598 LearningRate 0.0003 Epoch: 18 Global Step: 107040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:04,591-Speed 5587.64 samples/sec Loss 1.0050 LearningRate 0.0003 Epoch: 18 Global Step: 107050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:06,436-Speed 5551.76 samples/sec Loss 1.0262 LearningRate 0.0003 Epoch: 18 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:08,252-Speed 5638.06 samples/sec Loss 1.0315 LearningRate 0.0003 Epoch: 18 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:10,102-Speed 5537.29 samples/sec Loss 1.0322 LearningRate 0.0003 Epoch: 18 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:11,946-Speed 5556.94 samples/sec Loss 1.0318 LearningRate 0.0003 Epoch: 18 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:13,799-Speed 5527.79 samples/sec Loss 1.0507 LearningRate 0.0003 Epoch: 18 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:15,620-Speed 5623.63 samples/sec Loss 1.0369 LearningRate 0.0003 Epoch: 18 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:17,472-Speed 5533.04 samples/sec Loss 1.0797 LearningRate 0.0003 Epoch: 18 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:19,313-Speed 5562.66 samples/sec Loss 1.0886 LearningRate 0.0003 Epoch: 18 Global Step: 107130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:12:21,125-Speed 5652.46 samples/sec Loss 1.0655 LearningRate 0.0003 Epoch: 18 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:22,957-Speed 5593.27 samples/sec Loss 1.0535 LearningRate 0.0003 Epoch: 18 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:24,781-Speed 5614.85 samples/sec Loss 1.0903 LearningRate 0.0003 Epoch: 18 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:26,625-Speed 5554.61 samples/sec Loss 1.0142 LearningRate 0.0003 Epoch: 18 Global Step: 107170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:28,466-Speed 5565.79 samples/sec Loss 1.0382 LearningRate 0.0003 Epoch: 18 Global Step: 107180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:30,294-Speed 5603.72 samples/sec Loss 1.0268 LearningRate 0.0003 Epoch: 18 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:32,193-Speed 5391.74 samples/sec Loss 1.0262 LearningRate 0.0003 Epoch: 18 Global Step: 107200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:34,024-Speed 5595.65 samples/sec Loss 1.0823 LearningRate 0.0003 Epoch: 18 Global Step: 107210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:35,848-Speed 5614.66 samples/sec Loss 1.0616 LearningRate 0.0003 Epoch: 18 Global Step: 107220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:37,682-Speed 5585.98 samples/sec Loss 0.9939 LearningRate 0.0003 Epoch: 18 Global Step: 107230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:39,528-Speed 5550.74 samples/sec Loss 1.0026 LearningRate 0.0003 Epoch: 18 Global Step: 107240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:12:41,365-Speed 5576.47 samples/sec Loss 1.0816 LearningRate 0.0003 Epoch: 18 Global Step: 107250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:43,224-Speed 5509.99 samples/sec Loss 1.0020 LearningRate 0.0003 Epoch: 18 Global Step: 107260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:45,064-Speed 5565.39 samples/sec Loss 1.0026 LearningRate 0.0003 Epoch: 18 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:46,895-Speed 5594.34 samples/sec Loss 1.0181 LearningRate 0.0003 Epoch: 18 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:48,739-Speed 5557.28 samples/sec Loss 1.0152 LearningRate 0.0003 Epoch: 18 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:50,612-Speed 5467.62 samples/sec Loss 1.0583 LearningRate 0.0003 Epoch: 18 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:52,458-Speed 5551.00 samples/sec Loss 0.9787 LearningRate 0.0003 Epoch: 18 Global Step: 107310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:54,278-Speed 5627.99 samples/sec Loss 1.0319 LearningRate 0.0003 Epoch: 18 Global Step: 107320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:56,119-Speed 5563.57 samples/sec Loss 1.0331 LearningRate 0.0003 Epoch: 18 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:57,949-Speed 5595.63 samples/sec Loss 1.0498 LearningRate 0.0003 Epoch: 18 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:12:59,765-Speed 5642.26 samples/sec Loss 1.0448 LearningRate 0.0003 Epoch: 18 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:01,599-Speed 5583.68 samples/sec Loss 0.9742 LearningRate 0.0003 Epoch: 18 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:03,433-Speed 5588.45 samples/sec Loss 0.9538 LearningRate 0.0003 Epoch: 18 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:05,254-Speed 5625.30 samples/sec Loss 1.0130 LearningRate 0.0003 Epoch: 18 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:07,079-Speed 5613.06 samples/sec Loss 0.9747 LearningRate 0.0003 Epoch: 18 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:08,918-Speed 5569.21 samples/sec Loss 1.0746 LearningRate 0.0003 Epoch: 18 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:10,746-Speed 5602.57 samples/sec Loss 0.9647 LearningRate 0.0003 Epoch: 18 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:12,588-Speed 5562.25 samples/sec Loss 1.0677 LearningRate 0.0003 Epoch: 18 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:14,411-Speed 5616.58 samples/sec Loss 1.0584 LearningRate 0.0003 Epoch: 18 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:16,231-Speed 5629.36 samples/sec Loss 0.9686 LearningRate 0.0003 Epoch: 18 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:18,039-Speed 5666.36 samples/sec Loss 0.9924 LearningRate 0.0003 Epoch: 18 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:19,870-Speed 5592.75 samples/sec Loss 1.0704 LearningRate 0.0003 Epoch: 18 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:21,695-Speed 5613.94 samples/sec Loss 1.0538 LearningRate 0.0003 Epoch: 18 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:23,540-Speed 5551.78 samples/sec Loss 0.9912 LearningRate 0.0003 Epoch: 18 Global Step: 107480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:25,375-Speed 5582.70 samples/sec Loss 1.0384 LearningRate 0.0003 Epoch: 18 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:27,203-Speed 5604.64 samples/sec Loss 1.0242 LearningRate 0.0003 Epoch: 18 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:29,033-Speed 5598.60 samples/sec Loss 1.0494 LearningRate 0.0003 Epoch: 18 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:30,894-Speed 5503.36 samples/sec Loss 1.1280 LearningRate 0.0003 Epoch: 18 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:32,733-Speed 5568.81 samples/sec Loss 1.0061 LearningRate 0.0003 Epoch: 18 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:34,568-Speed 5581.99 samples/sec Loss 1.0555 LearningRate 0.0003 Epoch: 18 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:36,396-Speed 5605.30 samples/sec Loss 1.0451 LearningRate 0.0003 Epoch: 18 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:38,240-Speed 5552.74 samples/sec Loss 1.1147 LearningRate 0.0003 Epoch: 18 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:40,086-Speed 5551.32 samples/sec Loss 1.0554 LearningRate 0.0003 Epoch: 18 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:41,934-Speed 5542.73 samples/sec Loss 1.0652 LearningRate 0.0003 Epoch: 18 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:43,768-Speed 5584.92 samples/sec Loss 1.0545 LearningRate 0.0003 Epoch: 18 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:45,596-Speed 5603.42 samples/sec Loss 0.9781 LearningRate 0.0003 Epoch: 18 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:47,425-Speed 5600.90 samples/sec Loss 0.9775 LearningRate 0.0003 Epoch: 18 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:49,266-Speed 5565.60 samples/sec Loss 1.0621 LearningRate 0.0003 Epoch: 18 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:51,117-Speed 5532.81 samples/sec Loss 0.9984 LearningRate 0.0003 Epoch: 18 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:52,940-Speed 5618.86 samples/sec Loss 1.0506 LearningRate 0.0003 Epoch: 18 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:54,775-Speed 5582.91 samples/sec Loss 0.9917 LearningRate 0.0003 Epoch: 18 Global Step: 107650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:13:56,609-Speed 5584.79 samples/sec Loss 1.0591 LearningRate 0.0003 Epoch: 18 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:13:58,448-Speed 5569.80 samples/sec Loss 0.9825 LearningRate 0.0003 Epoch: 18 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:00,273-Speed 5611.77 samples/sec Loss 1.0730 LearningRate 0.0003 Epoch: 18 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:02,110-Speed 5576.59 samples/sec Loss 1.1013 LearningRate 0.0003 Epoch: 18 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:03,933-Speed 5620.64 samples/sec Loss 1.0336 LearningRate 0.0003 Epoch: 18 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:05,778-Speed 5551.87 samples/sec Loss 1.0449 LearningRate 0.0003 Epoch: 18 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:07,634-Speed 5517.48 samples/sec Loss 1.0360 LearningRate 0.0003 Epoch: 18 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:09,459-Speed 5614.47 samples/sec Loss 0.9924 LearningRate 0.0003 Epoch: 18 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:11,277-Speed 5635.68 samples/sec Loss 1.0640 LearningRate 0.0003 Epoch: 18 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:13,104-Speed 5606.52 samples/sec Loss 0.9713 LearningRate 0.0003 Epoch: 18 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:14,944-Speed 5565.46 samples/sec Loss 1.0014 LearningRate 0.0003 Epoch: 18 Global Step: 107760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:14:16,760-Speed 5641.78 samples/sec Loss 1.0198 LearningRate 0.0003 Epoch: 18 Global Step: 107770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:18,607-Speed 5543.76 samples/sec Loss 1.0988 LearningRate 0.0003 Epoch: 18 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:20,447-Speed 5567.97 samples/sec Loss 1.0816 LearningRate 0.0003 Epoch: 18 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:22,276-Speed 5599.59 samples/sec Loss 1.0192 LearningRate 0.0003 Epoch: 18 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:24,132-Speed 5520.24 samples/sec Loss 0.9998 LearningRate 0.0003 Epoch: 18 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:25,960-Speed 5602.02 samples/sec Loss 1.0488 LearningRate 0.0003 Epoch: 18 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:27,798-Speed 5574.78 samples/sec Loss 1.0845 LearningRate 0.0003 Epoch: 18 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:29,624-Speed 5608.26 samples/sec Loss 1.0183 LearningRate 0.0003 Epoch: 18 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:31,449-Speed 5613.43 samples/sec Loss 1.0963 LearningRate 0.0003 Epoch: 18 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:33,301-Speed 5533.60 samples/sec Loss 1.0606 LearningRate 0.0003 Epoch: 18 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:35,130-Speed 5598.05 samples/sec Loss 1.0552 LearningRate 0.0003 Epoch: 18 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:36,970-Speed 5567.52 samples/sec Loss 1.0606 LearningRate 0.0003 Epoch: 18 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:38,813-Speed 5559.75 samples/sec Loss 0.9883 LearningRate 0.0003 Epoch: 18 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:40,638-Speed 5612.43 samples/sec Loss 0.9943 LearningRate 0.0003 Epoch: 18 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:42,460-Speed 5622.20 samples/sec Loss 1.0836 LearningRate 0.0003 Epoch: 18 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:44,292-Speed 5590.54 samples/sec Loss 1.0847 LearningRate 0.0003 Epoch: 18 Global Step: 107920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:46,134-Speed 5561.93 samples/sec Loss 0.9946 LearningRate 0.0003 Epoch: 18 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:47,983-Speed 5539.32 samples/sec Loss 1.0120 LearningRate 0.0003 Epoch: 18 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:49,827-Speed 5554.16 samples/sec Loss 1.0175 LearningRate 0.0003 Epoch: 18 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:51,648-Speed 5627.27 samples/sec Loss 1.0610 LearningRate 0.0003 Epoch: 18 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:53,474-Speed 5608.18 samples/sec Loss 0.9859 LearningRate 0.0003 Epoch: 18 Global Step: 107970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:14:55,317-Speed 5559.45 samples/sec Loss 1.0916 LearningRate 0.0003 Epoch: 18 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:57,157-Speed 5566.53 samples/sec Loss 1.0393 LearningRate 0.0003 Epoch: 18 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:14:59,011-Speed 5525.82 samples/sec Loss 1.0476 LearningRate 0.0003 Epoch: 18 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:15:25,010-[lfw][108000]XNorm: 21.932818 Training: 2022-04-27 08:15:25,010-[lfw][108000]Accuracy-Flip: 0.99750+-0.00250 Training: 2022-04-27 08:15:25,011-[lfw][108000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:15:55,192-[cfp_fp][108000]XNorm: 21.231755 Training: 2022-04-27 08:15:55,193-[cfp_fp][108000]Accuracy-Flip: 0.98100+-0.00599 Training: 2022-04-27 08:15:55,193-[cfp_fp][108000]Accuracy-Highest: 0.98100 Training: 2022-04-27 08:16:21,245-[agedb_30][108000]XNorm: 22.029147 Training: 2022-04-27 08:16:21,246-[agedb_30][108000]Accuracy-Flip: 0.98100+-0.00638 Training: 2022-04-27 08:16:21,246-[agedb_30][108000]Accuracy-Highest: 0.98233 Training: 2022-04-27 08:16:23,088-Speed 121.79 samples/sec Loss 1.0637 LearningRate 0.0003 Epoch: 18 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:24,925-Speed 5575.79 samples/sec Loss 0.9437 LearningRate 0.0003 Epoch: 18 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:26,811-Speed 5431.78 samples/sec Loss 1.0033 LearningRate 0.0003 Epoch: 18 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:40,657-Speed 739.64 samples/sec Loss 0.8670 LearningRate 0.0002 Epoch: 19 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:42,535-Speed 5453.01 samples/sec Loss 0.8399 LearningRate 0.0002 Epoch: 19 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:44,384-Speed 5540.80 samples/sec Loss 0.9233 LearningRate 0.0002 Epoch: 19 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:46,226-Speed 5562.63 samples/sec Loss 0.8606 LearningRate 0.0002 Epoch: 19 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:48,052-Speed 5607.99 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:49,903-Speed 5533.89 samples/sec Loss 0.8394 LearningRate 0.0002 Epoch: 19 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:51,717-Speed 5648.47 samples/sec Loss 0.8780 LearningRate 0.0002 Epoch: 19 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:53,550-Speed 5587.56 samples/sec Loss 0.9224 LearningRate 0.0002 Epoch: 19 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:55,393-Speed 5557.25 samples/sec Loss 0.8704 LearningRate 0.0002 Epoch: 19 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:57,219-Speed 5610.71 samples/sec Loss 0.9583 LearningRate 0.0002 Epoch: 19 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:16:59,049-Speed 5596.70 samples/sec Loss 0.8984 LearningRate 0.0002 Epoch: 19 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:00,886-Speed 5575.25 samples/sec Loss 0.8742 LearningRate 0.0002 Epoch: 19 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:02,710-Speed 5616.54 samples/sec Loss 0.9216 LearningRate 0.0002 Epoch: 19 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:04,546-Speed 5582.32 samples/sec Loss 0.8863 LearningRate 0.0002 Epoch: 19 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:06,363-Speed 5636.78 samples/sec Loss 0.8983 LearningRate 0.0002 Epoch: 19 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:08,198-Speed 5581.74 samples/sec Loss 0.8668 LearningRate 0.0002 Epoch: 19 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:10,023-Speed 5614.47 samples/sec Loss 0.9647 LearningRate 0.0002 Epoch: 19 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:11,856-Speed 5585.70 samples/sec Loss 0.9405 LearningRate 0.0002 Epoch: 19 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:13,692-Speed 5580.31 samples/sec Loss 0.8785 LearningRate 0.0002 Epoch: 19 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:15,558-Speed 5487.85 samples/sec Loss 0.8598 LearningRate 0.0002 Epoch: 19 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:17,379-Speed 5626.49 samples/sec Loss 0.8992 LearningRate 0.0002 Epoch: 19 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:19,221-Speed 5562.18 samples/sec Loss 0.8982 LearningRate 0.0002 Epoch: 19 Global Step: 108250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:21,064-Speed 5556.07 samples/sec Loss 0.8872 LearningRate 0.0002 Epoch: 19 Global Step: 108260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:22,919-Speed 5522.34 samples/sec Loss 0.9153 LearningRate 0.0002 Epoch: 19 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:24,760-Speed 5563.22 samples/sec Loss 0.9313 LearningRate 0.0002 Epoch: 19 Global Step: 108280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:17:26,579-Speed 5635.72 samples/sec Loss 0.9736 LearningRate 0.0002 Epoch: 19 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:28,415-Speed 5577.33 samples/sec Loss 0.8999 LearningRate 0.0002 Epoch: 19 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:30,258-Speed 5558.45 samples/sec Loss 0.8726 LearningRate 0.0002 Epoch: 19 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:32,098-Speed 5568.33 samples/sec Loss 0.9330 LearningRate 0.0002 Epoch: 19 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:33,941-Speed 5558.02 samples/sec Loss 0.9114 LearningRate 0.0002 Epoch: 19 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:35,758-Speed 5637.70 samples/sec Loss 0.7691 LearningRate 0.0002 Epoch: 19 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:37,581-Speed 5617.68 samples/sec Loss 0.8953 LearningRate 0.0002 Epoch: 19 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:39,405-Speed 5614.88 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:41,236-Speed 5594.33 samples/sec Loss 0.8610 LearningRate 0.0002 Epoch: 19 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:43,065-Speed 5602.73 samples/sec Loss 0.8533 LearningRate 0.0002 Epoch: 19 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:44,896-Speed 5594.39 samples/sec Loss 0.8775 LearningRate 0.0002 Epoch: 19 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:46,737-Speed 5563.76 samples/sec Loss 0.9518 LearningRate 0.0002 Epoch: 19 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:48,577-Speed 5567.03 samples/sec Loss 0.8608 LearningRate 0.0002 Epoch: 19 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:50,410-Speed 5587.53 samples/sec Loss 0.9605 LearningRate 0.0002 Epoch: 19 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:52,245-Speed 5582.85 samples/sec Loss 0.9138 LearningRate 0.0002 Epoch: 19 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:54,073-Speed 5603.37 samples/sec Loss 0.8719 LearningRate 0.0002 Epoch: 19 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:55,892-Speed 5632.42 samples/sec Loss 0.8855 LearningRate 0.0002 Epoch: 19 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:57,729-Speed 5576.90 samples/sec Loss 0.8039 LearningRate 0.0002 Epoch: 19 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:17:59,563-Speed 5582.52 samples/sec Loss 0.8880 LearningRate 0.0002 Epoch: 19 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:01,412-Speed 5539.70 samples/sec Loss 0.9461 LearningRate 0.0002 Epoch: 19 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:03,231-Speed 5632.50 samples/sec Loss 0.8824 LearningRate 0.0002 Epoch: 19 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:05,071-Speed 5566.98 samples/sec Loss 0.9179 LearningRate 0.0002 Epoch: 19 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:06,887-Speed 5640.77 samples/sec Loss 0.9244 LearningRate 0.0002 Epoch: 19 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:08,703-Speed 5642.66 samples/sec Loss 0.9006 LearningRate 0.0002 Epoch: 19 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:10,533-Speed 5595.31 samples/sec Loss 0.8868 LearningRate 0.0002 Epoch: 19 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:12,372-Speed 5572.56 samples/sec Loss 0.8840 LearningRate 0.0002 Epoch: 19 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:14,183-Speed 5654.97 samples/sec Loss 0.8696 LearningRate 0.0002 Epoch: 19 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:16,004-Speed 5626.01 samples/sec Loss 0.8631 LearningRate 0.0002 Epoch: 19 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:17,835-Speed 5595.05 samples/sec Loss 0.8580 LearningRate 0.0002 Epoch: 19 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:19,689-Speed 5524.69 samples/sec Loss 0.8466 LearningRate 0.0002 Epoch: 19 Global Step: 108580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:21,583-Speed 5408.53 samples/sec Loss 0.8953 LearningRate 0.0002 Epoch: 19 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:23,469-Speed 5429.91 samples/sec Loss 0.9242 LearningRate 0.0002 Epoch: 19 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:25,291-Speed 5621.33 samples/sec Loss 0.9093 LearningRate 0.0002 Epoch: 19 Global Step: 108610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:27,114-Speed 5620.58 samples/sec Loss 0.8455 LearningRate 0.0002 Epoch: 19 Global Step: 108620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:28,945-Speed 5594.38 samples/sec Loss 0.9136 LearningRate 0.0002 Epoch: 19 Global Step: 108630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:30,759-Speed 5646.18 samples/sec Loss 0.9262 LearningRate 0.0002 Epoch: 19 Global Step: 108640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:32,586-Speed 5607.98 samples/sec Loss 0.8771 LearningRate 0.0002 Epoch: 19 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:34,410-Speed 5616.61 samples/sec Loss 0.9548 LearningRate 0.0002 Epoch: 19 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:36,232-Speed 5621.02 samples/sec Loss 0.9270 LearningRate 0.0002 Epoch: 19 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:38,053-Speed 5626.19 samples/sec Loss 0.9657 LearningRate 0.0002 Epoch: 19 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:39,862-Speed 5660.31 samples/sec Loss 0.9177 LearningRate 0.0002 Epoch: 19 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:41,684-Speed 5622.73 samples/sec Loss 0.9406 LearningRate 0.0002 Epoch: 19 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:43,500-Speed 5638.98 samples/sec Loss 0.9163 LearningRate 0.0002 Epoch: 19 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:45,315-Speed 5644.21 samples/sec Loss 0.8440 LearningRate 0.0002 Epoch: 19 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:47,148-Speed 5587.96 samples/sec Loss 0.8117 LearningRate 0.0002 Epoch: 19 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:48,966-Speed 5635.88 samples/sec Loss 0.8772 LearningRate 0.0002 Epoch: 19 Global Step: 108740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:50,785-Speed 5630.83 samples/sec Loss 0.9066 LearningRate 0.0002 Epoch: 19 Global Step: 108750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:52,623-Speed 5573.98 samples/sec Loss 0.8991 LearningRate 0.0002 Epoch: 19 Global Step: 108760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:54,449-Speed 5608.31 samples/sec Loss 0.8419 LearningRate 0.0002 Epoch: 19 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:56,268-Speed 5631.38 samples/sec Loss 0.9772 LearningRate 0.0002 Epoch: 19 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:18:58,118-Speed 5540.03 samples/sec Loss 0.8814 LearningRate 0.0002 Epoch: 19 Global Step: 108790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:18:59,951-Speed 5588.18 samples/sec Loss 0.8269 LearningRate 0.0002 Epoch: 19 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:01,803-Speed 5530.76 samples/sec Loss 0.9024 LearningRate 0.0002 Epoch: 19 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:03,641-Speed 5570.86 samples/sec Loss 0.8698 LearningRate 0.0002 Epoch: 19 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:05,461-Speed 5630.40 samples/sec Loss 0.8965 LearningRate 0.0002 Epoch: 19 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:07,283-Speed 5620.81 samples/sec Loss 0.9131 LearningRate 0.0002 Epoch: 19 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:09,106-Speed 5620.29 samples/sec Loss 0.9285 LearningRate 0.0002 Epoch: 19 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:10,930-Speed 5614.86 samples/sec Loss 0.8512 LearningRate 0.0002 Epoch: 19 Global Step: 108860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:12,759-Speed 5601.56 samples/sec Loss 0.8355 LearningRate 0.0002 Epoch: 19 Global Step: 108870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:14,582-Speed 5618.79 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 108880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:16,423-Speed 5564.62 samples/sec Loss 0.9087 LearningRate 0.0002 Epoch: 19 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:18,249-Speed 5608.82 samples/sec Loss 0.9667 LearningRate 0.0002 Epoch: 19 Global Step: 108900 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:19:20,062-Speed 5651.11 samples/sec Loss 0.9572 LearningRate 0.0002 Epoch: 19 Global Step: 108910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:21,884-Speed 5622.83 samples/sec Loss 0.9335 LearningRate 0.0002 Epoch: 19 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:23,716-Speed 5589.91 samples/sec Loss 0.9312 LearningRate 0.0002 Epoch: 19 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:25,534-Speed 5635.88 samples/sec Loss 0.9264 LearningRate 0.0002 Epoch: 19 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:27,382-Speed 5541.98 samples/sec Loss 0.9123 LearningRate 0.0002 Epoch: 19 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:29,226-Speed 5553.56 samples/sec Loss 0.9676 LearningRate 0.0002 Epoch: 19 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:31,058-Speed 5592.72 samples/sec Loss 0.9674 LearningRate 0.0002 Epoch: 19 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:32,887-Speed 5600.74 samples/sec Loss 0.9429 LearningRate 0.0002 Epoch: 19 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:34,718-Speed 5594.75 samples/sec Loss 0.9039 LearningRate 0.0002 Epoch: 19 Global Step: 108990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:36,550-Speed 5592.93 samples/sec Loss 0.9261 LearningRate 0.0002 Epoch: 19 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:38,377-Speed 5606.29 samples/sec Loss 0.8783 LearningRate 0.0002 Epoch: 19 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:40,205-Speed 5603.59 samples/sec Loss 0.9323 LearningRate 0.0002 Epoch: 19 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:42,033-Speed 5602.44 samples/sec Loss 0.9315 LearningRate 0.0002 Epoch: 19 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:43,874-Speed 5563.42 samples/sec Loss 0.8709 LearningRate 0.0002 Epoch: 19 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:45,700-Speed 5609.52 samples/sec Loss 0.8251 LearningRate 0.0002 Epoch: 19 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:47,526-Speed 5609.74 samples/sec Loss 0.8730 LearningRate 0.0002 Epoch: 19 Global Step: 109060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:49,347-Speed 5627.00 samples/sec Loss 0.8494 LearningRate 0.0002 Epoch: 19 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:51,167-Speed 5625.98 samples/sec Loss 0.9895 LearningRate 0.0002 Epoch: 19 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:53,014-Speed 5548.02 samples/sec Loss 0.9301 LearningRate 0.0002 Epoch: 19 Global Step: 109090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:54,834-Speed 5626.57 samples/sec Loss 0.9094 LearningRate 0.0002 Epoch: 19 Global Step: 109100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:56,644-Speed 5660.88 samples/sec Loss 0.8488 LearningRate 0.0002 Epoch: 19 Global Step: 109110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:19:58,480-Speed 5577.74 samples/sec Loss 0.9009 LearningRate 0.0002 Epoch: 19 Global Step: 109120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:00,308-Speed 5603.86 samples/sec Loss 0.9650 LearningRate 0.0002 Epoch: 19 Global Step: 109130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:02,143-Speed 5582.63 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 109140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:03,958-Speed 5644.12 samples/sec Loss 0.8563 LearningRate 0.0002 Epoch: 19 Global Step: 109150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:05,781-Speed 5620.19 samples/sec Loss 0.8959 LearningRate 0.0002 Epoch: 19 Global Step: 109160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:07,615-Speed 5584.27 samples/sec Loss 0.8654 LearningRate 0.0002 Epoch: 19 Global Step: 109170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:09,444-Speed 5600.14 samples/sec Loss 0.9284 LearningRate 0.0002 Epoch: 19 Global Step: 109180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:11,268-Speed 5617.10 samples/sec Loss 0.8738 LearningRate 0.0002 Epoch: 19 Global Step: 109190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:13,100-Speed 5589.36 samples/sec Loss 0.8241 LearningRate 0.0002 Epoch: 19 Global Step: 109200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:14,925-Speed 5614.34 samples/sec Loss 0.8661 LearningRate 0.0002 Epoch: 19 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:16,747-Speed 5620.15 samples/sec Loss 0.9100 LearningRate 0.0002 Epoch: 19 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:18,580-Speed 5589.54 samples/sec Loss 0.9288 LearningRate 0.0002 Epoch: 19 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:20,426-Speed 5550.94 samples/sec Loss 0.9168 LearningRate 0.0002 Epoch: 19 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:20:22,249-Speed 5617.15 samples/sec Loss 0.8380 LearningRate 0.0002 Epoch: 19 Global Step: 109250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:24,083-Speed 5585.80 samples/sec Loss 0.8275 LearningRate 0.0002 Epoch: 19 Global Step: 109260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:25,928-Speed 5552.79 samples/sec Loss 0.8250 LearningRate 0.0002 Epoch: 19 Global Step: 109270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:27,758-Speed 5596.95 samples/sec Loss 0.9467 LearningRate 0.0002 Epoch: 19 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:29,591-Speed 5588.10 samples/sec Loss 0.8830 LearningRate 0.0002 Epoch: 19 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:31,432-Speed 5564.43 samples/sec Loss 0.8516 LearningRate 0.0002 Epoch: 19 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:33,273-Speed 5562.33 samples/sec Loss 0.8467 LearningRate 0.0002 Epoch: 19 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:35,122-Speed 5552.58 samples/sec Loss 0.8999 LearningRate 0.0001 Epoch: 19 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:36,970-Speed 5542.50 samples/sec Loss 0.8416 LearningRate 0.0001 Epoch: 19 Global Step: 109330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:38,802-Speed 5589.07 samples/sec Loss 0.9442 LearningRate 0.0001 Epoch: 19 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:40,622-Speed 5631.35 samples/sec Loss 0.9613 LearningRate 0.0001 Epoch: 19 Global Step: 109350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:20:42,438-Speed 5640.65 samples/sec Loss 0.8851 LearningRate 0.0001 Epoch: 19 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:44,261-Speed 5618.59 samples/sec Loss 0.9271 LearningRate 0.0001 Epoch: 19 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:46,080-Speed 5631.50 samples/sec Loss 0.9331 LearningRate 0.0001 Epoch: 19 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:47,932-Speed 5529.20 samples/sec Loss 0.8566 LearningRate 0.0001 Epoch: 19 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:49,780-Speed 5544.64 samples/sec Loss 0.9146 LearningRate 0.0001 Epoch: 19 Global Step: 109400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:51,612-Speed 5591.13 samples/sec Loss 0.9149 LearningRate 0.0001 Epoch: 19 Global Step: 109410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:53,454-Speed 5560.41 samples/sec Loss 0.8773 LearningRate 0.0001 Epoch: 19 Global Step: 109420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:55,351-Speed 5399.70 samples/sec Loss 0.8473 LearningRate 0.0001 Epoch: 19 Global Step: 109430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:57,189-Speed 5573.58 samples/sec Loss 0.9404 LearningRate 0.0001 Epoch: 19 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:20:59,020-Speed 5595.47 samples/sec Loss 0.9023 LearningRate 0.0001 Epoch: 19 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:00,867-Speed 5544.56 samples/sec Loss 0.9103 LearningRate 0.0001 Epoch: 19 Global Step: 109460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:21:02,691-Speed 5617.16 samples/sec Loss 0.8987 LearningRate 0.0001 Epoch: 19 Global Step: 109470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:04,517-Speed 5611.14 samples/sec Loss 0.9311 LearningRate 0.0001 Epoch: 19 Global Step: 109480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:06,345-Speed 5603.48 samples/sec Loss 0.8869 LearningRate 0.0001 Epoch: 19 Global Step: 109490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:08,166-Speed 5623.39 samples/sec Loss 0.8800 LearningRate 0.0001 Epoch: 19 Global Step: 109500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:10,001-Speed 5584.26 samples/sec Loss 0.8930 LearningRate 0.0001 Epoch: 19 Global Step: 109510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:11,832-Speed 5593.82 samples/sec Loss 0.8770 LearningRate 0.0001 Epoch: 19 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:13,650-Speed 5635.17 samples/sec Loss 0.8493 LearningRate 0.0001 Epoch: 19 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:15,483-Speed 5586.08 samples/sec Loss 0.8870 LearningRate 0.0001 Epoch: 19 Global Step: 109540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:17,304-Speed 5625.38 samples/sec Loss 0.8504 LearningRate 0.0001 Epoch: 19 Global Step: 109550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:19,126-Speed 5623.04 samples/sec Loss 0.8983 LearningRate 0.0001 Epoch: 19 Global Step: 109560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:20,981-Speed 5523.00 samples/sec Loss 0.8813 LearningRate 0.0001 Epoch: 19 Global Step: 109570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:21:22,807-Speed 5609.52 samples/sec Loss 0.9535 LearningRate 0.0001 Epoch: 19 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:24,655-Speed 5542.30 samples/sec Loss 0.9411 LearningRate 0.0001 Epoch: 19 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:26,487-Speed 5591.20 samples/sec Loss 0.8634 LearningRate 0.0001 Epoch: 19 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:28,323-Speed 5579.54 samples/sec Loss 0.9083 LearningRate 0.0001 Epoch: 19 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:30,149-Speed 5609.82 samples/sec Loss 0.9143 LearningRate 0.0001 Epoch: 19 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:31,986-Speed 5576.47 samples/sec Loss 0.8787 LearningRate 0.0001 Epoch: 19 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:33,818-Speed 5591.50 samples/sec Loss 0.9038 LearningRate 0.0001 Epoch: 19 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:35,638-Speed 5628.93 samples/sec Loss 0.8910 LearningRate 0.0001 Epoch: 19 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:37,461-Speed 5616.77 samples/sec Loss 0.8881 LearningRate 0.0001 Epoch: 19 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:39,320-Speed 5511.74 samples/sec Loss 0.9127 LearningRate 0.0001 Epoch: 19 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:41,151-Speed 5594.30 samples/sec Loss 0.9372 LearningRate 0.0001 Epoch: 19 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:42,993-Speed 5559.99 samples/sec Loss 0.8548 LearningRate 0.0001 Epoch: 19 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:44,831-Speed 5572.62 samples/sec Loss 0.9021 LearningRate 0.0001 Epoch: 19 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:46,671-Speed 5567.62 samples/sec Loss 0.9176 LearningRate 0.0001 Epoch: 19 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:48,513-Speed 5559.95 samples/sec Loss 0.8574 LearningRate 0.0001 Epoch: 19 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:50,344-Speed 5595.60 samples/sec Loss 0.9404 LearningRate 0.0001 Epoch: 19 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:52,180-Speed 5580.50 samples/sec Loss 0.9608 LearningRate 0.0001 Epoch: 19 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:54,044-Speed 5495.50 samples/sec Loss 0.8677 LearningRate 0.0001 Epoch: 19 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:55,894-Speed 5536.24 samples/sec Loss 0.9225 LearningRate 0.0001 Epoch: 19 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:57,744-Speed 5537.31 samples/sec Loss 0.9428 LearningRate 0.0001 Epoch: 19 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:21:59,571-Speed 5606.41 samples/sec Loss 0.8972 LearningRate 0.0001 Epoch: 19 Global Step: 109780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:22:01,390-Speed 5632.09 samples/sec Loss 0.8742 LearningRate 0.0001 Epoch: 19 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:03,242-Speed 5529.37 samples/sec Loss 0.9105 LearningRate 0.0001 Epoch: 19 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:05,072-Speed 5599.25 samples/sec Loss 0.8888 LearningRate 0.0001 Epoch: 19 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:06,900-Speed 5604.36 samples/sec Loss 0.8566 LearningRate 0.0001 Epoch: 19 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:08,739-Speed 5567.44 samples/sec Loss 0.8975 LearningRate 0.0001 Epoch: 19 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:10,594-Speed 5525.10 samples/sec Loss 0.8667 LearningRate 0.0001 Epoch: 19 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:12,450-Speed 5516.65 samples/sec Loss 0.9446 LearningRate 0.0001 Epoch: 19 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:14,286-Speed 5579.62 samples/sec Loss 0.9225 LearningRate 0.0001 Epoch: 19 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:16,127-Speed 5564.33 samples/sec Loss 0.8698 LearningRate 0.0001 Epoch: 19 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:17,946-Speed 5631.09 samples/sec Loss 0.8461 LearningRate 0.0001 Epoch: 19 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:19,773-Speed 5608.77 samples/sec Loss 0.8857 LearningRate 0.0001 Epoch: 19 Global Step: 109890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:21,616-Speed 5555.64 samples/sec Loss 0.8835 LearningRate 0.0001 Epoch: 19 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:23,490-Speed 5467.89 samples/sec Loss 0.9047 LearningRate 0.0001 Epoch: 19 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:25,314-Speed 5616.34 samples/sec Loss 0.8615 LearningRate 0.0001 Epoch: 19 Global Step: 109920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:27,157-Speed 5557.07 samples/sec Loss 0.9030 LearningRate 0.0001 Epoch: 19 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:29,013-Speed 5519.32 samples/sec Loss 0.9220 LearningRate 0.0001 Epoch: 19 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:30,847-Speed 5584.96 samples/sec Loss 0.9238 LearningRate 0.0001 Epoch: 19 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:32,679-Speed 5591.39 samples/sec Loss 0.8983 LearningRate 0.0001 Epoch: 19 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:34,523-Speed 5556.71 samples/sec Loss 0.9055 LearningRate 0.0001 Epoch: 19 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:36,391-Speed 5482.02 samples/sec Loss 0.9275 LearningRate 0.0001 Epoch: 19 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:38,212-Speed 5625.69 samples/sec Loss 0.9376 LearningRate 0.0001 Epoch: 19 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:22:40,066-Speed 5524.04 samples/sec Loss 0.9827 LearningRate 0.0001 Epoch: 19 Global Step: 110000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:23:06,239-[lfw][110000]XNorm: 21.958611 Training: 2022-04-27 08:23:06,240-[lfw][110000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-27 08:23:06,240-[lfw][110000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:23:36,577-[cfp_fp][110000]XNorm: 21.303466 Training: 2022-04-27 08:23:36,578-[cfp_fp][110000]Accuracy-Flip: 0.97971+-0.00556 Training: 2022-04-27 08:23:36,578-[cfp_fp][110000]Accuracy-Highest: 0.98100 Training: 2022-04-27 08:24:02,790-[agedb_30][110000]XNorm: 22.110064 Training: 2022-04-27 08:24:02,790-[agedb_30][110000]Accuracy-Flip: 0.98217+-0.00610 Training: 2022-04-27 08:24:02,791-[agedb_30][110000]Accuracy-Highest: 0.98233 Training: 2022-04-27 08:24:04,623-Speed 121.10 samples/sec Loss 0.8766 LearningRate 0.0001 Epoch: 19 Global Step: 110010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:06,478-Speed 5521.27 samples/sec Loss 0.7726 LearningRate 0.0001 Epoch: 19 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:08,378-Speed 5391.09 samples/sec Loss 0.8861 LearningRate 0.0001 Epoch: 19 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:10,228-Speed 5535.66 samples/sec Loss 0.9379 LearningRate 0.0001 Epoch: 19 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:12,090-Speed 5503.42 samples/sec Loss 0.9194 LearningRate 0.0001 Epoch: 19 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:13,947-Speed 5515.78 samples/sec Loss 0.9042 LearningRate 0.0001 Epoch: 19 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:15,779-Speed 5590.37 samples/sec Loss 0.8612 LearningRate 0.0001 Epoch: 19 Global Step: 110070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:17,593-Speed 5646.18 samples/sec Loss 0.9473 LearningRate 0.0001 Epoch: 19 Global Step: 110080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:19,435-Speed 5561.96 samples/sec Loss 0.9098 LearningRate 0.0001 Epoch: 19 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:21,249-Speed 5646.48 samples/sec Loss 0.9094 LearningRate 0.0001 Epoch: 19 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:23,078-Speed 5601.87 samples/sec Loss 0.9111 LearningRate 0.0001 Epoch: 19 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:24,900-Speed 5621.59 samples/sec Loss 0.8879 LearningRate 0.0001 Epoch: 19 Global Step: 110120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:26,737-Speed 5575.82 samples/sec Loss 0.9179 LearningRate 0.0001 Epoch: 19 Global Step: 110130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:28,566-Speed 5600.06 samples/sec Loss 0.8814 LearningRate 0.0001 Epoch: 19 Global Step: 110140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:30,406-Speed 5567.11 samples/sec Loss 0.8566 LearningRate 0.0001 Epoch: 19 Global Step: 110150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:32,233-Speed 5607.95 samples/sec Loss 0.8863 LearningRate 0.0001 Epoch: 19 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:34,052-Speed 5632.03 samples/sec Loss 0.8666 LearningRate 0.0001 Epoch: 19 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:35,889-Speed 5575.17 samples/sec Loss 0.8920 LearningRate 0.0001 Epoch: 19 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:37,727-Speed 5573.01 samples/sec Loss 0.8435 LearningRate 0.0001 Epoch: 19 Global Step: 110190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:24:39,534-Speed 5668.07 samples/sec Loss 0.8592 LearningRate 0.0001 Epoch: 19 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:41,408-Speed 5467.20 samples/sec Loss 1.0199 LearningRate 0.0001 Epoch: 19 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:43,233-Speed 5613.55 samples/sec Loss 0.8814 LearningRate 0.0001 Epoch: 19 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:45,099-Speed 5487.74 samples/sec Loss 0.9026 LearningRate 0.0001 Epoch: 19 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:46,926-Speed 5607.83 samples/sec Loss 0.8704 LearningRate 0.0001 Epoch: 19 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:48,762-Speed 5577.45 samples/sec Loss 0.9268 LearningRate 0.0001 Epoch: 19 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:50,592-Speed 5600.12 samples/sec Loss 0.8913 LearningRate 0.0001 Epoch: 19 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:52,421-Speed 5598.85 samples/sec Loss 0.8649 LearningRate 0.0001 Epoch: 19 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:54,314-Speed 5410.33 samples/sec Loss 0.8818 LearningRate 0.0001 Epoch: 19 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:56,163-Speed 5543.00 samples/sec Loss 0.8715 LearningRate 0.0001 Epoch: 19 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:57,992-Speed 5599.57 samples/sec Loss 0.9493 LearningRate 0.0001 Epoch: 19 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:24:59,827-Speed 5582.61 samples/sec Loss 0.9414 LearningRate 0.0001 Epoch: 19 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:01,708-Speed 5444.70 samples/sec Loss 0.8249 LearningRate 0.0001 Epoch: 19 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:03,557-Speed 5541.04 samples/sec Loss 0.8857 LearningRate 0.0001 Epoch: 19 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:05,382-Speed 5613.80 samples/sec Loss 0.9497 LearningRate 0.0001 Epoch: 19 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:07,204-Speed 5621.29 samples/sec Loss 0.9075 LearningRate 0.0001 Epoch: 19 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:09,041-Speed 5575.92 samples/sec Loss 0.9098 LearningRate 0.0001 Epoch: 19 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:10,874-Speed 5586.86 samples/sec Loss 0.9704 LearningRate 0.0001 Epoch: 19 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:12,751-Speed 5457.87 samples/sec Loss 0.8967 LearningRate 0.0001 Epoch: 19 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:14,579-Speed 5605.18 samples/sec Loss 0.9691 LearningRate 0.0001 Epoch: 19 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:16,417-Speed 5570.98 samples/sec Loss 0.8612 LearningRate 0.0001 Epoch: 19 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:18,256-Speed 5572.06 samples/sec Loss 0.9367 LearningRate 0.0001 Epoch: 19 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:20,106-Speed 5538.30 samples/sec Loss 0.9410 LearningRate 0.0001 Epoch: 19 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:21,937-Speed 5594.82 samples/sec Loss 0.9191 LearningRate 0.0001 Epoch: 19 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:23,765-Speed 5602.51 samples/sec Loss 0.9018 LearningRate 0.0001 Epoch: 19 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:25,598-Speed 5586.28 samples/sec Loss 0.9166 LearningRate 0.0001 Epoch: 19 Global Step: 110450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:27,440-Speed 5562.74 samples/sec Loss 0.9060 LearningRate 0.0001 Epoch: 19 Global Step: 110460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:29,253-Speed 5647.96 samples/sec Loss 0.9390 LearningRate 0.0001 Epoch: 19 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:31,075-Speed 5623.11 samples/sec Loss 0.8355 LearningRate 0.0001 Epoch: 19 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:32,892-Speed 5638.11 samples/sec Loss 0.8555 LearningRate 0.0001 Epoch: 19 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:34,711-Speed 5632.58 samples/sec Loss 0.8456 LearningRate 0.0001 Epoch: 19 Global Step: 110500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:25:36,536-Speed 5612.68 samples/sec Loss 0.8690 LearningRate 0.0001 Epoch: 19 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:38,374-Speed 5572.78 samples/sec Loss 0.8647 LearningRate 0.0001 Epoch: 19 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:40,204-Speed 5597.57 samples/sec Loss 0.9714 LearningRate 0.0001 Epoch: 19 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:42,031-Speed 5607.72 samples/sec Loss 0.8980 LearningRate 0.0001 Epoch: 19 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:43,853-Speed 5622.85 samples/sec Loss 0.9194 LearningRate 0.0001 Epoch: 19 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:45,683-Speed 5596.81 samples/sec Loss 0.8756 LearningRate 0.0001 Epoch: 19 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:47,521-Speed 5572.87 samples/sec Loss 0.9355 LearningRate 0.0001 Epoch: 19 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:49,353-Speed 5590.26 samples/sec Loss 0.9309 LearningRate 0.0001 Epoch: 19 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:51,178-Speed 5612.06 samples/sec Loss 0.8916 LearningRate 0.0001 Epoch: 19 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:53,004-Speed 5611.16 samples/sec Loss 0.9394 LearningRate 0.0001 Epoch: 19 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:54,836-Speed 5591.41 samples/sec Loss 0.9227 LearningRate 0.0001 Epoch: 19 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:56,661-Speed 5612.37 samples/sec Loss 0.9691 LearningRate 0.0001 Epoch: 19 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:25:58,494-Speed 5588.37 samples/sec Loss 0.9187 LearningRate 0.0001 Epoch: 19 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:00,322-Speed 5604.68 samples/sec Loss 0.8837 LearningRate 0.0001 Epoch: 19 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:02,149-Speed 5606.93 samples/sec Loss 0.9076 LearningRate 0.0001 Epoch: 19 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:03,987-Speed 5573.75 samples/sec Loss 0.9188 LearningRate 0.0001 Epoch: 19 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:05,806-Speed 5629.58 samples/sec Loss 0.9303 LearningRate 0.0001 Epoch: 19 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:07,646-Speed 5567.64 samples/sec Loss 0.8511 LearningRate 0.0001 Epoch: 19 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:09,502-Speed 5518.26 samples/sec Loss 0.8993 LearningRate 0.0001 Epoch: 19 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:11,325-Speed 5619.60 samples/sec Loss 0.9135 LearningRate 0.0001 Epoch: 19 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:13,149-Speed 5616.13 samples/sec Loss 0.8594 LearningRate 0.0001 Epoch: 19 Global Step: 110710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:26:15,001-Speed 5531.30 samples/sec Loss 0.9683 LearningRate 0.0001 Epoch: 19 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:16,830-Speed 5601.59 samples/sec Loss 0.8446 LearningRate 0.0001 Epoch: 19 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:18,712-Speed 5440.89 samples/sec Loss 0.9203 LearningRate 0.0001 Epoch: 19 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:20,566-Speed 5526.75 samples/sec Loss 0.8919 LearningRate 0.0001 Epoch: 19 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:22,407-Speed 5563.17 samples/sec Loss 0.9479 LearningRate 0.0001 Epoch: 19 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:24,243-Speed 5579.91 samples/sec Loss 0.8953 LearningRate 0.0001 Epoch: 19 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:26,074-Speed 5593.17 samples/sec Loss 0.8906 LearningRate 0.0001 Epoch: 19 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:27,917-Speed 5560.56 samples/sec Loss 0.8807 LearningRate 0.0001 Epoch: 19 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:29,756-Speed 5570.19 samples/sec Loss 0.9166 LearningRate 0.0001 Epoch: 19 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:31,578-Speed 5620.36 samples/sec Loss 0.9278 LearningRate 0.0001 Epoch: 19 Global Step: 110810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:33,421-Speed 5559.77 samples/sec Loss 0.8910 LearningRate 0.0001 Epoch: 19 Global Step: 110820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:35,244-Speed 5617.97 samples/sec Loss 0.8909 LearningRate 0.0001 Epoch: 19 Global Step: 110830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:37,096-Speed 5531.05 samples/sec Loss 0.9296 LearningRate 0.0001 Epoch: 19 Global Step: 110840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:38,928-Speed 5591.72 samples/sec Loss 0.9313 LearningRate 0.0001 Epoch: 19 Global Step: 110850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:40,767-Speed 5569.39 samples/sec Loss 0.9419 LearningRate 0.0001 Epoch: 19 Global Step: 110860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:42,620-Speed 5529.28 samples/sec Loss 0.8723 LearningRate 0.0001 Epoch: 19 Global Step: 110870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:44,444-Speed 5614.57 samples/sec Loss 0.8563 LearningRate 0.0001 Epoch: 19 Global Step: 110880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:46,268-Speed 5615.89 samples/sec Loss 0.9053 LearningRate 0.0001 Epoch: 19 Global Step: 110890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:48,098-Speed 5598.65 samples/sec Loss 0.9774 LearningRate 0.0001 Epoch: 19 Global Step: 110900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:26:49,922-Speed 5616.26 samples/sec Loss 0.9089 LearningRate 0.0001 Epoch: 19 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:51,776-Speed 5524.25 samples/sec Loss 0.9271 LearningRate 0.0001 Epoch: 19 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:53,621-Speed 5552.98 samples/sec Loss 0.9354 LearningRate 0.0001 Epoch: 19 Global Step: 110930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:55,493-Speed 5471.84 samples/sec Loss 0.8515 LearningRate 0.0001 Epoch: 19 Global Step: 110940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:57,368-Speed 5463.68 samples/sec Loss 0.9065 LearningRate 0.0001 Epoch: 19 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:26:59,213-Speed 5551.09 samples/sec Loss 0.8591 LearningRate 0.0001 Epoch: 19 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:01,052-Speed 5568.29 samples/sec Loss 0.9632 LearningRate 0.0001 Epoch: 19 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:02,924-Speed 5474.84 samples/sec Loss 0.9051 LearningRate 0.0001 Epoch: 19 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:04,806-Speed 5440.66 samples/sec Loss 0.8835 LearningRate 0.0001 Epoch: 19 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:06,643-Speed 5575.81 samples/sec Loss 0.9247 LearningRate 0.0001 Epoch: 19 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:08,466-Speed 5621.72 samples/sec Loss 0.9027 LearningRate 0.0001 Epoch: 19 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:10,301-Speed 5581.89 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:12,136-Speed 5583.66 samples/sec Loss 0.8511 LearningRate 0.0001 Epoch: 19 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:13,968-Speed 5589.77 samples/sec Loss 0.9330 LearningRate 0.0001 Epoch: 19 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:15,797-Speed 5601.77 samples/sec Loss 0.8921 LearningRate 0.0001 Epoch: 19 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:17,626-Speed 5598.74 samples/sec Loss 0.9150 LearningRate 0.0001 Epoch: 19 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:19,452-Speed 5610.11 samples/sec Loss 0.9331 LearningRate 0.0001 Epoch: 19 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:21,295-Speed 5558.30 samples/sec Loss 0.9796 LearningRate 0.0001 Epoch: 19 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:23,133-Speed 5575.10 samples/sec Loss 0.9005 LearningRate 0.0001 Epoch: 19 Global Step: 111090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:24,963-Speed 5596.75 samples/sec Loss 0.9393 LearningRate 0.0001 Epoch: 19 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:26,781-Speed 5633.73 samples/sec Loss 0.8956 LearningRate 0.0001 Epoch: 19 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:28,606-Speed 5612.71 samples/sec Loss 0.9044 LearningRate 0.0001 Epoch: 19 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:30,453-Speed 5546.03 samples/sec Loss 0.8717 LearningRate 0.0001 Epoch: 19 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:32,276-Speed 5620.42 samples/sec Loss 0.8219 LearningRate 0.0001 Epoch: 19 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:34,101-Speed 5613.17 samples/sec Loss 0.8775 LearningRate 0.0001 Epoch: 19 Global Step: 111150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:35,936-Speed 5580.19 samples/sec Loss 0.9228 LearningRate 0.0001 Epoch: 19 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:37,773-Speed 5577.86 samples/sec Loss 0.8968 LearningRate 0.0001 Epoch: 19 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:39,618-Speed 5550.16 samples/sec Loss 0.8865 LearningRate 0.0000 Epoch: 19 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:41,462-Speed 5555.59 samples/sec Loss 0.9415 LearningRate 0.0000 Epoch: 19 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:43,298-Speed 5579.10 samples/sec Loss 0.9041 LearningRate 0.0000 Epoch: 19 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:45,127-Speed 5602.53 samples/sec Loss 0.9769 LearningRate 0.0000 Epoch: 19 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:46,945-Speed 5632.46 samples/sec Loss 0.8960 LearningRate 0.0000 Epoch: 19 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:48,774-Speed 5600.87 samples/sec Loss 0.8915 LearningRate 0.0000 Epoch: 19 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:50,615-Speed 5563.95 samples/sec Loss 0.9478 LearningRate 0.0000 Epoch: 19 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:27:52,430-Speed 5644.11 samples/sec Loss 0.8053 LearningRate 0.0000 Epoch: 19 Global Step: 111250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:27:54,263-Speed 5590.99 samples/sec Loss 0.8750 LearningRate 0.0000 Epoch: 19 Global Step: 111260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:27:56,087-Speed 5614.34 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 111270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:27:57,913-Speed 5610.75 samples/sec Loss 0.9309 LearningRate 0.0000 Epoch: 19 Global Step: 111280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:27:59,756-Speed 5557.92 samples/sec Loss 0.8973 LearningRate 0.0000 Epoch: 19 Global Step: 111290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:01,599-Speed 5557.53 samples/sec Loss 0.8735 LearningRate 0.0000 Epoch: 19 Global Step: 111300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:03,438-Speed 5568.45 samples/sec Loss 0.9159 LearningRate 0.0000 Epoch: 19 Global Step: 111310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:05,267-Speed 5600.30 samples/sec Loss 0.9010 LearningRate 0.0000 Epoch: 19 Global Step: 111320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:07,111-Speed 5557.74 samples/sec Loss 0.8398 LearningRate 0.0000 Epoch: 19 Global Step: 111330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:08,953-Speed 5559.36 samples/sec Loss 0.9759 LearningRate 0.0000 Epoch: 19 Global Step: 111340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:10,781-Speed 5602.71 samples/sec Loss 0.9290 LearningRate 0.0000 Epoch: 19 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:12,610-Speed 5601.11 samples/sec Loss 0.8718 LearningRate 0.0000 Epoch: 19 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:14,434-Speed 5617.83 samples/sec Loss 0.8892 LearningRate 0.0000 Epoch: 19 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:16,288-Speed 5525.88 samples/sec Loss 0.9132 LearningRate 0.0000 Epoch: 19 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:18,146-Speed 5511.72 samples/sec Loss 0.9157 LearningRate 0.0000 Epoch: 19 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:19,997-Speed 5534.61 samples/sec Loss 0.9428 LearningRate 0.0000 Epoch: 19 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:21,833-Speed 5579.44 samples/sec Loss 0.8195 LearningRate 0.0000 Epoch: 19 Global Step: 111410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:23,689-Speed 5519.28 samples/sec Loss 0.8908 LearningRate 0.0000 Epoch: 19 Global Step: 111420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:25,539-Speed 5539.94 samples/sec Loss 0.9257 LearningRate 0.0000 Epoch: 19 Global Step: 111430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:27,398-Speed 5510.80 samples/sec Loss 0.9160 LearningRate 0.0000 Epoch: 19 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:29,225-Speed 5605.57 samples/sec Loss 0.9201 LearningRate 0.0000 Epoch: 19 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:28:31,054-Speed 5599.41 samples/sec Loss 0.8807 LearningRate 0.0000 Epoch: 19 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:32,881-Speed 5607.08 samples/sec Loss 0.9434 LearningRate 0.0000 Epoch: 19 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:34,763-Speed 5443.47 samples/sec Loss 0.9207 LearningRate 0.0000 Epoch: 19 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:36,594-Speed 5593.70 samples/sec Loss 0.8088 LearningRate 0.0000 Epoch: 19 Global Step: 111490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:38,430-Speed 5582.79 samples/sec Loss 0.8132 LearningRate 0.0000 Epoch: 19 Global Step: 111500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:40,254-Speed 5614.42 samples/sec Loss 0.9575 LearningRate 0.0000 Epoch: 19 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:42,084-Speed 5598.64 samples/sec Loss 0.9361 LearningRate 0.0000 Epoch: 19 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:43,908-Speed 5614.85 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 19 Global Step: 111530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:45,743-Speed 5581.33 samples/sec Loss 0.8848 LearningRate 0.0000 Epoch: 19 Global Step: 111540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:47,588-Speed 5552.47 samples/sec Loss 0.9151 LearningRate 0.0000 Epoch: 19 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:49,418-Speed 5597.92 samples/sec Loss 0.9495 LearningRate 0.0000 Epoch: 19 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:28:51,279-Speed 5504.66 samples/sec Loss 0.9116 LearningRate 0.0000 Epoch: 19 Global Step: 111570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:53,139-Speed 5506.96 samples/sec Loss 0.8282 LearningRate 0.0000 Epoch: 19 Global Step: 111580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:54,969-Speed 5596.53 samples/sec Loss 0.8986 LearningRate 0.0000 Epoch: 19 Global Step: 111590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:56,795-Speed 5611.78 samples/sec Loss 0.9341 LearningRate 0.0000 Epoch: 19 Global Step: 111600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:28:58,631-Speed 5578.56 samples/sec Loss 0.8484 LearningRate 0.0000 Epoch: 19 Global Step: 111610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:00,466-Speed 5583.24 samples/sec Loss 0.8265 LearningRate 0.0000 Epoch: 19 Global Step: 111620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:02,298-Speed 5590.37 samples/sec Loss 0.9095 LearningRate 0.0000 Epoch: 19 Global Step: 111630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:04,118-Speed 5627.17 samples/sec Loss 0.8208 LearningRate 0.0000 Epoch: 19 Global Step: 111640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:05,941-Speed 5620.62 samples/sec Loss 0.9348 LearningRate 0.0000 Epoch: 19 Global Step: 111650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:07,768-Speed 5608.02 samples/sec Loss 0.8771 LearningRate 0.0000 Epoch: 19 Global Step: 111660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:09,589-Speed 5623.26 samples/sec Loss 0.9045 LearningRate 0.0000 Epoch: 19 Global Step: 111670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:11,408-Speed 5631.85 samples/sec Loss 0.8665 LearningRate 0.0000 Epoch: 19 Global Step: 111680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:13,249-Speed 5564.80 samples/sec Loss 0.8781 LearningRate 0.0000 Epoch: 19 Global Step: 111690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:15,076-Speed 5604.49 samples/sec Loss 0.9119 LearningRate 0.0000 Epoch: 19 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:16,898-Speed 5623.10 samples/sec Loss 0.8675 LearningRate 0.0000 Epoch: 19 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:18,729-Speed 5593.90 samples/sec Loss 0.8610 LearningRate 0.0000 Epoch: 19 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:20,562-Speed 5588.69 samples/sec Loss 0.8887 LearningRate 0.0000 Epoch: 19 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:22,383-Speed 5627.65 samples/sec Loss 0.9565 LearningRate 0.0000 Epoch: 19 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:24,206-Speed 5615.96 samples/sec Loss 0.8903 LearningRate 0.0000 Epoch: 19 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:26,039-Speed 5590.82 samples/sec Loss 0.9298 LearningRate 0.0000 Epoch: 19 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:27,855-Speed 5639.89 samples/sec Loss 0.9452 LearningRate 0.0000 Epoch: 19 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:29,674-Speed 5631.47 samples/sec Loss 0.8579 LearningRate 0.0000 Epoch: 19 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:31,492-Speed 5634.20 samples/sec Loss 0.8802 LearningRate 0.0000 Epoch: 19 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:33,323-Speed 5594.90 samples/sec Loss 0.9401 LearningRate 0.0000 Epoch: 19 Global Step: 111800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:35,140-Speed 5635.43 samples/sec Loss 0.8375 LearningRate 0.0000 Epoch: 19 Global Step: 111810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:37,000-Speed 5507.07 samples/sec Loss 0.8564 LearningRate 0.0000 Epoch: 19 Global Step: 111820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:38,834-Speed 5587.88 samples/sec Loss 0.8263 LearningRate 0.0000 Epoch: 19 Global Step: 111830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:40,661-Speed 5607.31 samples/sec Loss 1.0085 LearningRate 0.0000 Epoch: 19 Global Step: 111840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:42,483-Speed 5622.39 samples/sec Loss 0.9175 LearningRate 0.0000 Epoch: 19 Global Step: 111850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:44,330-Speed 5543.59 samples/sec Loss 0.8786 LearningRate 0.0000 Epoch: 19 Global Step: 111860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:46,156-Speed 5611.98 samples/sec Loss 0.8909 LearningRate 0.0000 Epoch: 19 Global Step: 111870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:47,982-Speed 5609.77 samples/sec Loss 0.8728 LearningRate 0.0000 Epoch: 19 Global Step: 111880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:49,806-Speed 5613.68 samples/sec Loss 0.8616 LearningRate 0.0000 Epoch: 19 Global Step: 111890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:51,633-Speed 5607.20 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 19 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:29:53,474-Speed 5564.32 samples/sec Loss 0.9218 LearningRate 0.0000 Epoch: 19 Global Step: 111910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:55,296-Speed 5622.60 samples/sec Loss 0.8335 LearningRate 0.0000 Epoch: 19 Global Step: 111920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:57,137-Speed 5564.95 samples/sec Loss 0.8661 LearningRate 0.0000 Epoch: 19 Global Step: 111930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:29:58,973-Speed 5578.85 samples/sec Loss 0.8876 LearningRate 0.0000 Epoch: 19 Global Step: 111940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:00,824-Speed 5532.24 samples/sec Loss 0.9738 LearningRate 0.0000 Epoch: 19 Global Step: 111950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:02,698-Speed 5465.74 samples/sec Loss 0.9293 LearningRate 0.0000 Epoch: 19 Global Step: 111960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:04,558-Speed 5509.03 samples/sec Loss 0.9188 LearningRate 0.0000 Epoch: 19 Global Step: 111970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:06,391-Speed 5588.16 samples/sec Loss 0.8895 LearningRate 0.0000 Epoch: 19 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:08,221-Speed 5597.26 samples/sec Loss 0.8981 LearningRate 0.0000 Epoch: 19 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:10,059-Speed 5573.15 samples/sec Loss 0.9471 LearningRate 0.0000 Epoch: 19 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:30:36,366-[lfw][112000]XNorm: 21.959637 Training: 2022-04-27 08:30:36,367-[lfw][112000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-27 08:30:36,367-[lfw][112000]Accuracy-Highest: 0.99817 Training: 2022-04-27 08:31:06,928-[cfp_fp][112000]XNorm: 21.275295 Training: 2022-04-27 08:31:06,929-[cfp_fp][112000]Accuracy-Flip: 0.97943+-0.00590 Training: 2022-04-27 08:31:06,929-[cfp_fp][112000]Accuracy-Highest: 0.98100 Training: 2022-04-27 08:31:33,334-[agedb_30][112000]XNorm: 22.080918 Training: 2022-04-27 08:31:33,335-[agedb_30][112000]Accuracy-Flip: 0.98250+-0.00651 Training: 2022-04-27 08:31:33,335-[agedb_30][112000]Accuracy-Highest: 0.98250 Training: 2022-04-27 08:31:35,173-Speed 120.31 samples/sec Loss 0.8574 LearningRate 0.0000 Epoch: 19 Global Step: 112010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:31:36,976-Speed 5681.69 samples/sec Loss 0.9182 LearningRate 0.0000 Epoch: 19 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:38,797-Speed 5627.07 samples/sec Loss 0.8749 LearningRate 0.0000 Epoch: 19 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:40,628-Speed 5592.46 samples/sec Loss 0.8665 LearningRate 0.0000 Epoch: 19 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:42,443-Speed 5645.09 samples/sec Loss 0.8959 LearningRate 0.0000 Epoch: 19 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:44,255-Speed 5653.60 samples/sec Loss 0.8637 LearningRate 0.0000 Epoch: 19 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:46,084-Speed 5600.43 samples/sec Loss 0.8189 LearningRate 0.0000 Epoch: 19 Global Step: 112070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:47,911-Speed 5607.05 samples/sec Loss 0.9052 LearningRate 0.0000 Epoch: 19 Global Step: 112080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:49,726-Speed 5644.47 samples/sec Loss 0.8896 LearningRate 0.0000 Epoch: 19 Global Step: 112090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:51,541-Speed 5643.04 samples/sec Loss 0.9044 LearningRate 0.0000 Epoch: 19 Global Step: 112100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:53,362-Speed 5623.31 samples/sec Loss 0.9100 LearningRate 0.0000 Epoch: 19 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:55,168-Speed 5673.90 samples/sec Loss 0.8337 LearningRate 0.0000 Epoch: 19 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:56,994-Speed 5607.76 samples/sec Loss 0.8771 LearningRate 0.0000 Epoch: 19 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:31:58,830-Speed 5581.30 samples/sec Loss 0.9087 LearningRate 0.0000 Epoch: 19 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:00,661-Speed 5593.87 samples/sec Loss 0.9124 LearningRate 0.0000 Epoch: 19 Global Step: 112150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:02,492-Speed 5595.22 samples/sec Loss 1.0071 LearningRate 0.0000 Epoch: 19 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:04,317-Speed 5610.85 samples/sec Loss 0.9303 LearningRate 0.0000 Epoch: 19 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:06,182-Speed 5494.13 samples/sec Loss 0.9165 LearningRate 0.0000 Epoch: 19 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:08,032-Speed 5536.49 samples/sec Loss 0.8450 LearningRate 0.0000 Epoch: 19 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:09,861-Speed 5601.62 samples/sec Loss 0.9034 LearningRate 0.0000 Epoch: 19 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:11,686-Speed 5613.40 samples/sec Loss 0.9958 LearningRate 0.0000 Epoch: 19 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:13,508-Speed 5620.27 samples/sec Loss 0.8481 LearningRate 0.0000 Epoch: 19 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:15,334-Speed 5610.21 samples/sec Loss 0.9603 LearningRate 0.0000 Epoch: 19 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:17,162-Speed 5604.83 samples/sec Loss 0.8694 LearningRate 0.0000 Epoch: 19 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:18,988-Speed 5610.12 samples/sec Loss 0.9987 LearningRate 0.0000 Epoch: 19 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:20,830-Speed 5560.36 samples/sec Loss 0.8499 LearningRate 0.0000 Epoch: 19 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:22,671-Speed 5564.79 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 19 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:24,501-Speed 5596.37 samples/sec Loss 0.9289 LearningRate 0.0000 Epoch: 19 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:26,335-Speed 5587.10 samples/sec Loss 0.9064 LearningRate 0.0000 Epoch: 19 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:28,159-Speed 5616.54 samples/sec Loss 0.8722 LearningRate 0.0000 Epoch: 19 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:30,000-Speed 5562.92 samples/sec Loss 0.9242 LearningRate 0.0000 Epoch: 19 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:31,841-Speed 5564.63 samples/sec Loss 0.9547 LearningRate 0.0000 Epoch: 19 Global Step: 112320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:32:33,682-Speed 5563.08 samples/sec Loss 0.8941 LearningRate 0.0000 Epoch: 19 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:35,516-Speed 5584.96 samples/sec Loss 0.9459 LearningRate 0.0000 Epoch: 19 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:37,348-Speed 5592.30 samples/sec Loss 0.9163 LearningRate 0.0000 Epoch: 19 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:39,162-Speed 5646.50 samples/sec Loss 0.9591 LearningRate 0.0000 Epoch: 19 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:41,009-Speed 5544.66 samples/sec Loss 0.8865 LearningRate 0.0000 Epoch: 19 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:42,843-Speed 5587.39 samples/sec Loss 0.9209 LearningRate 0.0000 Epoch: 19 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:44,688-Speed 5550.72 samples/sec Loss 0.8776 LearningRate 0.0000 Epoch: 19 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:46,507-Speed 5632.22 samples/sec Loss 0.8926 LearningRate 0.0000 Epoch: 19 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:48,334-Speed 5606.60 samples/sec Loss 0.8581 LearningRate 0.0000 Epoch: 19 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:50,163-Speed 5602.53 samples/sec Loss 0.9561 LearningRate 0.0000 Epoch: 19 Global Step: 112420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:51,984-Speed 5623.62 samples/sec Loss 0.8445 LearningRate 0.0000 Epoch: 19 Global Step: 112430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:53,820-Speed 5579.24 samples/sec Loss 0.8983 LearningRate 0.0000 Epoch: 19 Global Step: 112440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:55,653-Speed 5587.93 samples/sec Loss 0.9231 LearningRate 0.0000 Epoch: 19 Global Step: 112450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:57,493-Speed 5567.84 samples/sec Loss 0.8963 LearningRate 0.0000 Epoch: 19 Global Step: 112460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:32:59,357-Speed 5496.13 samples/sec Loss 0.9008 LearningRate 0.0000 Epoch: 19 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:01,210-Speed 5528.84 samples/sec Loss 0.9411 LearningRate 0.0000 Epoch: 19 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:03,038-Speed 5601.00 samples/sec Loss 0.9205 LearningRate 0.0000 Epoch: 19 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:04,866-Speed 5603.40 samples/sec Loss 0.8858 LearningRate 0.0000 Epoch: 19 Global Step: 112500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:06,695-Speed 5599.92 samples/sec Loss 0.8697 LearningRate 0.0000 Epoch: 19 Global Step: 112510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:08,513-Speed 5637.62 samples/sec Loss 0.9286 LearningRate 0.0000 Epoch: 19 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:10,337-Speed 5614.07 samples/sec Loss 0.9149 LearningRate 0.0000 Epoch: 19 Global Step: 112530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:12,155-Speed 5635.82 samples/sec Loss 0.9604 LearningRate 0.0000 Epoch: 19 Global Step: 112540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:13,994-Speed 5568.82 samples/sec Loss 0.9078 LearningRate 0.0000 Epoch: 19 Global Step: 112550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:15,813-Speed 5633.83 samples/sec Loss 0.9345 LearningRate 0.0000 Epoch: 19 Global Step: 112560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:17,629-Speed 5640.50 samples/sec Loss 0.9100 LearningRate 0.0000 Epoch: 19 Global Step: 112570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:19,481-Speed 5531.14 samples/sec Loss 0.8602 LearningRate 0.0000 Epoch: 19 Global Step: 112580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:21,306-Speed 5611.52 samples/sec Loss 0.9131 LearningRate 0.0000 Epoch: 19 Global Step: 112590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 08:33:23,132-Speed 5610.28 samples/sec Loss 0.9047 LearningRate 0.0000 Epoch: 19 Global Step: 112600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:24,985-Speed 5525.94 samples/sec Loss 0.9241 LearningRate 0.0000 Epoch: 19 Global Step: 112610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:26,813-Speed 5605.76 samples/sec Loss 0.8346 LearningRate 0.0000 Epoch: 19 Global Step: 112620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:28,630-Speed 5637.32 samples/sec Loss 0.8468 LearningRate 0.0000 Epoch: 19 Global Step: 112630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:30,457-Speed 5608.16 samples/sec Loss 0.9342 LearningRate 0.0000 Epoch: 19 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:32,289-Speed 5591.14 samples/sec Loss 0.9711 LearningRate 0.0000 Epoch: 19 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:34,130-Speed 5563.23 samples/sec Loss 0.9092 LearningRate 0.0000 Epoch: 19 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:35,946-Speed 5640.45 samples/sec Loss 0.8250 LearningRate 0.0000 Epoch: 19 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:37,773-Speed 5605.42 samples/sec Loss 0.9400 LearningRate 0.0000 Epoch: 19 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:39,620-Speed 5546.80 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 19 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:41,442-Speed 5622.26 samples/sec Loss 0.9702 LearningRate 0.0000 Epoch: 19 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:43,263-Speed 5625.05 samples/sec Loss 0.8297 LearningRate 0.0000 Epoch: 19 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:45,082-Speed 5632.25 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 19 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:46,902-Speed 5626.73 samples/sec Loss 0.9377 LearningRate 0.0000 Epoch: 19 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:48,725-Speed 5621.16 samples/sec Loss 0.8867 LearningRate 0.0000 Epoch: 19 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:50,542-Speed 5634.76 samples/sec Loss 0.8425 LearningRate 0.0000 Epoch: 19 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:52,370-Speed 5603.81 samples/sec Loss 0.8879 LearningRate 0.0000 Epoch: 19 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:54,199-Speed 5603.85 samples/sec Loss 0.9413 LearningRate 0.0000 Epoch: 19 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:56,022-Speed 5617.92 samples/sec Loss 0.8532 LearningRate 0.0000 Epoch: 19 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:57,841-Speed 5632.22 samples/sec Loss 0.8907 LearningRate 0.0000 Epoch: 19 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:33:59,663-Speed 5621.81 samples/sec Loss 0.8543 LearningRate 0.0000 Epoch: 19 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:01,482-Speed 5631.85 samples/sec Loss 0.8526 LearningRate 0.0000 Epoch: 19 Global Step: 112810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:03,313-Speed 5592.08 samples/sec Loss 0.9205 LearningRate 0.0000 Epoch: 19 Global Step: 112820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:05,148-Speed 5582.25 samples/sec Loss 0.9352 LearningRate 0.0000 Epoch: 19 Global Step: 112830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:06,999-Speed 5533.82 samples/sec Loss 0.9191 LearningRate 0.0000 Epoch: 19 Global Step: 112840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:08,877-Speed 5454.08 samples/sec Loss 0.9139 LearningRate 0.0000 Epoch: 19 Global Step: 112850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:10,699-Speed 5624.27 samples/sec Loss 0.9960 LearningRate 0.0000 Epoch: 19 Global Step: 112860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:12,560-Speed 5504.04 samples/sec Loss 0.9525 LearningRate 0.0000 Epoch: 19 Global Step: 112870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:14,400-Speed 5567.46 samples/sec Loss 0.9554 LearningRate 0.0000 Epoch: 19 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:16,254-Speed 5524.20 samples/sec Loss 0.8784 LearningRate 0.0000 Epoch: 19 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:18,092-Speed 5575.73 samples/sec Loss 0.8722 LearningRate 0.0000 Epoch: 19 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:19,915-Speed 5616.02 samples/sec Loss 0.9104 LearningRate 0.0000 Epoch: 19 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:21,780-Speed 5493.77 samples/sec Loss 0.8279 LearningRate 0.0000 Epoch: 19 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:23,613-Speed 5587.34 samples/sec Loss 0.9151 LearningRate 0.0000 Epoch: 19 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:25,447-Speed 5585.73 samples/sec Loss 0.9983 LearningRate 0.0000 Epoch: 19 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:27,283-Speed 5578.87 samples/sec Loss 0.9006 LearningRate 0.0000 Epoch: 19 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:29,100-Speed 5637.49 samples/sec Loss 0.8745 LearningRate 0.0000 Epoch: 19 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:30,942-Speed 5561.98 samples/sec Loss 0.8909 LearningRate 0.0000 Epoch: 19 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:32,780-Speed 5574.46 samples/sec Loss 0.8836 LearningRate 0.0000 Epoch: 19 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:34,622-Speed 5559.99 samples/sec Loss 0.8782 LearningRate 0.0000 Epoch: 19 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:36,463-Speed 5563.31 samples/sec Loss 0.9376 LearningRate 0.0000 Epoch: 19 Global Step: 113000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:34:38,296-Speed 5589.78 samples/sec Loss 0.9337 LearningRate 0.0000 Epoch: 19 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:40,123-Speed 5607.36 samples/sec Loss 0.8973 LearningRate 0.0000 Epoch: 19 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:41,940-Speed 5635.78 samples/sec Loss 0.9430 LearningRate 0.0000 Epoch: 19 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:43,758-Speed 5635.54 samples/sec Loss 0.8619 LearningRate 0.0000 Epoch: 19 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:45,596-Speed 5572.80 samples/sec Loss 0.9007 LearningRate 0.0000 Epoch: 19 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:47,425-Speed 5601.18 samples/sec Loss 0.8765 LearningRate 0.0000 Epoch: 19 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:49,266-Speed 5563.20 samples/sec Loss 0.9296 LearningRate 0.0000 Epoch: 19 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:51,106-Speed 5565.69 samples/sec Loss 0.9234 LearningRate 0.0000 Epoch: 19 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:52,934-Speed 5605.66 samples/sec Loss 0.8872 LearningRate 0.0000 Epoch: 19 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:54,781-Speed 5546.40 samples/sec Loss 0.8565 LearningRate 0.0000 Epoch: 19 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:56,595-Speed 5644.90 samples/sec Loss 0.8864 LearningRate 0.0000 Epoch: 19 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:34:58,426-Speed 5595.35 samples/sec Loss 0.8888 LearningRate 0.0000 Epoch: 19 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:00,256-Speed 5597.70 samples/sec Loss 0.9345 LearningRate 0.0000 Epoch: 19 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:02,081-Speed 5613.81 samples/sec Loss 0.9091 LearningRate 0.0000 Epoch: 19 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:03,917-Speed 5577.68 samples/sec Loss 0.8902 LearningRate 0.0000 Epoch: 19 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:05,757-Speed 5568.95 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 19 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:07,601-Speed 5554.28 samples/sec Loss 0.8963 LearningRate 0.0000 Epoch: 19 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:09,434-Speed 5586.23 samples/sec Loss 0.8957 LearningRate 0.0000 Epoch: 19 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:11,272-Speed 5573.01 samples/sec Loss 0.8788 LearningRate 0.0000 Epoch: 19 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:13,100-Speed 5604.13 samples/sec Loss 0.8342 LearningRate 0.0000 Epoch: 19 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:14,928-Speed 5605.84 samples/sec Loss 0.9299 LearningRate 0.0000 Epoch: 19 Global Step: 113210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:35:16,757-Speed 5599.95 samples/sec Loss 0.8394 LearningRate 0.0000 Epoch: 19 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:18,577-Speed 5626.01 samples/sec Loss 0.9374 LearningRate 0.0000 Epoch: 19 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:20,423-Speed 5549.75 samples/sec Loss 0.9211 LearningRate 0.0000 Epoch: 19 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:22,275-Speed 5529.79 samples/sec Loss 0.9662 LearningRate 0.0000 Epoch: 19 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:24,103-Speed 5606.57 samples/sec Loss 0.9374 LearningRate 0.0000 Epoch: 19 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:25,946-Speed 5557.39 samples/sec Loss 0.8569 LearningRate 0.0000 Epoch: 19 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:27,779-Speed 5589.87 samples/sec Loss 0.9242 LearningRate 0.0000 Epoch: 19 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:29,615-Speed 5578.75 samples/sec Loss 0.9543 LearningRate 0.0000 Epoch: 19 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:31,457-Speed 5560.38 samples/sec Loss 0.8630 LearningRate 0.0000 Epoch: 19 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:33,298-Speed 5563.95 samples/sec Loss 0.9224 LearningRate 0.0000 Epoch: 19 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:35,159-Speed 5505.35 samples/sec Loss 0.8232 LearningRate 0.0000 Epoch: 19 Global Step: 113320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-27 08:35:36,985-Speed 5608.41 samples/sec Loss 0.9654 LearningRate 0.0000 Epoch: 19 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:38,817-Speed 5590.47 samples/sec Loss 0.9052 LearningRate 0.0000 Epoch: 19 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:40,644-Speed 5607.38 samples/sec Loss 0.9508 LearningRate 0.0000 Epoch: 19 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:42,472-Speed 5603.08 samples/sec Loss 0.9453 LearningRate 0.0000 Epoch: 19 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:44,322-Speed 5539.36 samples/sec Loss 0.8852 LearningRate 0.0000 Epoch: 19 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:46,154-Speed 5592.03 samples/sec Loss 0.8481 LearningRate 0.0000 Epoch: 19 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:47,970-Speed 5641.02 samples/sec Loss 0.9657 LearningRate 0.0000 Epoch: 19 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:49,818-Speed 5542.47 samples/sec Loss 0.9751 LearningRate 0.0000 Epoch: 19 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:51,656-Speed 5570.80 samples/sec Loss 0.9252 LearningRate 0.0000 Epoch: 19 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:53,522-Speed 5492.12 samples/sec Loss 0.8894 LearningRate 0.0000 Epoch: 19 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:55,335-Speed 5648.27 samples/sec Loss 0.9116 LearningRate 0.0000 Epoch: 19 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:57,170-Speed 5584.13 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 19 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:35:58,989-Speed 5629.87 samples/sec Loss 0.9234 LearningRate 0.0000 Epoch: 19 Global Step: 113450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:00,841-Speed 5530.87 samples/sec Loss 0.8781 LearningRate 0.0000 Epoch: 19 Global Step: 113460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:02,687-Speed 5548.18 samples/sec Loss 0.9827 LearningRate 0.0000 Epoch: 19 Global Step: 113470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:04,521-Speed 5587.29 samples/sec Loss 0.9157 LearningRate 0.0000 Epoch: 19 Global Step: 113480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:06,340-Speed 5630.56 samples/sec Loss 0.8460 LearningRate 0.0000 Epoch: 19 Global Step: 113490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:08,160-Speed 5629.58 samples/sec Loss 0.9083 LearningRate 0.0000 Epoch: 19 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:10,003-Speed 5558.32 samples/sec Loss 0.8861 LearningRate 0.0000 Epoch: 19 Global Step: 113510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:11,824-Speed 5623.51 samples/sec Loss 0.9146 LearningRate 0.0000 Epoch: 19 Global Step: 113520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:13,660-Speed 5578.85 samples/sec Loss 0.8679 LearningRate 0.0000 Epoch: 19 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:15,482-Speed 5622.24 samples/sec Loss 0.9071 LearningRate 0.0000 Epoch: 19 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:17,317-Speed 5583.14 samples/sec Loss 0.8988 LearningRate 0.0000 Epoch: 19 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:19,164-Speed 5545.37 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:21,018-Speed 5526.02 samples/sec Loss 0.8677 LearningRate 0.0000 Epoch: 19 Global Step: 113570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:22,850-Speed 5592.07 samples/sec Loss 0.9298 LearningRate 0.0000 Epoch: 19 Global Step: 113580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:24,684-Speed 5585.04 samples/sec Loss 0.9025 LearningRate 0.0000 Epoch: 19 Global Step: 113590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:26,512-Speed 5602.21 samples/sec Loss 0.8337 LearningRate 0.0000 Epoch: 19 Global Step: 113600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:28,349-Speed 5574.91 samples/sec Loss 0.9337 LearningRate 0.0000 Epoch: 19 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:30,169-Speed 5630.80 samples/sec Loss 0.9140 LearningRate 0.0000 Epoch: 19 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:31,989-Speed 5629.82 samples/sec Loss 0.8580 LearningRate 0.0000 Epoch: 19 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:33,823-Speed 5585.44 samples/sec Loss 0.8593 LearningRate 0.0000 Epoch: 19 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:35,665-Speed 5559.58 samples/sec Loss 0.9264 LearningRate 0.0000 Epoch: 19 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:37,513-Speed 5541.37 samples/sec Loss 0.9049 LearningRate 0.0000 Epoch: 19 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:39,344-Speed 5596.70 samples/sec Loss 0.9431 LearningRate 0.0000 Epoch: 19 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:41,178-Speed 5582.49 samples/sec Loss 0.8512 LearningRate 0.0000 Epoch: 19 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:43,024-Speed 5549.93 samples/sec Loss 0.8791 LearningRate 0.0000 Epoch: 19 Global Step: 113690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:44,857-Speed 5588.24 samples/sec Loss 0.9119 LearningRate 0.0000 Epoch: 19 Global Step: 113700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:46,765-Speed 5368.49 samples/sec Loss 0.8956 LearningRate 0.0000 Epoch: 19 Global Step: 113710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-27 08:36:48,596-Speed 5596.34 samples/sec Loss 0.8276 LearningRate 0.0000 Epoch: 19 Global Step: 113720 Fp16 Grad Scale: 65536 Required: -0 hours